Академический Документы
Профессиональный Документы
Культура Документы
Outline
Top-down v.s. Bottom-up Top-down parsing
Bottom-up parsing
Recursive-descent parsing LL(1) parsing LL(1) parsing algorithm First and follow sets Constructing LL(1) parsing table Error recovery
Shift-reduce parsers LR(0) parsing LR(0) items Finite automata of items LR(0) parsing algorithm LR(0) grammar SLR(1) parsing SLR(1) parsing algorithm SLR(1) grammar Parsing conflict
2
2301373
Chapter 4 Parsing
Introduction
Parsing is a process that constructs a syntactic structure (i.e. parse tree) from the stream of tokens. We already learn how to describe the syntactic structure of a language using (context-free) grammar. So, a parser only need to do this?
Stream of tokens Parser Context-free grammar
2301373 Chapter 4 Parsing 3
Parse tree
Bottom Up Parsing
A parse tree is created A parse tree is created from root to leaves from leaves to root The traversal of parse The traversal of parse trees is a preorder trees is a reversal of traversal postorder traversal Tracing leftmost Tracing rightmost derivation derivation Try different structures and Two types: More powerful than top-down parsing Backtracking parser backtrack if it does not matched
Predictive parser
the input
4
Guess the structure of the parse tree Chapter 4 Parsing 2301373 the next input from
id id Top-down parsing E E id + E E * E
id id Bottom-up parsing
2301373
E+E id + E id + E * E id + id * E id + id * id EE+E E + E * E E + E * id E + id * id id + id * id
Chapter 4 Parsing
Top-down Parsing
What does a parser need to decide?
2301373
Top-down Parsing
Why is it difficult?
Next token: if
Structure to be built: St
St p MatchedSt | UnmatchedSt UnmatchedSt p if (E) St| if (E) MatchedSt else UnmatchedSt MatchedSt p if (E) MatchedSt else MatchedSt |...
2301373
Recursive-Descent
Write one procedure for each set of productions with the same nonterminal in the LHS Each procedure recognizes a structure described by a nonterminal. A procedure calls other procedures if it need to recognize other structures. A procedure calls match procedure if it need to recognize a terminal.
2301373 Chapter 4 Parsing 8
Recursive-Descent: Example
For this grammar: E ::= F {O F} We cannot decide which O ::= + | rule to use for E, and F ::= ( E ) | id If we choose E p E O F, procedure E procedure F it leads to infinitely E; O; F; } recursive loops. { switch token { { case (: match((); Rewrite the grammar E; into EBNF match()); case id: match(id); procedure E default: error; { F; } while (token=+ or token=-) } { O; F; } }
2301373 Chapter 4 Parsing 9
EpEOF|F Op+|F p ( E ) | id
Match procedure
procedure match(expTok) { if (token==expTok) then getToken else error } The token is not consumed until getToken is executed.
2301373
Chapter 4 Parsing
10
Problems in Recursive-Descent
Difficult to convert grammars into EBNF Cannot decide which production to use at each point Cannot decide when to use P-production ApP
2301373
Chapter 4 Parsing
11
LL(1) Parsing
LL(1)
Read input from (L) left to right Simulate (L) leftmost derivation 1 lookahead symbol
2301373
n F N T ( ( n + ( n ) ) * n $ X E ) + A n F EpTX N T ( XpATX|P Ap+|X E * M Finished TpFN ) F n NpMFN|P Mp* N T Fp(E ) | n X E $ Chapter 4 Parsing 14
ELSEError
CASE ($,$): OTHER:
2301373
Accept Error
Chapter 4 Parsing 15
t Y X N Q
X N t Y
t
2301373 Chapter 4 Parsing 16
First Set
Let X be P or be in V or T. First(X ) is the set of the first terminal in any sentential form derived from X.
If X is a terminal or P, then First(X ) ={X }. If X is a nonterminal and X pX1 X2 ... Xn is a rule, then
First(X1) -{P} is a subset of First(X) First(Xi )-{P} is a subset of First(X) if for all j<i First(Xj) contains {P} P is in First(X) if for all j n First(Xj)contains P
2301373 Chapter 4 Parsing 17
19
P *
( num
2301373
Chapter 4 Parsing
20
Follow Set
Let $ denote the end of input tokens If A is the start symbol, then $ is in Follow(A). If there is a rule B p X A Y, then First(Y) {P} is in Follow(A). If there is production B p X A Y and P is in First(Y), then Follow(A) contains Follow(B).
2301373
Chapter 4 Parsing
21
If A is the start symbol, then $ is in Follow(A). If there is a rule A p Y X Z, then First(Z) - {P} is in Follow(X). If there is production B p X A Y and P is in First(Y), then Follow(A) contains Follow(B).
22
Follow
$) $) )
P + + P* *
( num
( num + - $
2301373
Chapter 4 Parsing
23
2301373
Chapter 4 Parsing
24
+ -
n
1
2 4
2 5 6
1 exp p term exp 2 exp p addop term exp 3 exp p P 4 addop p + 5 addop p 6 term p factor term 7 term p mulop factor term 8 term p P 9 mulop p * 10 factor p ( exp ) 11 factor p num
2301373
6 8 8 8 7 9 10
11
25
Chapter 4 Parsing
LL(1) Grammar
A grammar is an LL(1) grammar if its LL(1) parsing table has at most one production in each table entry.
2301373
Chapter 4 Parsing
26
( 1,2 3,4 5
) +
addop mulop
Chapter 4 Parsing
27
2301373
Chapter 4 Parsing
28
Left Recursion
Immediate left recursion
A p Y A, A p X A| P A p Y1 A | Y2 A |...| Ym A , A p X1 A | X2 A | | Xn A | P
A => X =>* A Y
Can be removed when there is no empty-string production and no cycle in the grammar
29
2301373
Chapter 4 Parsing
Removal of Immediate Left Recursion exp p exp + term | exp - term | term term p term * factor | factor factor p ( exp ) | num Remove left recursion exp = term (s term)* exp p term exp exp p + term exp | - term exp | P term p factor term term = factor (* factor)* term p * factor term | P factor p ( exp ) | num
2301373 Chapter 4 Parsing 30
Can only be removed when there is no emptystring production and no cycle in the grammar. Never seen in grammars of any programming languages
Good News!!!!
2301373
Chapter 4 Parsing
31
Left Factoring
Left factor causes non-LL(1)
Given A p X Y | X Z. Both A p X Y and A p X Z can be chosen when A is on top of stack and a token in First(X) is the next token.
2301373
Chapter 4 Parsing
32
Bottom-up Parsing
Use explicit stack to perform a parse Simulate rightmost derivation (R) from left (L) to right, thus called LR parsing More powerful than top-down parsing
Two actions
Shift: take next input token into the stack Reduce: replace a string B on top of stack by a nonterminal A, given a production A p B
2301373 Chapter 4 Parsing 34
Reverse of
Action 1 shift 2 shift 3 reduce S p P 4 shift 5 reduce S p P reduce S p ( S ) S 6 7 shift reduce S p P 8 reduce S p ( S ) S 9 accept 10 S
Chapter 4 Parsing
Parsing actions Stack Input $ (())$ $( ())$ $(( ))$ $((S ))$ $((S) )$ $((S)S )$ $(S )$ $(S) $ $(S)S $ $S $
2301373
35
Parsing actions Stack Input $ (())$ $( ())$ $(( ))$ $((S ))$ $((S) )$ $((S)S )$ $(S )$ $(S) $ $(S)S $ $S $ Viable prefix
2301373
Action shift 1 shift 2 reduce S p P 3 shift 4 reduce S p P 5 reduce S p ( S ) S 6 shift 7 reduce S p P 8 reduce S p ( S ) S 9 accept 10 S
Chapter 4 Parsing
handle
36
Terminologies
Right sentential form
sentential form in a rightmost derivation sequence of symbols on the parsing stack right sentential form + position where reduction can be performed + production used for reduction production with distinguished position in its RHS
Viable prefix
Viable prefix
Handle
Handle
LR(0) item
LR(0) item
Sp Sp Sp Sp Sp
2301373
Chapter 4 Parsing
Shift-reduce parsers
There are two possible actions:
To make sure that parsing is finished when S is on top of stack because S never appears on the RHS of any production.
Chapter 4 Parsing 38
2301373
LR(0) parsing
Keep track of what is left to be done in the parsing process by using finite automata of items
An item A p w . B y means:
A p w B y might be used for the reduction in the future, at the time, we know we already construct w in the parsing process, if B is constructed next, we get the new item ApwB.Y
2301373
Chapter 4 Parsing
39
LR(0) items
LR(0) item
production with a distinguished position in the RHS Item with the distinguished position on the leftmost of the production Item with the distinguished position on the rightmost of the production Item x together with items which can be reached from x via P-transition Original item, not including closure items
Chapter 4 Parsing 40
Initial Item
Complete Item
Closure Item of x
Kernel Item
2301373
Items:
S p .S S p S. S p .(S)S S p (.S)S S p (S.)S S p (S).S S p (S)S. Sp.
2301373
S p (.S)S
S p (S.)S
S p (S).S
S p (S)S.
Chapter 4 Parsing
41
P
S p .(S)S
P P
S p.
P
S S p (S.)S
S p (.S)S )
P
S p (S).S S S p (S)S.
2301373 Chapter 4 Parsing
2301373
Chapter 4 Parsing
43
a ( A p (.A) A p .(A) A p .a 3 A
State 0 1 2 3 4 5
Action Rule ( a ) A shift 3 2 1 reduce A -> A reduce A -> a shift 3 2 4 shift 5 reduce A -> (A)
2301373
Chapter 4 Parsing
44
Input Action ((a))$ shift (a))$ shift a))$ shift ) ) $ reduce ) ) $ shift ) $ reduce ) $ shift $ reduce $ accept
Chapter 4 Parsing
45
Non-LR(0)Grammar
Conflict
Shift-reduce conflict
A state contains a complete item A p x. and a shift item A p x.By A state contains more than one complete items.
S p .S S p .(S)S Sp.
p S. 1
S p (S.)S 3 S )
Reduce-reduce conflict
S S p (S)S. 5
46
SLR(1) parsing
Simple LR with 1 lookahead symbol Examine the next token before deciding to shift or reduce
If the next token is the token expected in an item, then it can be shifted into the stack. If a complete item A p x. is constructed and the next token is in Follow(A), then reduction can be done using A p x. Otherwise, error occurs.
Chapter 4 Parsing
48
SLR(1) grammar
Conflict
Shift-reduce conflict
A state contains a shift item A p x.Wy such that W is a terminal and a complete item B p z. such that W is in Follow(B). A state contains more than one complete item with some common Follow set.
Reduce-reduce conflict
2301373
Chapter 4 Parsing
49
A p .A A p .(A) A p .a 0
A a
A p A. 1 A p a. 2
State ( a ) $ A 0 S3 S2 1 1 AC 2 R2 3 S3 S2 4 4 S5 5 R1
2301373
Chapter 4 Parsing
50
S p (S)S | P
State ( ) $ 0 S2 R2 R2 1 AC 2 S2 R2 R2 3 S4 4 S2 R2 R2 5 R1 R1
S 1 3 5
2301373
Chapter 4 Parsing
51
Shift-reduce conflict
In case of nested if statements, preferring shift over reduce implies most closely nested rule for dangling else
Reduce-reduce conflict
Error in design
2301373
Chapter 4 Parsing
52
Dangling Else
S p .S 0 S p .I S p .other I p .if S I p .if S else S other S I S
p S.
1 I
S p I. 2 if I
else if
other other
S 1
I 2
5 7
2 2
if
2301373
Chapter 4 Parsing