Вы находитесь на странице: 1из 53

CSC 4181

Compiler Construction

Parsing

1
Outline
Top-down v.s. Bottom-up
Top-down parsing Bottom-up parsing
 Recursive-descent  Shift-reduce parsers
parsing
 LR(0) parsing
 LL(1) parsing
 LR(0) items
 LL(1) parsing
algorithm  Finite automata of items

 First and follow sets  LR(0) parsing algorithm

 Constructing LL(1)  LR(0) grammar


parsing table  SLR(1) parsing
 Error recovery
 SLR(1) parsing algorithm

 SLR(1) grammar

 Parsing conflict

Parsing 2
Introduction

Parsing is a process that constructs a


syntactic structure (i.e. parse tree) from the
stream of tokens.
We already learned how to describe the
syntactic structure of a language using
(context-free) grammar.
So, a parser only needs to do this?
Stream of tokens
Parser Parse tree
Context-free grammar
Parsing 3
Top–Down Parsing Bottom–Up Parsing
A parse tree is created A parse tree is created
from root to leaves from leaves to root
The traversal of parse trees The traversal of parse trees
is a preorder traversal is a reversal of postorder
Tracing leftmost derivation traversal
Two types: Tracing rightmost
 Backtracking parser derivation
 Predictive parser More powerful than top-
Try different
Backtracking:
down parsing
structures and backtrack if it
does not matched the input Guess the
Predictive:
structure of the parse
tree from the next input

Parsing 4
Parse Trees and Derivations
E
E E+E

E E
 id + E
+
E E  id + E * E
id *
id id  id + id * E
Top-down parsing  id + id * id
E
EE+E
E + E  E + E * E

E E  E + E * id
id *
 E + id * id
id id
Bottom-up parsing  id + id * id

Parsing 5
Top-down Parsing

What does a parser need to decide?


 Which production rule is to be used at each point
of time ?
How to guess?
What is the guess based on?
 What is the next token?
 Reserved word if, open parentheses, etc.
 What is the structure to be built?
 If statement, expression, etc.

Parsing 6
Top-down Parsing

Why is it difficult?
 Cannot decide until later
 Next token: if Structure to be built: St
 St  MatchedSt | UnmatchedSt
 UnmatchedSt 
if (E) St| if (E) MatchedSt else UnmatchedSt
 MatchedSt  if (E) MatchedSt else MatchedSt |...
 Production with empty string
 Next token: id Structure to be built: par
 par  parList | 

 parList  exp , parList | exp

Parsing 7
Recursive-Descent

Write one procedure for each set of


productions with the same nonterminal in
the LHS
Each procedure recognizes a structure
described by a nonterminal.
A procedure calls other procedures if it needs
to recognize other structures.
A procedure calls match procedure if it needs
to recognize a terminal.
Parsing 8
Recursive-Descent: Example
EEOF|F E ::= F {O F} For this grammar:
O+|- O ::= + | -  We cannot decide which
F  ( E ) | id F ::= ( E ) | id rule to use for E, and
 If we choose E  E O F,
procedure F procedure E it leads to infinitely
{ switch token { E; O; F; } recursive loops.
{ case (: match(‘(‘); Rewrite the grammar
E; into EBNF
match(‘)’);
case id: match(id); procedure E
default: error; { F;
} while (token=+ or token=-)
} { O; F; }
}
Parsing 9
Match procedure

procedure match(expTok)
{ if (token==expTok)
then getToken
else error
}
The token is not consumed until getToken
is executed.

Parsing 10
Problems in Recursive-Descent

Difficult to convert grammars into EBNF


Cannot decide which production to use at
each point
Cannot decide when to use -production
A

Parsing 11
LL(1) Parsing

LL(1)
 Read input from (L) left to right
 Simulate (L) leftmost derivation

 1 lookahead symbol

Use stack to simulate leftmost derivation


 Part of sentential form produced in the leftmost
derivation is stored in the stack.
 Top of stack is the leftmost nonterminal symbol
in the fragment of sentential form.

Parsing 12
Concept of LL(1) Parsing

Simulate leftmost derivation of the input.


Keep part of sentential form in the stack.
If the symbol on the top of stack is a
terminal, try to match it with the next input
token and pop it out of stack.
If the symbol on the top of stack is a
nonterminal X, replace it with Y if we have a
production rule X  Y.
 Which production will be chosen, if there are
both X  Y and X  Z ?

Parsing 13
Example of LL(1) Parsing
E TX
FNX
n
F
(E)NX
(TX)NX T
N
( ( n + ( n ) ) * n $
(FNX)NX
(nNX)NX
(nX)NX X
E
(nATX)NX
(n+TX)NX
(n+FNX)NX
A
n
F
+
)
(n+(E)NX)NX E  T X
(n+(TX)NX)NX (
T
N X  A T X | 
(n+(FNX)NX)NX
(n+(nNX)NX)NX
(n+(nX)NX)NX Finished
E
X
M
* A  + | -
(n+(n)NX)NX T  F N
(n+(n)X)NX
(n+(n))NX
F
)
n N  M F N | 
(n+(n))MFNX M  *
(n+(n))*FNX T
N
(n+(n))*nNX F  ( E ) | n
(n+(n))*nX
(n+(n))*n
E
X
Parsing $ 14
LL(1) Parsing Algorithm
Push the start symbol into the stack
WHILE stack is not empty ($ is not on top of stack) and the
stream of tokens is not empty (the next input token is not $)
SWITCH (Top of stack, next token)
CASE (terminal a, a):
Pop stack; Get next token
CASE (nonterminal A, terminal a):
IF the parsing table entry M[A, a] is not empty THEN
Get A X1 X2 ... Xn from the parsing table entry M[A,
a] Pop stack;
Push Xn ... X2 X1 into stack in that order
ELSE Error
CASE ($,$): Accept
OTHER: Error
Parsing 15
LL(1) Parsing Table

If the nonterminal N is on
the top of stack and the
next token is t, which
production rule to use?
Choose a rule N  X t N
X
such that N
Y
X t
 X * tY or Q Y
 X *  and S * WNtY
t … … …

Parsing 16
First Set

Let X be  or be in V or T.
First(X ) is the set of the first terminal in any
sentential form derived from X.
 If X is a terminal or , then First(X ) ={X }.
 If X is a nonterminal and X X1 X2 ... Xn is a
rule, then
 First(X1) -{} is a subset of First(X)
 First(Xi )-{} is a subset of First(X) if for all j<i
First(Xj) contains {}
  is in First(X) if for all j≤n First(Xj)contains 

Parsing 17
Examples of First Set
exp  exp addop term | st  ifst | other
term ifst  if ( exp ) st elsepart
addop  + | - elsepart  else st | 
term  term mulop factor | exp 0|1
factor
mulop  * First(exp) = {0,1}
factor  (exp) | num First(elsepart) = {else, }
First(ifst) = {if}
First(addop) = {+, -}
First(st) = {if, other}
First(mulop) = {*}
First(factor) = {(, num}
First(term) = {(, num}
First(exp) = {(, num}
Parsing 18
Algorithm for finding First(A)

For all terminals a, First(a) = {a} If A is a terminal or ,


then First(A) = {A}.
For all nonterminals A, First(A) := {} If A is a nonterminal,
While there are changes to any First(A) then for each rule A
X1 X2 ... Xn, First(A)
For each rule A  X1 X2 ... Xn contains First(X1) - {}.
For each Xi in {X1, X2, …, Xn } If also for some i<n,
First(X1), First(X2), ...,
If for all j<i First(Xj) contains and First(Xi) contain ,
, then First(A) contains
First(Xi+1)-{}.
Then If First(X1), First(X2), ...,
add First(Xi)-{} to First(A) and First(Xn) contain ,
then First(A) also
If  is in First(X1), First(X2), ..., contains .
and First(Xn)
ParsingThen add  to First(A) 19
Finding First Set: An Example

exp  term exp’ First


exp’  addop term exp’ |  exp
addop  + | - exp’ 
term  factor term’ addop + -
term’  mulop factor term’ |  term ( num
mulop  * term’ 
factor  ( exp ) | num mulop *
factor ( num

Parsing 20
Follow Set

Let $ denote the end of input tokens


If A is the start symbol, then $ is in
Follow(A).
If there is a rule B  X A Y, then First(Y) -
{} is in Follow(A).
If there is production B  X A Y and  is in
First(Y), then Follow(A) contains Follow(B).

Parsing 21
Algorithm for Finding Follow(A)
Follow(S) = {$} If A is the start
FOR each A in V-{S} symbol, then $ is
Follow(A)={} in Follow(A).
WHILE change is made to some Follow sets If there is a rule A 
Y X Z, then
FOR each production A  X1 X2 ... Xn, First(Z) - {} is in
FOR each nonterminal Xi Follow(X).
Add First(Xi+1 Xi+2...Xn)-{} If there is production
into Follow(Xi). B  X A Y and 
(NOTE: If i=n, Xi+1 Xi+2...Xn= ) is in First(Y), then
IF  is in First(Xi+1 Xi+2...Xn) THEN Follow(A) contains
Add Follow(A) to Follow(Xi) Follow(B).

Parsing 22
Finding Follow Set: An Example
exp  term exp’
First Follow
exp’  addop term exp’ | 
exp ( num $)
addop  + | -
term  factor term’ exp’  + - $ )
term’  mulop factor term’ | addop + -
mulop  * term ( num + - $ )
factor  ( exp ) | num term’ *
mulop *
factor ( num

Parsing 23
Constructing LL(1) Parsing Tables

FOR each nonterminal A and a production A  X


FOR each token a in First(X)
A  X is in M(A, a)
IF  is in First(X) THEN
FOR each element a in Follow(A)
Add A  X to M(A, a)

Parsing 24
Example: Constructing LL(1) Parsing Table

First Follow
exp {(, num} {$,)} ( ) + - * n $
exp’ {+,-, } {$,)}
addop {+,-} {(,num} exp
1 1
term {(,num} {+,-,),$}
term’ {*, } {+,-,),$} exp’ 3 2 2 3
mulop {*} {(,num}
factor {(, num} {*,+,-,),$} addop 4 5
1 exp  term exp’
2 exp’  addop term exp’ term 6 6
3 exp’  
4 addop  +
5 addop  - term’ 8 8 8 7 8
6 term  factor term’
7 term’  mulop factor term’ mulop
8 term’   9
9 mulop  *
10 factor  ( exp ) factor
11 factor  num 10 11
Parsing 25
LL(1) Grammar

A grammar is an LL(1) grammar if its LL(1)


parsing table has at most one production in
each table entry.

Parsing 26
LL(1) Parsing Table for non-LL(1) Grammar

1 exp  exp addop term


2 exp  term
3 term  term mulop factor
4 term  factor
5 factor  ( exp ) ( ) + - * num $
6 factor  num exp 1,2 1,2
7 addop  +
8 addop  - term 3,4 3,4
9 mulop  * factor 5 6
addop 7 8
First(exp) = { (, num } mulop 9
First(term) = { (, num }
First(factor) = { (, num }
First(addop) = { +, - }
First(mulop) = { * }
Parsing 27
Causes of Non-LL(1) Grammar

What causes grammar being non-LL(1)?


 Left-recursion
 Left factor

Parsing 28
Left Recursion

Immediate left Can be removed very


recursion easily
 A  A X | Y A=Y X*  A  Y A’, A’  X A’| 
 A  A X1 | A X2 |…| A Xn  A  Y1 A’ | Y2 A’ |...| Ym A’,
| Y1 | Y2 |... | Ym A’  X1 A’| X2 A’|…| Xn A’| 
A={Y1, Y2,…, Ym} {X1, X2, …, Xn}*
General left recursion Can be removed when
 A => X =>* A Y there is no empty-string
production and no cycle
in the grammar

Parsing 29
Removal of Immediate Left Recursion

exp  exp + term | exp - term | term


term  term * factor | factor
factor  ( exp ) | num
Remove left recursion
exp  term exp’ exp = term ( term)*
exp’  + term exp’ | - term exp’ | 
term  factor term’ term = factor (* factor)*
term’  * factor term’ | 
factor  ( exp ) | num
Parsing 30
General Left Recursion

Bad News!
 Can only be removed when there is no empty-
string production and no cycle in the grammar.
Good News!!!!
 Never seen in grammars of any programming
languages

Parsing 31
Left Factoring

Left factor causes non-LL(1)


 Given A  X Y | X Z. Both A  X Y and A  X Z
can be chosen when A is on top of stack and a
token in First(X) is the next token.

AXY|XZ
can be left-factored as
A  X A’ and A’  Y | Z

Parsing 32
Example of Left Factor

ifSt  if ( exp ) st else st | if ( exp ) st


can be left-factored as
ifSt  if ( exp ) st elsePart
elsePart  else st | 

seq  st ; seq | st
can be left-factored as
seq  st seq’
seq’ ; seq | 

Parsing 33
Bottom-up Parsing

Use explicit stack to perform a parse


Simulate rightmost derivation (R) from left
(L) to right, thus called LR parsing
More powerful than top-down parsing
 Left recursion does not cause problem
Two actions
 Shift: take next input token into the stack
 Reduce: replace a string B on top of stack by a
nonterminal A, given a production A  B

Parsing 34
Example of Shift-reduce Parsing

Grammar
S’  S
S  (S)S |  Reverse of
Parsing actions rightmost derivation
Stack Input Action from left to right
$ (())$ shift 1 (())
$( ())$ shift 2 (())
$(( ))$ reduce S  3 (())
$((S ))$ shift 4 ((S))
$((S) )$ reduce S  5 ((S))
$((S)S )$ reduce S (S)S 6 ((S)S)
$(S )$ shift 7 (S)
$(S) $ reduce S  8 (S)
$(S)S $ reduce S (S)S 9 (S)S
$S $ accept 10 S’ S

Parsing 35
Example of Shift-reduce Parsing

Grammar
S’  S
S  (S)S | 
Parsing actions
Stack Input Action
$ (())$ shift 1 (()) handle
$( ())$ shift 2 (())
$(( ))$ reduce S  3 (())
$((S ))$ shift 4 ((S))
$((S) )$ reduce S  5 ((S))
$((S)S )$ reduce S (S)S 6 ((S)S)
$(S )$ shift 7 (S)
$(S) $ reduce S  8 (S)
$(S)S $ reduce S (S)S 9 (S)S
$S $ accept 10 S’ S
Viable prefix

Parsing 36
Terminologies
Right sentential form Right sentential form
 sentential form in a rightmost  (S)S
derivation  ((S)S)
Viable prefix Viable prefix
 ( S ) S, ( S ), ( S, (
 sequence of symbols on the
 ( ( S ) S, ( ( S ), ( ( S , ( (, (
parsing stack
Handle
Handle  ( S ) S. with S  
 right sentential form +  ( S ) S . with S  
position where reduction can  ( ( S ) S . ) with S  ( S ) S
be performed + production
used for reduction LR(0) item
 S  ( S ) S.
LR(0) item  S  (S).S
 production with distinguished  S  (S.)S
position in its RHS  S  (.S)S
 S  .(S)S

Parsing 37
Shift-reduce parsers

There are two possible actions:


 shift and reduce
Parsing is completed when
 the input stream is empty and
 the stack contains only the start symbol

The grammar must be augmented


 a new start symbol S’ is added
 a production S’  S is added
 To make sure that parsing is finished when S’ is on
top of stack because S’ never appears on the RHS of
any production.
Parsing 38
LR(0) parsing

Keep track of what is left to be done in the


parsing process by using finite automata of
items
 An item A  w . B y means:
 A  w B y might be used for the reduction in the
future,
 at the time, we know we already construct w in the
parsing process,
 if B is constructed next, we get the new item
AwB.Y

Parsing 39
LR(0) items

LR(0) item
 production with a distinguished position in the RHS
Initial Item
 Item with the distinguished position on the leftmost of
the production
Complete Item
 Item with the distinguished position on the rightmost of
the production
Closure Item of x
 Item x together with items which can be reached from x
via -transition
Kernel Item
 Original item, not including closure items
Parsing 40
Finite automata of items

Grammar: S
S’  .S S’  S.
S’  S
 
S  (S)S
S  S  .(S)S S.
Items:
S’  .S (  
S’  S.
S  .(S)S S
 S  (.S)S S  (S.)S 
S  (.S)S
S  (S.)S )

S  (S).S
S
S  (S)S. S  (S).S S  (S)S.
S.

Parsing 41
DFA of LR(0) Items

S S’  .S S S’  S.
S’  .S S’  S. S  .(S)S
 S.

S  (S.)S
S  .(S)S
S. ( S

  )
( S  (.S)S
S S  .(S)S
S  (.S)S S  (S.)S S.
) ( ( S  (S).S
S  .(S)S
 S.
S  (S).S

S S

S  (S)S.
S  (S)S.

Parsing 42
LR(0) parsing algorithm
Item in state token Action
A-> x.By where B is terminal B shift B and push state s
containing A -> xB.y
A-> x.By where B is terminal not B error
A -> x. - reduce with A -> x (i.e. pop x,
backup to the state s on top of
stack) and push A with new
state d(s,A)
S’ -> S. none accept
S’ -> S. any error

Parsing 43
LR(0) Parsing Table

A’  .A A A’  A. 1
A  .(A) State Action Rule ( a ) A
A  .a 0 a
A  a. 2
0 shift 3 2 1
( a 1 reduce A’ -> A
A  (.A)
A  (A.) 4
2 reduce A -> a
A  .(A)
A  .a 3 A 3 shift 3 2 4
)
( 4 shift 5
A  (A). 5 5 reduce A -> (A)

Parsing 44
Example of LR(0) Parsing
State Action Rule ( a ) A
0 shift 3 2 1
1 reduce A’ -> A
2 reduce A -> a
3 shift 3 2 4
4 shift 5
5 reduce A -> (A)
Stack Input Action
$0 ((a))$ shift
$0(3 (a))$ shift
$0(3(3 a))$ shift
$0(3(3a2 ))$ reduce
$0(3(3A4 ))$ shift
$0(3(3A4)5 )$ reduce
$0(3A4 )$ shift
$0(3A4)5 $ reduce
$0A1 $ accept
Parsing 45
Non-LR(0)Grammar

Conflict
S’  .S S S’  S. 1
 Shift-reduce conflict S  .(S)S
A state contains a S .
 0 S  (S.)S 3
complete item A  x. and (
a shift item A  x.By S

 Reduce-reduce conflict S  (.S)S )


S  .(S)S
 A state contains more S .
2
than one complete items.
( ( S  (S).S
S  .(S)S
A grammar is a LR(0) S .
4
grammar if there is no
S
conflict in the
grammar. S  (S)S. 5

Parsing 46
SLR(1) parsing

Simple LR with 1 lookahead symbol


Examine the next token before deciding to
shift or reduce
 If the next token is the token expected in an
item, then it can be shifted into the stack.
 If a complete item A  x. is constructed and the
next token is in Follow(A), then reduction can be
done using A  x.
 Otherwise, error occurs.

Can avoid conflict

Parsing 47
SLR(1) parsing algorithm

Item in state token Action


A-> x.By (B is terminal) B shift B and push state s containing
A -> xB.y
A-> x.By (B is terminal) not B error
A -> x. in reduce with A -> x (i.e. pop x,
Follow(A) backup to the state s on top of
stack) and push A with new state
d(s,A)
A -> x. not in error
Follow(A)
S’ -> S. none accept
S’ -> S. any error

Parsing 48
SLR(1) grammar

Conflict
 Shift-reduce conflict
 A state contains a shift item A  x.Wy such that W is
a terminal and a complete item B  z. such that W
is in Follow(B).
 Reduce-reduce conflict
 A state contains more than one complete item with
some common Follow set.
A grammar is an SLR(1) grammar if there is
no conflict in the grammar.

Parsing 49
SLR(1) Parsing Table

A  (A) | a
State ( a ) $ A
0 S3 S2 1
A’  .A A A’  A. 1 1 AC
A  .(A)
A  .a 0 a 2 R2
A  a. 2 3 S3 S2 4
( a
A  (.A)
4 S5
A  .(A)
A  .a 3 A A  (A.) 4
5 R1
)

( A  (A). 5

Parsing 50
SLR(1) Grammar not LR(0)
S’  .S S
S  .(S)S S’  S. 1 S  (S)S | 
S. 0
S  (S.)S 3
( S
S  (.S)S ) State ( ) $ S
S  .(S)S
S. 2 0 S2 R2 R2 1
S  (S).S 1 AC
( ( S  .(S)S 2 S2 R2 R2 3
S. 4 3 S4
S 4 S2 R2 R2 5
5 R1 R1
S (S)S. 5

Parsing 51
Disambiguating Rules for Parsing Conflict

Shift-reduce conflict
 Prefer shift over reduce
 In case of nested if statements, preferring shift over
reduce implies most closely nested rule for dangling
else
Reduce-reduce conflict
 Error in design

Parsing 52
Dangling Else
S’  .S S S’  S. 1
0 I  if S else .S 6
S  .I I I S  .I
S  .other S  I. 2 S  .other
I  .if S I  .if S S
I  .if S else I I  .if S else S
S if
else
I  .if S else S 7
other other
if state if else other $ S I
other
0 S4 S3 1 2
S  .other 3 I  if .S 4 1 ACC
other
I  if .S else S
I  if S. 5 S  .I 2 R1 R1
I  if S. else S S S  .other 3 R2 R2
I  .if S
4 S4 S3 5 2
I  .if S else S
5 S6 R3
if
6 S4 S3 7 2
7 R4 R4
Parsing 53

Вам также может понравиться