Error Handling Syntax-Directed Translation Recursive Descent Parsing

Announcements
• PA1
– Due today at midnight
Error Handling – README, test case
Syntax-Directed Translation – Your name(s)!
Recursive Descent Parsing • WA1
– Due today at 5pm
Lecture 6
• PA2
– Assigned today
• WA2
– Assigned Tuesday
Prof. Aiken CS 143 Lecture 6 1 Prof. Aiken CS 143 Lecture 6 2
Outline Error Handling
• Extensions of CFG for parsing • Purpose of the compiler is

– Precedence declarations – To detect non-valid programs
– Error handling – To translate the valid ones
– Semantic actions • Many kinds of possible errors (e.g. in C)
Error kind Example Detected by …
• Constructing a parse tree
Lexical …$… Lexer
Syntax … x *% … Parser
• Recursive descent Semantic … int x; y = x(3); … Type checker
Correctness your favorite program Tester/User
Syntax Error Handling Approaches to Syntax Error Recovery
• Error handler should • From simple to complex

– Report errors accurately and clearly – Panic mode
– Recover from an error quickly – Error productions
– Not slow down compilation of valid code – Automatic local or global correction
• Good error handling is not easy to achieve • Not all are supported by all parser generators
1
Error Recovery: Panic Mode Syntax Error Recovery: Panic Mode (Cont.)
• Simplest, most popular method • Consider the erroneous expression

(1 + + 2) + 3
• When an error is detected: • Panic-mode recovery:
– Discard tokens until one with a clear role is found – Skip ahead to next integer and then continue
– Continue from there
• Bison: use the special terminal error to
• Such tokens are called synchronizing tokens describe how much input to skip
– Typically the statement or expression terminators E  int | E + E | ( E ) | error int | ( error )
Syntax Error Recovery: Error Productions Error Recovery: Local and Global Correction
• Idea: specify in the grammar known common mistakes • Idea: find a correct “nearby” program
– Try token insertions and deletions
• Essentially promotes common errors to alternative
– Exhaustive search
syntax
• Example: • Disadvantages:
– Write 5 x instead of 5 * x
– Hard to implement
– Add the production E  … | E E
– Slows down parsing of correct programs
• Disadvantage – “Nearby” is not necessarily “the intended” program
– Complicates the grammar – Not all tools support it
Syntax Error Recovery: Past and Present Abstract Syntax Trees
• Past • So far a parser traces the derivation of a

– Slow recompilation cycle (even once a day) sequence of tokens
– Find as many errors in one cycle as possible
– Researchers could not let go of the topic
• The rest of the compiler needs a structural
• Present representation of the program
– Quick recompilation cycle
– Users tend to correct one error/cycle • Abstract syntax trees
– Complex error recovery is less compelling – Like parse trees but ignore some details
– Panic-mode seems enough – Abbreviated as AST
2
Abstract Syntax Tree. (Cont.) Example of Parse Tree
• Consider the grammar E

E  int | ( E ) | E + E • Traces the operation
E + E of the parser
• And the string
5 + (2 + 3) • Does capture the
int5 ( E ) nesting structure
• After lexical analysis (a list of tokens) • But too much info

E +
int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ E – Parentheses
– Single-successor
int2 nodes
• During parsing we build a parse tree … int3
Example of Abstract Syntax Tree Semantic Actions
PLUS • This is what we’ll use to construct ASTs
PLUS
• Each grammar symbol may have attributes
– For terminal symbols (lexical tokens) attributes can
5 2 3 be calculated by the lexer
• Also captures the nesting structure

• But abstracts from the concrete syntax • Each production may have an action
– Written as: X  Y1 … Yn { action }
=> more compact and easier to use
– That can refer to or compute symbol attributes
• An important data structure in a compiler
Semantic Actions: An Example Semantic Actions: An Example (Cont.)
• Consider the grammar • String: 5 + (2 + 3)

E  int | E + E | ( E )
• Tokens: int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’
• For each symbol X define an attribute X.val
– For terminals, val is the associated lexeme Productions Equations
– For non-terminals, val is the expression’s value (and is
computed from values of subexpressions) E  E1 + E2 E.val = E1.val + E2.val
E1  int5 E1.val = int5.val = 5
• We annotate the grammar with actions: E2  ( E3) E2.val = E3.val
E  int { E.val = int.val } E3  E4 + E5 E3.val = E4.val + E5.val
| E1 + E2 { E.val = E1.val + E2.val }
| ( E1 ) { E.val = E1.val } E4  int2 E4.val = int2.val = 2
E5  int3 E5.val = int3.val = 3
3
Semantic Actions: Notes Dependency Graph
• Semantic actions specify a system of E + • Each node labeled E

equations has one slot for the val
– Order of resolution is not specified E1 + E2 attribute
• Note the dependencies
• Example: int5 5 ( E3 + )
E3.val = E4.val + E5.val
– Must compute E4.val and E5.val before E3.val
– We say that E3.val depends on E4.val and E5.val E4 +
E5
• The parser must find the order of evaluation int2 2 int3 3
Evaluating Attributes Dependency Graph
• An attribute must be computed after all its E 10

successors in the dependency graph have been
computed E1 5 + E2 5
– In previous example attributes can be computed

bottom-up int5 5 ( E3 5 )
• Such an order exists when there are no cycles E4 2 +

E5 3
– Cyclically defined attributes are not legal
int2 2 int3 3
Semantic Actions: Notes (Cont.) Inherited Attributes
• Synthesized attributes • Another kind of attribute

– Calculated from attributes of descendents in the
parse tree
• Calculated from attributes of parent and/or
– E.val is a synthesized attribute
siblings in the parse tree
– Can always be calculated in a bottom-up order
• Grammars with only synthesized attributes • Example: a line calculator

are called S-attributed grammars
– Most common case
4
A Line Calculator Attributes for the Line Calculator
• Each line contains an expression • Each E has a synthesized attribute val

E  int | E + E – Calculated as before
• Each line is terminated with the = sign • Each L has an attribute val
LE= | +E= LE= { L.val = E.val }
| +E= { L.val = E.val + L.prev }
• In second form the value of previous line is
used as starting value • We need the value of the previous line
• A program is a sequence of lines • We use an inherited attribute L.prev
P |PL
Attributes for the Line Calculator (Cont.) Example of Inherited Attributes
• Each P has a synthesized attribute val P • val synthesized

– The value of its last line
P { P.val = 0 } P L +
| P1 L { P.val = L.val; • prev inherited
L.prev = P1.val } + E3 + =
– Each L has an inherited attribute prev  0
– L.prev is inherited from sibling P1.val • All can be
E4 +
E5 computed in
depth-first
• Example … int2 2 int3 3 order
Example of Inherited Attributes Semantic Actions: Notes (Cont.)
P 5 • val synthesized • Semantic actions can be used to build ASTs
P 0 0 L 5
• And many other things as well
• prev inherited
– Also used for type checking, code generation, …
+ E3 5 =
 0
• All can be • Process is called syntax-directed translation
E4 +
2 E5 3 computed in – Substantial generalization over CFGs
depth-first
int2 2 int3 3 order
5
Constructing An AST Constructing a Parse Tree
• We first define the AST data type • We define a synthesized attribute ast
– Supplied by us for the project – Values of ast values are ASTs
• Consider an abstract tree type with two – We assume that int.lexval is the value of the
constructors: integer lexeme
– Computed using semantic actions
mkleaf(n) = n
E  int E.ast = mkleaf(int.lexval)

PLUS | E1 + E2 E.ast = mkplus(E1.ast, E2.ast)
mkplus( , ) =
| ( E1 ) E.ast = E1.ast
T1 T2 T1 T2
Parse Tree Example Summary
• Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’ • We can specify language syntax using CFG
• A bottom-up evaluation of the ast attribute:
E.ast = mkplus(mkleaf(5), • A parser will answer whether s  L(G)
mkplus(mkleaf(2), mkleaf(3)) – … and will build a parse tree
– … which we convert to an AST
PLUS
– … and pass on to the rest of the compiler
PLUS
5 2 3
Intro to Top-Down Parsing: The Idea Recursive Descent Parsing
• Consider the grammar

• The parse tree is constructed 1 E  T |T + E
– From the top T  int | int * T | ( E )
– From left to right t2 3 t9
• Token stream is: ( int5 )
• Terminals are seen in order of 4 7
appearance in the token • Start with top-level non-terminal E
stream: t5 t6 t8 – Try the rules for E in order
t2 t5 t6 t8 t9
6
Recursive Descent Parsing Recursive Descent Parsing
E  T |T + E E  T |T + E
T  int | int * T | ( E ) T  int | int * T | ( E )
E E
( int5 ) ( int5 )
E  T |T + E E  T |T + E
E E
T T
Mismatch: int is not ( !
int Backtrack …
( int5 ) ( int5 )
E  T |T + E E  T |T + E
E E
T T
Mismatch: int is not ( !
int * T Backtrack …
( int5 ) ( int5 )
7
E  T |T + E E  T |T + E
E E
T T
Match! Advance input.
( E ) ( E )
( int5 ) ( int5 )
E  T |T + E E  T |T + E
E E
T T
Match! Advance input.
( E ) ( E )
( int5 ) T ( int5 ) T
45 46
int
Prof. Aiken CS 143 Lecture 6 Prof. Aiken CS 143 Lecture 6
E  T |T + E E  T |T + E
E E
T T
Match! Advance input. End of input, accept.
( E ) ( E )
( int5 ) T ( int5 ) T
47 48
int int
Prof. Aiken CS 143 Lecture 6 Prof. Aiken CS 143 Lecture 6
8
A Recursive Descent Parser. Preliminaries A Recursive Descent Parser (2)
• Let TOKEN be the type of tokens • Define boolean functions that check the token
– Special tokens INT, OPEN, CLOSE, PLUS, TIMES string for a match of
– A given token terminal
• Let the global next point to the next token bool term(TOKEN tok) { return *next++ == tok; }
– The nth production of S:
bool Sn() { … }
– Try all productions of S:
bool S() { … }
A Recursive Descent Parser (3) A Recursive Descent Parser (4)
• For production E  T • Functions for non-terminal T

bool E1() { return T(); } bool T1() { return term(INT); }
bool T2() { return term(INT) && term(TIMES) && T(); }
• For production E  T + E bool T3() { return term(OPEN) && E() && term(CLOSE); }
bool E2() { return T() && term(PLUS) && E(); }
• For all productions of E (with backtracking) bool T() {
TOKEN *save = next;
bool E() {
return (next = save, T1())
TOKEN *save = next;
|| (next = save, T2())
return (next = save, E1()) || (next = save, T3()); }
|| (next = save, E2()); }
Recursive Descent Parsing. Notes. Example
• To start the parser E  T |T + E ( int )

T  int | int * T | ( E )
– Initialize next to point to first token
– Invoke E() bool term(TOKEN tok) { return *next++ == tok; }
bool E1() { return T(); }

bool E2() { return T() && term(PLUS) && E(); }
• Notice how this simulates the example parse
bool E() {TOKEN *save = next; return (next = save, E1())
|| (next = save, E2()); }
bool T1() { return term(INT); }
• Easy to implement by hand bool T2() { return term(INT) && term(TIMES) && T(); }
bool T3() { return term(OPEN) && E() && term(CLOSE); }
bool T() { TOKEN *save = next; return (next = save, T1())

|| (next = save, T2())
|| (next = save, T3()); }
9
When Recursive Descent Does Not Work Elimination of Left Recursion
• Consider a production S  S a • Consider the left-recursive grammar

bool S1() { return S() && term(a); } SS|
bool S() { return S1(); }
• S generates all strings starting with a  and
• S() goes into an infinite loop followed by a number of 
• A left-recursive grammar has a non-terminal S • Can rewrite using right-recursion

S + S for some  S   S’
• Recursive descent does not work in such cases S’   S’ | 
More Elimination of Left-Recursion General Left Recursion
• In general • The grammar

SA|
S  S 1 | … | S n | 1 | … | m
AS
• All strings derived from S start with one of is also left-recursive because
 1,…,m and continue with several instances of S + S  
1,…,n
• Rewrite as • This left-recursion can also be eliminated
S  1 S’ | … | m S’
S’  1 S’ | … | n S’ |  • See Dragon Book for general algorithm
– Section 4.3
Summary of Recursive Descent
• Simple and general parsing strategy

– Left-recursion must be eliminated first
– … but that can be done automatically
• Unpopular because of backtracking

– Thought to be too inefficient
• In practice, backtracking is eliminated by

restricting the grammar
Prof. Aiken CS 143 Lecture 6 59
10

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Error Handling Syntax-Directed Translation Recursive Descent Parsing

Загружено:

Авторское право:

Доступные форматы

Announcements

Prof. Aiken CS 143 Lecture 6 1 Prof. Aiken CS 143 Lecture 6 2

Outline Error Handling

• Extensions of CFG for parsing • Purpose of the compiler is

Syntax Error Handling Approaches to Syntax Error Recovery

• Error handler should • From simple to complex

Prof. Aiken CS 143 Lecture 6 5 Prof. Aiken CS 143 Lecture 6 6

• Simplest, most popular method • Consider the erroneous expression

Prof. Aiken CS 143 Lecture 6 7 Prof. Aiken CS 143 Lecture 6 8

Prof. Aiken CS 143 Lecture 6 9 Prof. Aiken CS 143 Lecture 6 10

Syntax Error Recovery: Past and Present Abstract Syntax Trees

• Past • So far a parser traces the derivation of a

• Consider the grammar E

• After lexical analysis (a list of tokens) • But too much info

Prof. Aiken CS 143 Lecture 6 13 Prof. Aiken CS 143 Lecture 6 14

Example of Abstract Syntax Tree Semantic Actions

PLUS • This is what we’ll use to construct ASTs

• Also captures the nesting structure

Semantic Actions: An Example Semantic Actions: An Example (Cont.)

• Consider the grammar • String: 5 + (2 + 3)

• Semantic actions specify a system of E + • Each node labeled E

• The parser must find the order of evaluation int2 2 int3 3

Prof. Aiken CS 143 Lecture 6 19 Prof. Aiken CS 143 Lecture 6 20

Evaluating Attributes Dependency Graph

• An attribute must be computed after all its E 10

– In previous example attributes can be computed

• Such an order exists when there are no cycles E4 2 +

Prof. Aiken CS 143 Lecture 6 21 Prof. Aiken CS 143 Lecture 6 22

Semantic Actions: Notes (Cont.) Inherited Attributes

• Synthesized attributes • Another kind of attribute

• Grammars with only synthesized attributes • Example: a line calculator

Prof. Aiken CS 143 Lecture 6 23 Prof. Aiken CS 143 Lecture 6 24

• Each line contains an expression • Each E has a synthesized attribute val

Prof. Aiken CS 143 Lecture 6 25 Prof. Aiken CS 143 Lecture 6 26

Attributes for the Line Calculator (Cont.) Example of Inherited Attributes

• Each P has a synthesized attribute val P • val synthesized

Prof. Aiken CS 143 Lecture 6 27 Prof. Aiken CS 143 Lecture 6 28

Example of Inherited Attributes Semantic Actions: Notes (Cont.)

P 5 • val synthesized • Semantic actions can be used to build ASTs

Prof. Aiken CS 143 Lecture 6 29 Prof. Aiken CS 143 Lecture 6 30

E  int E.ast = mkleaf(int.lexval)

Parse Tree Example Summary

Intro to Top-Down Parsing: The Idea Recursive Descent Parsing

• Consider the grammar

Prof. Aiken CS 143 Lecture 6 35 Prof. Aiken CS 143 Lecture 6 36

Recursive Descent Parsing Recursive Descent Parsing

Recursive Descent Parsing Recursive Descent Parsing

Recursive Descent Parsing Recursive Descent Parsing

Recursive Descent Parsing Recursive Descent Parsing

Prof. Aiken CS 143 Lecture 6 49 Prof. Aiken CS 143 Lecture 6 50

A Recursive Descent Parser (3) A Recursive Descent Parser (4)

• For production E  T • Functions for non-terminal T

Recursive Descent Parsing. Notes. Example

• To start the parser E  T |T + E ( int )

bool E1() { return T(); }

bool T() { TOKEN *save = next; return (next = save, T1())

Prof. Aiken CS 143 Lecture 6 53 Prof. Aiken CS 143 Lecture 6 54

• Consider a production S  S a • Consider the left-recursive grammar

• A left-recursive grammar has a non-terminal S • Can rewrite using right-recursion

More Elimination of Left-Recursion General Left Recursion

• In general • The grammar

Prof. Aiken CS 143 Lecture 6 57 Prof. Aiken CS 143 Lecture 6 58