Вы находитесь на странице: 1из 18

Syntax analyzer

Syntax analysis
Syntax a programming language describes the proper form/structure of its programs
Is the order of expressions correct as defined by the grammar of the language Rules which determine whether a given string is valid or not

Determines the structure of the source string Generate error


omissions, wrong order of tokens

Recover from some of the errors

Errors Handling
Detection
Finding position at which they occur

Reporting
Clear / accurate presentation

Recovery
How to skip over to continue and find later errors Cannot impact compilation of correct programs

Basic issues in Parsing


Specification of the syntax of a language
Precise and unambiguous Cover all syntactical details Described using context free grammar

Representation of output after parsing


The parse tree

Parsing algorithms
Top down Bottom up

From text to abstract syntax


program text 5 + (7 * x)
Lexical Analyzer

token stream Grammar: E id E num E E + E E E * E E ( E )

num

num

id

Parser
syntax error valid

parse tree
E E num * + ( E x num E E * ) E id
5

+ num 7

Abstract syntax tree

Context Free Grammar


1. A set of terminal symbols, "tokens."
The terminals are the elementary symbols of the language defined by the grammar.

2. A set of nonterminals, "syntactic variables." 3. A set of productions, where


nonterminal, head or (LHS) of the production, an arrow, (start symbol) sequence of terminals and/or

Context free grammar


The basic CFG rule form is:
XY1 Y2 Y3 Yn X is a nonterminal (start symbol) Ys may be nonterminals or terminals.

Notation expression id | num | ( expression ) | expression operator expression operator + | - | * | /

CFG
Nonterminals:
Expression, operator

Terminals:
id, num, +,-,*,/ (id and num are treated as tokens)

Sentence: id * num +num (x*2+3) The language has 6 rules (productions)


Production rule (substitution rule)

terminal symbol terminal and

CFG
Grammar rules: E id E num E E + E E E * E E ( E )

Derivation
Derivation provides means for generating sentences of the language A grammar derives strings by beginning with the start symbol and repeatedly replacing a nonterminal by the body of a production for that nonterminal. The terminal strings that can be derived from the start symbol form the language defined by the

CFG
Grammar rules: E id E num E E + E E E * E E ( E ) Derivation: E E+E 1+E 1+E*E 1+2*E 1+2*3

Parsing
Parsing is the problem of taking a string of terminals and figuring out how to derive it from the start symbol of the grammar, If it cannot be derived from the start symbol of the grammar, then reporting syntax errors within the string.

Parse Tree
Grammar rules: E ( E ) E E * E E E + E E num E id
E

E num

E num

E id

Leftmost derivation
Grammar rules:

E id E num E E + E E E * E E ( E )

Derivation: E E+E 1+E 1+E+E 1+2+E 1+2*3

Parse tree:
E

1 E

14

Rightmost derivation
Derivation: E E*E E*3 E+E*3 E+2*3 1+2*3
Parse tree:
E

3 E

Ambiguity
CFG is ambiguous if produces more than one parse tree for the same sentence Grammar is ambiguous when the same nonterminal appears twice on RHS E E + E|E*E|(E)|id|num

Ambiguity
Grammar rules: E id E num E E + E E E * E E ( E ) Leftmost derivation Derivation: E E+E 1+E 1+E+E 1+2+E 1+2*3 Parse tree:
E E 1 + E 2 E * E 3

Rightmost derivation Derivation: E E*E E*3 E+E*3 E+2*3 1+2*3 Parse tree:
E E E 1 + * E 2 E 3

17

Eliminating Ambiguity
Ambiguous grammar: E id E num E E + E E E * E E ( E ) Non-ambiguous grammar: E E + T //new non terminal E T T T * F T F F id F ( E )

Derivation: E E+T 1+T 1+T*F 1+F*F 1+2*F 1+2*3


18

Parse tree:
E E T F 1 + T F 2 T * F 3