Вы находитесь на странице: 1из 31

Welcome to

CS4212: Compiler Design


1
Overview
Web-page:
http://www.comp.nus.edu.sg/sulzmann/cs4212
Lecturer: Martin Sulzmann (sulzmann@comp.nus.edu.sg)
Lecture: Monday 10am-12pm, LT 34
Tutorial(s): will be xed soon At the beginning, I will make
use of the tutorial time slot for some additional lectures.
Assessment: Midterm, project, nal exam?
We will talk about this in detail next week.
2
Course Material
Main text: Compilers: Principles, Techniques, and Tools by
Aho, Sethi and Ullman (available via co-op).
We will/might make use of additional material.
Lecture notes and further material will be provided.
This course involves theoretical and practical exercises
(project!). You can choose your preferred programming
language to carry out the implementation tasks.
3
Semester Plan
Lexical analysis, DFAs (minimization), NFAs, regular expressions etc.
Top-down and bottom-up parsing.
LALR parsing.
Semantic processing, attribute grammars.
Symbol tables, type checking.
Run-time Environment/Activation Records.
Code generation, virtual machine.
...
4
What to expect
One focuss will be on lexical analysis and parsing. For the
remaining subjects, we will only loosely stick to the text book.
Another focuss will be on security in programming languages
(PLs).
This course is not only about compiler hacking. We also
discuss programming language design issues.
5
Why study compiler design
Only a few people will ever write their own compiler. So why
bother?
A competent computer professional knows about high-level
programming and about hardware.
A compiler connects the two.
Understanding compilation techniques is essential for
understanding how PLs and computers hang together.
Many applications contain little languages for customization
and exible control, e.g. word macros, scripts for graphics&
animation, data layout description, ...
Compiler techniques are needed to properly design and
implement these extension languages.
6
Why study compiler design
Data formats are also formal languages, e.g. HTML, XML.
Compiler techniques are useful for reading, manipulating
and writing data.
Besides, compilers are excellent examples of large and
complex systems
which can be specied rigorously,
which can be implemented only by combining theory
and practice.
Formal specication becomes very important!
7
The task of a compiler
The main task of a compiler is to map programs written in a
given source language into a a target language.
Often, the source language is a high-level PL and the target
language is a machine language.
Exceptions: Source-to-source translators (e.g. Java C),
data manipulation in XML.
Part of the task of a compiler is to detect whether a given
program conforms to the rules of the source language.
Formal specication important!
A specication of a compiler consists of
A specication of its source- and target languages.
A specication of a mapping between them.
8
Languages
A language is a set of strings (sentences).
Each string in a language has a structure which can be
described by a tree.
Structure rules for strings (sentences) are described by a
grammar.
E.g.
The sentences of a PL are (legal) programs.
Programs are sentences of words (or symbols, tokens),
their structure is given by a context-free grammar.
Words themselves are sequences of characters, their
structure is given by a regular grammar.
9
Compiler structure
Roughly as follows:
Lexical analysis
Token sequence
Syntax analysis
Structure tree
Type checking
Attributed structure tree
Intermediate code generation
Intermediate code sequence
Optimization
Optimized ...
Target code generation
Target code sequence
We will spent a signi cant amount of time on lexical and syntax analysis.
10
Example
x= x + 1 Source program

(Identier x) Assign (Identier x) Plus (Const 1) Token sequence

. . .
LD R1 X Target program
LD R2 1
ADD R3 R2 R1
STR R3 X
11
Formal languages
A language is formally dened by :
A set T of terminal symbols.
A set N of non-terminal symbols.
A set P of syntactic rules (or production rules).
A start symbol S.
We dene a grammar G by G = (T, N, P, S).
12
The language of context-free grammars
Expr = Expr Operand Expr
Expr = Identifier
Expr = Constant
Expr = ( Expr )
Operand = + | * | - | /
Identifier = Char Char | Char
Char = A | ... | Z | a | ... | z
Constant = Number Number | Number
Number = 0 | ... | 9
This syntax was originally developed by J. Backus and P. Naur
for the denition of Algol 60. Commonly, called Backus-Naur
form or BNF.
Exercise: Determine start, terminal and non-terminal symbols.
13
Extended Backus Naur Form
Grammars can often be simplied and shortened by using two
more constructs:
{x} expresses repitition, zero, one ore more occurences of
x.
[x] expresses option, zero, or one occurence of x.
The resulting formalism is called extended Backus Naur form or
EBNF.
14
Warm up: IMP a simple imperative language
Concrete syntax (already simplied):
com = skip | x := exp | if exp then com else com
| if exp then com | while exp do com
| var x := exp ; com | com ; com
exp = v | x | exp op exp
op = + | - | * | / | = | < | &&
v = i | true | false
i = 1 | 2 | ...
x = l {l}
l = a | ... | z
Problems:
Whats the meaning of
if e1 then c1; if e2 then c2 else c3
15
IMP Grammer
Compare
if e1 then c1; (if e2 then c2 else c3)
to
if e1 then (c1; if e2 then c2) else c3
Grammar has conicts!
16
Examples
var x:= 1;
var y:= 2;
var z:= if x < y then true else y; skip
var x:= 1;
var x:= x < 1; skip
Are the above programs valid, whats their meaning?
We need to provide a formal specication of IMP.
17
Abstract syntax
Variables x
Numbers i
Values v ::= i | true | false
Operators o ::= +| | | / |=|<|
Expressions e ::= v | x | eoe | e
Commands c ::= skip | x := e | c; c | if ethen celse c |
while edo c | var x := e; c
Assume that
if exp then com
has been translated to
if exp then com else skip
18
Side conditions
Types of operands and operators must be compatible.
If-then-else condition must be a Boolean expression.
Types of variables and expression in an assignment must
be compatible.
We will employ a type system to enforce these side conditions!
Types classify values!
Here: Int, Bool, Cmd
Clauses e : state expression e is well-typed (with type )
under type environment where {Int, Bool}.
Similarly, we have c : Cmd
={x
1
:
1
, . . . , x
n
:
}
where x
i
are free variables.
19
Rules for expressions
(Taut) i : Int true : Bool false : Bool
(Var)
(x : )
x :
(IOP)
e
1
: Int e
2
: Int o {+, , , /}
e
1
oe
2
: Int
20
Rules for expressions
(CMP)
e
1
: Int e
2
: Int o {<, =}
e
1
oe
2
: Bool
(AND)
e
1
: Bool e
2
: Bool
e
1
e
2
: Bool
(NOT)
e : Bool
e : Bool
21
Rules for commands
(SKIP) skip : Cmd
(ASSIGN)
x : e :
x := e : Cmd
(SEQ)
c
1
: Cmd c
2
: Cmd
c
1
; c
2
: Cmd
22
Rules for commands
(IF)
e : Bool c
1
: Cmd c
2
: Cmd
if ethen c
1
else c
2
: Cmd
(WHILE)
e : Bool c : Cmd
while edo c : Cmd
(NEWVAR)
e : , x : c : Cmd
var x := e; c : Cmd
We dene , x : = {y :

| y = x}{x : }.
For example, {x : Bool}, x : Int ={x : Int}.
23
IMP semantics
We employ an operational semantics.
Clause S c S

states that given S is the state before


executing c, then S

is the state after the execution of c.


State: mapping from variables to values
v ::= i | true | false
State operations: lookup, update
lookup: S(x)
update: (S|x v)
(S|x v)(y) =

v if x = y
S(y) otherwise
24
Operational semantics of expressions
(VALUE) S i i S false false S true true
(VAR) S x v (S(x) = v)
(IOP)
S e
1
v
1
S e
2
v
2
v = v
1
+v
2
S e
1
+e
2
v
S e
1
v
1
S e
2
v
2
v = v
1
v
2
S e
1
e
2
v
S e
1
v
1
S e
2
v
2
v
2
= 0, v = v
1
v
2
S e
1
/e
2
v
25
Operational semantics of expressions
(CMP)
S e
1
v
1
S e
2
v
2
v
1
= v
2
S e
1
= e
2
true
S e
1
v
1
S e
2
v
2
v
1
= v
2
S e
1
= e
2
false
(NOT)
S e true
S e false
S e false
S e true
(AND)
S e
1
true S e
2
v
2
S e
1
e
2
v
2
S e
1
false
S e
1
e
2
false
26
Operational semantics of commands
(SKIP) S skip S
(ASSIGN)
S e v
S x := e (S | x v)
(IF)
S e true S c
1
S

S if ethen c
1
else c
2
S

S e false S c
2
S

S if ethen c
1
else c
2
S

27
Operational semantics of commands
(SEQ)
S c
1
S

c
2
S

S c
1
; c
2
S

(WHILE)
S e false
S while edo c S
S e true S c S

while edo c S

S while edo c S

(NEWVAR)
S e v (S | x v) c S

S var x := e; c (S

| x S(x))
Phew! Now we are in the position to write a compiler for IMP!
28
Operational Semantics Results
Lemma 1 We have that S while edo c S

iff
S if ethen (c; while edo c)else skip S

for any states S and S

.
Theorem 1 (a) Let x f v(e) and S e v, then
(S | x w) e v.
(b) Let x f v(c) and S c S

, then
(S | x w) c (S

| x w).
Theorem 2 IMP is deterministic. For any expression e,
command c and state S there exists at most one v and S

such that S e v and S c S

.
Sound language specication!
29
Other compiler related issues
Resource usage verication: prevent incorrect resource access
Source program:
main(n) = open(f) ; a(f,n) ; close(f)
a(f,n) = write(f);
if (n mod 2 == 0) then read(f);
if (n > 0) then a(f,n-1)
Resource usage policy (specied as DFA):
//
1
open
//
2
close
//
read

write


4

3
read
OO
What could go wrong? 30
Summary
A detailed formal language specication is crucial
Negative examples, see C, C++
Good, ANSI-C and ML. C better, but there are still
ambiguities.
Compiler writer needs to follow language specication
carefully.
Next week: lexical analysis, some theory on DFAs, NFAs,
regular expressions.
31

Вам также может понравиться