Вы находитесь на странице: 1из 21

Compiler Design

1. Overview

CIS 631, CSE 691, CIS400, CSE 400


Kanat Bolazar
January 19, 2010
Compilers
Compilers translate from a source language (typically a high
level language) to a functionally equivalent target language
(typically the machine code of a particular machine or a
machine-independent virtual machine).
Compilers for high level programming languages are among
the larger and more complex pieces of software
Original languages included Fortran and Cobol
Often multi-pass compilers (to facilitate memory reuse)
Compiler development helped in better programming language design
Early development focused on syntactic analysis and optimization
Commercially, compilers are developed by very large software groups
Current focus is on optimization and smart use of resources for
modern RISC (reduced instruction set computer) architectures.
2
Why Study Compilers?
General background information for good software engineer
Increases understanding of language semantics
Seeing the machine code generated for language
constructs helps understand performance issues for
languages
Teaches good language design
New devices may need device-specific languages
New business fields may need domain-specific
languages

3
Applications of Compiler Technology & Tools

Processing XML/other to generate documents, code, etc.


Processing domain-specific and device-specific languages.
Implementing a server that uses a protocol such as http or
imap
Natural language processing, for example, spam filter,
search, document comprehension, summary generation
Translating from a hardware description language to the
schematic of a circuit
Automatic graph layout (graphviz, for example)
Extending an existing programming language
Program analysis and improvement tools
4
Lexical Analysis
Stream of characters is grouped into tokens
Examples of tokens are identifiers, reserved words, integers, doubles or
floats, delimiters, operators and special symbols

int a;
a = a + 2;

int reserved word


a identifier
; special symbol
a identifier
= operator
a identifier
+ operator
2 integer constant
; special symbol
9
Syntax Analysis or Parsing
Parsing uses a context-free grammar of valid programming
language structures to find the structure of the input
Result of parsing usually represented by a syntax tree

Example of grammar rules:


expression expression + expression |
variable | constant
variable identifier
constant intconstant | doubleconstant |

Example parse tree: =

a +

a 2
10
Semantic Analysis
Parse tree is checked for things that violates the semantic
rules of the language
Semantic rules may be written with an attribute grammar
Examples:
Using undeclared variables
Function called with improper arguments
Number and type of arguments
Array variables used without array syntax
Type checking of operator arguments
Left hand side of an assignment must be a variable (sometimes
called an L-value)
...

11
Intermediate Code Generation
An intermediate code representation often helps contain
complexity of compiler and discover code optimizations.
Typical choices include:
Annotated parse trees
Three Address Code (TAC), and abstract machine language
Bytecode, as in Java bytecode.
Resulting TAC:
Example statements:
_t1 = a > b
if (a <= b) if _t1 goto L0
_t2 = a c
a = _t2
{ a = a c; } L0: _t3 = b * c
C = _t3
c=b*c 12
Intermediate Code Generation (cont'd)

Example statements: Java bytecode (javap -c):

if (a <= b) 55: iload_1


56: iload_2
57: if_icmpgt 64
{ a = a c; }
60: iload_1
c=b*c
61: iload_3
62: isub
Postfix/Polish/Stack: 63: istore_1

v1 v2 JumpIf(>) 64: iload_2


65: iload_3
v1 v3 store(v1)
66: imul
v2 v3 * store(v3) 67: istore_3
13
Code Optimization
Compiler converts the intermediate representation to another
one that attempts to be smaller and faster.
Typical optimizations:
Inhibit code generation for unreachable segments
Getting rid of unused variables
Eliminating multiplication by 1 and addition by 0
Loop optimization: e.g. removing statements not modified in the
loop
Common sub-expression elimination
...

14
Object Code Generation
The target program is generated in the machine language of
the target architecture.
Memory locations are selected for each variable
Instructions are chosen for each operation
Individual tree nodes or TAC is translated into a sequence of
machine language instructions that perform the same task
Typical machine language instructions include things like
Load register
Add register to memory location
Store register to memory
...

15
Object Code Optimization
It is possible to have another code optimization phase that
transforms the object code into more efficient object code.
These optimizations use features of the hardware itself to
make efficient use of processors and registers.
Specialized instructions
Pipelining
Branch prediction and other peephole optimizations
JIT (Just-In-Time) compilation of intermediate code (e.g.
Java bytecode) can discover more context-specific
optimizations not available earlier.

16
Symbol Table
Symbol table management is a part of the compiler that
interacts with several of the phases
Identifiers are found in lexical analysis and placed in the symbol
table
During syntactical and semantical analysis, type and scope
information is added
During code generation, type information is used to determine what
instructions to use
During optimization, the live analysis may be kept in the symbol
table

17
Error Handling
Error handling and reporting also occurs across many phases
Lexical analyzer reports invalid character sequences
Syntactic analyzer reports invalid token sequences
Semantic analyzer reports type and scope errors, and the like
The compiler may be able to continue with some errors, but
other errors may stop the process

18
Compiler / Translator Design Decisions
Choose a source language
Large enough to have many interesting language features
Small enough to implement in a reasonable amount of time
Examples for us: MicroJava, Decaf, MiniJava
Choose a target language
Either a real assembly language for a machine with an assembler
Or a virtual machine language with an interpreter
Examples for us: MicroJava VM (JVM), MIPS (a popular RISC
architecture, for which there is a SPIM simulator)
Choose an approach for implementation:
Either use an existing scanner and parser / compiler generator
lex/flex, yacc/bison/byacc,
Antlr/JavaCC/SableCC/byaccj/Coco/R.
Or implement these yourself (limits the language somewhat) 19
Example MicroJava Program
program P main program; no separate compilation
final int size = 10;
class Table {
int[] pos; classes (without methods)
int[] neg;
}
Table val; global variables
{
void main()
int x, i; local variables
{ //---------- initialize val ----------
val = new Table;
val.pos = new int[size];
val.neg = new int[size];
i = 0;
while (i < size) {
val.pos[i] = 0; val.neg[i] = 0; i = i + 1;
}
//---------- read values ----------
read(x);
while (x != 0) {
if (x > 0) val.pos[x] = val.pos[x] + 1;
else if (x < 0) val.neg[-x] = val.neg[-x] + 1;
read(x);
}
}
} 20
References
Original slides: Nancy McCracken.
Niklaus Wirth, Compiler Construction, chapters 1 and 2
Course notes from H. Mossenback, System Specification and Compiler
Construction, http://www.ssw.uni-linz.ac.at/Misc/CC/
Also notes on MicroJava
Course notes from Jerry Cain, Compilers,
http://www.stanford.edu/class/cs143/
General references:
Aho, A., Lam, M., Sethi, R., Ullman, J., Compilers: Principles,
Techniques and Tools, 2nd Edition, Addison-Wesley, 2006.
Steven Muchnik, Advanced Compiler Design and Implementation,
Morgan-Kaufmann, 1997.
Keith Cooper and Linda Torczon, Engineering a Compiler, Morgan-
Kaufmann, 2003.
21

Вам также может понравиться