Вы находитесь на странице: 1из 32

Compiler Design

(Course Code : 10B11CI612)

References
Compilers: Principles, Tools and Techniques by Aho, Sethi and Ullman

Crafting a Compiler in C by Fischer and LeBlanc


Compiler Design in C by Holub

Programming Language Pragmatics by Scott


Engineering a Compiler by Cooper and Torczon

Modern Compiler Implementation (in C and in Java) by Appel


Writing Compilers and Interpreters by Mak

Motivation for doing this course


Language processing is an important component of programming

A large number of systems software and application programs require structured input
Software quality assurance and software testing

Operating Systems (command line processing)


Databases (Query language processing) Type setting systems like Latex, Nroff, Troff, Equation editors, M4 VLSI design and testing

XML, html based systems, Awk, Sed, Emacs, vi . Form processing, extracting information automatically from forms Compilers, assemblers and linkers High level language to language translators Natural language processing Where ever input has a structure one can think of language processing Why study compilers? Compilers use the whole spectrum of language processing technology

Many common applications require structured input during development phase (design of banner programme of Unix) xxxxxxxxx xxxxxxxxx xxxxxxxxx xxx xxx xxx xxx xxx xxx xxxxxxxxx xxxxxxxxx xxxxxxxxx 9x 9x 9x 3b 3x 3b 3x 3b 3x 3b 3x 3b 3x 3b 3x 9x 9x 9x

3 9x 6 3b 3x 3 9x

What will we learn in the course?


How high level languages are implemented to generate machine code. Complete structure of compilers and how various parts are composed together to get a compiler Course has theoretical and practical components. Both are needed in implementing programming languages. The focus will be on practical application of the theory.

Theory of lexical analysis, parsing, type checking, runtime system, code generation, optimization (without going too deep into the proofs etc.)
Emphasis will be on algorithms and data structures rather than proofs of correctness of algorithms. Techniques for developing lexical analyzers, parsers, type checkers, run time systems, code generator, optimization. Use of tools and specifications for developing various parts of compilers

Bit of History
How are programming languages implemented? Two major strategies: Interpreters (old and much less studied) Compilers (very well understood with mathematical foundations) Some environments provide both interpreter and compiler. Lisp, scheme etc. provide Interpreter for development Compiler for deployment

Some early machines and implementations


IBM developed 704 in 1954. All programming was done in assembly language. Cost of software development far exceeded cost of hardware. Low productivity. Speedcoding interpreter: programs ran about 10 times slower than hand written assembly code John Backus (in 1954): Proposed a program that translated high level expressions into native machine code. Skeptism all around. Most people thought it was impossible Fortran I project (1954-1957): The first compiler was released

Modern compilers preserve the basic structure of the Fortran I compiler !!!

Computer Organization

Applications

Compiler
Operating System Hardware Machine

What are Compilers?

High level program

Compiler

Low level code

Goals of translation
Correctness High Level of abstraction

Good performance for the generated code Good compile time performance

Overall View
Compiler is part of program development environment The other typical components of this environment are editor, assembler, linker, loader, debugger, profiler etc. The compiler (and all other tools) must support each other for easy program development
14

Programmer

Source Program

Assembly code

Editor
Programmer Does manual Correction of The code

Compiler

Assembler
Machine Code

Linker
Resolved Machine Code

Loader
Errors, if any
Executable Image Execution on the target machine

Normally end up with error

The first few steps


The first few steps can be understood by analogies to how humans comprehend a natural language
The first step is recognizing/knowing alphabets of a language. For example
English text consists of lower and upper case alphabets, digits, punctuations and white spaces Written programs consist of characters from the ASCII characters set (normally 9-13, 32-126)

The next step to understand the sentence is recognizing words (lexical analysis)
English language words can be found in dictionaries Programming languages have a dictionary (keywords etc.) and rules for constructing words (identifiers, numbers etc.)

Lexical Analysis
Someone the ice breaks final := initial + rate * 60

Someone breaks the ice

id1 := id2 + id3 * num(60)

Syntax Analysis
Someone breaks the ice id1 := id2 + id3 * 60

sentence := subject verb object id1 + id2

id3

60

Someone breaks the ice

Semantic Anaysis
:= Someone plays the piano id1 + * id3

(meaningful)

id2

60

The piano plays someone (meaningless)

:=
id1

+
* id3 i2r 60

id2

Intermediate Code Generation


:= Someone breaks the ice id1 id2 + id3 *

i2r 60

temp1 := i2r ( 60 ) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3

Code Optimization
temp1 := i2r ( 60 ) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3

temp1 := id3 * 60.0 id1 := id2 + temp1

Code Generation

temp1 := id3 * 60.0 id1 := id2 + temp1

movf id3, r2 mulf #60.0, r2 movf id2, r1 addf r2, r1 movf r1, id1

Structure of a Compiler
Front End
Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Generation

Back End
Code Optimization Code Generation

Compiler structure
Compiler
Lexical Analysis

Syntax Semantic IL code Optimizer Code Analysis Analysis Generator generator


IR Abstract Unambiguous Program representation code

Source Program

Token Syntax stream tree

Optimised code Machine specific

Target Program

(Language specific)

Front End

Back End

Symbol Table
Information required about the program variables during compilation
Class of variable: keyword, identifier etc. Type of variable: integer, float, array, function etc. Amount of storage required Address in the memory Scope information

Location to store this information


At a central repository and every phase refers to the repository whenever information is required

data structure called symbol table

Final Compiler structure


Symbol Table

Compiler
Lexical Analysis Syntax Semantic Analysis Analysis IL code
IL Optimiser

Code generator

Source Program

Token stream

code Abstract Unambiguous Syntax Program tree representation

optimised code

Target Program

(Language specific)

Front End

Machine specific

Back End

Advantages of the model


Also known as Analysis-Synthesis model of compilation
Front end phases are known as analysis phases Back end phases known as synthesis phases

Each phase has a well defined work

Each phase handles a logical activity in the process of compilation

Advantages of the model


Compiler is retargetable Source and machine independent code optimization is possible. Optimization phase can be inserted after the front and back end phases have been developed and deployed

Cousins Of Compliers
1. Preprocessors : Provide Input to Compilers
a. Macro Processing #define in C: does text substitution before compiling #define X 3

#define Y A*B+C
#define Z getchar()

b. File Inclusion
#include in C - bring in another file before compiling
defs.h ////// ////// ////// main.c #include defs.h ------------------------////// ////// ////// -------------------------

c. Language Extensions for a Database System


SEQUEL - Database query language embedded in C ## Retrieve (DN=Department.Dnum) where ## Department.Dname = Research

is

Preprocessed

into:

ingres_system(Retr..Research,____,____);

a procedure call in a programming language.

2. Assemblers Assembly code: names are used for instructions, and names are used for memory addresses.
MOV a, R1 ADD #2, R1 MOV R1, b 0001 01 00 00000000 * 0011 01 10 00000010 0010 01 00 00000100 * relocation bit

Two-pass Assembly:
First Pass: all identifiers are assigned to memory addresses (0-offset) e.g. substitute 0 for a, and 4 for b Second Pass: produce relocatable machine code:

3. Loaders and Link-Editors Loader: taking relocatable machine code, altering


the addresses and placing the altered instructions into memory.

Link-editor: taking many (relocatable) machine


code programs (with cross-references) and produce a single file. Need to keep track of correspondence between variable names and corresponding addresses in each piece of code.

Next Lecture
Bootstrapping Cross compiler Single pass Multipass compiler Load go compiler Debugger optimiser