Вы находитесь на странице: 1из 48

Compiler Design

VI Sem (CSE )

Course Outline
Compiler phases Lexical Analysis, Syntax Analysis,

semantic Analysis, IM Code generation, Code


Generation, Code Optimization
Lexical Analysis
Top Down & Bottom Up Parsing
Semantic Analysis
Storage management & Symbol Table management
IMCG & Code Generation
Data Flow Analysis & Code Optimization Techniques

Preliminaries Required
Basic knowledge of programming languages.
Basic knowledge of FSA and CFG.

Knowledge of a high programming language for the

programming assignments.
Textbook:

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,


Compilers: Principles, Techniques, and Tools

Lecture Outline
Language Processors
The Structure of a Compiler

Language Processors
A compiler

source program

Compiler

target program

Running the target program

input

Target Program

output

An interpreter
Much slower program execution
Better error diagnostics
source program
input

Interpreter

output

A hybrid compiler, e.g. Java Virtual Machine


source program

Translator

intermediate
program
input

Virtual
Machine

output

A Language Processing System


source program

Preprocessor
modified source program

Compiler
target assembly program

Assembler
relocatable machine code

Linker/Loader
target machine code

library files
relocatable object files

Preprocessors: Produce input for compilers. Typical

functionalities are: Macro processing, File Inclusion.


Modified source program is given as an input to
compiler. Some compilers produces equivalent code in
assembly language that is passed to assembler for
further processing.
Other compilers performs the job of assembler ,
producing machine code that is directly passed to
linker/loader

Loaders and Link-Editors


Loader: taking relocatable machine code, altering the

addresses and placing the altered instructions


into memory.
Link-editor: taking many (relocatable) machine code

programs (with cross-references) and produce a single


file.
Need to keep track of correspondence between variable

names and corresponding addresses in each piece of


code.

A Typical C Program Development Environment


Phases of C Programs:
Editor

Disk

Preprocessor

Disk

Compiler

Disk

Linker

Disk

1. Program is created in the


editor and stored on disk
2. Preprocessor program
processes the code
3. Compiler creates object
code and stores it on disk.
4. Linker links the object
code with the libraries

Primary Memory

2. Preprocess
3. Compile
4. Link

Loader

5. Loader puts program


in memory.

Disk

Primary Memory
CPU

1. Edit

6. CPU takes each instruction


and executes it, possibly
storing new data values as
the program executes

5. Load

6. Execute

A Typical C Program Development Environment (cont.)


Procedure to Prepare a C Program for Execution
Enter the program code and
save as a source (*.c) file using
Word Processor (editor)

Source (.c)
file on disk
(Format: text)

Compiler attempts to
translate the program
into machine code
Success

New object
(*.obj) files
(Format: binary)

The linker links the


new object file with
other object files

Failure

Revised
source file

Correct
syntax
errors

List of errors

Other object
(*.obj) files

Input data

Executable
(*.exe, *.out) file
(Format: binary)

The loader places the


executable file into
memory

Executable
program in
memory

Results

Introduction to Compilers

As a Discipline, Involves Multiple CS&E Areas


Programming Languages and Algorithms

Theory of Computing & Software Engineering


Computer Architecture & Operating Systems

Has Deceivingly Simplistic Intent:

Source
program

Compiler

Error messages
Diverse & Varied

Target
Program

The Structure of a Compiler


Analysis
Front end
Using a grammatical structure to create an intermediate representation

Collecting information about the source program in a symbol table

Synthesis
Back end
Constructing the target program from the intermediate representation and

the symbol table

Classifications of Compilers
Compilers Viewed from Many Perspectives

Single Pass
Multiple Pass

Construction

Load & Go

Debugging
Optimizing

Functional

However, All utilize same basic tasks to accomplish their

actions

The Many Phases of a Compiler


Source Program

3
Symbol-table
Manager

Lexical
Analyzer
Syntax Analyzer

Semantic Analyzer
Error Handler

Intermediate
Code Generator

Code Optimizer

Code Generator
1, 2, 3 : Analysis - Our Focus
4, 5, 6 : Synthesis
Target Program

Phases of A Compiler
Source
Program

Lexical
Syntax Semantic
Analyzer Analyzer Analyzer

Intermediate
Code
Code Generator Optimizer

Code
Generator

Target
Program

Each phase transforms the source program from one representation


into another representation.
They communicate with error handlers.
They communicate with the symbol table.

The Model
The TWO Fundamental Parts:

Analysis: Decompose Source into an


intermediate representation
Synthesis: Target program generation
from representation

The Analysis Task For Compilation


Three Phases:
Linear / Lexical Analysis:

L-to-r Scan to Identify Tokens


lexeme: sequence of chars having a collective meaning forming a token

Hierarchical Analysis:

Grouping of Tokens Into Meaningful Collection

Semantic Analysis:

Checking to ensure Correctness of Components

Phase 1. Lexical Analysis


Easiest Analysis - Identify tokens which
are the basic building blocks
For
Example: Position := initial + rate * 60
_______ __ _____ _ ___ _ __ _

All are tokens


Blanks, Line breaks, etc. are scanned out

Lexical Analyzer
Lexical Analyzer reads the source program character by

character and returns the tokens of the source program.


A token describes a pattern of characters having same
meaning in the source program. (such as identifiers,
operators, keywords, numbers, delimeters and so on)
Ex: newval := oldval + 12 => tokens:
newval
identifier
:=
oldval
+
12

assignment operator
identifier
add operator
a number

Puts information about identifiers into the symbol table.


Regular expressions are used to describe tokens (lexical

constructs).
A (Deterministic) Finite State Automaton can be used in
the implementation of a lexical analyzer.

Phase 2. Hierarchical Analysis


aka Parsing or Syntax Analysis

Syntax Analyzer (CFG)


The syntax of a language is specified by a context free

grammar (CFG).
The rules in a CFG are mostly recursive.
A syntax analyzer checks whether a given program satisfies
the rules implied by a CFG or not.
If it satisfies, the syntax analyzer creates a parse tree for the

given program.

Ex: We use BNF (Backus Naur Form) to specify a CFG


assgstmt -> identifier := expression
expression -> identifier
expression -> number
expression -> expression + expression
expression - > expression * expression

What is a Grammar?
Grammar is a Set of Rules Which Govern the

Interdependencies & Structure Among the Tokens


statement

is an

assignment statement, or
while statement, or if
statement, or ...

assignment statement is an

identifier := expression ;

expression

(expression), or expression +
expression, or expression *
expression, or number, or
identifier, or ...

is an

parse tree- IR form after Syntax analysis


For previous example,
assignment
we would have
statement
Parse Tree:
identifier
position

:=

expression
identifier
initial

expression
+

expression
*
expression
expression
identifier
rate

number
60

Nodes of tree are constructed using a grammar for the language

Syntax Analyzer
A Syntax Analyzer creates the syntactic structure

(generally a parse tree) of the given program.


A syntax analyzer is also called as a parser.
A parse tree describes a syntactic structure.
In a parse tree, all terminals are at leaves.

assgstmt

identifier
newval

:=

All inner nodes are non-terminals in


a context free grammar.

expression
expression

expression

identifier

number

oldval

12

Syntax Analyzer versus Lexical


Analyzer
Which constructs of a program should be recognized by

the lexical analyzer, and which ones by the syntax analyzer?


Both of them do similar things; But the lexical analyzer deals

with simple non-recursive constructs of the language.


The syntax analyzer deals with recursive constructs of the
language.
The lexical analyzer simplifies the job of the syntax analyzer.
The lexical analyzer recognizes the smallest meaningful units
(tokens) in a source program.
The syntax analyzer works on the smallest meaningful units
(tokens) in a source program to recognize meaningful
structures in our programming language.

Parsing
Techniques
Depending on how the parse tree is created, there are different
parsing techniques.
These parsing techniques are categorized into two groups:

Top-Down Parsing:
Construction of the parse tree starts at the root, and proceeds towards

the leaves.
Efficient top-down parsers can be easily constructed by hand.
Recursive Predictive Parsing, Non-Recursive Predictive Parsing (LL
Parsing).

Bottom-Up Parsing:
Construction of the parse tree starts at the leaves, and proceeds towards
the root.
Normally efficient bottom-up parsers are created with the help of some
software tools.
Bottom-up parsing is also known as shift-reduce parsing.
Operator-Precedence Parsing simple, restrictive, easy to implement
LR Parsing much general form of shift-reduce parsing, LR, SLR, LALR

Phase 3. Semantic Analysis


Find More Complicated Semantic Errors and

Support Code Generation


Parse Tree:=Is Augmented With:=Semantic Actions
position

position

initial

initial

rate

60

*
rate inttoreal
60

Compressed Tree

Conversion Action

Phase 3. Semantic Analysis


Most Important Activity in This Phase:
Type Checking - Legality of Operands

Many Different Situations:

Real := int + char ;


A[int] := A[real] + int ;
while char <> int
. Etc.

do

Semantic Analyzer
A semantic analyzer checks the source program for

semantic errors and collects the type information for the


code generation.
Type-checking is an important part of semantic analyzer.
Normally semantic information cannot be represented by a
context-free language used in syntax analyzers.
Context-free grammars used in the syntax analysis are
integrated with attributes (semantic rules)
the result is a syntax-directed translation,
Attribute grammars

Ex: newval := oldval + 12

The type of the identifier newval must match with type of the
expression (oldval+12)

The Synthesis Task For Compilation


Intermediate Code Generation
Abstract Machine Version of Code - Independent of
Architecture
Easy to Produce and Do Final, Machine Dependent

Code Generation

Final Code Generation


Generate Relocatable Machine Dependent Code

Intermediate Code Generation

Intermediate codes are machine independent codes, but they


are close to machine instructions.

The given program in a source language is converted to an


equivalent program in an intermediate language by the
intermediate code generator.

Intermediate language can be many different languages, and


the designer of the compiler decides this intermediate
language.
syntax trees can be used as an intermediate language.
postfix notation can be used as an intermediate language.
three-address code (Quadruples) can be used as an
intermediate language
some programming languages have well defined
intermediate languages.
java java virtual machine
prolog warren abstract machine
In fact, there are byte-code emulators to execute
instructions in these intermediate languages.

Three Address Code

Statements of general form x:=y op z


No built-up arithmetic expressions are allowed.
As a result, x:=y + z * w

should be represented as
t1:=z * w
t2:=y + t1
x:=t2
Observe that given the syntax-tree or the dag of the
graphical representation we can easily derive a three
address code for assignments as above.
In fact three-address code is a linearization of the tree.
Three-address code is useful: related to machine-language/
simple/ optimizable.

Code Optimization
Why
Reduce programmers burden

Allow programmers to concentrate on high level concept


Without worrying about performance issues

Target
Reduce execution time
Reduce space
Sometimes, these are tradeoffs
Types
Intermediate code level

We are looking at this part now

Assembly level

Instruction selection, register allocation, etc.

Code Optimization
Scope
Peephole analysis

Within one or a few instructions

Local analysis

Within a basic block

Global analysis

Entire procedure or within a certain scope

Inter-procedural analysis

Beyond a procedure, consider the entire program

Code Optimization
Techniques
Constant propagation
Constant folding
Algebraic simplification, strength reduction
Copy propagation
Common subexpression elimination
Unreacheable code elimination
Dead code elimination
Loop Optimization

Code Generation
Must generate code executable by target machine
Most complex phase of compiler
Typically code generation includes an intermediate

representation.
Machine Code Generator should translate all the
instructions in intermediate representation to assembly
language.

Reviewing the Entire Process


position := initial + rate * 60
lexical analyzer
id1 := id2 + id3 * 60
syntax analyzer

:=

id1

id2l

*
id3

60

semantic analyzer

:=
Symbol
Table
position ....

initial .
rate.

id1

+
id2l

*
id3

inttoreal

60
intermediate code generator

E
r
r
o
r
s

Reviewing the Entire Process


Symbol Table
position ....
initial .
rate.

ERRORS
intermediate code generator

temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
code optimizer

3 address code

temp1 := id3 * 60.0


id1 := id2 + temp1
final code generator
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R1, R2
MOVF R1, id1

Supporting Phases/
Activities for Analysis
Symbol Table Creation / Maintenance
Contains Info (storage, type, scope, args) on Each
Meaningful Token, Typically Identifiers
Data Structure Created / Initialized During Lexical Analysis
Utilized / Updated During Later Analysis & Synthesis
Error Handling
Detection of Different Errors Which Correspond to All
Phases
What Kinds of Errors Are Found During the Analysis Phase?
What Happens When an Error Is Found?

Anatomy of a compiler: Revision


Program (character stream)
Lexical Analyzer (Scanner)
Token Stream
Syntax Analyzer (Parser)
Parse Tree
Semantic Analyzer
Intermediate Representation
Intermediate Code Optimizer
Optimized Intermediate Representation
Code Generator

Assembly code

Вам также может понравиться