Вы находитесь на странице: 1из 9

INTRODUCTION

A translator is a program that reads a program written in one


programming language the source language and translates it into an
equivalent program in another- the target language.
Translator

Source program

Target program

If the source language is a high-level language such as FORTRAN,


COBOL, or C, and the object language is a low-level language such as an
assembly language or machine language, then such a translator is called a
compiler.
Interpreters translate the source to an intermediate language (code) to
be executed by an abstract machine or virtual machine. Interpreters are
often smaller than compilers but have the counterpart virtual machine.
If the source language is the assembly language and the target
language is the machine language, then such a translator is termed as
assembler.
A translator that takes programs in one high-level language into
equivalent

programs

in

another

high-level

language

is

termed

as

preprocessor.
THE CONTEXT OF A COMPILER
The following figure illustrates the context of a compiler in a real time
language processing system that takes the high level language source and
generates machine code.

THE STRUCTURE OF A COMPILER


Conceptually, a compiler operates in phases, each of which transforms
the source program from one representation to another. A phase is a logically
cohesive operation that takes as input one representation of the source
program and produces as output another representation.

1
2
3
Sym
boltabl
e
Man
ager

4
5
6

Source
Progra
Lexic
m al
Synt
Anal
ax
Semanti
yzer
Anal
c
Inter
yzer
Analyze
medi
rCode
ate
Opti
Code
Code
mize
Gene
Gene
Target
rrator
rator
Progra
m

Erro
r
Han
dler

LEXICAL ANALYSIS
The module corresponding to this phase is lexical analyzer or scanner.
The stream of characters making up the source program is read from left-toright and grouped into tokens. A set of strings that confirm to a pattern are
associated with a token. The string is often referred to as lexeme.
Identifiers, keywords, constants, operators, delimiters are typical
tokens. A token is often represented by a pair < token type, attribute of the
token>. Some tokens may not have attributes associated with them.
Consider the assignment
position = initial + rate * 60
The token stream corresponding to this is
5

positi
on

1
0

initial

1
5

rate

2
5

60

<ID,5>
<ASSIGN>
<ID, 10>
<PLUS>
<ID, 15>
<MUL>
<NUM, 25>

Symbol Table

The token types ID, ASSIGN, PLUS, MUL, NUM are indeed the symbolic
constants in the implementation of lexical analyzer. The token ID represents
an identifier and its attribute passed on represents the pointer (index) into
the symbol table that stores a record of information pertaining to the
identifier.

The principle of operation is that the longest prefix corresponding to a


token is to be identified. It may be noted that, in this process the lexical
analyzer may need to read past the prefix that corresponds to a token.
SYNTAX ANALYSIS
The module corresponding to this phase is known as syntax analyzer or
parser. Thus, this phase is also referred to as parsing. The parser groups
tokens returned by the lexical analyzer into syntactic constructs/structures.
The grouping is hierarchical in nature as specified by language specification
and is in general represented by a parse tree. A parse tree may not be
constructed in a physical sense.
The following tree represents the parse tree for the expression seen
earlier, in accordance to the CFG given below that define the syntactic
structure.
Example:

Parse tree

A concise form of the parse tree is

syntax
S ID = E

tree, in which the interior nodes

represent
EE+E

the operators & the operators and

the
E*E
ID

leaves represent the token.


Syntax tree

NUM
Token refer to
terminals of CFG.
SEMANTIC ANALYSIS

Semantic tree

The semantic analysis phase checks the source program for semantic
errors and gathers type information for the subsequent code generation
phase. An important aspect of semantic analysis

is type checking. An error is generated whenever


a real number is used to index an array. Of Course
type cohesions as specified by the language are
performed by the semantic analyzer. The syntax
tree is restructured as shown in figure.
assuming that rate was declared to be of type real.
INTERMEDIATE CODE GENERATION
In a logical level, the output of the syntax analysis phase is some
representation of phase tree. This phase transforms this parse tree into an
explicit intermediate representation as a program for an abstract machine.
The Intermediate representation can have different forms one of which is
Three-Address code. The three address code for the about construct is
temp1 = Intoreal(ID25)
temp2 = ID15 * temp1
temp3 = ID10 + temp2
ID5 = temp3
Physical this can be represented by a quadruple array as follows
The numbers represent the indices into the
Symbol table.

CODE OPTIMIZATION

OP

ARG1

ARG2

INTORE
AL
MUL
PLUS

25
15
10
10

-35
36
--

RESU
LT
35
36
37
37

The code optimization phase attempts to improve the intermediate


code so that faster running machine code will result. For example the
above code can be optimized as
temp1 = ID15 * 60.0
ID10 = ID10 + temp1
Some of the other possible optimizations are
a. Local optimization
b. Loop optimization
One example of local optimization is that the following three code

if A > B GOTO L2
GOTO L3
L2:
Can be optimized to
if A < = B GOTO L3
Another example of local optimization is common sub expression elimination.
The high level language expressions
A=B+C+D
E=B+C+F
Can be transformed to
T1 = B + C
A=T+D
E = T1 + F
An example of loop optimization involves the identifications of loop
invariants and displacing them, external to the loop.
CODE GENERATION
The final phase is the code generation phase. This phase cogenerates
either assembly code or relocatable machine code. Memory locations are
selected for each of the variable used by the program. Then the intermediate
instructions are each translated to a sequence of assembly or machine
instructions. For the above intermediate code, the target code is
MOVF ID15, R2
MULF #60.0, R2
MOVF ID10, R1
ADDF R2, R1
MOVF R1, ID5
SYMBOL TABLE
It is a data structure used by a compiler to keep track of scope/ binding
information about names. These names are used in the source program to
identify the various program elements, like variable, constants, procedures,
and the labels of statements. The symbol table is searched every time a

name is encountered in the source text. When a new name or new


information about an existing name is discovered, the content of the symbol
table changes. Therefore, a symbol table must have an efficient mechanism
for accessing the information held in the table as well as for adding new
entries to the symbol table.
ERROR HANDLER
It is important task of compiler must perform is the detection and
reporting of errors in the source program. Every phase of a compilation
expects the input to be in a particular format, and whenever that input is not
in the required format, an error is returned. The error message should allow
the programmer to determine exactly where the error has occurred. Errors
can be encountered by virtually all of the phases of a compiler. For example
o Lexical analyzer report invalid characters sequences
o Syntactic analyzer report invalid token sequences
o Semantic analyzer reports type and scope errors, and the like
o Intermediate code generator may detect an operator whose
operands have incompatible types
o Code optimizer, doing control flow analysis, may detect the
certain statements can never be reached.
o Code generator fined a compiler-created constant that is too
large to fit in a word of the target machine.

code generator

MOVF
MULF
MOVF
ADDF
MOVF

id3, R2
#60.0, R2
id2, R1
R2,
R1,

R1

id1

DISTINCTION BETWEEN PHASES AND PASSES


Passes: number of times through a program representation
1-pass, 2-pass, multi-pass compilation
Language become more complex more passes
Phases: conceptual and sometimes physical stages
Symbol table coordinates information between phases
Phases are not completely separate
Semantic phase may do things that syntax phase should
do
Interactions are possible

Вам также может понравиться