Академический Документы
Профессиональный Документы
Культура Документы
Abstract
General Terms
Compiler, Cross-Compiler, Bootstrapping .
Keywords
Compiler Design, Object Oriented Programming,
High Level Language, Design Patterns, Tools.
1.0 Introduction
The study of compiler designing form a central theme in
the field of computer science. An understanding of the
technique used by high level language compilers can give
the programmer a set of skills applicable in many aspects
of software design - one does not have to be a compiler
writer to make use of them.
The compiler writing is not confined to one discipline
only but rather spans several other disciplines:
programming languages, computer architecture, theory of
programming languages, algorithms, etc. This paper is
intended as an introduction to the basic essential features
of compiler writing.
1.2.2 Cross-Compiler
A cross-compiler is a compiler which runs on one
machine and generates a code for another machine. The
only difference between a cross-compiler and a normal
compiler is in terms of code generated by it. For
example, consider the problem of implementing a Pascal
compiler on a new piece of hardware (a computer called
X) on which assembly language is the only programming
language already available. Under these circumstances,
the obvious approach is to write the Pascal compiler in
assembler. Hence, the compiler in this case is a program
that takes Pascal sources as input, produces machine
code for the target machine as output and is written in the
assembly language of the target machine.
This compiler implementation involves a great deal of
work since a large assembly language program has to be
written for X. It is to be noticed in this case that the
compiler is very machine specific; that is, not only does
it run on X but it also produces machine code suitable for
running on X. Furthermore, only one computer is
involved in the entire implementation process.
1.2.3 Bootstrapping
It is a concept of developing a compiler for a language by
using subsets (small part) of the same language.
Suppose that a Modda-2 compiler is required for machine
X, but that the compiler itself is to be coded in Modula-2.
Coding the compiler in the language it is to compile is
nothing special and, as will be seen, it has a great deal in
its favour. Suppose further that Modula-2 is already
available on machine Y. In this case, the compiler can be
run on machine Y, producing object code for machine X.
This is the same situation as before except that the
compiler is coded in Modula-2 rather than Pascal. The
special feature of this approach appears in the next step.
The compiler, running on Y. is nothing more than a large
program written in Modula-2. Its function is to transform
an input file of Module-2 statement into a functionally
equivalent sequence of statements in X's machine code.
Therefore, the source statements of this Module-2
compiler can be passed into itself running on Y to
produce a file containing X's machine code. This file is
of course Module-2 compiler, which is capable of being
run on X. By making the compiler compile itself, a
version of the compiler that runs on X has been created.
Once this machine code has been transferred to X, a selfsufficient Modula-2 compiler is available on X; hence
there is no further use for machine Y for supporting
Module-2 compilation.
This implementation plan is very attractive. Machine Y is
only required for compiler development and once this
development has reached the stage at which the compiler
can (correctly) compile itself, machine Y is no longer
required. Consequently, the original compiler
implemented on Y need not be of the highest quality - for
example, optimization can be completely disregarded.
Further development (and obviously conventional use) of
the compiler can then continue at leisure on machine X.
This approach to compiler implementation is called
bootstrapping. Many languages, including C, Pascal,
FORTRAN and LISP have been implemented in this
way.
Lexical analysis
Syntax analysis
Semantic analysis
constructs can be described by Backens Naur Form (BNF) notations. These types of notations are also called
context-free grammars. Well-formed grammars offer
significant advantages to compiler designer:
3.
Figure 2:
Semantic
Analysis
of an
arithmetic
expression
The semantic analyser can determine the types of the
intermediate results and thus propagate the type attributes
through the tree checking for compatibility as it goes. In
our example, the semantic analyzer first considers the
results of c and d. According to the Pascal semantic rule
integer * real --> real, the * node can be labelled as real.
This is shown in figure 4(c).
Compilers vary widely in the role taken by the semantic
analyzer. In some simpler compilers, there is no easily
identifiable semantic analysis phase, the syntax analyzer
itself does semantic analysis and intermediate code
generation directly. In other compilers syntax analysis,
semantic analysis and code generation is a separate
phase. In the next section we will discuss about code
generation phase.
Producing an
machine language
has the advantage that
absolute
program as output
it can be placed in a
1.4.1 Lex
Lex is a software tool that takes as input a specification
of a set of regular expressions together with actions to be
taken on recognising each of these expressions. The
output of Lex is a program that recognizes the regular
expression and acts appropriately on each Lex is
generally used in the manner depicted in the following
figure 6.
Operator
notation
Example
* (astersk)
a*
Set of all strings of zero or
more a's, i.e. . (empty a, aa,
aaa ... )
| (or)
a|b
Either a or b
a+
Meaning
a?
[,]
[a b c]
a | b | c. An alphabetical
character class such as [a-zl
1.4.2 Yacc
C - Programs
Example: To illustrate how to prepare a Yacc source
program, let us construct a simple desk calculator that
reads an arithmetic expression, evaluates it, and then
prints its numeric value. We shall build the desk
calculator staffing with the following grammar for
arithmetic expressions:
expr
expr + term | term
term
term * factor | factor
factor
(expr) digit
The token digit is a single digit ranging from 0 to 9. A
Yacc desk calculator program derived from this grammar
is shown in figure 8.
1.6 Conclusion
This paper discussed several issues related to the
compiler. The initial discussion focused on the
approaches to the compiler designing phases, which
included lexical analysis, parsing semantic analysis and
the code generation, while the latter part examined two
important software tools Lex and Yacc and also program
development tools which greatly simplify
implementation of a compiler.
References
[1]Aho, A. V. and Ullman, J.D. (2007). Principles of
Compiler Design, Addison-Wesley, Reading, MA.
[2]Backhouse, R. C.(2009). Syntax of Programming
Languages, Theory and Practice, Prentice-Hall,
Englewood Cliffs, N. J.
[3]Barret, W. A. and Couch, J. D. (1999). Compiler
Construction, Theory and Practice, Science
Research Associates, Chicago, ILL.
[4]Bauer, F, L. and Eickel, J. (2007).Compiler
Construction-an Advanced Course, SpringerVerlag, New York.
[5]Bornat, R. (2000). Understanding and Writing
[8]http://www.icse.s5.com/notes/m3.html