Вы находитесь на странице: 1из 18

COMP SCI 4TB3 / 6TB3 Compiler Construction

Emil Sekerinski Department of Computing and Software McMaster University

Copyright Emil Sekerinski, 2013

Objectives Taking this course is not just about writing compilers! Compilers are the missing link in connecting architecture, programming languages, formal languages, and operating systems. Parsing techniques have broader applicability than compilers, e.g. command lines, file formats (XML), protocols. You learn to understand why the syntax of programming languages is defined in a particular way. Understanding memory layout of data types (e.g. arrays, objects) and compilation of control structures (e.g. short circuit evaluation, recursion) is essential to fully understanding efficiency considerations of programming languages. The issues of separate compilation and the (unavoidable) effects of garbage collection are central to judging many languages. You learn to understand the various optimization options of compilers. You learn to understand the implications of "byte code" vs. RICS and "just-in-time" compilation.

Objectives In summary, you will have a deeper understanding of programming languages, which will help for a better programming style, you will be able to implement analysers and interpreters for "small languages" which appear everywhere (e.g. configuration files, query languages), you will be able to write whole compilers for simple processors, you will know if and when to use compiler tools like lex and yacc. Compiler construction is a highly specialised subject. For writing optimising, commercial quality compilers additional study is necessary.

Reference Books Niklaus Wirth. Compiler Construction, Addison-Wesley, 176 pages, 1996. Most of the lectures and assignments are based on this book. It is rather thin, easy to read, and covers most topics. Alfred V. Aho, Monica Lam, Ravi Sethi, Jeffery D. Ullman. Compiler Principles, Techniques, and Tools, Second Edition, Addison-Wesley, 1009 pages, 2007. The revision of the classic book on compiler design, excellent reference for all traditional topics. Dick Grune, Henri E. Bal, Ceriel J. H. Jacobs, Koen G. Langendoen. Modern Compiler Design. Wiley, 754 pages, 2000. Covers imperative, object-oriented, functional, logic, and distributed languages; has been used in this course.

Course Text Andrew W. Appel. Modern Compiler Implementation in Java, Cambridge University Press, 548 pages, 1998. Three versions of the book exist, each covering the same material but using a different programming language for the implementation: Modern Compiler Implementation in Java, Modern Compiler Implementation in C, and Modern Compiler Implementation in ML. The book gives an excellent coverage over all modern issues in compiler design for traditional, object-oriented, and functional programming languages. Steven S. Muchnick. Advanced Compiler Design and Implementation, Morgan Kaufmann Publishers, 856 pages, 1997. Very comprehensive coverage of all issues around code generation and code optimization.

Outline 1. Concepts of Compilation 2. Language and Syntax 3. Regular Languages 4. Analysis of Context-free Languages 5. Syntax-Directed Translation 6. The Construction of a Parser 7. Context-Dependencies 8. A RISC Architecture as Target 9. Expressions and Assignments 10. Conditionals, Iterations, and Boolean Expressions 11. Procedures and Locality 12. Further Data Types 13. Object-Oriented Concepts 14. Modules and Separate Compilation 15. Code Optimization 16. Garbage Collection 17. Virtual Machines 18. Generalized Parsing

1. Concepts of Compilation Task of a compiler: source text (source program, source code) Compiler error messages target program (target code)

In a broader view, the compiler is a program which processes a structured source and generates (simpler structured) target code. Source: programming languages: C, Pascal, Assembler text formatting languages: PDF, TeX, html, RTF scripting languages: bash, emacs, python, JavaScript database query languages hardware description languages machine control languages Target: machine code: MC68000, SPARC assembly language interpreted code: Java Virtual Machine, P-Code special processor code: DSP text formatting languages: PDF, TeX, html, RTF machine tool instructions

Syntax-Directed Translation Starting with Algol 60, the first programming language with a formally defined syntax, the translation process of a compiler is guided by the syntactical structure of the source text: syntaxdirected compilation. All languages since Algol 60 follow the structure of its definition: the structure of symbols in terms of characters and the structure of the language in terms of symbols are defined by a formal grammar. The conditions for type-correct programs (the contextdependencies) and the meaning of programs (the semantics) are part of the definition but not part of that grammar. This leads to following model of compilation: Analysis: recognising the structure of the source text according to the grammar(s) and checking the contextdependencies. Synthesis: generating code for the target processor. Note that in a compiler these activities are intertwined.

Phases of Compilation More precisely, compilation is typically split into a number of consecutive phases. Symbols (also called tokens) are sequences of characters like a number (a sequences of digits), an identifier (a sequence of letters and digits), a keyword (if, while), a separator (e.g. :). Lexical analysis is usually called scanning and syntactic analysis parsing. The corresponding parts of the compiler are referred to as the scanner and parser. analysis source text lexical analysis seq. of symbols syntactic analysis syntax tree contextual analysis syntax tree + context info. intermediate code generation synthesis intermediate code code optimization intermediate code code generation target code
9

Intermediate Representations Suppose following declarations are processed: var pos: integer; procedure update (r: integer); Example: pos := pos + r * 60 lexical analysis idpos becomes idpos plus idr times const60 syntactic analysis assignment idpos idpos contextual analysis assignment idpos,var,integer plus idpos, var, integer times const60, integer
10

plus times idr const60

idr, var, integer

Intermediate Representations R1 := r R2 := 60 R3 := R1 * R2 R4 := pos R5 := R4 + R3 pos := R5

intermediate code generation

code optimization

R1 := r R1 := R1 * 60 R1 := R1 + pos pos := R1

code generation MOV R1,SP+$8 MULI R1,60 ADD R1, $4000 MOV $4000, R1
11

Symbol Table All the context information is stored in the symbol table, e.g. Identifier Class Description Value / Address pos variable type integer absolute at 4000 hex update procedure 1 integer param. absolute at 4 hex r variable type integer relative at 8 hex The symbol table contains all the information given by the declarations for the purpose of type-checking, but later information for code generation is added. Although it is represented as a graph (linked data structure), it is called historically the symbol table.

12

Passes Phases are a conceptual decomposition of the task of a compiler, which does not necessarily reflect the structure of the compiler. Typically, several phases are merged into passes such that no intermediate data structure is necessary between the phases of a pass. Lexical Analysis Syntactic & Context Analysis

Synthesis

Files are traditionally used for passing the data between the passes. Modern compilers use main memory.

13

Single-Pass Compilers Traditionally, compilers would have 4-6 passes in order to keep the memory requirements for each pass down. Modern compilers, for which main memory is not a limitation, are often single-pass compilers, where the various tasks are interleaved. In a syntax-directed translation scheme, the parser is the main program. Other tasks are implemented as modules which are called by the parser, e.g.: imported/used by

Parser

Scanner

Generator

Symbol Table
14

Front End / Back End ... A common and advantageous separation of the tasks is by dividing the compiler into two parts, the font end and the back end. front end Pascal C Java analysis and target independent transformations

syntax tree and context info back end MIPS SPARC PowerPC target dependent code generation

15

Front End / Back End This division helps reducing the efforts for writing compilers for different targets for the same language by sharing the front end, or for different languages for the same target by sharing the back end. Theoretically:

m source languages n target machines

reducing m x n compilers to m front ends + n back ends

In practice, this only works if the languages respectively the targets are sufficiently similar. It is nevertheless a good structuring principle for flexibility.

16

Interpreted Code Compilers A variation of this scheme is when a sequential interpreted representation rather than a hierarchical syntax tree is used. Compiler Pascal byte code Interpreter MIPS SPARC PowerPC C Java e.g. JVM (Java Virtual Machine), .NET, LLVM

For compactness, interpreted codes usually represent each instruction by a single byte, hence are called byte-codes.

17

Cross-Compilers Compilers which produce machine code for a different computer than on which they run are called cross-compilers. These are typically used for programming microcontrollers in embedded applications, but also for general multi-platform program development. (For testing purposes a compiler with two back-ends is particularly useful.)

18

Вам также может понравиться