Вы находитесь на странице: 1из 21

AKGEC/IAP/FM/03

Ajay Kumar Garg Engineering College, Ghaziabad


Department of CSE
MODEL SOLUTION PUT
Course:
B.Tech
Session:
2014-15
Subject:
Compiler Design
Max Marks: 100

Semester:
Section:
Sub. Code:
Time:

VI
A, B
ECS-603
3 hour

Note: All questions are compulsory. All questions carry equal marks.
Q1. Attempt any FOUR parts of the following.
4X5=20
a. Explain all phases of compiler with suitable diagram.
Ans. Compiler is a program (written in a high-level language) that converts / translates / compiles
source program written in a high level language into an equivalent machine code.

Phases of Compiler:
Lexical Analyzer:
Lexical Analyzer reads the source program character by character and returns the tokens of the source
program. A token describes a pattern of characters having same meaning in the source program. (such as
identifiers, operators, keywords, numbers, delimiters and so on)
Ex:
newval := oldval + 12
=> tokens: newval
identifier
:=
assignment operator
oldval
identifier
+
add operator
12
a number

Syntax Analyzer:
Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program.
A syntax analyzer is also called as a parser. A parse tree describes a syntactic structure. The syntax of a
language is specified by a context free grammar (CFG). The rules in a CFG are mostly recursive.
A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not.
If it satisfies, the syntax analyzer creates a parse tree for the given program.
Semantic Analyzer:
A semantic analyzer checks the source program for semantic errors and collects the type information for
the code generation.
Type-checking is an important part of semantic analyzer.
Normally semantic information cannot be represented by a context-free language used in syntax analyzers.
Context-free grammars used in the syntax analysis are integrated with attributes (semantic rules)
the result is a syntax-directed translation,
Attribute grammars
Ex:
newval := oldval + 12
The type of the identifier newval must match with type of the expression (oldval+12).
Intermediate Code Generation:
A compiler may produce an explicit intermediate codes representing the source program.
These intermediate codes are generally machine (architecture independent). But the level of intermediate
codes is close to the level of machine codes.
Ex:
newval := oldval * fact + 1
id1 := id2 * id3 + 1
MULT
id2,id3,temp1
Intermediates Codes (Quadraples)
ADD temp1,#1,temp2
MOV temp2,,id1
Code Optimizer:
The code optimizer optimizes the code produced by the intermediate code generator in the terms of time
and space.
Ex:
MULT
id2,id3,temp1
ADD temp1,#1,id1
Code Generator:
Produces the target language in a specific architecture.
The target program is normally is a relocatable object file containing the machine codes.
Ex: ( assume that we have an architecture with instructions whose at least one of its operands is a machine
register)
MOVEid2,R1
MULT id3,R1
ADD #1,R1
MOVER1,id1
b. Differentiate between
(i)
Compiler and interpreter
Ans. A compiler translates a complete source program into machine code. The whole source code
file is compiled in one go, and a complete, compiled version of the file is produced. This can be
saved on some secondary storage medium (e.g. floppy disk, hard disk...). This means that:
The program can only be executed once translation is complete
ANY changes to the source code require a complete recompilation.

An interpreter, on the other hand, provides a means by which a program written in source language can be
understood and executed by the CPU line by line. As the first line is encountered by the interpreter, it is
translated and executed. Then it moves to the next line of source code and repeats the process. This means
that:
The interpreter is a program which is loaded into memory alongside the source program
Statements from the source program are fetched and executed one by one
No copy of the translation exists, and if the program is to be re-run, it has to be interpreted all over
again.
(ii)
Macro processor and Pre processor
Ans. A macro processor is a program that copies a stream of text from one place to another, making a
systematic set of replacements as it does so. Macro processors are often embedded in other programs, such
as assemblers and compilers. Sometimes they are standalone programs that can be used to process any kind
of text. Macro processors have been used for language expansion, for systematic text replacements that
require decision making, and for text reformatting.
A preprocessor is a program that processes its input data to produce output that is used as input to
another program. The output is said to be a preprocessed form of the input data, which is often used by
some subsequent programs like compilers. The amount and kind of processing done depends on the nature
of the preprocessor; some preprocessors are only capable of performing relatively simple textual
substitutions and macro expansions
c. What is bootstrapping? Explain with example.
Ans. In computer science, bootstrapping is the process of writing a compiler (or assembler) in the
target programming language which it is intended to compile. Applying this technique leads to a selfhosting compiler.
Bootstrapping a compiler has the following advantages:

it is a non-trivial test of the language being compiled;


compiler developers only need to know the language being compiled;
improvements to the compiler's back-end improve not only general purpose programs but also the
compiler itself; and
it is a comprehensive consistency check as it should be able to reproduce its own object code.

d. Explain Thompsons construction algorithm.


Ans. Converting A Regular Expression into A NFA (Thomsons Construction):

This is one way to convert a regular expression into a NFA.


There can be other ways (much efficient) for the conversion.
Thomsons Construction is simple and systematic method. It guarantees that the resulting NFA will
have exactly one final state, and one start state.
Construction starts from simplest parts (alphabet symbols). To create a NFA for a complex regular
expression, NFAs of its sub-expressions are combined to create its NFA,
To recognize an empty string

To recognize a symbol a in the alphabet

If N(r1) and N(r2) are NFAs for regular expressions r1 and r2

For regular expression r1 | r2

N(r1)

N(r2)

NFA for r1 | r2

For regular expression r1 r2


i N(r1)

N(r2)

Final state of N(r2) become final state of N(r1r2)

NFA for r1 r2

For regular expression r*

N(r)

NFA for r*
e. Explain the need of lexical analyzer in compilation process. Also explain the concept of input
buffering and preliminary scanning.
Ans. Lexical analyzer is needed in compilation process because of the reasons given below:
Efficiency: A lexer may do the simple parts of the work faster than the more general parser can.
Furthermore, the size of a system that is split in two may be smaller than a combined system. This may
seem paradoxical but, as we shall see, there is a non-linear factor involved which may make a separated
system smaller than a combined system.
Modularity: The syntactical description of the language need not be cluttered with small lexical details
such as white-space and comments.
Tradition: Languages are often designed with separate lexical and syntactical phases in mind, and the
standard documents of such languages typically separate lexical and syntactical elements of the languages.
Input buffering
Lexical analyzer may need to look at least a character ahead to make a token decision.
Buffering: to reduce overhead required to process a single character

Preliminary scanning
Remove comments and white spaces to make effective token recognition.
f. Design NDFA with move for the following regular expression:
((0+1)*10+(00)*(11)*)*
Sol.

Q2. . Attempt any TWO parts of the following.


2X10=20
a. Write the algorithm to find FIRST and FOLLOW for predictive parser. Also write the
algorithm for constructing predictive parsing table.
Ans. FIRST(X):
If X is a terminal symbol

FIRST(X)={X}

If Xa then FIRST(X)={a}. If X is a non-terminal symbol and X is a production rule


is in FIRST(X).

If X is a non-terminal symbol and X Y1Y2..Yn is a production rule if a terminal a in


FIRST(Yi) and is in all FIRST(Yj) for j=1,...,i-1 then a is in FIRST(X). if is in all FIRST(Yj)

for j=1,...,n then is in FIRST(X).


If X is
FIRST(X)={}

If X is Y1Y2..Yn if a terminal a in FIRST(Yi) and is in all FIRST(Yj) for j=1,...,i-1 then a is in


FIRST(X). if is in all FIRST(Yj) for j=1,...,n then is in FIRST(X).

FOLLOW(X):
If S is the start symbol $ is in FOLLOW(S)

if A B is a production rule everything in FIRST() is FOLLOW(B) except

If ( A B is a production rule ) or ( A B is a production rule and is in FIRST() )


everything in FOLLOW(A) is in FOLLOW(B).
Constructing LL(1) Parsing Table -- Algorithm
for each production rule A of a grammar G
for each terminal a in FIRST() add A to M[A,a]

If in FIRST() for each terminal a in FOLLOW(A) add A to M[A,a]

If in FIRST() and $ in FOLLOW(A) add A to M[A,$].


All other undefined entries of the parsing table are error entries.

b. Write algorithm for computing CLOSURE and GOTO function for LR(0) item set. Also find
the LR(0) itemset for the following grammar
EE+T
TTF | F
FF+ | id
Ans. . Augmented Grammar:
G is G with a new production rule SS where S is the new starting symbol.
CLOSURE (I): If I is a set of LR(0) items for a grammar G, then closure(I) is the set of LR(0) items
constructed from I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I).
2. If A .B is in closure(I) and B is a production rule of G; then B. will be in the

closure(I). We will apply this rule until no more new LR(0) items can be added to closure(I).
GOTO (I, X) : If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then
goto(I,X) is defined as follows:

If A .X in I then every item in closure({A X.}) will be in goto (I,X).

Initially add production EE to the production set of the given grammar to make it augmented grammar.
The LR(0) set of items for the given grammar are:
I0: E.E,
E.E+T
I1: GOTO(I0, E)
EE.,
EE.+T
I2: GOTO(I1, +)
EE+.T,
T.TF,
T.F,
F.F+,
F.id
I3:GOTO(I2, T)
EE+T.,
TT.F,
F.F+,
F.id
I4: GOTO(I2, F)

TF.,
FF.+
I5:GOTO(I2, id)
F.id
I6:GOTO(I3, F)
TTF.,
FF.+
I7: GOTO(I4, +)
FF+.
The states of LR(0) items are from I0, ...................,I7.
c. Construct CLR parsing table for the following grammar
S-> Aa | bAc |dc | bda
A-> d
Sol. Initially make it augmented grammar by adding production SS . the LR(1) Item sets are
calculated below
I0: S.S, $, S.Aa, $,
S.bAc, $, S.dc, $,
S.bda, $, A.d, a
I1:GOTO(I0, S)
SS., $
I2: GOTO(I0, A)
SA.a, $
I3:GOTO(I0, b)
Sb.Ac, $, Sb.da, $, A.d, c
I4: GOTO(I0, d)
Sd.c, $,
Ad., a
I5: GOTO(I2, a)
SAa. , $
I6:GOTO(I3, A)
SbA.c, $
I7: GOTO(I3, d)
Sbd.a, $, Ad., c
I8:GOTO(I4, c)
Sdc., $
I9:GOTO(I6, c)
SbAc. , $
I10:GOTO(I7, a)
Sbda., $
CLR parsing table
state
ACTION
GOTO
a
b
c
d
$
S
A
0
s3
s4
1
2
1
acc
2
s5
3
r5
s7
6
4
r5
s8
5
r1
6
s9
7
s10
r5
8
r3
9
r2
10
r4
Q3 Attempt any TWO parts of the following.

2X10=20

a.

Consider the following grammar and give the syntax directed definitions to construct parse
tree. For the input expression 4*7+1*2 construct an annotated parse tree according to your
syntax directed definition:
SE$
EE+T|T
TT*F|F
F digit
Ans. SDT schemes for grammar:
Production

Semantic action

SE$

{print E.VAL}

EE+T

{E.VAL:=E1.VAL+T.VAL}

ET

{E.VAL.:=T.VAL}

TT*F

{T.VAL :=T1.VAL*F.VAL}

TF

{T.VAL :=F.VAL}

Fdigit

{F.VAL :=LEXVAL}

Annotated parse tree:

b. What is 3-address code? Explain types of 3-adress code. Convert following expression into
quadruple, triple and indirect triples.
S = (a + b) / (c d) * (e + f)
Ans. A TAC is:
x := y op z
where x, y and z are names, constants or compiler-generated temporaries; op is any operator.
Types of 3-adress code
Types of three address statements:
Assignment instructions: x = y op z
Includes binary arithmetic and logical operations
Unary assignments:
x = op y
Includes unary arithmetic op (-) and logical op (!) and type conversion
Copy instructions:
x=y
These may be optimized later.
Unconditional jump: goto L
L is a symbolic label of an instruction
Conditional jumps:
if x goto L
and
ifFalse x goto L
Left: If x is true, execute instruction L next
Right: If x is false, execute instruction L next
Conditional jumps:
if x relop y goto L
Procedure calls. For a procedure call p(x1, , xn)
param x1

param xn
call p, n
Indexed copy instructions: x = y[i] and x[i] = y
Left: sets x to the value in the location [i memory units beyond y] (in C)
Right: sets the contents of the location [i memory units beyond y] to x
Address and pointer instructions:
x = &y sets the value of x to be the location (address) of y.
x = *y, presumably y is a pointer or temporary whose value is a location. The value of x is
set to the contents of that location.
*x = y sets the value of the object pointed to by x to the value of y.
The given expression is
S = (a + b) / (c d) * (e + f)
The three address code is:
T1=a+b
T2=c-d
T3=T1/T2
T4=e+f

T5=T3*T4
S=T5

1
2
3
4
5
6

Quadruple:
OP
+
/
+
*
=

ARG1
a
c
T1
e
T3
T5

ARG2
b
d
T2
f
T4

ARG1
a
c
T1
e
T3
T5

ARG2
b
d
T2
f
T4

Triple
1
2
3
4
5
6

OP
+
/
+
*
=

Indirect triples
Statement list
1

14

15

16

17

18

19

RESULT
T1
T2
T3
T4
T5
S

1
2
3
4
5
6

c. Explain syntax directed


statement. Also translate
segment into three address code:
switch ( a+ b)
{
case 2: {x = y; break ;}
case5: switch x
{
case 0: {a=b+1; break; }
case 1: {a=b+3; break;}
default: {a=2; break;}
}
case 9: {x=y-1; break;}
default: {a=2; break;}
}
Ans. SDT scheme for switch case statement

The 3-address code for the given fragment is


1. T1=a+b

OP
+
/
+
*
=

ARG1
A
C
T1
E
T3
T5

ARG2
b
d
T2
f
T4

translation of switch case


the following program

2. Goto 19
3. x=y
4. goto 26
5. goto 23
6. T2=b+1
7. a=T2
8. goto 26
9. T2=b+3
10. a=T2
11. goto 26
12. a=2
13. goto 26
14. T2=y-1
15. x=T2
16. goto 26
17. a=2
18. goto 26
19. if T1=2 goto 3
20. if T1=5 goto 5
21. if T1=9 goto 14
22. goto 17
23. if x=0 goto 6
24. if x=1 goto 9
25. goto 12
26. exit
Q4. Attempt any TWO parts of the following.
2X10=20
a. What is symbol table? Explain various data structure used for symbol table.
Ans. Symbol tables:
Gather information about names which are in a program.
A symbol table is a data structure, where information about program objects is gathered.
Is used in both the analysis and synthesis phases.
The symbol table is built up during the lexical and syntactic analysis.
Help for other phases during compilation:
Semantic analysis: type conflict?
Code generation: how much and what type of run-time space is to be allocated?
Error handling: Has the error message already been issued?
"Variable A undefined"
Symbol table phase or symbol table management refer to the symbol tables storage structure, its
construction in the analysis phase and its use during the whole compilation.
Requirements for symbol table management
Quick insertion of an identifier
Quick search for an identifier

Efficient insertion of information (attributes) about an id


Quick access to information about a certain id
Space and time efficiency
Data structures for symbol tables
Linear lists
Trees
Hash tables
Linear List:

- Search:Search linearly from beginning to end. Stop if found.


- Adding: Search (does it exist?). Add at beginning if not found.
- Effectively: To insert n names and search for m names the cost will be cn (n+m) comparisons. Inefficient.
-Positive
Easy to implement
Uses little space
Easy to represent scoping.
Negative
Slow for large n and m.
Trees:

:
You can have the symbol table in the form of trees as:
Each subprogram has a symbol table associated to its node in the abstract syntax tree.
The main program has a similar table for globally declared objects.
Quicker that linear lists.
Easy to represent scoping.
Hash tables (with chaining)

-Search
Hash the name in a hash function,
h(symbol) [0, k-1]
Where k = table size
If the entry is occupied, follow the link field.
-Insertion
Search + simple insertion at the end of the symbol table (use the sympos pointer).
-Efficiency
Search proportional to n/k and the number of comparisons is (m + n) n / k for n insertions and
m searches. k can be chosen arbitrarily large.
-Positive
Very quick search
-Negative
Relatively complicated
Extra space required, k words for the hash table.
More difficult to introduce scoping.

b. Discuss the following storage allocation strategies


(i)
Stack allocation
Ans. A function's prolog is responsible for allocating stack space for local variables, saved registers, stack
parameters, and register parameters.
The parameter area is always at the bottom of the stack (even if alloca is used), so that it will always
be adjacent to the return address during any function call. It contains at least four entries, but always enough
space to hold all the parameters needed by any function that may be called. Note that space is always allocated
for the register parameters, even if the parameters themselves are never homed to the stack; a callee is
guaranteed that space has been allocated for all its parameters. Home addresses are required for the register
arguments so a contiguous area is available in case the called function needs to take the address of the
argument list (va_list) or an individual argument. This area also provides a convenient place to save register
arguments during thunk execution and as a debugging option (for example, it makes the arguments easy to find
during debugging if they are stored at their home addresses in the prolog code). Even if the called function has
fewer than 4 parameters, these 4 stack locations are effectively owned by the called function, and may be used
by the called function for other purposes besides saving parameter register values. Thus the caller may not save
information in this region of stack across a function call.
If space is dynamically allocated (alloca) in a function, then a nonvolatile register must be used as a frame
pointer to mark the base of the fixed part of the stack and that register must be saved and initialized in the
prolog. Note that when alloca is used, calls to the same callee from the same caller may have different home
addresses for their register parameters.
The stack will always be maintained 16-byte aligned, except within the prolog (for example, after the
return address is pushed), and except where indicated in Function Types for a certain class of frame functions.
The following is an example of the stack layout where function A calls a non-leaf function B. Function A's
prolog has already allocated space for all the register and stack parameters required by B at the bottom of the
stack. The call pushes the return address and B's prolog allocates space for its local variables, nonvolatile
registers, and the space needed for it to call functions. If B uses alloca, the space is allocated between the local
variable/nonvolatile register save area and the parameter stack area.
When the function B calls another function, the return address is pushed just below the home address for
RCX.

(ii)
Heap allocation
Ans. In compiler run time environment, dynamic memory allocation (also known as heap-based
memory allocation) is the allocation of memory storage for use in a computer program during the run-time
of that program. It can be seen also as a way of distributing ownership of limited memory resources among
many pieces of data and code.
Dynamically allocated memory exists until it is released either explicitly by the programmer, or by the
garbage collector. This is in contrast to static memory allocation, which has a fixed duration. It is said that
an object so allocated has a dynamic lifetime.
The task of fulfilling an allocation request consists of finding a block of unused memory of sufficient size.
Problems during fulfilling allocation request
o Internal and external fragmentation.
Reduction needs special care, thus making implementation more complex (see
algorithm efficiency).
o Allocator's metadata can inflate the size of (individually) small allocations;
Chunking attempts to reduce this effect.
Usually, memory is allocated from a large pool of unused memory area called the heap (also called the
free store). Since the precise location of the allocation is not known in advance, the memory is accessed
indirectly, usually via a pointer reference. The precise algorithm used to organize the memory area and
allocate and deallocate chunks is hidden behind an abstract interface and may use any of the methods
described below.
Fixed-size-blocks allocation
Fixed-size-blocks allocation, also called memory pool allocation, uses a free list of fixed-size blocks of
memory (often all of the same size). This works well for simple embedded systems
Buddy blocks
In this system, memory is allocated from a large block in memory that is a power of two in size. If the
block is more than twice as large as desired, it is broken in two. One of the halves is selected, and the
process repeats (checking the size again and splitting if needed) until the block is just large enough.
All the blocks of a particular size are kept in a sorted linked list or tree. When a block is freed, it is
compared to its buddy. If they are both free, they are combined and placed in the next-largest size buddyblock list (when a block is allocated, the allocator will start with the smallest sufficiently large block
avoiding needlessly breaking blocks).
garbage collection (GC) is a form of automatic memory management. It is a special case of resource
management, in which the limited resource being managed is memory. The garbage collector, or just
collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the
program. Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp.[1][2]
Garbage collection is often portrayed as the opposite of manual memory management, which requires the
programmer to specify which objects to deallocate and return to the memory system. However, many
systems use a combination of the two approaches, and other techniques such as stack allocation and region
inference can carve off parts of the problem. There is an ambiguity of terms, as theory often uses the terms
manual garbage collection and automatic garbage collection rather than manual memory management and
garbage collection, and does not restrict garbage collection to memory management, rather considering
that any logical or physical resource may be garbage collected.

c. Explain lexical and syntactic phase errors. Also explain the error recovery technique for both
types of errors.
Ans. Lexical phase error: There are not many errors that can be caught at the lexical level; those you
should be looking for are:
Characters that cannot appear in any token in our source language, such as @ or #.
Integer constants out of bounds (range is 0 to 32767).
Identifier names that are too long (maximum length is 32 characters).
Text strings that are two long (maximum length is 256 characters).
Text strings that span more than one line.
Certain other errors, such as malformed identifiers, could be caught here, or by the parser (the
"interpretation" of the error will be affected by the stage at which the error is caught). The only one
of these errors you are responsible for at this stage is the following: Unmatched right comment
delimiters (*/).
Error Recovery
Lexical analyzer unable to proceed: no pattern matches
Panic mode recovery: delete successive characters from remaining input until token found
Insert missing character
Delete a character
Replace character by another
Transpose two adjacent characters
Syntax Errors:
A Syntax Error occurs when stream of tokens is an invalid string.
In LL(k)or LR(k) parsing tables, blank entries refer to syntax error
How should syntax errors be handled?
1. Report error, terminate compilation not user friendly
2. Report error, recover from error, and search for more errors better
Error Recovery
Error Recovery: process of adjusting input stream so that parsing and Syntax error reported. The
techniques are:
Panic mode - ignore all symbols until a "synchronising" token is found e.g. and "end' or ";", etc.
- simple to implement
- guaranteed to halt
- ignores a lot of code
Phrase level - replace a prefix of current input by string allowing parser to continue. Normally
replaces/deletes delimiters.
- danger of looping
- unable to deal with cases where error is on stack and not on input
Error productions - include extra productions in grammar which recognise commonly occurring
errors.
- requires analysis of language use
- ensures messages and recovery procedures are specific to the actual error
Global correction - compiler carries out minimum number of changes to get a correct program
- algorithms exist to determine minimum change

- requires complete traversal of program before correction


- extremely expensive in time and space
Q5. Attempt any TWO parts of the following.
2X10=20
a. What are code improving transformation techniques for code optimization? Explain with
examples.
Ans. The code improving transformations are
b. What is basic block and flow graph? Consider the following three address statements:

Generate flow graph for above code.


Sol: Basic blocks: basic blocks is the sequence of consecutive statements which may be entered at
beginning, and when entered are executed in sequence without halt or possibility of branch.
Flow Graphs: flow graph define the basic blocks and its successive relationship. It has
The nodes of the flow graph are the basic blocks.
There is an edge from block B to block C if and only if it is possible for the first instruction in block
C to immediately follow the last instruction in block B .
There are two ways that such an edge could be justified:
There is a conditional or unconditional jump from the end of B to the beginning of C .
C immediately follows B in the original order of the three-address instructions, and B does
not end in an unconditional jump.

c. Discuss the use of algebraic laws in code optimization. Draw DAG for the following
expression.
a+a*(b-c)+(b-c)*d
Sol: Value number and algebraic laws:
Eliminate computations

Reduction in strength

Constant folding
2*3.14 = 6.28 evaluated at compile time
Other algebraic transformations
x*y=y*x
x>y and x-y>0
a= b+c; e=c+d+b;
e=a+d;

The converted triples form of the given expression is given below

S1:=b-c

S2:=a*S1
S3:=S1*d
S4:=S2+S3
S5:=a+S4
The DAG is shown by the diagram given below.

Вам также может понравиться