Академический Документы
Профессиональный Документы
Культура Документы
Semester:
Section:
Sub. Code:
Time:
VI
A, B
ECS-603
3 hour
Note: All questions are compulsory. All questions carry equal marks.
Q1. Attempt any FOUR parts of the following.
4X5=20
a. Explain all phases of compiler with suitable diagram.
Ans. Compiler is a program (written in a high-level language) that converts / translates / compiles
source program written in a high level language into an equivalent machine code.
Phases of Compiler:
Lexical Analyzer:
Lexical Analyzer reads the source program character by character and returns the tokens of the source
program. A token describes a pattern of characters having same meaning in the source program. (such as
identifiers, operators, keywords, numbers, delimiters and so on)
Ex:
newval := oldval + 12
=> tokens: newval
identifier
:=
assignment operator
oldval
identifier
+
add operator
12
a number
Syntax Analyzer:
Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program.
A syntax analyzer is also called as a parser. A parse tree describes a syntactic structure. The syntax of a
language is specified by a context free grammar (CFG). The rules in a CFG are mostly recursive.
A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not.
If it satisfies, the syntax analyzer creates a parse tree for the given program.
Semantic Analyzer:
A semantic analyzer checks the source program for semantic errors and collects the type information for
the code generation.
Type-checking is an important part of semantic analyzer.
Normally semantic information cannot be represented by a context-free language used in syntax analyzers.
Context-free grammars used in the syntax analysis are integrated with attributes (semantic rules)
the result is a syntax-directed translation,
Attribute grammars
Ex:
newval := oldval + 12
The type of the identifier newval must match with type of the expression (oldval+12).
Intermediate Code Generation:
A compiler may produce an explicit intermediate codes representing the source program.
These intermediate codes are generally machine (architecture independent). But the level of intermediate
codes is close to the level of machine codes.
Ex:
newval := oldval * fact + 1
id1 := id2 * id3 + 1
MULT
id2,id3,temp1
Intermediates Codes (Quadraples)
ADD temp1,#1,temp2
MOV temp2,,id1
Code Optimizer:
The code optimizer optimizes the code produced by the intermediate code generator in the terms of time
and space.
Ex:
MULT
id2,id3,temp1
ADD temp1,#1,id1
Code Generator:
Produces the target language in a specific architecture.
The target program is normally is a relocatable object file containing the machine codes.
Ex: ( assume that we have an architecture with instructions whose at least one of its operands is a machine
register)
MOVEid2,R1
MULT id3,R1
ADD #1,R1
MOVER1,id1
b. Differentiate between
(i)
Compiler and interpreter
Ans. A compiler translates a complete source program into machine code. The whole source code
file is compiled in one go, and a complete, compiled version of the file is produced. This can be
saved on some secondary storage medium (e.g. floppy disk, hard disk...). This means that:
The program can only be executed once translation is complete
ANY changes to the source code require a complete recompilation.
An interpreter, on the other hand, provides a means by which a program written in source language can be
understood and executed by the CPU line by line. As the first line is encountered by the interpreter, it is
translated and executed. Then it moves to the next line of source code and repeats the process. This means
that:
The interpreter is a program which is loaded into memory alongside the source program
Statements from the source program are fetched and executed one by one
No copy of the translation exists, and if the program is to be re-run, it has to be interpreted all over
again.
(ii)
Macro processor and Pre processor
Ans. A macro processor is a program that copies a stream of text from one place to another, making a
systematic set of replacements as it does so. Macro processors are often embedded in other programs, such
as assemblers and compilers. Sometimes they are standalone programs that can be used to process any kind
of text. Macro processors have been used for language expansion, for systematic text replacements that
require decision making, and for text reformatting.
A preprocessor is a program that processes its input data to produce output that is used as input to
another program. The output is said to be a preprocessed form of the input data, which is often used by
some subsequent programs like compilers. The amount and kind of processing done depends on the nature
of the preprocessor; some preprocessors are only capable of performing relatively simple textual
substitutions and macro expansions
c. What is bootstrapping? Explain with example.
Ans. In computer science, bootstrapping is the process of writing a compiler (or assembler) in the
target programming language which it is intended to compile. Applying this technique leads to a selfhosting compiler.
Bootstrapping a compiler has the following advantages:
N(r1)
N(r2)
NFA for r1 | r2
N(r2)
NFA for r1 r2
N(r)
NFA for r*
e. Explain the need of lexical analyzer in compilation process. Also explain the concept of input
buffering and preliminary scanning.
Ans. Lexical analyzer is needed in compilation process because of the reasons given below:
Efficiency: A lexer may do the simple parts of the work faster than the more general parser can.
Furthermore, the size of a system that is split in two may be smaller than a combined system. This may
seem paradoxical but, as we shall see, there is a non-linear factor involved which may make a separated
system smaller than a combined system.
Modularity: The syntactical description of the language need not be cluttered with small lexical details
such as white-space and comments.
Tradition: Languages are often designed with separate lexical and syntactical phases in mind, and the
standard documents of such languages typically separate lexical and syntactical elements of the languages.
Input buffering
Lexical analyzer may need to look at least a character ahead to make a token decision.
Buffering: to reduce overhead required to process a single character
Preliminary scanning
Remove comments and white spaces to make effective token recognition.
f. Design NDFA with move for the following regular expression:
((0+1)*10+(00)*(11)*)*
Sol.
FIRST(X)={X}
FOLLOW(X):
If S is the start symbol $ is in FOLLOW(S)
b. Write algorithm for computing CLOSURE and GOTO function for LR(0) item set. Also find
the LR(0) itemset for the following grammar
EE+T
TTF | F
FF+ | id
Ans. . Augmented Grammar:
G is G with a new production rule SS where S is the new starting symbol.
CLOSURE (I): If I is a set of LR(0) items for a grammar G, then closure(I) is the set of LR(0) items
constructed from I by the two rules:
1. Initially, every LR(0) item in I is added to closure(I).
2. If A .B is in closure(I) and B is a production rule of G; then B. will be in the
closure(I). We will apply this rule until no more new LR(0) items can be added to closure(I).
GOTO (I, X) : If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then
goto(I,X) is defined as follows:
Initially add production EE to the production set of the given grammar to make it augmented grammar.
The LR(0) set of items for the given grammar are:
I0: E.E,
E.E+T
I1: GOTO(I0, E)
EE.,
EE.+T
I2: GOTO(I1, +)
EE+.T,
T.TF,
T.F,
F.F+,
F.id
I3:GOTO(I2, T)
EE+T.,
TT.F,
F.F+,
F.id
I4: GOTO(I2, F)
TF.,
FF.+
I5:GOTO(I2, id)
F.id
I6:GOTO(I3, F)
TTF.,
FF.+
I7: GOTO(I4, +)
FF+.
The states of LR(0) items are from I0, ...................,I7.
c. Construct CLR parsing table for the following grammar
S-> Aa | bAc |dc | bda
A-> d
Sol. Initially make it augmented grammar by adding production SS . the LR(1) Item sets are
calculated below
I0: S.S, $, S.Aa, $,
S.bAc, $, S.dc, $,
S.bda, $, A.d, a
I1:GOTO(I0, S)
SS., $
I2: GOTO(I0, A)
SA.a, $
I3:GOTO(I0, b)
Sb.Ac, $, Sb.da, $, A.d, c
I4: GOTO(I0, d)
Sd.c, $,
Ad., a
I5: GOTO(I2, a)
SAa. , $
I6:GOTO(I3, A)
SbA.c, $
I7: GOTO(I3, d)
Sbd.a, $, Ad., c
I8:GOTO(I4, c)
Sdc., $
I9:GOTO(I6, c)
SbAc. , $
I10:GOTO(I7, a)
Sbda., $
CLR parsing table
state
ACTION
GOTO
a
b
c
d
$
S
A
0
s3
s4
1
2
1
acc
2
s5
3
r5
s7
6
4
r5
s8
5
r1
6
s9
7
s10
r5
8
r3
9
r2
10
r4
Q3 Attempt any TWO parts of the following.
2X10=20
a.
Consider the following grammar and give the syntax directed definitions to construct parse
tree. For the input expression 4*7+1*2 construct an annotated parse tree according to your
syntax directed definition:
SE$
EE+T|T
TT*F|F
F digit
Ans. SDT schemes for grammar:
Production
Semantic action
SE$
{print E.VAL}
EE+T
{E.VAL:=E1.VAL+T.VAL}
ET
{E.VAL.:=T.VAL}
TT*F
{T.VAL :=T1.VAL*F.VAL}
TF
{T.VAL :=F.VAL}
Fdigit
{F.VAL :=LEXVAL}
b. What is 3-address code? Explain types of 3-adress code. Convert following expression into
quadruple, triple and indirect triples.
S = (a + b) / (c d) * (e + f)
Ans. A TAC is:
x := y op z
where x, y and z are names, constants or compiler-generated temporaries; op is any operator.
Types of 3-adress code
Types of three address statements:
Assignment instructions: x = y op z
Includes binary arithmetic and logical operations
Unary assignments:
x = op y
Includes unary arithmetic op (-) and logical op (!) and type conversion
Copy instructions:
x=y
These may be optimized later.
Unconditional jump: goto L
L is a symbolic label of an instruction
Conditional jumps:
if x goto L
and
ifFalse x goto L
Left: If x is true, execute instruction L next
Right: If x is false, execute instruction L next
Conditional jumps:
if x relop y goto L
Procedure calls. For a procedure call p(x1, , xn)
param x1
param xn
call p, n
Indexed copy instructions: x = y[i] and x[i] = y
Left: sets x to the value in the location [i memory units beyond y] (in C)
Right: sets the contents of the location [i memory units beyond y] to x
Address and pointer instructions:
x = &y sets the value of x to be the location (address) of y.
x = *y, presumably y is a pointer or temporary whose value is a location. The value of x is
set to the contents of that location.
*x = y sets the value of the object pointed to by x to the value of y.
The given expression is
S = (a + b) / (c d) * (e + f)
The three address code is:
T1=a+b
T2=c-d
T3=T1/T2
T4=e+f
T5=T3*T4
S=T5
1
2
3
4
5
6
Quadruple:
OP
+
/
+
*
=
ARG1
a
c
T1
e
T3
T5
ARG2
b
d
T2
f
T4
ARG1
a
c
T1
e
T3
T5
ARG2
b
d
T2
f
T4
Triple
1
2
3
4
5
6
OP
+
/
+
*
=
Indirect triples
Statement list
1
14
15
16
17
18
19
RESULT
T1
T2
T3
T4
T5
S
1
2
3
4
5
6
OP
+
/
+
*
=
ARG1
A
C
T1
E
T3
T5
ARG2
b
d
T2
f
T4
2. Goto 19
3. x=y
4. goto 26
5. goto 23
6. T2=b+1
7. a=T2
8. goto 26
9. T2=b+3
10. a=T2
11. goto 26
12. a=2
13. goto 26
14. T2=y-1
15. x=T2
16. goto 26
17. a=2
18. goto 26
19. if T1=2 goto 3
20. if T1=5 goto 5
21. if T1=9 goto 14
22. goto 17
23. if x=0 goto 6
24. if x=1 goto 9
25. goto 12
26. exit
Q4. Attempt any TWO parts of the following.
2X10=20
a. What is symbol table? Explain various data structure used for symbol table.
Ans. Symbol tables:
Gather information about names which are in a program.
A symbol table is a data structure, where information about program objects is gathered.
Is used in both the analysis and synthesis phases.
The symbol table is built up during the lexical and syntactic analysis.
Help for other phases during compilation:
Semantic analysis: type conflict?
Code generation: how much and what type of run-time space is to be allocated?
Error handling: Has the error message already been issued?
"Variable A undefined"
Symbol table phase or symbol table management refer to the symbol tables storage structure, its
construction in the analysis phase and its use during the whole compilation.
Requirements for symbol table management
Quick insertion of an identifier
Quick search for an identifier
:
You can have the symbol table in the form of trees as:
Each subprogram has a symbol table associated to its node in the abstract syntax tree.
The main program has a similar table for globally declared objects.
Quicker that linear lists.
Easy to represent scoping.
Hash tables (with chaining)
-Search
Hash the name in a hash function,
h(symbol) [0, k-1]
Where k = table size
If the entry is occupied, follow the link field.
-Insertion
Search + simple insertion at the end of the symbol table (use the sympos pointer).
-Efficiency
Search proportional to n/k and the number of comparisons is (m + n) n / k for n insertions and
m searches. k can be chosen arbitrarily large.
-Positive
Very quick search
-Negative
Relatively complicated
Extra space required, k words for the hash table.
More difficult to introduce scoping.
(ii)
Heap allocation
Ans. In compiler run time environment, dynamic memory allocation (also known as heap-based
memory allocation) is the allocation of memory storage for use in a computer program during the run-time
of that program. It can be seen also as a way of distributing ownership of limited memory resources among
many pieces of data and code.
Dynamically allocated memory exists until it is released either explicitly by the programmer, or by the
garbage collector. This is in contrast to static memory allocation, which has a fixed duration. It is said that
an object so allocated has a dynamic lifetime.
The task of fulfilling an allocation request consists of finding a block of unused memory of sufficient size.
Problems during fulfilling allocation request
o Internal and external fragmentation.
Reduction needs special care, thus making implementation more complex (see
algorithm efficiency).
o Allocator's metadata can inflate the size of (individually) small allocations;
Chunking attempts to reduce this effect.
Usually, memory is allocated from a large pool of unused memory area called the heap (also called the
free store). Since the precise location of the allocation is not known in advance, the memory is accessed
indirectly, usually via a pointer reference. The precise algorithm used to organize the memory area and
allocate and deallocate chunks is hidden behind an abstract interface and may use any of the methods
described below.
Fixed-size-blocks allocation
Fixed-size-blocks allocation, also called memory pool allocation, uses a free list of fixed-size blocks of
memory (often all of the same size). This works well for simple embedded systems
Buddy blocks
In this system, memory is allocated from a large block in memory that is a power of two in size. If the
block is more than twice as large as desired, it is broken in two. One of the halves is selected, and the
process repeats (checking the size again and splitting if needed) until the block is just large enough.
All the blocks of a particular size are kept in a sorted linked list or tree. When a block is freed, it is
compared to its buddy. If they are both free, they are combined and placed in the next-largest size buddyblock list (when a block is allocated, the allocator will start with the smallest sufficiently large block
avoiding needlessly breaking blocks).
garbage collection (GC) is a form of automatic memory management. It is a special case of resource
management, in which the limited resource being managed is memory. The garbage collector, or just
collector, attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the
program. Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp.[1][2]
Garbage collection is often portrayed as the opposite of manual memory management, which requires the
programmer to specify which objects to deallocate and return to the memory system. However, many
systems use a combination of the two approaches, and other techniques such as stack allocation and region
inference can carve off parts of the problem. There is an ambiguity of terms, as theory often uses the terms
manual garbage collection and automatic garbage collection rather than manual memory management and
garbage collection, and does not restrict garbage collection to memory management, rather considering
that any logical or physical resource may be garbage collected.
c. Explain lexical and syntactic phase errors. Also explain the error recovery technique for both
types of errors.
Ans. Lexical phase error: There are not many errors that can be caught at the lexical level; those you
should be looking for are:
Characters that cannot appear in any token in our source language, such as @ or #.
Integer constants out of bounds (range is 0 to 32767).
Identifier names that are too long (maximum length is 32 characters).
Text strings that are two long (maximum length is 256 characters).
Text strings that span more than one line.
Certain other errors, such as malformed identifiers, could be caught here, or by the parser (the
"interpretation" of the error will be affected by the stage at which the error is caught). The only one
of these errors you are responsible for at this stage is the following: Unmatched right comment
delimiters (*/).
Error Recovery
Lexical analyzer unable to proceed: no pattern matches
Panic mode recovery: delete successive characters from remaining input until token found
Insert missing character
Delete a character
Replace character by another
Transpose two adjacent characters
Syntax Errors:
A Syntax Error occurs when stream of tokens is an invalid string.
In LL(k)or LR(k) parsing tables, blank entries refer to syntax error
How should syntax errors be handled?
1. Report error, terminate compilation not user friendly
2. Report error, recover from error, and search for more errors better
Error Recovery
Error Recovery: process of adjusting input stream so that parsing and Syntax error reported. The
techniques are:
Panic mode - ignore all symbols until a "synchronising" token is found e.g. and "end' or ";", etc.
- simple to implement
- guaranteed to halt
- ignores a lot of code
Phrase level - replace a prefix of current input by string allowing parser to continue. Normally
replaces/deletes delimiters.
- danger of looping
- unable to deal with cases where error is on stack and not on input
Error productions - include extra productions in grammar which recognise commonly occurring
errors.
- requires analysis of language use
- ensures messages and recovery procedures are specific to the actual error
Global correction - compiler carries out minimum number of changes to get a correct program
- algorithms exist to determine minimum change
c. Discuss the use of algebraic laws in code optimization. Draw DAG for the following
expression.
a+a*(b-c)+(b-c)*d
Sol: Value number and algebraic laws:
Eliminate computations
Reduction in strength
Constant folding
2*3.14 = 6.28 evaluated at compile time
Other algebraic transformations
x*y=y*x
x>y and x-y>0
a= b+c; e=c+d+b;
e=a+d;
S1:=b-c
S2:=a*S1
S3:=S1*d
S4:=S2+S3
S5:=a+S4
The DAG is shown by the diagram given below.