Вы находитесь на странице: 1из 18

NOV/DEC-'07/CS1352-Answer Key CS1352 Principles of Compiler Design University Question Key Nov/Dec 07 PART-A 1.

. What are the functions of preprocessors? Produce input to compilers. Functions: Macro processing, file inclusion, rational preprocessors and language extensions. 2. Define a symbol table. A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data from that record quickly. Whenever an identifier is detected by a lexical analyzer, it is entered into the symbol table. The attributes of an identifier cannot be determined by the lexical analyzer. 3. What is an ambiguous grammar? A grammar G is said to be ambiguous if it generates more than one parse tree for some sentence of language L(G).i.e. both leftmost and rightmost derivations are same for the given sentence. 4. What is a predictive parser? It is a top down parser. It is a program based on a transition diagram attempts to match the terminal symbols against the input and makes a recursive procedure call whenever it has to follow an edge labeled by a non terminal. A non recursive predictive parser is a program that matches the terminal symbols against the input by maintaining a stack rather than using recursive calls. 5. What are the notations used to represent an intermediate languages? There are mainly three types of intermediate code representations. Syntax tree Postfix notation Three address code 6. Give the ways of representing three address statements. Quadruples Record with four fields, op, arg1, arg2 and result Triples Record with three fields, op, arg1, arg2 to avoid entering temporary names into symbol table. Here, refer the temporary value by the position of the statement that computes it. Indirect triples List the pointers to triples rather than listing the triples

-1-

http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key

7. What are basic blocks and flow graphs? A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end. A flow graph is a directed graph in which the flow control information is added to the basic blocks. The nodes in the flow graph are basic blocks the block whose leader is the first statement is called initial block. There is a directed edge from block B1 to block B2 if B2 immediately follows B1 in the some execution sequence. We can say that B1 is a predecessor of B2 and B2 is a successor of B1. 8. What are the limitations of static allocation? o The size of a data object and constraints on its position in memory must be known at compile time o Recursive procedures are restricted, because all activations of a procedure use the same bindings for local names. o Data structures cannot be created dynamically, since there is no mechanism for storage allocation at run time. 9. Define activation tree. Each execution of a procedure is referred to as an activation of the procedure. Activation tree depicts the way control enters and leaves activations. Here, each node represents activation of a procedure, root represents the activation of the main program, node for a is the parent of the node for b iff control flows from activation a to b and the node for a is to the left of the node for b iff the lifetime of a occurs before the lifetime of b. 10. What is inline expansion? Here, the body of the procedure is substituted for the call in the caller, with the actual parameters literally substituted for the formals. i.e. the procedure is treated as if it were a macro. PART B 11. a. i. Explain in detail about the role of lexical analyzer with the possible error recovery actions. (6) Few errors are discernible at the lexical level alone, because a lexical analyzer has a very localized view of a source program. The simplest recovery strategy is panic mode recovery: delete the successive characters from the remaining input until the lexical analyzer can find a well-formed token. Other possible error recovery actions are o Deleting an extraneous character o Inserting a missing character o Replacing an incorrect character by a correct character o Transposing two adjacent characters

-2http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key ii. What is a compiler? Explain the various phases of compiler in detail, with a neat sketch. (10) The process of compilation is very complex. So it comes out to be customary from the logical as well as implementation point of view to partition the compilation process into several phases. A phase is a logically cohesive operation that takes as input one representation of source program and produces as output another representation. (2) Source program is a stream of characters: E.g. pos = init + rate * 60 (6) lexical analysis: groups characters into non-separable units, called token, and generates token stream: id1 = id2 + id3 * const The information about the identifiers must be stored somewhere (symbol table). Syntax analysis: checks whether the token stream meets the grammatical specification of the language and generates the syntax tree. Semantic analysis: checks whether the program has a meaning (e.g. if pos is a record and init and rate are integers then the assignment does not make a sense).
:=

:=
id1 + id2

id1 id2

+
*

*
id3 inttoreal 60

id3

60

Syntax analysis Semantic analysis Intermediate code generation, intermediate code is something that is both close to the final machine code and easy to manipulate (for optimization). One example is the threeaddress code: dst = op1 op op2 The three-address code for the assignment statement: temp1 = inttoreal(60); temp2 = id3 * temp1; temp3 = id2 + temp2; id1 = temp3 Code optimization: produces better/semantically equivalent code. temp1 = id3 * 60.0 id1 = id2 + temp1 Code generation: generates assembly MOVF id3, R2 MULF #60.0, R2 MOVF id2, R1 ADDF R2, R1 MOVF R1, id1 Symbol Table Creation / Maintenance Contains Info (storage, type, scope, args) on Each Meaningful Token, typically Identifiers Data Structure Created / Initialized During Lexical Analysis Utilized / Updated During Later Analysis & Synthesis

-3http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key

Error Handling Detection of Different Errors Which Correspond to All Phases Each phase should know somehow to deal with error, so that compilation can proceed, to allow further errors to be detected
Source Program 1

Lexical Analyzer

Syntax Analyzer

3 Symbol-table Manager

Semantic Analyzer Error Handler

4 Intermediate Code Generator

Code Optimizer

Code Generator

Target Program

(2)

(OR) b. i. Give the minimized DFA for the following expression (a|b)*abb. Syntax tree for (a|b)*abb#: (10)

-4http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key Calculation of firstpos, lastpos and nullable for nodes in syntax tree:

Calculation of followpos: Node 1 2 3 4 5 6 followpos {1, 2, 3} {1, 2, 3} {4} {5} {6} -

Now, the start state of DFA is firstpos of the root So, A= {1, 2, 3} Consider the input symbol a: Position 1 and 3 are for a in A So, let B = followpos(1) U followpos(3) = {1, 2, 3} U {4} = {1, 2, 3, 4} DTrans[A, a] = B Consider the input symbol b: Position 2 is for b in A So, let B = followpos(2) = {1, 2, 3} = A DTrans[A, b] = A

-5http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key Now continue with B, Consider the input symbol a: Position 1 and 3 are for a in A So, followpos(1) U followpos(3) = {1, 2, 3} U {4} = {1, 2, 3, 4} = B DTrans[B, a] = B Consider the input symbol b: Position 2 and 4 are for b in B So, followpos(2) U followpos(4) = {1, 2, 3, 4, 5} = C DTrans[B, b] = C Now continue with C, Consider the input symbol a: Position 1 and 3 are for a in A So, followpos(1) U followpos(3) = {1, 2, 3} U {4} = {1, 2, 3, 4} = B DTrans[C, a] = B Consider the input symbol b: Position 2 and 5 are for b in C So, followpos(2) U followpos(5) = {1, 2, 3, 6} = D DTrans[C, b] = D Now continue with D, Consider the input symbol a: Position 1 and 3 are for a in D So, followpos(1) U followpos(3) = {1, 2, 3} U {4} = {1, 2, 3, 4} = B DTrans[D, a] = B Consider the input symbol b: Position 2 is for b in D So, followpos(2) = {1, 2, 3} = A DTrans[D, b] = A The position associated with the end marker #, 6 is in D. So, D is the final state. DFA
a b a A B a b b C b D a

-6http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key

Transition table: States A B C D Input a B B B B symbol b A C D A

ii. Draw the transition diagram for unsigned numbers. (6)

12. a. i. Explain the role of parser in detail. (4) Parser obtains a string of tokens from the lexical analyzer and verifies that the string can be generated by the grammar for the source language. It can report any syntax error in an intelligible fashion. Errors can be of lexical, syntactic, semantic or logical. The error handler in a parser has simple-to-state goals: should report the presence of errors clearly and accurately should recover from each error quickly enough to be able to detect subsequent errors should not significantly slow down the processing of correct programs

-7http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key ii. Construct predictive parsing table for the grammar E->E+T | T, T->T*F | F, F->(E)|id (12) Eliminating left recursion: (2) E->TE E->+TE | T->FT T->*FT | F-> (E) | id Calculation of First: (2) First (E) = First (T) = First (F) = {(, id} First (E) = {+, } First (T) = {*, } Calculation of Follow: (2) Follow (E) = Follow (E) = {), $} Follow (T) = Follow (T) = {+,), $} Follow (F) = {+, *,), $} Predictive parsing table: Non terminal E E T T F (6) Input Symbol * ( E->TE T->FT T-> F->id (OR) b. i. Give the LALR parsing table for the grammar (12) S-> L=R | R L->*R | id R->L. Given grammar: 1. S->L=R 2. S->R 3. L->*R 4. L->id 5. R->L Augmented grammar: S->S S->L=R S->R L->*R L->id R->L T->*FT F->(E) T-> T->

id E->TE T->FT

+ E->+TE

) E->

$ E->

-8http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key Canonical collection of LR(1) items I0: S->.S, $ S->.L=R, $ S->.R, $ L->.*R, = L->.id, = R->.L, $ I1: goto(I0, S) S->S., $ I2: goto(I0, L) S->L.=R, $ R->L., $ I3: goto(I0, R) S->R., $ I4: goto(I0, *) L->*.R, = R->.L, = L->.*R, = L->.id, = I5: goto(I0, id) L->id., = I6: goto(I2, =) S->L=.R, $ R->.L, $ LR (1) table construction: States 0 1 2 3 4 5 6 7 8 9 10 11 12 13 action = * s4 id s5 $ Acc r5 r2 s4 r4 s11 r3 r5 r1 r5 s11 s12 r4 r3 10 13 s12 10 9 s5 8 7 S 1 goto L 2 L->.*R, $ L->.id, $ I7: goto(I4, R) L->*R., = I8: goto(I4, L) R->L., = goto(I4, *)=I4 goto(I4, id)=I5 I9: goto(I6, R) S->L=R., $ I10: goto(I6, L) R->L., $ I11: goto(I6, *) L->*.R, $ R->.L, $ L->.*R, $ L->.id, $ I12: goto(I6, id) L->id., $ I13: goto(I11, R) L->*R., $ goto(I11, L)=I10 goto(I11, *)=I11 goto (I11, id)=I12

R 3

s6

This grammar is LR(1), since it does not produce any multi-defined entry in its parsing table.

-9http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key

LALR table construction: I4 and I11 are similar. Combine I411 or I4: L->*.R, =/$ R->.L, =/$ L->.*R, =/$ L->.id, =/$ I5 and I12 are similar. Combine I512 or I5: L->id., =/$ I7 and I13 are similar. Combine I713 or I7: L->*R., =/$ I8 and I10 are similar. Combine I810 or I8: R->L., =/$ States 0 1 2 3 4 5 6 7 8 9 = * s4 s6 s4 r4 s4 r3 r5 s5 r3 r5 r1 (4) s5 r4 8 9 them as

them as

them as

them as

action id s5 $ Acc r5 r2 S 1

goto L 2

R 3

ii. What are the reasons for using LR parser technique?

LR parsers can be constructed to recognize virtually all programming language constructs for which CFGs can be written LR parsing method is the most general non backtracking shift reduce parsing method known, yet it can be implemented as efficiently as other shift-reduce methods The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive parsers An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input

- 10 http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key 13. a. i. Explain about the different type of three address statements. (8) It is one of the intermediate representations. It is a sequence of statements of the form x:= y op z, where x, y, and z are names, constants or compiler-generated temporaries and op is an operator which can be arithmetic or a logical operator. E.g. x+y*z is translated as t1=y*z and t2=x+t1. Reason for the term three-address code is that each statement usually contains three addresses, two for the operands and one for the result. (2) Common three address statements: (2) x:=y op z (assignment statements) x:= op y (assignment statements) x:=y (copy statements) goto L (unconditional jump) Conditional jumps like if x relop y goto L param x, call p,n and return y for procedure calls indexed assignments x:=y[i] and x[i]:= y address and pointer assignments x:=&y, x:=*y and *x:=y Implementation: (4) Quadruples Record with four fields, op, arg1, arg2 and result Triples Record with three fields, op, arg1, arg2 to avoid entering temporary names into symbol table. Here, refer the temporary value by the position of the statement that computes it. Indirect triples List the pointers to triples rather than listing the triples For a: = b* -c + b * -c Quadruples Op (0) uminus (1) * (2) uminus (3) * (4) + (5) := Triples Op (0) uminus (1) * (2) uminus (3) * (4) + (5) assign

arg1 arg2 result c t1 b t1 t2 c t3 b t3 t4 t2 t4 t5 t5 a

arg1 arg2 c b (0) c b (2) (1) (3) a (4) - 11 http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key

Indirect Triples Op (14) uminus (15) * (16) uminus (17) * (18) + (19) assign

arg1 arg2 c b (14) c b (16) (15) (17) a (18)

(0) (1) (2) (3) (4) (5)

Statement (14) (15) (16) (17) (18) (19)

ii. What are the methods of translating Boolean expression? (8) Used to compute logical values. (2) Used as conditional expressions in statements, that alters the flow of control. Operators used are and, or and not. Elements are Boolean variables/relational expressions. Methods of translating Boolean expressions: (2) Encode true and false numerically and evaluate like arithmetic expression By flow of control, i.e. represent the value of Boolean expression by a position reached in the program Semantics of programming language determines whether all parts of the Boolean expression must be evaluated. If so, can optimize the evaluation by computing only enough of it to determine its value. Syntax directed definitions to produce 3AC for Booleans: (4) E --> E1 or E2 { E1.True =E.True; E2.True=E.True; E1.false=newlabel(); E2.false=E.false; E.code=E1.code || gen(E1.false,:) || E2.code } E--> E1 and E2 { E1.true=newlabel();E2.true=E.true; E1.false=E.false; E2.false=E.false; E.code=E1.code||gen(E1.true,.)||E2.code} E--> not E1 {E1.false=E.true;E1.true=E.false; E.code=E1.code} E--> (E1) {E1.true=E.true;E1.false=E.false; E.code=E1.code} E--> ID1 RELOP ID2 {E.code=gen(ifID1.place RELOP ID2.place goto E.true|| gen(goto E.false} E--> True { F.code=gen(gotoE.true)} E--> false {F.code=gen(gotoE.false)} (OR) b. i. Write short notes on back-patching. (8) Back patching is the activity of filling up unspecified information of labels using appropriate semantic actions in during the code generation process. (2)

- 12 http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key In the semantic actions the functions used are (2) mklist(i) create a new list having i, an index into array of quadruples. merge(p1,p2) - merges two lists pointed by p1 and p2 back patch(p,j) inserts the target label j for each list pointed by p. Example: (4) Source: L2: x= y+1 if a or b then L3: if c then After Backpatching: x= y+1 100: if a goto 103 Translation: 101: if b goto 103 if a go to L1 102: goto 106 if b go to L1 103: if c goto 105 go to L3 104: goto 106 L1: if c goto L2 105: x=y+1 goto L3 106: ii. Explain procedure calls with an example. (8) Procedure is an important and frequently used programming construct that is imperative for a compiler to generate good code for procedure calls and returns. (2) Consider the following grammar for a simple procedure call statement: S-> call id (Elist) Elist -> Elist, E Elist ->E Calling sequences: (2) The translation for a call includes a calling sequence, a sequence of actions taken on entry to and exit from each procedure. Example: (4) Syntax directed translation: S-> call id(Elist) {for each item p on queue do Emit(param p); Emit(call id.place)} Elist -> Elist, E {append E.place to the end of the queue} Elist - > E {initialize queue to contain only E.place} E.g. Call p1(int a, int b) param a param b call p1 14. a. i. Construct the DAG for the following basic block: d:=b*c e:=a+b b:=b*c a:=e-d (6)

- 13 http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key

ii. Explain in detail about primary structure-preserving transformations on basic blocks. (10) Structure preserving transformations: It is implemented by constructing a dag for a basic block. Common sub expression can be detected by noticing, as a new node m is about to be added, whether there is an existing node n with the same children, in the same order, and with the same operator. If so, n computes the same value as m and may be used in its place. E.g. DAG for the basic block d:=b*c e:=a+b b:=b*c a:=e-d is given by

For dead-code elimination, delete from a dag any root (root with no ancestors) that has no live variables. Repeated application of this will remove all nodes from the dag that corresponds to dead code. (OR) b. i. Describe in detail about a simple code generator with the appropriate algorithm. (8) It generates target code for a sequence of three address statements. (2) Assumptions: For each operator in three address statement, there is a corresponding target language operator. Computed results can be left in registers as long as possible. E.g. a=b+c: (2) Add Rj,Ri where Ri has b and Rj has c and result in Ri. Cost=1; Add c, Ri where Ri has b and result in Ri. Cost=2; Mov c, Rj; Add Rj, Ri; Cost=3;

- 14 http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key

Register descriptor: Keeps track of what is currently in each register Address descriptor: Keeps tracks of the location where the current value of the name can be found at run time. Code generation algorithm: For x= y op z (2) Invoke the function getreg to determine the location L, where the result of y op z should be stored (register or memory location) Check the address descriptor for y to determine y Generate the instruction op z, L where z is the current location of z If the current values of y and/or z have no next uses, alter register descriptor Getreg: (2) If y is in a register that holds the values of no other names and y is not live, return register of y for L If failed, return empty register If failed, if X has next use, find an occupied register and empty it If X is not used in the block, or suitable register is found, select memory location of x as L ii. Explain in detail about run-time storage management. (8) Information needed during an execution of a procedure is kept in a block of storage called an activation record; storage for names local to the procedure also appears in the activation record. Two standard storage-allocation strategies are Static allocation (4) The position of an activation record in memory is fixed at compile time. Here, a new activation record is pushed onto the stack for each execution of a procedure. The record is popped when the activation ends. Activation record for a procedure has fields to hold parameters, results, machine status information, local data, temporaries and the like. A call statement is implemented by a sequence of two target-machine instructions. A MOV instruction saves the return address and a GOTO transfers control to the target code for the called procedure. Stack allocation (4) Static allocation becomes stack allocation by using relative addresses for storage in activation records. The position of the record for an activation of a procedure is not known until run time. In stack allocation, this position is usually stored in a register (Indexed address mode). Relative addresses in an activation record can be taken as offsets from any known position in the activation record. 15. a. i. Explain in detail about principle sources of optimization. (10) Code optimization is needed to make the code run faster or take less space or both. Function preserving transformations: Common sub expression elimination Copy propagation Dead-code elimination Constant folding

- 15 http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key

Common sub expression elimination: (2) E is called as a common sub expression if E was previously computed and the values of variables in E have not changed since the previous computation. Copy propagation: (2) Assignments of the form f:=g is called copy statements or copies in short. The idea here is use g for f wherever possible after the copy statement. Dead code elimination: (2) A variable is live at a point in the program if its value can be used subsequently. Otherwise dead. Deducing at compile time that the value of an expression is a constant and using the constant instead is called constant folding. Loop optimization: (4) Code motion: Moving code outside the loop Takes an expression that yields the same result independent of the number of times a loop is executed (a loop-invariant computation) and place the expression before the loop. Induction variable elimination Reduction in strength: Replacing an expensive operation by a cheaper one. ii. Describe in detail about optimization of basic blocks with example. (6) Code improving transformations: Structure-preserving transformations o Common sub expression elimination o Dead-code eliminations Algebraic transformations like reduction in strength. Structure preserving transformations: (3) It is implemented by constructing a dag for a basic block. Common sub expression can be detected by noticing, as a new node m is about to be added, whether there is an existing node n with the same children, in the same order, and with the same operator. If so, n computes the same value as m and may be used in its place. E.g. DAG for the basic block d:=b*c e:=a+b b:=b*c a:=e-d is given by

- 16 http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key For dead-code elimination, delete from a dag any root (root with no ancestors) that has no live variables. Repeated application of this will remove all nodes from the dag that corresponds to dead code. Use of algebraic identities: (3) e.g. x+0 = 0+x=x x-0 = x x*1 = 1*x = x x/1 = x Reduction in strength: Replace expensive operator by a cheaper one. x ** 2 = x * x Constant folding: Evaluate constant expressions at compile time and replace them by their values. Can use commutative and associative laws E.g. a=b+c e=c+d+b IC: a=b+c t=c+d e=t+b If t is not needed outside the block, change this to a=b+c e=a+d using both the associativity and commutativity of +. (OR) b. i. Describe in detail about storage organization. (10) Subdivision of run time memory: (4) Run time storage: The block of memory obtained by compiler from OS to execute the compiled program. It is subdivided into Code Static data Generated target code Stack Data objects Stack to keep track of the activations Heap to store all other information Heap Activation record: (Frame) (4) It is used to store the information required by a single procedure call. Returned value Actual parameters Optional control link Optional access link Saved machine status Local data temporaries

- 17 http://engineerportal.blogspot.in/

NOV/DEC-'07/CS1352-Answer Key Temporaries are used to hold values that arise in the evaluation of expressions. Local data is the data that is local to the execution of procedure. Saved machine status represents status of machine just before the procedure is called. Control link (dynamic link) points to the activation record of the calling procedure. Access link refers to the non-local data in other activation records. Actual parameters are the one which is passed to the called procedure. Returned value field is used by the called procedure to return a value to the calling procedure Compile time layout of local data: (2) The amount of storage needed for a name is determined by its type. The field for the local data is laid out as the declarations in a procedure are examined at compile time. The storage layout for data objects is strongly influenced by the addressing constraints on the target machine. ii. Explain in detail various methods of passing parameters. (6) Call by value A formal parameter is treated just like a local name. Its storage is in the activation record of the called procedure The caller evaluates the actual parameter and place the r-value in the storage for the formals Call by reference If an actual parameter is a name or expression having L-value, then that lvalue itself is passed However, if it is not (e.g. a+b or 2) that has no l-value, then expression is evaluated in the new location and its address is passed. Copy-Restore: Hybrid between call-by-value and call-by-ref (copy in, copy out) Actual parameters evaluated, its r-value is passed and l-value of the actuals are determined When the called procedure is done, r-value of the formals are copied back to the l-value of the actuals Call by name Inline expansion(procedures are treated like a macro)

- 18 http://engineerportal.blogspot.in/

Вам также может понравиться