Вы находитесь на странице: 1из 7

Intermediate Representation Introduction Intermediate Representation (IR) is language-independent and machineindependent.

. A good intermediate representation can be said as one which: Captures high level language constructs Should be easy to translate from abstract syntax tree Should support high-level optimizations Captures low-level machine features Should be easy to translate to assembly Should support machine-dependent optimizations Has narrower interface i.e. small number of node types (instructions) Should be easy to optimize and retarget IR Types High Level IR (HIR) Medium Level IR (MIR) Low Level IR (LIR) High Level IR (HIR) Language independent but closer to the high level language. Preserves high-level language constructs structured control flows: if, for, while, etc; Variables Expressions functions etc. Allows high level optimizations depending on the source language e.g., function inlining memory dependence analysis, loop transformations, etc. Medium Level IR (MIR): Machine and language independent and can represent a set of source languages. Good for code generation for one or more architectures. Utilizes simple control flow structure: "if" and "goto allows source language variables (human form names) front-end created "temporaries" (symbolic registers).

Compared to HIR, it reveals computations in greater detail (much closer to the machine than HIR), and therefore is usually preferred for needs of optimization. Low Level IR (LIR) Machine independent but more closer to the machine Easy to generate code from LIR but generation of input program may involve some work LIR has low level constructs such as unstructured jumps Registers memory locations LIR has features of MIR It can also have features of HIR depending on the needs Issues in IR Design source language and target language porting cost or reuse of existing design whether appropriate for optimizations U-code IR used on PA-RISC and Mips. Suitable for expression evaluation on stacks but less suited for load- store architectures both compilers translate U-code to another form HP translates to very low level representation Mips translates to MIR and translates back to U-code for code generator Issues in IR Design Machine dependence: for machine level optimizations. Expressiveness: for ease of understanding and extensibility. Appropriateness for code optimization. Appropriateness for code generation. Whether it will use existing design or not? This is an important issue as if optimum; it should use pre-existing design so that it doesn't have issues of portability with previously existing architectures, and other issues. Use of more than one IR for more optimization: Different IRs have different levels of optimizations possible. Suitable for dependence analysis by representing subscripts by list of subscripts. Make addresses explicit in linearized form. This is suitable for constant folding, strength reduction, loop invariant code motion and other basic optimizations. High level IR int f(int a, int b) { int c;

c = a + 2; print(b, c); } Abstract syntax tree keeps enough information to reconstruct source form keeps information about symbol table Abstract syntax tree (AST) An abstract syntax tree (AST) is a finite , labeled, directed tree, where the nodes are labeled by operators, and the edges represent the operands of the node operators. Thus, the leaves have nullary operators, i.e., pointers to the symbol table entries of the variables or constants. An AST differs from a parse tree by omitting nodes and edges for syntax rules that do not affect the semantics of the program. The classic example of such an omission is grouping parentheses, since in an AST the grouping of operands is explicit in the tree structure. Medium level IR reflects range of features in a set of source languages language independent good for code generation for a number of architectures appropriate for most of the optimizations normally three address code Three address code Only one operator on the right hand side is allowed Source expression like x + y * z might be translated into t 1 := y * z t 2 := x + t 1 where t 1 and t 2 are compiler generated temporary names Unraveling of complicated arithmetic expressions and of control flow makes 3address code desirable for code generation and optimization The use of names for intermediate values allows 3-address code to be easily rearranged Three address code is a linearized representation of a syntax tree where explicit names correspond to the interior nodes of the graph Low level IR Corresponds one to one to target machine instructions Architecture dependent Deviate from one to one correspondence generally in case

where there are alternatives for the most effective code generate for them E.g. LLIR integer multiply operator Target code may not have multiply operator or it is not the best choice LLIR register + register, register + constant Target code complex one like index register Instruction selection phase of compilation selects the appropriate instruction from the intermediate code Multi-level IR has features of MIR and LIR may also have some features of HIR MIR Instructions MIR consists of symbol table and quadruples consisting of an operator and maximum of three operands Special symbols are reserved to denote temporary variables, registers and labels Assignment operator: -> LIR Replaces variables of MIR by registers and memory addresses Five types of assignment instructions Assign expression to a register Assign operand to an element of a register Conditionally assign operand to a register depending on the value in a register Store an operand at a memory address Load a register from memory address Reserve registers r0, r1,--------r31 for integers or GPRs f0, f1,--------f31 for floating point registers S0,s1,------- for symbolic registers Cont. ICAN types Var, Const, Register, Symbol, Operand and LIROperand are defines as Var=CharString quoted seq of alphanumeric chars first: letter

Const=CharString Integer or floating point number Register=CharString Symbols that begins with one of s,r,orf followed by one or more decimal digits Symbolic register: begins with s Integer register: begins with r Floating point register: begins with f Symbol=Var U Const Variables Temporaries begins with t Symbols begins with s,r,orf are registers not variables Operand=Var U Const U TypeName LIROperand=Register U Const U TypeName Instruction HIRInst U MIRInst U LIRInst Representing MIR in ICAN Each kind of MIR instruction is represented by ICAN tuple which implicitly declares the type MIRInst MIRKind = enum{ label, receive, binasgn, unasgn, valasgn, condasgn, castasgn, indasgn, eltasgn, indeltasgn, goto, binif, unif, valif, bintrap, untrap, valtrap, call, callasgn, return, reval, sequence} OpdKind = enum{var, const, type} ExpKind = enum{binexp, unexp, noexp, listexp} Exp_Kind: MIRKind -> ExpKind Has_Left: MIRKind -> boolean MIR_Kind := {--------} Has_Left := {---------} Cont. We represent sequence of intermediate code instruction by the array Inst[1..n] of Instruction E.g L1: b<-a c<-b+1 Is represented by the array of tuple Inst[1] = <kind:label, lbl:L1>

Inst[2] = <kind:valasgn, left:b, opd:<kind:var, val:a>> Inst[3] = <kind:binasgn, left:c, opr:add, opd1:<kind:var, val:a>, opd2:<kind:const, val:1>> Representing HIR in ICAN HIRKind = enum{ label, receive, binasgn, unasgn, valasgn, condasgn, castasgn, indasgn, eltasgn, indeltasgn, goto, trap, call, callasgn, return, reval, sequence, for, endfor, strbinif, strunif, strvalif, else, endif, arybinasgn, aryunasgn, aryvalasgn} HIROpdKind = enum{var, const, type, aryref} HIRExpKind = enum{terexp, binexp, unexp, noexp, listexp} HIR_Exp_Kind: HIRKind -> HIRExpKind HIR_Has_Left: HIRKind -> boolean HIR_Exp_Kind := {--------} HIR_Has_Left := {---------} HIR to ICAN Example 1: for v<-opd1 by opd2 to op3 v=opd2+opd3 endfor Example 2: if a<b then t1=a+b else t1=a-b endif Representing LIR in ICAN LIRKind = enum{ label, regbin, regun, regval, regcond, regelt, stormem, loadmem, goto, gotoaddr, regbinif, regunif, regvalif, regbintrap, reguntrap, regvaltrap, callreg, callreg2,callregasgn, callreg3, return, reval, sequence}

LIROpdKind = enum{regno, const, type} LIRExpKind = enum{binexp, unexp, noexp} LIR_Exp_Kind: LIRKind -> LIRExpKind LIR_Has_Left: LIRKind -> boolean LIR_Exp_Kind := {--------} LIR_Has_Left := {---------} Representation of Memory Address (tran(MemAddr) [RegName](Length) <kind:addr1r,reg:RegName,len:Length> [Regname1+RegName2](Length) <kind:addr2r,reg:RegName1,reg2,RegName2,l en:Length> [RegName+Integer](Length) <kind:addrrc,reg:RegName,disp:Integer,len:Le ngth> Example for LIR code L1: r1<-[r7+4] r2<-[r7+r8] r3<-r1+r2 r4<- -r3 if r3>0 goto L2 r5<-(r9) r1 [r7-8](2) <- r5 L2: Return r4 ICAN tuple for LIR code Inst[1]=<kind:label,lbl:L1> Inst[2]=<kind:loadmem,left:r1, addr:<kind:addrrc,reg:r7,disp:4,len:4>> Inst[3]=<kind:loadmem,left:r2, addr:<kind:addr2r,reg:r7,reg2:r8,len:4>> Inst[4]=<kind:regbin,left:r3,opr:add,opd1:<kind:regno, val:r1>,opd2:<kind:regno,val:r2>> Inst[5]=<kind:regun,left:r4,opr:neg, opd:<kind:regno,val:r3>> Inst[6]=<kind:regbinif,opr:grtr, opd1:<kind:regno,val:r3>