Академический Документы
Профессиональный Документы
Культура Документы
Thefinalphaseinourcompilermodelisthecode generator.Ittakesasinputan
intermediaterepresentationofthesourceprogramandproducesasoutputanequivalenttarget
program.
Therequirementstraditionallyimposedonacodegeneratoraresevere.The
outputcodemustbecorrectandofhighquality,meaningthatitshouldmakeeffectiveuseof
theresourcesofthetargetmachine.Moreover,thecodegeneratoritselfshouldrunefficiently.
fig. 1
2) Write in detail the issues in the design of code generator.
ISSUES IN THE DESIGN OF A CODE GENERATOR
Whilethedetailsaredependentonthetargetlanguageandtheoperatingsystem,issues
suchasmemorymanagement,instructionselection,registerallocation,andevaluationorderare
inherentinalmostallcodegenerationproblems.
INPUT TO THE CODE GENERATOR
Theinputtothecodegeneratorconsistsoftheintermediaterepresentationofthe
sourceprogramproducedbythefrontend,togetherwithinformationinthesymboltablethatisused
todeterminetheruntimeaddressesofthedataobjectsdenotedbythenamesintheintermediate
representation.
Thereareseveralchoicesfortheintermediatelanguage,including:linearrepresentations
suchaspostfixnotation,threeaddressrepresentationssuchasquadruples,virtualmachine
representationssuchassyntaxtreesanddags.
Weassumethatpriortocodegenerationthefrontendhasscanned,parsed,andtranslatedthe
sourceprogramintoareasonablydetailedintermediaterepresentation,sothevaluesofnames
appearingintheintermediatelanguagecanberepresentedbyquantitiesthatthetargetmachine
candirectlymanipulate(bits,integers,reals,pointers,etc.).Wealsoassumethatthenecessary
typecheckinghastakeplace,sotypeconversionoperatorshavebeeninsertedwherevernecessary
andobvioussemanticerrors(e.g.,attemptingtoindexanarraybyafloatingpointnumber)have
alreadybeendetected.Thecodegenerationphasecanthereforeproceedontheassumptionthatits
inputisfreeoferrors.Insomecompilers,thiskindofsemanticcheckingisdonetogetherwithcode
generation.
TARGET PROGRAMS
Theoutputofthecodegeneratoristhetargetprogram.Theoutputmaytakeonavariety
offorms:absolutemachinelanguage,relocatablemachinelanguage,orassemblylanguage.
Producinganabsolutemachinelanguageprogramasoutputhastheadvantagethatit
canbeplacedinalocationinmemoryandimmediatelyexecuted.Asmallprogramcanbecompiled
andexecutedquickly.Anumberofstudent-jobcompilers,suchasWATFIVandPL/C,produce
absolutecode.
Producingarelocatablemachinelanguageprogramasoutputallowssubprogramsto
becompiledseparately.Asetofrelocatableobjectmodulescanbelinkedtogetherandloadedfor
executionbyalinkingloader.Althoughwemustpaytheaddedexpenseoflinkingandloadingif
weproducerelocatableobjectmodules,wegainagreatdealofflexibilityinbeingabletocompile
subroutinesseparatelyandtocallotherpreviouslycompiledprogramsfromanobjectmodule.If
thetargetmachinedoesnothandlerelocationautomatically,thecompilermustprovideexplicit
relocationinformationtotheloadertolinktheseparatelycompiledprogramsegments.
Producinganassemblylanguageprogramasoutputmakestheprocessofcode
generationsomewhateasier.Wecangeneratesymbolicinstructionsandusethemacrofacilitiesof
theassemblertohelpgeneratecode.Thepricepaidistheassemblystepaftercodegeneration.
Becauseproducingassemblycodedoesnotduplicatetheentiretaskoftheassembler,thischoiceis
anotherreasonablealternative,especiallyforamachinewithasmallmemory,whereacompilermust
usesseveralpasses.
MEMORY MANAGEMENT
Mappingnamesinthesourceprogramtoaddressesofdataobjectsinruntimememory
isdonecooperativelybythefrontendandthecodegenerator.Weassumethatanameinathree-
addressstatementreferstoasymboltableentryforthename.
Ifmachinecodeisbeinggenerated,labelsinthreeaddressstatementshavetobe
convertedtoaddressesofinstructions.Thisprocessisanalogoustothebackpatching.Suppose
thatlabelsrefertoquadruplenumbersinaquadruplearray.Aswescaneachquadrupleinturnwe
candeducethelocationofthefirstmachineinstructiongeneratedforthatquadruple,simplyby
maintainingacountofthenumberofwordsusedfortheinstructionsgeneratedsofar.Thiscountcan
bekeptinthequadruplearray(inanextrafield),soifareferencesuchasj:goto iisencountered,
andiislessthanj,thecurrentquadruplenumber,wemaysimplygenerateajumpinstructionwith
thetargetaddressequaltothemachinelocationofthefirstinstructioninthecodeforquadruplei.If,
however,thejumpisforward,soiexceedsj,wemuststoreonalistforquadrupleithelocationof
thefirstmachineinstructiongeneratedforquadruplej.Thenweprocessquadruplei,wefillinthe
propermachinelocationforallinstructionsthatareforwardjumpstoi.
INSTRUCTION SELECTION
Thenatureoftheinstructionsetofthetargetmachinedeterminesthedifficultyof
instructionselection.Theuniformityandcompletenessoftheinstructionsetareimportantfactors.If
thetargetmachinedoesnotsupporteachdatatypeinauniformmanner,theneachexceptiontothe
generalrulerequiresspecialhandling.
Instructionspeedsandmachineidiomsareotherimportantfactors.Ifwedonotcare
abouttheefficiencyofthetargetprogram,instructionselectionisstraightforward.Foreachtypeof
three-addressstatementwecandesignacodeskeletonthatoutlinesthetargetcodetobegenerated
forthatconstruct.
Forexample,everythreeaddressstatementoftheformx:=y+z,wherex,y,andzarestatically
allocated,canbetranslatedintothecodesequence
MOVy,R0/*loadyintoregisterR0*/
ADDz,R0/*addztoR0*/
MOVR0,x/*storeR0intox*/
Unfortunately,thiskindofstatementby-statementcodegenerationoftenproducespoorcode.For
example,thesequenceofstatements
a:=b+c
d:=a+e
wouldbetranslatedinto
MOVb,R0
ADDc,R0
MOVR0,a
MOVa,R0
ADDe,R0
MOVR0,d
Herethefourthstatementisredundant,andsoisthethirdifaisnotsubsequentlyused.
Thequalityofthegeneratedcodeisdeterminedbyitsspeedandsize.
Atargetmachinewitharichinstructionsetmayprovideseveralwaysofimplementingagiven
operation.Sincethecostdifferencesbetweendifferentimplementationsmaybesignificant,anaive
translationoftheintermediatecodemayleadtocorrect,butunacceptablyinefficienttargetcode.For
exampleifthetargetmachinehasanincrementinstruction(INC),thenthethreeaddressstatement
a:=a+1maybeimplementedmoreefficientlybythesingleinstructionINCa,ratherthanbyamore
obvioussequencethatloadsaintoaregister,addonetotheregister,andthenstorestheresultback
intoa.
MOVa,R0
ADD#1,R0
MOVR0,a
Instructionspeedsareneededtodesigngoodcodesequencebutunfortunately,accurate
timinginformationisoftendifficulttoobtain.Decidingwhichmachinecodesequenceisbestfora
giventhreeaddressconstructmayalsorequireknowledgeaboutthecontextinwhichthatconstruct
appears.
REGISTER ALLOCATION
Instructionsinvolvingregisteroperandsareusuallyshorterandfasterthanthose
involvingoperandsinmemory.Therefore,efficientutilizationofregisterisparticularlyimportantin
generatinggoodcode.Theuseofregistersisoftensubdividedintotwosubproblems:
1.Duringregister allocation,weselectthesetofvariablesthatwillresideinregistersatapoint
intheprogram.
2.Duringasubsequentregister assignmentphase,wepickthespecificregisterthatavariable
willresidein.
Findinganoptimalassignmentofregisterstovariablesisdifficult,evenwithsingle
registervalues.Mathematically,theproblemisNP-complete.Theproblemisfurthercomplicated
becausethehardwareand/ortheoperatingsystemofthetargetmachinemayrequirethatcertain
registerusageconventionsbeobserved.
Certainmachinesrequireregister pairs(anevenandnextoddnumberedregister)for
someoperandsandresults.Forexample,intheIBMSystem/370machinesintegermultiplication
andintegerdivisioninvolveregisterpairs.Themultiplicationinstructionisoftheform
Mx,y
wherex,isthemultiplicand,istheevenregisterofaneven/oddregisterpair.
Themultiplicandvalueistakenfromtheoddregisterpair.Themultiplieryisasingleregister.The
productoccupiestheentireeven/oddregisterpair.
Thedivisioninstructionisoftheform
Dx,y
wherethe64-bitdividendoccupiesaneven/oddregisterpairwhoseevenregisterisx;yrepresents
thedivisor.Afterdivision,theevenregisterholdstheremainderandtheoddregisterthequotient.
Nowconsiderthetwothreeaddresscodesequences(a)and(b)inwhichtheonlydifferenceisthe
operatorinthesecondstatement.Theshortestassemblysequencefor(a)and(b)aregivenin(c).
Ristandsforregisteri.L,STandAstandforload,storeandaddrespectively.Theoptimalchoice
fortheregisterintowhichaistobeloadeddependsonwhatwillultimatelyhappentoe.
t:=a+bt:=a+b
t:=t*ct:=t+c
t:=t/dt:=t/d
(a)(b)
fig.2Twothreeaddresscodesequences
LR1,aLR0,a
AR1,bAR0,b
MR0,cAR0,c
DR0,dSRDAR0,32
STR1,tDR0,d
STR1,t
(a) (b)
fig.3Optimalmachinecodesequence
Theorderinwhichcomputationsareperformedcanaffecttheefficiencyofthetarget
code.Somecomputationordersrequirefewerregisterstoholdintermediateresultsthanothers.
Pickingabestorderisanotherdifficult,NP-completeproblem.Initially,weshallavoidtheproblem
bygeneratingcodeforthethree-addressstatementsintheorderinwhichtheyhavebeenproduced
bytheintermediatecodegenerator.
fig 8.Three-addresscodecomputingdotproduct
prod := 0
i := 1
4) What are the structure preserving transformations on basic blocks?
TRANSFORMATIONSONBASICBLOCKS
Abasicblockcomputesasetofexpressions.Theseexpressionsarethevaluesofthe
namesliveonexitfromblock.Twobasicblocksaresaidtobeequivalentiftheycomputethesame
setofexpressions.
Anumberoftransformationscanbeappliedtoabasicblockwithoutchangingthe
setofexpressionscomputedbytheblock.Manyofthesetransformationsareusefulforimproving
thequalityofcodethatwillbeultimatelygeneratedfromabasicblock.Therearetwoimportant
classesoflocaltransformationsthatcanbeappliedtobasicblocks;thesearethestructure-preserving
transformationsandthealgebraictransformations.
STRUCTURE-PRESERVINGTRANSFORMATIONS
Theprimarystructure-preservingtransformationsonbasicblocksare:
1.commonsub-expressionelimination
2.dead-codeelimination
3.renamingoftemporaryvariables
4.interchangeoftwoindependentadjacentstatements
Weassumebasicblockshavenoarrays,pointers,orprocedurecalls.
1. Common sub-expression elimination
Considerthebasicblock
a:=b+c
b:=a-d
c:=b+c
d:=a-d
Thesecondandfourthstatementscomputethesameexpression,
namelyb+c-d,andhencethisbasicblockmaybetransformedintotheequivalentblock
a:=b+c
b:=a-d
c:=b+c
d:=b
Althoughthe1
st
and3
rd
statementsinbothcasesappeartohavethesameexpressionon
theright,thesecondstatementredefinesb.Therefore,thevalueofbinthe3
rd
statement
isdifferentfromthevalueofbinthe1
st
,andthe1
st
and3
rd
statementsdonotcomputethe
sameexpression.
2.Dead-code elimination
Supposexisdead,thatis,neversubsequentlyused,atthepointwherethestatement
x:=y+zappearsinabasicblock.Thenthisstatementmaybesafelyremovedwithout
changingthevalueofthebasicblock.
3. Renaming temporary variables
Supposewehaveastatementt:=b+c,wheretisatemporary.Ifwechangethisstatement
tou:=b+c,whereuisanewtemporaryvariable,andchangeallusesofthisinstanceof
ttou,thenthevalueofthebasicblockisnotchanged.Infact,wecanalwaystransform
abasicblockintoanequivalentblockinwhicheachstatementthatdefinesatemporary
definesanewtemporary.Wecallsuchabasicblockanormal-formblock.
4.Interchange of statements
Supposewehaveablockwiththetwoadjacentstatements
t1:=b+c
t2:=x+y
Thenwecaninterchangethetwostatementswithoutaffectingthevalueoftheblockifand
onlyifneitherxnoryist1andneitherbnorcist2.Anormal-formbasicblockpermitsall
statementinterchangesthatarepossible.
5) What are the instructions and address modes of the target machine?
Thetargetmachinecharacteristicsare
l Byte-addressable, 4 bytes/word, n registers
l Two operand instructions of the form
l op source, destination
l Example opcodes: MOV, ADD, SUB, MULT
l Several addressing modes
l An instruction has an associated cost
l Cost corresponds to length of instruction
Addressing Modes & Extra Costs
6) Generate target code for the source language statement
(a-b) + (a-c) + (a-c);
The 3AC for this can be written as
t := a b
u := a c
v := t + u
d := v + u //d live at the end
Show the code sequence generated by the simple code generation algorithm
What is its cost? Can it be improved?
Total cost=12
7) What is an activation record?
Informationneededbyasingleexecutionofprocedureismanagedusingacontiguous
blockofstoragecalledanactivationrecordorframe.Itiscustomarytopushthe
activationrecordofaprocedureontheruntimestackwhentheprocedureiscalledand
topoptheactivationrecordoffthestackwhencontrolreturnstothecaller.
8) What are the contents of activation record?