nn ARM core ARM core : key component for many embedded systems that need : key component for many embedded systems that need high code density, small size, low power e.g. cell phones, handheld high code density, small size, low power e.g. cell phones, handheld PDA, camera PDA, camera nn Adopted RISC design philosophy Adopted RISC design philosophy nn Reduced number of Fixed size Instructions (simple and powerful) Reduced number of Fixed size Instructions (simple and powerful) nn Pipelining, Load/Store architecture, Large register set Pipelining, Load/Store architecture, Large register set nn But different from pure RISC But different from pure RISC nn Variable cycle execution for certain instructions Variable cycle execution for certain instructions nn Inline barrel shifter leading to few complex instructions Inline barrel shifter leading to few complex instructions nn Thumb state (16 Thumb state (16--bit instruction set) bit instruction set) nn Conditional execution of instructions Conditional execution of instructions nn DSP instructions DSP instructions nn Pipeline : Pipeline : nn Three basic stages (in ARM7TDMI): fetch, decode, execute Three basic stages (in ARM7TDMI): fetch, decode, execute nn five stages in ARM9 & six in ARM10 five stages in ARM9 & six in ARM10 nn Performance: Performance: MIPS @ MIPS @ Clk Clk freq., freq., mW mW@ (Volt, @ (Volt, Clk Clk freq.) freq.) nn Softwares Softwares for ARM Embedded System : for ARM Embedded System :-- Boot Code, Operating system Boot Code, Operating system & Application programs & Application programs 2 nn Sign Extend Sign Extend -->converts signed 8/16 bit to 32 bit value and places in reg. >converts signed 8/16 bit to 32 bit value and places in reg. nn Two source registers ( Two source registers (Rn Rn and and Rm Rm) and one result register ) and one result register Rd Rd nn Barrel shifter =>preprocess Barrel shifter =>preprocess Rm Rmbefore it enters to ALU before it enters to ALU On Chip Debug Hardware On Chip Debug Hardware 3 4 nn Instructions are 32 Instructions are 32--bit wide and address is bit wide and address is word aligned word aligned CPU STATES and MODES: CPU STATES and MODES: nn Mode determines which registers are active and access rights to Mode determines which registers are active and access rights to Program Status Reg. Program Status Reg. nn Non Non--privileged privileged mode has write access to only condition flags of current mode has write access to only condition flags of current program status register (CPSR) and read access to remaining fields program status register (CPSR) and read access to remaining fields nn After reset, processor is in After reset, processor is in supervisor mode wherein OS kernel operates mode wherein OS kernel operates nn Programs and applications runs in Programs and applications runs in user user mode mode nn IRQ IRQ & & FIQ FIQ are associated with interrupts are associated with interrupts nn Exception modes are the modes other than user and system Exception modes are the modes other than user and system ARM ARM Architecture Architecture 5 nn When the processor is executing in When the processor is executing in ARM ARMstate: state: nn All instructions are 32 bits wide All instructions are 32 bits wide nn All instructions must be word aligned All instructions must be word aligned nn Therefore the Therefore the pc pc value is stored in bits [31:2] with bits [1:0] value is stored in bits [31:2] with bits [1:0] undefined (as instruction cannot be halfword or byte undefined (as instruction cannot be halfword or byte aligned). aligned). nn When the processor is executing in When the processor is executing in Thumb Thumbstate: state: nn All instructions are 16 bits wide All instructions are 16 bits wide nn All instructions must be halfword aligned All instructions must be halfword aligned nn Therefore the Therefore the pc pc value is stored in bits [31:1] with bit [0] value is stored in bits [31:1] with bit [0] undefined (as instruction cannot be byte aligned). undefined (as instruction cannot be byte aligned). nn When the processor is executing in When the processor is executing in J azelle J azelle state: state: nn All instructions are 8 bits wide All instructions are 8 bits wide nn Executes java byte codes Executes java byte codes ARM ARM Architecture Architecture 6 CPSR: CPSR: nn 32 32--bit register with condition flags, control bits, status & ext. bit register with condition flags, control bits, status & ext. nn Only privileged modes have full write access to CPSR Only privileged modes have full write access to CPSR nn Every processor mode except user mode Every processor mode except user mode can change mode can change mode by writing by writing directly to the mode bits of the CPSR. directly to the mode bits of the CPSR. ARM ARM Architecture Architecture nn N = N = NNegative result from ALU (bit 31 of the result) egative result from ALU (bit 31 of the result) nn Z = Z = ZZero result from ALU ero result from ALU nn C = ALU operation results in C = ALU operation results in CCarry (if Subtraction result is arry (if Subtraction result is --ve ve => =>CC reset) reset) nn V = ALU operation V = ALU operation ooVVerflowed erflowed nn Flags are updated only if suffix S is added to instruction Flags are updated only if suffix S is added to instruction 7 Banked Registers: Banked Registers: 8 nn Total 37 registers =30 general purpose +6 status +1 PC Total 37 registers =30 general purpose +6 status +1 PC nn Different set Different set of register in different mode of operation of register in different mode of operation nn User and System mode uses User and System mode uses same set same set of registers of registers nn Shaded registers (banked registers) are hidden from user/system mode and Shaded registers (banked registers) are hidden from user/system mode and available only in available only in exception modes exception modes. . nn R13 =Stack pointer (SP). Each exception mode has its own SP R13 =Stack pointer (SP). Each exception mode has its own SP nn R14 =link register (LR) R14 =link register (LR) -->Holds return address of subroutine when it is >Holds return address of subroutine when it is called with called with BL BL instruction. instruction. nn Each exception mode has its own SP and LR Each exception mode has its own SP and LR BL <cc> subroutine_label BL <cc> subroutine_label (LR automatically stores return add.) (LR automatically stores return add.) nn The return can be in two ways The return can be in two ways nn MOV PC, LR or MOV PC, LR or nn B LR B LR ARM ARM Architecture Architecture 9 ARM ARM Data Processing Data Processing nn Syntax : Syntax : <opcode> {<cc>} {S} Rd, Rn, op2 <opcode> {<cc>} {S} Rd, Rn, op2 nn op2 normally comes from barrel shifter and can be the following: op2 normally comes from barrel shifter and can be the following: nn Rm Rm and and Rs Rs should not be should not be PC (r15) PC (r15) in in shift/rotate by register shift/rotate by register mode of op2 mode of op2 nn shift and rotate affects N,Z,C flags shift and rotate affects N,Z,C flags nn # value # value for shift and rotate is 5 for shift and rotate is 5--bit unsigned integer bit unsigned integer 10 11 ARM ARM The Barrel Shifter The Barrel Shifter Destination CF 0 Destination CF LSL : Logical Left Shift ASR: Arithmetic Right Shift Multiplication by a power of 2 Division by a power of 2, preserving the sign bit Destination CF ...0 Destination CF LSR : Logical Shift Right ROR: Rotate Right Division by a power of 2 Bit rotate with wrap around from LSB to MSB Destination RRX: Rotate Right Extended Single bit rotate with wrap around from CF to MSB CF 12 ARM ARM Data Processing Instructions Data Processing Instructions nn CMP,CMN,TST & TEQ CMP,CMN,TST & TEQ always update flags always update flags (even if S is not used as (even if S is not used as suffix) and do not alter any register. They suffix) and do not alter any register. They use only use only Rn Rn and and OP2 OP2.. nn MOV & MVN use only two operands i.e. Rd and MOV & MVN use only two operands i.e. Rd and op2 op2 13 ARM Immediate Operand ARM Immediate Operand Immediate Operand (32 Immediate Operand (32--bit): bit): nn obtained by obtained by 88--bit constant rotated right bit constant rotated right even number of positions i.e. even number of positions i.e. 0,2,4, ..30. 0,2,4, ..30. nn Instruction code contains Instruction code contains 88--bit for constant bit for constant and and 44--bit for rotate bit for rotate nn The assembler converts immediate values to the rotate form: The assembler converts immediate values to the rotate form: nn MOV r0,#4096 MOV r0,#4096 ; uses 0x40 ror 26 ; uses 0x40 ror 26 nn ADD r1,r2,#0xFF0000 ADD r1,r2,#0xFF0000 ; uses 0xFF ror 16 ; uses 0xFF ror 16 nn Examples: ( range of 32 Examples: ( range of 32--bit constants by rotating #0, #8 & #32 positions) bit constants by rotating #0, #8 & #32 positions) nn Complement of valid 32 Complement of valid 32--bit obtained as above is also valid 32 bit obtained as above is also valid 32--bit constant bit constant nn Valid 32 Valid 32--bit constants : bit constants : 0xFF, 0x104, 0xFF00, 0xF000000F, 0x0FFFFFF0 0xFF, 0x104, 0xFF00, 0xF000000F, 0x0FFFFFF0 nn Invalid 32 Invalid 32--bit Constants bit Constants : 0x101, 0x103, 0xFF1, 0xFF03, 0xFF04 : 0x101, 0x103, 0xFF1, 0xFF03, 0xFF04 14 Data processing: Data processing: nn ADD R9, R5, R5, LSL #3 ADD R9, R5, R5, LSL #3 ; R9 =R5+(R5*8) ; R9 =R5+(R5*8) nn RSB R9, R5, R5, LSR #3 RSB R9, R5, R5, LSR #3 ; R9 =(R5/8) ; R9 =(R5/8) R5 R5 nn MOV R12, R4, ROR R3 MOV R12, R4, ROR R3 ;R12=R4 rotated right by value of R3 ;R12=R4 rotated right by value of R3 nn CMP R7, R5 CMP R7, R5 ; update flags after (R7 ; update flags after (R7--R5) R5) Conditional Execution: Conditional Execution: nn ARM instructions can be made to execute conditionally by ARM instructions can be made to execute conditionally by post fixing post fixing them with the appropriate condition code field. (e.g. MOVEQ R0,R1) them with the appropriate condition code field. (e.g. MOVEQ R0,R1) nn Condition reflects the status of flags Condition reflects the status of flags nn If condition is true, normal execution otherwise no execution. If condition is true, normal execution otherwise no execution. nn Adv. =>Greater pipeline performance and higher code density leading to Adv. =>Greater pipeline performance and higher code density leading to higher instructions throughput higher instructions throughput 15 ARM Conditional Execution ARM Conditional Execution 16 nn Set the flags, and then use various conditional code Set the flags, and then use various conditional code nn CMP r0, # 0 if (a==0) x=0; (here r0 = a, r1= x) CMP r0, # 0 if (a==0) x=0; (here r0 = a, r1= x) nn MOVEQ r1, # 0 if (a>0) x=1; MOVEQ r1, # 0 if (a>0) x=1; nn MOVGT r1, #1 MOVGT r1, #1 nn Set of Conditional compare instruction Set of Conditional compare instruction nn CMP r0, # 4 if (a==4 or a==10) CMP r0, # 4 if (a==4 or a==10) nn CMPNE r0, #10 CMPNE r0, #10 x=0; x=0; nn MOVEQ r1, # 0 MOVEQ r1, # 0 nn Reduces number of instructions Reduces number of instructions While (a!=b) { While (a!=b) { if (a>b) a=a if (a>b) a=a--b; else b=b b; else b=b--a; } (here r1 = a, r2= b) a; } (here r1 = a, r2= b) ------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------ loop: CMP r1,r2 loop1: CMP r1, r2 loop: CMP r1,r2 loop1: CMP r1, r2 BEQ finish BEQ finish SUBGT r1, r1, r2 SUBGT r1, r1, r2 BLT lessthan BLT lessthan SUBLT r2, r2, r1 SUBLT r2, r2, r1 SUB r1, r1, r2 BNE loop1 SUB r1, r1, r2 BNE loop1 B loop B loop lessthan : SUB r2,r2,r1 lessthan : SUB r2,r2,r1 B loop B loop finish finish ARM Conditional Execution ARM Conditional Execution 17 nn B <cc> label B <cc> label : branch to label : branch to label ( MOV LR, PC can be used before above inst. to store return add.) ( MOV LR, PC can be used before above inst. to store return add.) nn BL <cc> subroutine_label BL <cc> subroutine_label (LR automatically stores return add.) (LR automatically stores return add.) The processor core shifts the offset field left by 2 positions, sign The processor core shifts the offset field left by 2 positions, sign-- extends it and adds it to the PC extends it and adds it to the PC nn 32 Mbyte range 32 Mbyte range nn How to perform longer branches? (use BX Rm) How to perform longer branches? (use BX Rm) nn BX Rm BX Rm : branch with exchange : branch with exchange nn If LSB of Rm is 1, processor switches to thumb state otherwise it If LSB of Rm is 1, processor switches to thumb state otherwise it will remain in ARM state. PC=Rm & 0xFFFFFFFE will remain in ARM state. PC=Rm & 0xFFFFFFFE nn Useful to provide interlinking between ARM and Thumb state Useful to provide interlinking between ARM and Thumb state nn BLX Rm BLX Rm :: similar to BX Rm but additionally stores return address in similar to BX Rm but additionally stores return address in LR LR nn BLX label : BLX label : nn Branching in Branching in 32Mbyte range with LR storing return address 32Mbyte range with LR storing return address nn Makes Makes T=1 T=1 and Enters into Thumb state and Enters into Thumb state nn The The TT bit must not be changed by directly writing to CPSR to change bit must not be changed by directly writing to CPSR to change the state of CPU the state of CPU ARM Brach Instructions ARM Brach Instructions 18 ARM ARM Multiply Multiply nn Normal (32 Normal (32--bit result) and long(64 bit result) and long(64--bit result) multiplication bit result) multiplication nn Syntax: Syntax: nn MUL {<cc>} {S} Rd, Rm, Rs MUL {<cc>} {S} Rd, Rm, Rs ; Rd = Rm * Rs ; Rd = Rm * Rs nn MLA {<cc>}{S} Rd,Rm,Rs,Rn MLA {<cc>}{S} Rd,Rm,Rs,Rn ; Rd = (Rm * Rs) + Rn ; Rd = (Rm * Rs) + Rn nn [U or S] MULL{<cond>}{S} RdLo, RdHi, Rm, Rs [U or S] MULL{<cond>}{S} RdLo, RdHi, Rm, Rs ; RdHi,RdLo := Rm*Rs ; RdHi,RdLo := Rm*Rs nn [U or S] MLAL{<cond>}{S} RdLo, RdHi, Rm, Rs [U or S] MLAL{<cond>}{S} RdLo, RdHi, Rm, Rs ; RdHi,RdLo := (Rm*Rs)+RdHi, RdLo ; RdHi,RdLo := (Rm*Rs)+RdHi, RdLo nn MUL and MLA truncates result to least significant 32bits MUL and MLA truncates result to least significant 32bits nn Rd must be different register than Rm or Rs Rd must be different register than Rm or Rs nn Rs and Rm can be swapped Rs and Rm can be swapped nn N and Z flags are affected (of course if suffix S is used) N and Z flags are affected (of course if suffix S is used) 19 ARM ARM Load & Store Instructions Load & Store Instructions nn Data movement between registers and memory Data movement between registers and memory nn Instructions : Instructions : opcode <cc> Rd, <address> opcode <cc> Rd, <address> LDR LDR STR STR ;32 ;32--bit Word load & store bit Word load & store LDRB LDRB STRB STRB ;;Byte load & store Byte load & store LDRH LDRH STRH STRH ;;16 16--bit Halfword load & store bit Halfword load & store LDRSB LDRSB ;;Signed byte load Signed byte load LDRSH LDRSH ;;Signed halfword load Signed halfword load nn LDRB and LDRH copy 8 LDRB and LDRH copy 8--bit and 16 bit and 16--bit quantities from memory bit quantities from memory to destination register and forces high bits of destination to destination register and forces high bits of destination register to zero. For LDRSB and LDRSH the high bits of register to zero. For LDRSB and LDRSH the high bits of destination register is replaced by sign extension destination register is replaced by sign extension nn Address: Address: nn Formed by Formed by base register base register and and offset offset nn Base register can be any general purpose register including PC Base register can be any general purpose register including PC nn Offset ( for 32 Offset ( for 32--bit Word and unsigned Byte) bit Word and unsigned Byte) nn immediate (#12 immediate (#12--bit value) bit value) nn register or register or nn scaled register (Rm with shift/rotate by #immediate only) scaled register (Rm with shift/rotate by #immediate only) nn Offset for H,SH & SB Offset for H,SH & SB ::-- immediate value (#8bit) and register immediate value (#8bit) and register 20 Load & Store Instructions Load & Store Instructions nn Choice of indexing Choice of indexing ::-- Pre Pre--index, Pre index, Pre--index write back and post index index write back and post index addressing addressing nn Post index and Pre Post index and Pre--index write back index write back modify modify base register value. base register value. Examples: Examples:-- nn LDR R8, [R3, # LDR R8, [R3, #--3] ; Load R8 from address R3 3] ; Load R8 from address R3--3 3 (Pre index) (Pre index) nn LDR R3, [R9], #4 ; Load R3 from address R9 then R9=R9+4 LDR R3, [R9], #4 ; Load R3 from address R9 then R9=R9+4 nn (post index) (post index) nn STRB R7, [R6, # STRB R7, [R6, #--1] ! ; Store byte at R6 1] ! ; Store byte at R6--1 from R7 and then decrement 1 from R7 and then decrement R6. R6. (pre index with write back) (pre index with write back) nn LDR R0, [PC, LDR R0, [PC, --R2] ; load R0 from PC R2] ; load R0 from PC--R2 R2 nn LDR R11, [R3, R5, LSL #2] ;Load R11 from R3 +R5*4 LDR R11, [R3, R5, LSL #2] ;Load R11 from R3 +R5*4 Note: Note: By default, we assume By default, we assume little endian little endian format where lower byte format where lower byte of word is stored at lower address. In of word is stored at lower address. In big endian big endian format lower byte format lower byte of word is stored at higher address. of word is stored at higher address. 21 ARM ARM Pre & Post indexing Pre & Post indexing 0x5 0x5 r1 0x200 Base Register 0x200 r0 0x5 Source Register for STR Offset 12 0x20c r1 0x200 Original Base Register 0x200 r0 0x5 Source Register for STR Offset 12 0x20c r1 0x20c Updated Base Register Pre-indexed write back : STR r0,[r1,#12]! nn Pre Pre--indexed: indexed: STR r0, [r1, #12] STR r0, [r1, #12] nn Post Post--indexed indexed: STR r0, [r1], #12 : STR r0, [r1], #12 =>R1=0x20c after instruction 22 ARM Load/Store Multiple ARM Load/Store Multiple nn Multiple register load and store with single instruction Multiple register load and store with single instruction nn Syntax : Syntax : nn LDM <CC> < LDM <CC> <add_mode add_mode> > Rn Rn {!} , {registers}{^} {!} , {registers}{^} nn STM <CC> < STM <CC> <add_mode add_mode> > Rn Rn {!} , {registers}{^} {!} , {registers}{^} where where add_mode add_mode ::-- IA | IB | DA | DB | IA | IB | DA | DB | Rn Rn ((base address) base address) ::-- must not be PC, must not appear in register must not be PC, must not appear in register list if list if !! (write back) is specified (write back) is specified nn Block memory copy: Block memory copy: R9 R9 -->points to start source, R11 >points to start source, R11-->points to >points to end of source, R10 end of source, R10 -->points to start of destination >points to start of destination loop : loop : LDMIA R9!, {R0} LDMIA R9!, {R0} STMIA R10!, {R0} STMIA R10!, {R0} CMP R9,R11 CMP R9,R11 BNE loop BNE loop nn Stack Stack Opertions Opertions:: nn SP replaces SP replaces Rn Rn nn add_mode add_mode ::-- FD | FA | ED | EA FD | FA | ED | EA 23 ARM Stack Operations ARM Stack Operations Example : Example : Let Let R1=0x00000002, R4=0x00000003,SP=0x00000814 R1=0x00000002, R4=0x00000003,SP=0x00000814 nn STMFD sp! , {R1,R4} ; full descending stack write STMFD sp! , {R1,R4} ; full descending stack write After inst.: SP=0x0000080c , mem[0x810]=R4, mem[0x80c]=R1 After inst.: SP=0x0000080c , mem[0x810]=R4, mem[0x80c]=R1 nn Only Exception modes use ^ (not used in user/system mode) Only Exception modes use ^ (not used in user/system mode) nn F and E signify whether SP points to location that is full or empty F and E signify whether SP points to location that is full or empty nn Stack is either Stack is either ascending ascending(growing towards high memory add.) or (growing towards high memory add.) or descending descending(growing towards low memory add.) (growing towards low memory add.) nn One of the following pair is used to save context at start of One of the following pair is used to save context at start of routine/hander and retrieve context at the end of routine/handler routine/hander and retrieve context at the end of routine/handler 24 25 ARM Miscellaneous Instr. ARM Miscellaneous Instr. nn SWP <cc>Rd, Rm, [Rn] SWP <cc>Rd, Rm, [Rn] nn Swap a word between memory and a register Swap a word between memory and a register nn tmp= mem32[Rn], mem32[Rn]=Rm and Rd=tmp tmp= mem32[Rn], mem32[Rn]=Rm and Rd=tmp nn SWPB <cc>Rd, Rm, [Rn] SWPB <cc>Rd, Rm, [Rn] nn Swap a byte between memory and a register Swap a byte between memory and a register nn Tmp=mem8[Rn], mem8[Rn]=Rm and Rd=tmp Tmp=mem8[Rn], mem8[Rn]=Rm and Rd=tmp nn The swap instruction is The swap instruction is atomic atomic-- it reads and writes a location in the it reads and writes a location in the same bus cycle. Useful in implementing semaphore and mutual same bus cycle. Useful in implementing semaphore and mutual exclusion. exclusion. CPSR instructions: CPSR instructions: nn MRS {<cc>} Rd, <CPSR | SPSR> ;copy from PSR to MRS {<cc>} Rd, <CPSR | SPSR> ;copy from PSR to register register nn MSR {<cc>} <CPSR | SPSR>_<fields>, Rm MSR {<cc>} <CPSR | SPSR>_<fields>, Rm nn MSR {<cc>} <CPSR | SPSR>_<fields>, # immediate MSR {<cc>} <CPSR | SPSR>_<fields>, # immediate nn <fields>can be <fields>can be f, s, x f, s, x and and cc representing respective byte of representing respective byte of CPSR/SPSR CPSR/SPSR nn MSR cpsr_c, R0 ; update only control byte of CPSR MSR cpsr_c, R0 ; update only control byte of CPSR nn MSR cpsr_fsc, R0 ; update flags, status and control byte MSR cpsr_fsc, R0 ; update flags, status and control byte of CPSR of CPSR nn In user mode you can read all CPSR bits but you can update only In user mode you can read all CPSR bits but you can update only ff byte byte 26 nn Count leading zeros : CLZ <cc>Rd, Rm Count leading zeros : CLZ <cc>Rd, Rm Pseudo Instructions: Pseudo Instructions: nn LDR Rd, =constant LDR Rd, =constant (assembly pseudo instruction) (assembly pseudo instruction) if constant can be constructed with MOV or MVN then if constant can be constructed with MOV or MVN then this instruction is actually generated. Otherwise this instruction is actually generated. Otherwise assembler generates a PC assembler generates a PC--relative LDR instruction relative LDR instruction that reads the constant from the literal pool. that reads the constant from the literal pool. You are responsible for ensuring that there is a literal You are responsible for ensuring that there is a literal pool within 4KB range. pool within 4KB range. nn ADR Rd, label ADR Rd, label this pseudo instruction writes address of label into this pseudo instruction writes address of label into register, using PC register, using PC--relative expression relative expression 27 Exceptions: Exceptions: nn Generated by internal (e.g. undefined inst.) or external (e.g. Generated by internal (e.g. undefined inst.) or external (e.g. interrupts) sources interrupts) sources nn On exception, processor changes the mode. The address of On exception, processor changes the mode. The address of next instruction is copied to next instruction is copied to LR_<mode> LR_<mode>and CPSR is copied and CPSR is copied to to SPSR_<mode> SPSR_<mode>. Here . Here LR_<mode> LR_<mode>and and SPSR_<mode> SPSR_<mode>are are LR and SPSR of newly entered exception mode LR and SPSR of newly entered exception mode nn Forceful mode change doesnt copy CPSR to Forceful mode change doesnt copy CPSR to SPSR_<mode> SPSR_<mode> ARM ARM Exceptions Exceptions 28 ARM Exceptions ARM Exceptions nn Events from internal and external sources that diverts normal flow Events from internal and external sources that diverts normal flow of execution of execution nn Reset Reset and and SWI SWI switches processor to Supervisor mode switches processor to Supervisor mode nn Exception vector table Exception vector table -->starting address of exception handler >starting address of exception handler nn Each exception handler need to restore registers and state of CPU Each exception handler need to restore registers and state of CPU 29 ARM ARM Exceptions Exceptions nn When an exception occurs, the ARM When an exception occurs, the ARM automatically automatically:: nn Copies CPSR into Copies CPSR into SPSR_<mode> SPSR_<mode> nn Sets appropriate CPSR bits to Sets appropriate CPSR bits to nn Switch to ARM state (i.e. makes T=0) Switch to ARM state (i.e. makes T=0) nn Change exception mode Change exception mode nn Disable interrupts IRQ Disable interrupts IRQ nn Disable FIQ Disable FIQ only when FIQ & reset occurs only when FIQ & reset occurs nn Stores the return address Stores the return address (i.e. PC (i.e. PC -- 4) 4) in in LR_<mode> LR_<mode> nn Sets PC to Sets PC to vector vector address address nn To return, exception handler needs to: To return, exception handler needs to: nn Restore CPSR from SPSR_<mode> Restore CPSR from SPSR_<mode> nn Restore PC from LR_<mode> Restore PC from LR_<mode> 30 ARM ARM Exceptions Exceptions Return from Exceptions: Return from Exceptions: nn When exception occurs, return address stored in LR (i.e.PC When exception occurs, return address stored in LR (i.e.PC--4) 4) may not be address of next instruction (because PC may or may may not be address of next instruction (because PC may or may not be updated when exception occurs) not be updated when exception occurs) nn Normally PC points to instruction being fetched, PC Normally PC points to instruction being fetched, PC--4 points to 4 points to instruction decoded and PC instruction decoded and PC--8 points to instruction executed 8 points to instruction executed nn Return from SWI and undefined instruction: Return from SWI and undefined instruction: nn PC is not updated when these exception are taken. So PC is not updated when these exception are taken. So PC PC-- 44 is the actually return address which is already there in LR is the actually return address which is already there in LR nn Return from handler : Return from handler : MOVS PC, LR MOVS PC, LR nn Return from IRQ and FIQ exception: Return from IRQ and FIQ exception: nn Interrupt exception occurs only after PC is updated. So PC Interrupt exception occurs only after PC is updated. So PC--4 4 is pointing to one instruction beyond the actual return address is pointing to one instruction beyond the actual return address nn Return from handler : Return from handler : SUB LR, LR, #4 SUB LR, LR, #4 MOVS PC, LR MOVS PC, LR ARM ARM Exceptions Exceptions nn Return from pre Return from pre--fetch abort : fetch abort : nn PC not updated, so to return on same instruction PC not updated, so to return on same instruction nn Return : Return : SUB LR, LR, #4 SUB LR, LR, #4 MOVS PC, LR MOVS PC, LR nn Return from Data Abort: Return from Data Abort: nn PC is updated, so to return on same instruction PC is updated, so to return on same instruction nn Return : Return : SUB LR, LR, #8 SUB LR, LR, #8 MOVS PC, LR MOVS PC, LR nn Suffix Suffix SS after MOV & SUB => after MOV & SUB => restores restores CPSR from SPSR_mode CPSR from SPSR_mode 31 32 ARM Exceptions ARM Exceptions nn Exception Priorities: Exception Priorities: nn Reset is highest priority exception initializes memory, caches, Reset is highest priority exception initializes memory, caches, stack pointer etc. stack pointer etc. nn Lowest priority is shared by two mutually exclusive exceptions: Lowest priority is shared by two mutually exclusive exceptions: SWI and Undefined SWI and Undefined nn IRQ is disabled when any exception occurs IRQ is disabled when any exception occurs nn FIQ is disabled only when FIQ is disabled only when Reset or FIQ Reset or FIQ occurs otherwise occurs otherwise remains unchanged remains unchanged nn Placing Data Abort above FIQ exception ensures that data abort Placing Data Abort above FIQ exception ensures that data abort is is actually registered actually registeredbefore FIQ is handled. before FIQ is handled. 33 Software Interrupt Software Interrupt nn User mode uses SWI instruction (that causes exception) to User mode uses SWI instruction (that causes exception) to access privileged operation (e.g. OS services) from access privileged operation (e.g. OS services) from Supervisor mode Supervisor mode nn Syntax : Syntax : SWI <cc> SWI_number(24bit) SWI <cc> SWI_number(24bit) nn SWI_number : SWI_number :-- represents a particular service or feature represents a particular service or feature of OS of OS nn SWI_number =SWI_opcode AND (0x00ffffff) SWI_number =SWI_opcode AND (0x00ffffff) nn When CPU executes SWI instruction: When CPU executes SWI instruction: nn Copies CPSR to Copies CPSR to SPSR_svc SPSR_svc of Supervisor mode of Supervisor mode nn Set appropriate CPSR bits to Set appropriate CPSR bits to nn Change exception mode Change exception mode nn Disable IRQ Disable IRQ nn Stores return address in Stores return address in LR_svc LR_svc nn Set PC to vector address Set PC to vector address 34 Software Interrupt Software Interrupt nn Top level SWI handler Top level SWI handler determines SWI_number and uses this number determines SWI_number and uses this number to call appropriate SWI service routine. to call appropriate SWI service routine. nn STMFD SP!, {R0 STMFD SP!, {R0--R12, LR_svc} ; save context of user mode R12, LR_svc} ; save context of user mode nn LDR R10, [LR, # LDR R10, [LR, #-- 4] ; read SWI instruction opcode 4] ; read SWI instruction opcode nn AND R10, R10, #0x00FFFFFF ; get 24 AND R10, R10, #0x00FFFFFF ; get 24--bit number in R10 bit number in R10 nn MOV R10, R10 LSL #2 ; word align the offset MOV R10, R10 LSL #2 ; word align the offset nn ADD R9, R9, R10 ADD R9, R9, R10 ; add base R9 to offset ; add base R9 to offset nn BLX R9 ; go to appropriate location in BLX R9 ; go to appropriate location in jump table jump table nn LDMFD SP!, {R0 LDMFD SP!, {R0--R12, PC}^ ;return from handler (to user R12, PC}^ ;return from handler (to user mode), restore registers and CPSR mode), restore registers and CPSR nn R9 is pointer to the R9 is pointer to the beginning of beginning of jump table jump table. R10 (offset) picks out a . R10 (offset) picks out a particular entry from jump table particular entry from jump table nn ^in last instruction causes SPSR_svc to be copied to CPSR ^in last instruction causes SPSR_svc to be copied to CPSR automatically if PC appears in list automatically if PC appears in list 35 Software Interrupt Software Interrupt nn Instruction InstructionBL BL jump_table jump_table save return address in LR_SVC. save return address in LR_SVC. Routine num0 returns to supervisor mode after completion. Routine num0 returns to supervisor mode after completion. nn When context is restored in Supervisor mode, LR is copied to When context is restored in Supervisor mode, LR is copied to PC and switches back to user mode PC and switches back to user mode nn Software interrupt can be Software interrupt can be nested nested by writing SWI instruction in by writing SWI instruction in SWI routine SWI routine 36 Nested SWIs Nested SWIs Reentrant SWI Handling: Reentrant SWI Handling: nn Corruption of SPSR and LR by nested SWI calls causes Corruption of SPSR and LR by nested SWI calls causes problem e.g. 2 problem e.g. 2 nd nd SWI exception in 1 SWI exception in 1 st st SWI routine may SWI routine may corrupt SPSR_SVC and LR_SVC corrupt SPSR_SVC and LR_SVC nn Remedy: Remedy: nn Save Context (i.e.registers, SPSR and LR ) at the Save Context (i.e.registers, SPSR and LR ) at the beginning of Handler so that each SWI call preserves beginning of Handler so that each SWI call preserves environment of caller. When SWI routine completed, environment of caller. When SWI routine completed, restore Context. restore Context. nn Following assembly code of SWI hander is reentrant and Following assembly code of SWI hander is reentrant and safely handles nested SWI calls safely handles nested SWI calls nn Register R9 (base address) is pointing to beginning of Register R9 (base address) is pointing to beginning of Branch Table Branch Table SWI handler : SWI handler : STMFD SP!, {R0-R12,LR} ; Store registers and LR_SVC MRS R2, SPSR ; Get SPSR_SVC into register R2 STR R2, [SP, #-4]! ; Store SPSR_SVC in stack 37 Nested SWIs Nested SWIs LDR R10, [LR, # LDR R10, [LR, #-- 4] ; read SWI instruction opcode 4] ; read SWI instruction opcode AND R10, R10, #0x00FFFFFF ; get 24 AND R10, R10, #0x00FFFFFF ; get 24--bit number in R10 bit number in R10 MOV R10, R10 LSL #2 ; word align the offset MOV R10, R10 LSL #2 ; word align the offset ADD R9, R9, R10 ADD R9, R9, R10 ; add base R9 to offset ; add base R9 to offset BLX R9 ; go to appropriate location in BLX R9 ; go to appropriate location in branch table branch table LDR R2, [SP], #4 ; Restore SPSR_SVC from stack LDR R2, [SP], #4 ; Restore SPSR_SVC from stack MSR SPSR, R2 MSR SPSR, R2 LDMFD SP!, {R0 LDMFD SP!, {R0--R12,LR} ; restore registers R12,LR} ; restore registers MOVS PC, LR MOVS PC, LR ; Return from current routine ; Return from current routine 38 Software Interrupt Software Interrupt nn Suffix S Suffix S in MOVS signifies that SPSR is also copied to CPSR in MOVS signifies that SPSR is also copied to CPSR 39 Thumb Instructions Thumb Instructions nn On average, thumb program takes 35% less memory (high On average, thumb program takes 35% less memory (high code density) code density) nn 16 16--bit fixed size instructions =>higher performance than ARM bit fixed size instructions =>higher performance than ARM with 16 with 16--bit data bus bit data bus nn How Thumb instructions differ from ARM? How Thumb instructions differ from ARM? nn Only branch instruction (B label) is executed conditionally Only branch instruction (B label) is executed conditionally nn Barrel shift operations are separate instructions Barrel shift operations are separate instructions nn Multiple load/store (LDM/STM) support only IA mode. Multiple load/store (LDM/STM) support only IA mode. nn PUSH & POP instructions for stack operation ( only full PUSH & POP instructions for stack operation ( only full descending stack) descending stack) nn No instruction to access CPSR, SPSR and co No instruction to access CPSR, SPSR and co--processor processor nn Restricted Register access Restricted Register access nn You must switch to ARM state to alter CPSR & SPSR and to You must switch to ARM state to alter CPSR & SPSR and to access coprocessor access coprocessor 40 Thumb Instructions Thumb Instructions nn ARM ARM--Thumb inter Thumb inter--working: working: BX and BLX instructions of ARM and Thumb does same BX and BLX instructions of ARM and Thumb does same thing thing CODE32 ; followings are word aligned codes CODE32 ; followings are word aligned codes LDR R0, thumbcode +1 ; set LSB of R0 to 1, point LDR R0, thumbcode +1 ; set LSB of R0 to 1, point R0[31:1] to thumbcode R0[31:1] to thumbcode MOV LR, PC MOV LR, PC ; store return address ; store return address BX R0 BX R0 ; branch to thumb state ; branch to thumb state --------------------------------------------------------------------------- --------------------------------------------------------------------------- CODE16 ; followings are half word aligned codes CODE16 ; followings are half word aligned codes thumbcode thumbcode ADD R1, #1 ; thumb instructions ADD R1, #1 ; thumb instructions . . . . . . . ; . . . . . . . ; BX LR BX LR ; return to ARM state ; return to ARM state Thumb Instructions: Thumb Instructions: nn Branch Instructions: Branch Instructions: 41 Thumb Instructions Thumb Instructions B <cc> label B <cc> label : branch to label with condition : branch to label with condition nn Branch range is Branch range is --256 to +254 256 to +254 B label B label : branch to label without conditional code : branch to label without conditional code nn Branch range is Branch range is --2048 to +2046 2048 to +2046 BL <cc> subroutine_label BL <cc> subroutine_label (LR automatically stores return add.) (LR automatically stores return add.) nn 4 Mbytes range 4 Mbytes range BX Rm BX Rm : branch with exchange : branch with exchange nn If LSB of Rm is 0, processor switches to ARM state otherwise it If LSB of Rm is 0, processor switches to ARM state otherwise it will remain in THUMB state. PC =Rm & 0xFFFFFFFE will remain in THUMB state. PC =Rm & 0xFFFFFFFE BLX Rm BLX Rm :: similar to BX Rm but additionally stores return address similar to BX Rm but additionally stores return address in LR in LR BLX label BLX label nn Branching in Branching in 4 Mbytes range with LR storing return address 4 Mbytes range with LR storing return address nn Makes Makes T=0 T=0 and Enters into ARM state and Enters into ARM state 42 nn Data Processing Instructions: Data Processing Instructions: nn ADD/ADC/AND/BIC/EOR/MOV/MUL/MVN/NEG/ORR/ ADD/ADC/AND/BIC/EOR/MOV/MUL/MVN/NEG/ORR/ SBC/SUB Rd, Rn SBC/SUB Rd, Rn nn ADD/SUB Rd, Rn #immed3 ADD/SUB Rd, Rn #immed3 nn ADD/MOV/SUB Rd, #immed8 ADD/MOV/SUB Rd, #immed8 nn ADD/SUB Rd, Rn, Rm ADD/SUB Rd, Rn, Rm nn ADD Rd, PC, #immed8*4 (i.e. 0,4,8, . ,1020) ADD Rd, PC, #immed8*4 (i.e. 0,4,8, . ,1020) nn ADD Rd, SP, #immed8*4 ADD Rd, SP, #immed8*4 nn ADD/ SUB SP, #immed7*4 (i.e. 0,4,8, .., 508) ADD/ SUB SP, #immed7*4 (i.e. 0,4,8, .., 508) nn CMN/CMP/TST Rn, Rm CMN/CMP/TST Rn, Rm nn CMP Rn, #immed8 CMP Rn, #immed8 nn MOV Rn, Rd MOV Rn, Rd Barrel Shift Instructions: Barrel Shift Instructions: nn LSL/LSR/ASR Rd, Rm, #immed5 LSL/LSR/ASR Rd, Rm, #immed5 nn ASR/LSL/LSR/ROR Rd, Rs ASR/LSL/LSR/ROR Rd, Rs nn Single Register Load/Store Instructions Single Register Load/Store Instructions nn LDR/STR {B|H} Rd, [Rn, #immed5] LDR/STR {B|H} Rd, [Rn, #immed5] nn LDR { H | SB | SH } Rd, [Rn, Rm] LDR { H | SB | SH } Rd, [Rn, Rm] nn STR {B | H} Rd, [Rn, Rm] STR {B | H} Rd, [Rn, Rm] nn LDR Rd, [PC, #immed8*4] LDR Rd, [PC, #immed8*4] nn LDR / STR Rd, [SP, #immed8*4] LDR / STR Rd, [SP, #immed8*4] Thumb Instructions Thumb Instructions 43 nn Multiple Register Load/Store Multiple Register Load/Store nn LDM / STM {IA } Rn!, {low register list} LDM / STM {IA } Rn!, {low register list} nn Stack Instructions: Stack Instructions: nn POP {low register_list, PC } POP {low register_list, PC } nn PUSH {low register_list, LR } PUSH {low register_list, LR } nn There is no SP in instruction but SP is automatically There is no SP in instruction but SP is automatically updated updated nn Stack is always full descending Stack is always full descending nn Software Interrupt: Software Interrupt: SWI SWI Number(8 Number(8--bit) bit) nn Switches to ARM state and takes similar actions as ARM Switches to ARM state and takes similar actions as ARM equivalent SWI equivalent SWI nn Unlike ARM it cant be executed conditionally Unlike ARM it cant be executed conditionally Thumb Instructions Thumb Instructions 44 ARM ARM Programs Programs [1] Bit [1] Bit--Field Manipulation: Field Manipulation: nn Packing/Unpacking of bit fields (variable size) e.g. variable length Packing/Unpacking of bit fields (variable size) e.g. variable length code code nn Used to create compressed file that packs item at bit granularity Used to create compressed file that packs item at bit granularity Ex: Ex:-- Bit Field Pack/Unpack: Bit Field Pack/Unpack: R0 contains code to be written to R1. Let R0 contains code to be written to R1. Let Rm Rmcontains value of no. contains value of no. free bits available in R1 and codelen is length of code free bits available in R1 and codelen is length of code Algorithm for Variable Length Code Packing: Algorithm for Variable Length Code Packing: nn Pack variable length code to create bytestream Pack variable length code to create bytestream nn Initially codes are packed in 32 Initially codes are packed in 32--bit buffer (reg. R1) from MSB to LSB. bit buffer (reg. R1) from MSB to LSB. Once buffer is full, it can be stored to memory Once buffer is full, it can be stored to memory nn Sometimes code needs to be split into two parts. We make buffer full Sometimes code needs to be split into two parts. We make buffer full with 1 with 1 st st part, store the buffer in memory and write 2 part, store the buffer in memory and write 2 nd nd part in empty part in empty buffer buffer 45 nn Three functions in packing: (1) Align byte Three functions in packing: (1) Align byte--stream pointer (2)insert stream pointer (2)insert codes to codes to bitbuff bitbuff and store and store bitbuff bitbuff in mem in mem.( .(3) finishing of byte stream 3) finishing of byte stream nn Byte stream pointer may not be word aligned at the end of write. Next Byte stream pointer may not be word aligned at the end of write. Next write must begin with word aligned address write must begin with word aligned address nn ARM CODE: ARM CODE: bytestream R0 ; current byte add. in output stream bytestream R0 ; current byte add. in output stream code code R4 ; current code R4 ; current code 46 codelen R5 ; length of current code codelen R5 ; length of current code bitbuff R6 bitbuff R6 ; 32 ; 32--bit bit big endian big endian buffer buffer bitsfree R7 bitsfree R7 ; no. of bits free in bitbuff ; no. of bits free in bitbuff temp R8 ; used bits of bitbuff temp R8 ; used bits of bitbuff write_start write_start ; ; 11 st st routine routine (to word align bytestream) (to word align bytestream) MOV bitbuffer, #0 MOV bitbuffer, #0 MOV bitsfree, # 32 MOV bitsfree, # 32 align_loop: align_loop: TST bytestream, #3 TST bytestream, #3 ; is bytestream is aligned ? ; is bytestream is aligned ? LDRNEB code, [bytestream, # LDRNEB code, [bytestream, # --1]! ; if not, get byte 1]! ; if not, get byte SUBNE bitsfree, bitsfree, # 8 SUBNE bitsfree, bitsfree, # 8 ; update bitsfree ; update bitsfree ORRNE bitbuff, code, bitbuff, ROR # 8 ; copy byte in bitbuff ORRNE bitbuff, code, bitbuff, ROR # 8 ; copy byte in bitbuff BNE align_loop ; loop until bytestream is aligned BNE align_loop ; loop until bytestream is aligned MOV bitbuff, bitbuff, ROR #8 ; adjust bitbuff MOV bitbuff, bitbuff, ROR #8 ; adjust bitbuff MOV PC, LR ; return MOV PC, LR ; return -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 47 write_code write_code ; ; 22 nd nd routine ( to write codes in buffer & store buffer if routine ( to write codes in buffer & store buffer if ;; it gets full ) it gets full ) SUBS bitsfree, bitsfree, codelen ; is bitsfree > code length? SUBS bitsfree, bitsfree, codelen ; is bitsfree > code length? BLE buff_full ; if not branch to buff_full BLE buff_full ; if not branch to buff_full ORR bitbuff, bitbuff, code, LSL bitsfree ; otherwise write code ORR bitbuff, bitbuff, code, LSL bitsfree ; otherwise write code MOV PC, LR MOV PC, LR ; return ; return buff_full: buff_full: RSB bitsfree, bitsfree, # 0 ; make bitsfree positive RSB bitsfree, bitsfree, # 0 ; make bitsfree positive ORR bitbuff, bitbuff, code, LSR bitsfree ; write 1 ORR bitbuff, bitbuff, code, LSR bitsfree ; write 1 st st part of split code part of split code STR bitbuff, [bytestream], #4 ; store bitbuff in memory STR bitbuff, [bytestream], #4 ; store bitbuff in memory RSB bitsfree, bitsfree, #32 RSB bitsfree, bitsfree, #32 ; update bitsfree ; update bitsfree MOV bitbuff, code, LSL bitsfree ; write 2 MOV bitbuff, code, LSL bitsfree ; write 2 nd nd part of split code part of split code MOV PC, LR MOV PC, LR ; return ; return -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 48 write_finish write_finish ; ; 33 RD RD routine routine (to finish packing) (to finish packing) RSBS temp, bitsfree, #32 ; temp = no. of used bits in bitbuff RSBS temp, bitsfree, #32 ; temp = no. of used bits in bitbuff finish_loop: finish_loop: STRGTB bitbuff, [bytestream], # 1 ;start storing bytes of bitbuff in STRGTB bitbuff, [bytestream], # 1 ;start storing bytes of bitbuff in MOVGT bitbuff, bitbuff, LSL # 8 ; memory from MSB MOVGT bitbuff, bitbuff, LSL # 8 ; memory from MSB SUBGTS temp, temp, #8 ; update temp SUBGTS temp, temp, #8 ; update temp BGT finish_loop BGT finish_loop ; loop till temp >0 ; loop till temp >0 MOV PC, LR MOV PC, LR ; return ; return Note: Note: Above code assumes big endian data transfer Above code assumes big endian data transfer [2] SIMD processing: [2] SIMD processing: nn Let us consider graphics example of processing multiple 8 Let us consider graphics example of processing multiple 8--bit pixels of an bit pixels of an image image nn Problem : merge two images X and Y to produce new image Z by scaling X Problem : merge two images X and Y to produce new image Z by scaling X with a/256 and Y with 1 with a/256 and Y with 1-- (a/256) where 0<a<256. (a/256) where 0<a<256. nn let x let x nn and y and y nn and z and z nn denotes nth 8 denotes nth 8--bit pixel of X, Y and Z bit pixel of X, Y and Z nn zn =( a/256 x zn =( a/256 x nn +{1 +{1-- a/256)}y a/256)}y nn nn Zn =w Zn =w nn /256 where w /256 where w nn =a(x =a(x nn yy nn ) +256 yn ) +256 yn nn We load four pixels at once in 32 We load four pixels at once in 32--bit ARM register xx =[x3,x2,x1,x0] bit ARM register xx =[x3,x2,x1,x0] nn We need two expanded pixels in ARM register x =[0,x2,0,x0] We need two expanded pixels in ARM register x =[0,x2,0,x0] ARM Programs ARM Programs 49 IMG_W equ 176 IMG_W equ 176 IMG_H equ 144 IMG_H equ 144 pz pz R0 ; pointer to destination image R0 ; pointer to destination image px px R1 ; pointer to first image X R1 ; pointer to first image X py py R2 ; pointer to second image Y R2 ; pointer to second image Y aaR3 ; 8 R3 ; 8--bit scaling factor bit scaling factor xx xx R4 ; holds four pixels of X R4 ; holds four pixels of X yy yy R5 ; holds four pixels of Y R5 ; holds four pixels of Y x x R6 ; holds two expanded pixels of X i.e. [0, x2, 0, x0] R6 ; holds two expanded pixels of X i.e. [0, x2, 0, x0] yy R7 ; holds two expanded pixels of Y i.e. [0, y2, 0, y0] R7 ; holds two expanded pixels of Y i.e. [0, y2, 0, y0] zz R8 ; holds four pixels of Z R8 ; holds four pixels of Z cnt cnt R9 ; number of remaining pixels R9 ; number of remaining pixels STMFD sp!, {R4 STMFD sp!, {R4--R8, LR } R8, LR } MOV cnt, #IMG_W * IMG_H MOV cnt, #IMG_W * IMG_H LDR mask, =0x00FF00FF LDR mask, =0x00FF00FF loop: loop: LDR xx, [px], #4 LDR xx, [px], #4 LDR yy, [py], #4 LDR yy, [py], #4 50 AND x, mask, xx AND x, mask, xx AND y, mask, yy AND y, mask, yy SUB x, x, y SUB x, x, y MUL x, a, x MUL x, a, x ADD x, x, y, LSL #8 ADD x, x, y, LSL #8 AND z, mask, x, LSR#8 AND z, mask, x, LSR#8 AND x, mask, xx, LSR #8 AND x, mask, xx, LSR #8 AND y, mask, yy, LSR #8 AND y, mask, yy, LSR #8 SUB x, x, y SUB x, x, y MUL x, a, x MUL x, a, x ADD x, x, y, LSL #8 ADD x, x, y, LSL #8 AND x, mask, x, LSR #8 AND x, mask, x, LSR #8 ORR z, z, x, LSL #8 ORR z, z, x, LSL #8 STR z, [pz], #4 STR z, [pz], #4 SUBS cnt, cnt, #4 SUBS cnt, cnt, #4 BGT loop BGT loop LDMFD sp!, {r4 LDMFD sp!, {r4--r8, PC} r8, PC} 51 52 ARM7 TDMI block diagram ARM7 TDMI block diagram 53 External Interface through AMBA Bus External Interface through AMBA Bus 54 AMBA Interface Inst. & data cache MMU ARM Core CP15 EmbeddedICE & JTAG Write Buffer AMBA Address AMBA Data Virtual Address Physical Address Inst. & data nn JTAG TAP controller: JTAG TAP controller: nn Basically used to test PCB assembly, interconnect or even sub block Basically used to test PCB assembly, interconnect or even sub block inside IC without any physical prob. inside IC without any physical prob. nn J TAG scan chain =>embedded solution to testing an IC for certain J TAG scan chain =>embedded solution to testing an IC for certain static faults (shorts, opens, and logic errors). static faults (shorts, opens, and logic errors). nn ICs supporting J TAG will have the four additional pins : ICs supporting J TAG will have the four additional pins : Test Clock Test Clock ((TCK TCK), ), Test Mode Select Test Mode Select ((TMS TMS), ), Test Data Input Test Data Input ((TDI TDI), and ), and Test Data Test Data Output Output ((TDO TDO). ). nn Embedded ICE (In Circuit Emulator): Embedded ICE (In Circuit Emulator): nn Used to debug software of embedded system through Used to debug software of embedded system through breakpoints breakpoints and and watch watch--points points nn Breakpoint is an address at which program execution halts Breakpoint is an address at which program execution halts nn Watch point is a Watch point is a value value that may combine address, data or control that may combine address, data or control signals. When match occurs, debug event is generated that halts signals. When match occurs, debug event is generated that halts processor execution processor execution nn uses J TAG as the transport mechanism to access on uses J TAG as the transport mechanism to access on--chip debug chip debug modules inside the target CPU modules inside the target CPU 55 DATA BUS DATA BUS nn Uni & Bidirectional Data Bus: Uni & Bidirectional Data Bus: nn When When BUSEN BUSEN is HIGH, all instruction and input data are presented to is HIGH, all instruction and input data are presented to DIN[31:0] whereas output data appears on DOUT[31:0] DIN[31:0] whereas output data appears on DOUT[31:0] nn When BUSEN is LOW, only bidirectional D[31:0] is used When BUSEN is LOW, only bidirectional D[31:0] is used nn Unidirectional data bus is used for coprocessor/external IC connection Unidirectional data bus is used for coprocessor/external IC connection 56