Вы находитесь на странице: 1из 56

1

ARM ARM Architecture Architecture


nn ARM core ARM core : key component for many embedded systems that need : key component for many embedded systems that need
high code density, small size, low power e.g. cell phones, handheld high code density, small size, low power e.g. cell phones, handheld
PDA, camera PDA, camera
nn Adopted RISC design philosophy Adopted RISC design philosophy
nn Reduced number of Fixed size Instructions (simple and powerful) Reduced number of Fixed size Instructions (simple and powerful)
nn Pipelining, Load/Store architecture, Large register set Pipelining, Load/Store architecture, Large register set
nn But different from pure RISC But different from pure RISC
nn Variable cycle execution for certain instructions Variable cycle execution for certain instructions
nn Inline barrel shifter leading to few complex instructions Inline barrel shifter leading to few complex instructions
nn Thumb state (16 Thumb state (16--bit instruction set) bit instruction set)
nn Conditional execution of instructions Conditional execution of instructions
nn DSP instructions DSP instructions
nn Pipeline : Pipeline :
nn Three basic stages (in ARM7TDMI): fetch, decode, execute Three basic stages (in ARM7TDMI): fetch, decode, execute
nn five stages in ARM9 & six in ARM10 five stages in ARM9 & six in ARM10
nn Performance: Performance: MIPS @ MIPS @ Clk Clk freq., freq., mW mW@ (Volt, @ (Volt, Clk Clk freq.) freq.)
nn Softwares Softwares for ARM Embedded System : for ARM Embedded System :-- Boot Code, Operating system Boot Code, Operating system
& Application programs & Application programs
2
nn Sign Extend Sign Extend -->converts signed 8/16 bit to 32 bit value and places in reg. >converts signed 8/16 bit to 32 bit value and places in reg.
nn Two source registers ( Two source registers (Rn Rn and and Rm Rm) and one result register ) and one result register Rd Rd
nn Barrel shifter =>preprocess Barrel shifter =>preprocess Rm Rmbefore it enters to ALU before it enters to ALU
On Chip Debug Hardware On Chip Debug Hardware
3
4
nn Instructions are 32 Instructions are 32--bit wide and address is bit wide and address is word aligned word aligned
CPU STATES and MODES: CPU STATES and MODES:
nn Mode determines which registers are active and access rights to Mode determines which registers are active and access rights to
Program Status Reg. Program Status Reg.
nn Non Non--privileged privileged mode has write access to only condition flags of current mode has write access to only condition flags of current
program status register (CPSR) and read access to remaining fields program status register (CPSR) and read access to remaining fields
nn After reset, processor is in After reset, processor is in supervisor mode wherein OS kernel operates mode wherein OS kernel operates
nn Programs and applications runs in Programs and applications runs in user user mode mode
nn IRQ IRQ & & FIQ FIQ are associated with interrupts are associated with interrupts
nn Exception modes are the modes other than user and system Exception modes are the modes other than user and system
ARM ARM Architecture Architecture
5
nn When the processor is executing in When the processor is executing in ARM ARMstate: state:
nn All instructions are 32 bits wide All instructions are 32 bits wide
nn All instructions must be word aligned All instructions must be word aligned
nn Therefore the Therefore the pc pc value is stored in bits [31:2] with bits [1:0] value is stored in bits [31:2] with bits [1:0]
undefined (as instruction cannot be halfword or byte undefined (as instruction cannot be halfword or byte
aligned). aligned).
nn When the processor is executing in When the processor is executing in Thumb Thumbstate: state:
nn All instructions are 16 bits wide All instructions are 16 bits wide
nn All instructions must be halfword aligned All instructions must be halfword aligned
nn Therefore the Therefore the pc pc value is stored in bits [31:1] with bit [0] value is stored in bits [31:1] with bit [0]
undefined (as instruction cannot be byte aligned). undefined (as instruction cannot be byte aligned).
nn When the processor is executing in When the processor is executing in J azelle J azelle state: state:
nn All instructions are 8 bits wide All instructions are 8 bits wide
nn Executes java byte codes Executes java byte codes
ARM ARM Architecture Architecture
6
CPSR: CPSR:
nn 32 32--bit register with condition flags, control bits, status & ext. bit register with condition flags, control bits, status & ext.
nn Only privileged modes have full write access to CPSR Only privileged modes have full write access to CPSR
nn Every processor mode except user mode Every processor mode except user mode can change mode can change mode by writing by writing
directly to the mode bits of the CPSR. directly to the mode bits of the CPSR.
ARM ARM Architecture Architecture
nn N = N = NNegative result from ALU (bit 31 of the result) egative result from ALU (bit 31 of the result)
nn Z = Z = ZZero result from ALU ero result from ALU
nn C = ALU operation results in C = ALU operation results in CCarry (if Subtraction result is arry (if Subtraction result is --ve ve => =>CC reset) reset)
nn V = ALU operation V = ALU operation ooVVerflowed erflowed
nn Flags are updated only if suffix S is added to instruction Flags are updated only if suffix S is added to instruction
7
Banked Registers: Banked Registers:
8
nn Total 37 registers =30 general purpose +6 status +1 PC Total 37 registers =30 general purpose +6 status +1 PC
nn Different set Different set of register in different mode of operation of register in different mode of operation
nn User and System mode uses User and System mode uses same set same set of registers of registers
nn Shaded registers (banked registers) are hidden from user/system mode and Shaded registers (banked registers) are hidden from user/system mode and
available only in available only in exception modes exception modes. .
nn R13 =Stack pointer (SP). Each exception mode has its own SP R13 =Stack pointer (SP). Each exception mode has its own SP
nn R14 =link register (LR) R14 =link register (LR) -->Holds return address of subroutine when it is >Holds return address of subroutine when it is
called with called with BL BL instruction. instruction.
nn Each exception mode has its own SP and LR Each exception mode has its own SP and LR
BL <cc> subroutine_label BL <cc> subroutine_label (LR automatically stores return add.) (LR automatically stores return add.)
nn The return can be in two ways The return can be in two ways
nn MOV PC, LR or MOV PC, LR or
nn B LR B LR
ARM ARM Architecture Architecture
9
ARM ARM Data Processing Data Processing
nn Syntax : Syntax : <opcode> {<cc>} {S} Rd, Rn, op2 <opcode> {<cc>} {S} Rd, Rn, op2
nn op2 normally comes from barrel shifter and can be the following: op2 normally comes from barrel shifter and can be the following:
nn Rm Rm and and Rs Rs should not be should not be PC (r15) PC (r15) in in shift/rotate by register shift/rotate by register mode of op2 mode of op2
nn shift and rotate affects N,Z,C flags shift and rotate affects N,Z,C flags
nn # value # value for shift and rotate is 5 for shift and rotate is 5--bit unsigned integer bit unsigned integer
10
11
ARM ARM The Barrel Shifter The Barrel Shifter
Destination CF 0 Destination CF
LSL : Logical Left Shift
ASR: Arithmetic Right Shift
Multiplication by a power of 2
Division by a power of 2,
preserving the sign bit
Destination CF ...0 Destination CF
LSR : Logical Shift Right ROR: Rotate Right
Division by a power of 2 Bit rotate with wrap around
from LSB to MSB
Destination
RRX: Rotate Right Extended
Single bit rotate with wrap around
from CF to MSB
CF
12
ARM ARM Data Processing Instructions Data Processing Instructions
nn CMP,CMN,TST & TEQ CMP,CMN,TST & TEQ always update flags always update flags (even if S is not used as (even if S is not used as
suffix) and do not alter any register. They suffix) and do not alter any register. They use only use only Rn Rn and and OP2 OP2..
nn MOV & MVN use only two operands i.e. Rd and MOV & MVN use only two operands i.e. Rd and op2 op2
13
ARM Immediate Operand ARM Immediate Operand
Immediate Operand (32 Immediate Operand (32--bit): bit):
nn obtained by obtained by 88--bit constant rotated right bit constant rotated right even number of positions i.e. even number of positions i.e.
0,2,4, ..30. 0,2,4, ..30.
nn Instruction code contains Instruction code contains 88--bit for constant bit for constant and and 44--bit for rotate bit for rotate
nn The assembler converts immediate values to the rotate form: The assembler converts immediate values to the rotate form:
nn MOV r0,#4096 MOV r0,#4096 ; uses 0x40 ror 26 ; uses 0x40 ror 26
nn ADD r1,r2,#0xFF0000 ADD r1,r2,#0xFF0000 ; uses 0xFF ror 16 ; uses 0xFF ror 16
nn Examples: ( range of 32 Examples: ( range of 32--bit constants by rotating #0, #8 & #32 positions) bit constants by rotating #0, #8 & #32 positions)
nn Complement of valid 32 Complement of valid 32--bit obtained as above is also valid 32 bit obtained as above is also valid 32--bit constant bit constant
nn Valid 32 Valid 32--bit constants : bit constants : 0xFF, 0x104, 0xFF00, 0xF000000F, 0x0FFFFFF0 0xFF, 0x104, 0xFF00, 0xF000000F, 0x0FFFFFF0
nn Invalid 32 Invalid 32--bit Constants bit Constants : 0x101, 0x103, 0xFF1, 0xFF03, 0xFF04 : 0x101, 0x103, 0xFF1, 0xFF03, 0xFF04
14
Data processing: Data processing:
nn ADD R9, R5, R5, LSL #3 ADD R9, R5, R5, LSL #3 ; R9 =R5+(R5*8) ; R9 =R5+(R5*8)
nn RSB R9, R5, R5, LSR #3 RSB R9, R5, R5, LSR #3 ; R9 =(R5/8) ; R9 =(R5/8) R5 R5
nn MOV R12, R4, ROR R3 MOV R12, R4, ROR R3 ;R12=R4 rotated right by value of R3 ;R12=R4 rotated right by value of R3
nn CMP R7, R5 CMP R7, R5 ; update flags after (R7 ; update flags after (R7--R5) R5)
Conditional Execution: Conditional Execution:
nn ARM instructions can be made to execute conditionally by ARM instructions can be made to execute conditionally by post fixing post fixing
them with the appropriate condition code field. (e.g. MOVEQ R0,R1) them with the appropriate condition code field. (e.g. MOVEQ R0,R1)
nn Condition reflects the status of flags Condition reflects the status of flags
nn If condition is true, normal execution otherwise no execution. If condition is true, normal execution otherwise no execution.
nn Adv. =>Greater pipeline performance and higher code density leading to Adv. =>Greater pipeline performance and higher code density leading to
higher instructions throughput higher instructions throughput
15
ARM Conditional Execution ARM Conditional Execution
16
nn Set the flags, and then use various conditional code Set the flags, and then use various conditional code
nn CMP r0, # 0 if (a==0) x=0; (here r0 = a, r1= x) CMP r0, # 0 if (a==0) x=0; (here r0 = a, r1= x)
nn MOVEQ r1, # 0 if (a>0) x=1; MOVEQ r1, # 0 if (a>0) x=1;
nn MOVGT r1, #1 MOVGT r1, #1
nn Set of Conditional compare instruction Set of Conditional compare instruction
nn CMP r0, # 4 if (a==4 or a==10) CMP r0, # 4 if (a==4 or a==10)
nn CMPNE r0, #10 CMPNE r0, #10 x=0; x=0;
nn MOVEQ r1, # 0 MOVEQ r1, # 0
nn Reduces number of instructions Reduces number of instructions
While (a!=b) { While (a!=b) {
if (a>b) a=a if (a>b) a=a--b; else b=b b; else b=b--a; } (here r1 = a, r2= b) a; } (here r1 = a, r2= b)
------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------
loop: CMP r1,r2 loop1: CMP r1, r2 loop: CMP r1,r2 loop1: CMP r1, r2
BEQ finish BEQ finish SUBGT r1, r1, r2 SUBGT r1, r1, r2
BLT lessthan BLT lessthan SUBLT r2, r2, r1 SUBLT r2, r2, r1
SUB r1, r1, r2 BNE loop1 SUB r1, r1, r2 BNE loop1
B loop B loop
lessthan : SUB r2,r2,r1 lessthan : SUB r2,r2,r1
B loop B loop
finish finish
ARM Conditional Execution ARM Conditional Execution
17
nn B <cc> label B <cc> label : branch to label : branch to label
( MOV LR, PC can be used before above inst. to store return add.) ( MOV LR, PC can be used before above inst. to store return add.)
nn BL <cc> subroutine_label BL <cc> subroutine_label (LR automatically stores return add.) (LR automatically stores return add.)
The processor core shifts the offset field left by 2 positions, sign The processor core shifts the offset field left by 2 positions, sign--
extends it and adds it to the PC extends it and adds it to the PC
nn 32 Mbyte range 32 Mbyte range
nn How to perform longer branches? (use BX Rm) How to perform longer branches? (use BX Rm)
nn BX Rm BX Rm : branch with exchange : branch with exchange
nn If LSB of Rm is 1, processor switches to thumb state otherwise it If LSB of Rm is 1, processor switches to thumb state otherwise it
will remain in ARM state. PC=Rm & 0xFFFFFFFE will remain in ARM state. PC=Rm & 0xFFFFFFFE
nn Useful to provide interlinking between ARM and Thumb state Useful to provide interlinking between ARM and Thumb state
nn BLX Rm BLX Rm :: similar to BX Rm but additionally stores return address in similar to BX Rm but additionally stores return address in
LR LR
nn BLX label : BLX label :
nn Branching in Branching in 32Mbyte range with LR storing return address 32Mbyte range with LR storing return address
nn Makes Makes T=1 T=1 and Enters into Thumb state and Enters into Thumb state
nn The The TT bit must not be changed by directly writing to CPSR to change bit must not be changed by directly writing to CPSR to change
the state of CPU the state of CPU
ARM Brach Instructions ARM Brach Instructions
18
ARM ARM Multiply Multiply
nn Normal (32 Normal (32--bit result) and long(64 bit result) and long(64--bit result) multiplication bit result) multiplication
nn Syntax: Syntax:
nn MUL {<cc>} {S} Rd, Rm, Rs MUL {<cc>} {S} Rd, Rm, Rs ; Rd = Rm * Rs ; Rd = Rm * Rs
nn MLA {<cc>}{S} Rd,Rm,Rs,Rn MLA {<cc>}{S} Rd,Rm,Rs,Rn ; Rd = (Rm * Rs) + Rn ; Rd = (Rm * Rs) + Rn
nn [U or S] MULL{<cond>}{S} RdLo, RdHi, Rm, Rs [U or S] MULL{<cond>}{S} RdLo, RdHi, Rm, Rs
; RdHi,RdLo := Rm*Rs ; RdHi,RdLo := Rm*Rs
nn [U or S] MLAL{<cond>}{S} RdLo, RdHi, Rm, Rs [U or S] MLAL{<cond>}{S} RdLo, RdHi, Rm, Rs
; RdHi,RdLo := (Rm*Rs)+RdHi, RdLo ; RdHi,RdLo := (Rm*Rs)+RdHi, RdLo
nn MUL and MLA truncates result to least significant 32bits MUL and MLA truncates result to least significant 32bits
nn Rd must be different register than Rm or Rs Rd must be different register than Rm or Rs
nn Rs and Rm can be swapped Rs and Rm can be swapped
nn N and Z flags are affected (of course if suffix S is used) N and Z flags are affected (of course if suffix S is used)
19
ARM ARM Load & Store Instructions Load & Store Instructions
nn Data movement between registers and memory Data movement between registers and memory
nn Instructions : Instructions : opcode <cc> Rd, <address> opcode <cc> Rd, <address>
LDR LDR STR STR ;32 ;32--bit Word load & store bit Word load & store
LDRB LDRB STRB STRB ;;Byte load & store Byte load & store
LDRH LDRH STRH STRH ;;16 16--bit Halfword load & store bit Halfword load & store
LDRSB LDRSB ;;Signed byte load Signed byte load
LDRSH LDRSH ;;Signed halfword load Signed halfword load
nn LDRB and LDRH copy 8 LDRB and LDRH copy 8--bit and 16 bit and 16--bit quantities from memory bit quantities from memory
to destination register and forces high bits of destination to destination register and forces high bits of destination
register to zero. For LDRSB and LDRSH the high bits of register to zero. For LDRSB and LDRSH the high bits of
destination register is replaced by sign extension destination register is replaced by sign extension
nn Address: Address:
nn Formed by Formed by base register base register and and offset offset
nn Base register can be any general purpose register including PC Base register can be any general purpose register including PC
nn Offset ( for 32 Offset ( for 32--bit Word and unsigned Byte) bit Word and unsigned Byte)
nn immediate (#12 immediate (#12--bit value) bit value)
nn register or register or
nn scaled register (Rm with shift/rotate by #immediate only) scaled register (Rm with shift/rotate by #immediate only)
nn Offset for H,SH & SB Offset for H,SH & SB ::-- immediate value (#8bit) and register immediate value (#8bit) and register
20
Load & Store Instructions Load & Store Instructions
nn Choice of indexing Choice of indexing ::-- Pre Pre--index, Pre index, Pre--index write back and post index index write back and post index
addressing addressing
nn Post index and Pre Post index and Pre--index write back index write back modify modify base register value. base register value.
Examples: Examples:--
nn LDR R8, [R3, # LDR R8, [R3, #--3] ; Load R8 from address R3 3] ; Load R8 from address R3--3 3 (Pre index) (Pre index)
nn LDR R3, [R9], #4 ; Load R3 from address R9 then R9=R9+4 LDR R3, [R9], #4 ; Load R3 from address R9 then R9=R9+4
nn (post index) (post index)
nn STRB R7, [R6, # STRB R7, [R6, #--1] ! ; Store byte at R6 1] ! ; Store byte at R6--1 from R7 and then decrement 1 from R7 and then decrement
R6. R6. (pre index with write back) (pre index with write back)
nn LDR R0, [PC, LDR R0, [PC, --R2] ; load R0 from PC R2] ; load R0 from PC--R2 R2
nn LDR R11, [R3, R5, LSL #2] ;Load R11 from R3 +R5*4 LDR R11, [R3, R5, LSL #2] ;Load R11 from R3 +R5*4
Note: Note: By default, we assume By default, we assume little endian little endian format where lower byte format where lower byte
of word is stored at lower address. In of word is stored at lower address. In big endian big endian format lower byte format lower byte
of word is stored at higher address. of word is stored at higher address.
21
ARM ARM Pre & Post indexing Pre & Post indexing
0x5
0x5
r1
0x200
Base
Register 0x200
r0
0x5
Source
Register
for STR
Offset
12
0x20c
r1
0x200
Original
Base
Register
0x200
r0
0x5
Source
Register
for STR
Offset
12
0x20c
r1
0x20c
Updated
Base
Register
Pre-indexed write back : STR r0,[r1,#12]!
nn Pre Pre--indexed: indexed: STR r0, [r1, #12] STR r0, [r1, #12]
nn Post Post--indexed indexed: STR r0, [r1], #12 : STR r0, [r1], #12
=>R1=0x20c after instruction
22
ARM Load/Store Multiple ARM Load/Store Multiple
nn Multiple register load and store with single instruction Multiple register load and store with single instruction
nn Syntax : Syntax :
nn LDM <CC> < LDM <CC> <add_mode add_mode> > Rn Rn {!} , {registers}{^} {!} , {registers}{^}
nn STM <CC> < STM <CC> <add_mode add_mode> > Rn Rn {!} , {registers}{^} {!} , {registers}{^}
where where add_mode add_mode ::-- IA | IB | DA | DB | IA | IB | DA | DB |
Rn Rn ((base address) base address) ::-- must not be PC, must not appear in register must not be PC, must not appear in register
list if list if !! (write back) is specified (write back) is specified
nn Block memory copy: Block memory copy: R9 R9 -->points to start source, R11 >points to start source, R11-->points to >points to
end of source, R10 end of source, R10 -->points to start of destination >points to start of destination
loop : loop : LDMIA R9!, {R0} LDMIA R9!, {R0}
STMIA R10!, {R0} STMIA R10!, {R0}
CMP R9,R11 CMP R9,R11
BNE loop BNE loop
nn Stack Stack Opertions Opertions::
nn SP replaces SP replaces Rn Rn
nn add_mode add_mode ::-- FD | FA | ED | EA FD | FA | ED | EA
23
ARM Stack Operations ARM Stack Operations
Example : Example : Let Let R1=0x00000002, R4=0x00000003,SP=0x00000814 R1=0x00000002, R4=0x00000003,SP=0x00000814
nn STMFD sp! , {R1,R4} ; full descending stack write STMFD sp! , {R1,R4} ; full descending stack write
After inst.: SP=0x0000080c , mem[0x810]=R4, mem[0x80c]=R1 After inst.: SP=0x0000080c , mem[0x810]=R4, mem[0x80c]=R1
nn Only Exception modes use ^ (not used in user/system mode) Only Exception modes use ^ (not used in user/system mode)
nn F and E signify whether SP points to location that is full or empty F and E signify whether SP points to location that is full or empty
nn Stack is either Stack is either ascending ascending(growing towards high memory add.) or (growing towards high memory add.) or
descending descending(growing towards low memory add.) (growing towards low memory add.)
nn One of the following pair is used to save context at start of One of the following pair is used to save context at start of
routine/hander and retrieve context at the end of routine/handler routine/hander and retrieve context at the end of routine/handler
24
25
ARM Miscellaneous Instr. ARM Miscellaneous Instr.
nn SWP <cc>Rd, Rm, [Rn] SWP <cc>Rd, Rm, [Rn]
nn Swap a word between memory and a register Swap a word between memory and a register
nn tmp= mem32[Rn], mem32[Rn]=Rm and Rd=tmp tmp= mem32[Rn], mem32[Rn]=Rm and Rd=tmp
nn SWPB <cc>Rd, Rm, [Rn] SWPB <cc>Rd, Rm, [Rn]
nn Swap a byte between memory and a register Swap a byte between memory and a register
nn Tmp=mem8[Rn], mem8[Rn]=Rm and Rd=tmp Tmp=mem8[Rn], mem8[Rn]=Rm and Rd=tmp
nn The swap instruction is The swap instruction is atomic atomic-- it reads and writes a location in the it reads and writes a location in the
same bus cycle. Useful in implementing semaphore and mutual same bus cycle. Useful in implementing semaphore and mutual
exclusion. exclusion.
CPSR instructions: CPSR instructions:
nn MRS {<cc>} Rd, <CPSR | SPSR> ;copy from PSR to MRS {<cc>} Rd, <CPSR | SPSR> ;copy from PSR to
register register
nn MSR {<cc>} <CPSR | SPSR>_<fields>, Rm MSR {<cc>} <CPSR | SPSR>_<fields>, Rm
nn MSR {<cc>} <CPSR | SPSR>_<fields>, # immediate MSR {<cc>} <CPSR | SPSR>_<fields>, # immediate
nn <fields>can be <fields>can be f, s, x f, s, x and and cc representing respective byte of representing respective byte of
CPSR/SPSR CPSR/SPSR
nn MSR cpsr_c, R0 ; update only control byte of CPSR MSR cpsr_c, R0 ; update only control byte of CPSR
nn MSR cpsr_fsc, R0 ; update flags, status and control byte MSR cpsr_fsc, R0 ; update flags, status and control byte
of CPSR of CPSR
nn In user mode you can read all CPSR bits but you can update only In user mode you can read all CPSR bits but you can update only ff
byte byte
26
nn Count leading zeros : CLZ <cc>Rd, Rm Count leading zeros : CLZ <cc>Rd, Rm
Pseudo Instructions: Pseudo Instructions:
nn LDR Rd, =constant LDR Rd, =constant (assembly pseudo instruction) (assembly pseudo instruction)
if constant can be constructed with MOV or MVN then if constant can be constructed with MOV or MVN then
this instruction is actually generated. Otherwise this instruction is actually generated. Otherwise
assembler generates a PC assembler generates a PC--relative LDR instruction relative LDR instruction
that reads the constant from the literal pool. that reads the constant from the literal pool.
You are responsible for ensuring that there is a literal You are responsible for ensuring that there is a literal
pool within 4KB range. pool within 4KB range.
nn ADR Rd, label ADR Rd, label
this pseudo instruction writes address of label into this pseudo instruction writes address of label into
register, using PC register, using PC--relative expression relative expression
27
Exceptions: Exceptions:
nn Generated by internal (e.g. undefined inst.) or external (e.g. Generated by internal (e.g. undefined inst.) or external (e.g.
interrupts) sources interrupts) sources
nn On exception, processor changes the mode. The address of On exception, processor changes the mode. The address of
next instruction is copied to next instruction is copied to LR_<mode> LR_<mode>and CPSR is copied and CPSR is copied
to to SPSR_<mode> SPSR_<mode>. Here . Here LR_<mode> LR_<mode>and and SPSR_<mode> SPSR_<mode>are are
LR and SPSR of newly entered exception mode LR and SPSR of newly entered exception mode
nn Forceful mode change doesnt copy CPSR to Forceful mode change doesnt copy CPSR to SPSR_<mode> SPSR_<mode>
ARM ARM Exceptions Exceptions
28
ARM Exceptions ARM Exceptions
nn Events from internal and external sources that diverts normal flow Events from internal and external sources that diverts normal flow
of execution of execution
nn Reset Reset and and SWI SWI switches processor to Supervisor mode switches processor to Supervisor mode
nn Exception vector table Exception vector table -->starting address of exception handler >starting address of exception handler
nn Each exception handler need to restore registers and state of CPU Each exception handler need to restore registers and state of CPU
29
ARM ARM Exceptions Exceptions
nn When an exception occurs, the ARM When an exception occurs, the ARM automatically automatically::
nn Copies CPSR into Copies CPSR into SPSR_<mode> SPSR_<mode>
nn Sets appropriate CPSR bits to Sets appropriate CPSR bits to
nn Switch to ARM state (i.e. makes T=0) Switch to ARM state (i.e. makes T=0)
nn Change exception mode Change exception mode
nn Disable interrupts IRQ Disable interrupts IRQ
nn Disable FIQ Disable FIQ only when FIQ & reset occurs only when FIQ & reset occurs
nn Stores the return address Stores the return address
(i.e. PC (i.e. PC -- 4) 4) in in LR_<mode> LR_<mode>
nn Sets PC to Sets PC to vector vector address address
nn To return, exception handler needs to: To return, exception handler needs to:
nn Restore CPSR from SPSR_<mode> Restore CPSR from SPSR_<mode>
nn Restore PC from LR_<mode> Restore PC from LR_<mode>
30
ARM ARM Exceptions Exceptions
Return from Exceptions: Return from Exceptions:
nn When exception occurs, return address stored in LR (i.e.PC When exception occurs, return address stored in LR (i.e.PC--4) 4)
may not be address of next instruction (because PC may or may may not be address of next instruction (because PC may or may
not be updated when exception occurs) not be updated when exception occurs)
nn Normally PC points to instruction being fetched, PC Normally PC points to instruction being fetched, PC--4 points to 4 points to
instruction decoded and PC instruction decoded and PC--8 points to instruction executed 8 points to instruction executed
nn Return from SWI and undefined instruction: Return from SWI and undefined instruction:
nn PC is not updated when these exception are taken. So PC is not updated when these exception are taken. So PC PC-- 44
is the actually return address which is already there in LR is the actually return address which is already there in LR
nn Return from handler : Return from handler : MOVS PC, LR MOVS PC, LR
nn Return from IRQ and FIQ exception: Return from IRQ and FIQ exception:
nn Interrupt exception occurs only after PC is updated. So PC Interrupt exception occurs only after PC is updated. So PC--4 4
is pointing to one instruction beyond the actual return address is pointing to one instruction beyond the actual return address
nn Return from handler : Return from handler :
SUB LR, LR, #4 SUB LR, LR, #4
MOVS PC, LR MOVS PC, LR
ARM ARM Exceptions Exceptions
nn Return from pre Return from pre--fetch abort : fetch abort :
nn PC not updated, so to return on same instruction PC not updated, so to return on same instruction
nn Return : Return :
SUB LR, LR, #4 SUB LR, LR, #4
MOVS PC, LR MOVS PC, LR
nn Return from Data Abort: Return from Data Abort:
nn PC is updated, so to return on same instruction PC is updated, so to return on same instruction
nn Return : Return :
SUB LR, LR, #8 SUB LR, LR, #8
MOVS PC, LR MOVS PC, LR
nn Suffix Suffix SS after MOV & SUB => after MOV & SUB => restores restores CPSR from SPSR_mode CPSR from SPSR_mode
31
32
ARM Exceptions ARM Exceptions
nn Exception Priorities: Exception Priorities:
nn Reset is highest priority exception initializes memory, caches, Reset is highest priority exception initializes memory, caches,
stack pointer etc. stack pointer etc.
nn Lowest priority is shared by two mutually exclusive exceptions: Lowest priority is shared by two mutually exclusive exceptions:
SWI and Undefined SWI and Undefined
nn IRQ is disabled when any exception occurs IRQ is disabled when any exception occurs
nn FIQ is disabled only when FIQ is disabled only when Reset or FIQ Reset or FIQ occurs otherwise occurs otherwise
remains unchanged remains unchanged
nn Placing Data Abort above FIQ exception ensures that data abort Placing Data Abort above FIQ exception ensures that data abort
is is actually registered actually registeredbefore FIQ is handled. before FIQ is handled.
33
Software Interrupt Software Interrupt
nn User mode uses SWI instruction (that causes exception) to User mode uses SWI instruction (that causes exception) to
access privileged operation (e.g. OS services) from access privileged operation (e.g. OS services) from
Supervisor mode Supervisor mode
nn Syntax : Syntax : SWI <cc> SWI_number(24bit) SWI <cc> SWI_number(24bit)
nn SWI_number : SWI_number :-- represents a particular service or feature represents a particular service or feature
of OS of OS
nn SWI_number =SWI_opcode AND (0x00ffffff) SWI_number =SWI_opcode AND (0x00ffffff)
nn When CPU executes SWI instruction: When CPU executes SWI instruction:
nn Copies CPSR to Copies CPSR to SPSR_svc SPSR_svc of Supervisor mode of Supervisor mode
nn Set appropriate CPSR bits to Set appropriate CPSR bits to
nn Change exception mode Change exception mode
nn Disable IRQ Disable IRQ
nn Stores return address in Stores return address in LR_svc LR_svc
nn Set PC to vector address Set PC to vector address
34
Software Interrupt Software Interrupt
nn Top level SWI handler Top level SWI handler determines SWI_number and uses this number determines SWI_number and uses this number
to call appropriate SWI service routine. to call appropriate SWI service routine.
nn STMFD SP!, {R0 STMFD SP!, {R0--R12, LR_svc} ; save context of user mode R12, LR_svc} ; save context of user mode
nn LDR R10, [LR, # LDR R10, [LR, #-- 4] ; read SWI instruction opcode 4] ; read SWI instruction opcode
nn AND R10, R10, #0x00FFFFFF ; get 24 AND R10, R10, #0x00FFFFFF ; get 24--bit number in R10 bit number in R10
nn MOV R10, R10 LSL #2 ; word align the offset MOV R10, R10 LSL #2 ; word align the offset
nn ADD R9, R9, R10 ADD R9, R9, R10 ; add base R9 to offset ; add base R9 to offset
nn BLX R9 ; go to appropriate location in BLX R9 ; go to appropriate location in
jump table jump table
nn LDMFD SP!, {R0 LDMFD SP!, {R0--R12, PC}^ ;return from handler (to user R12, PC}^ ;return from handler (to user
mode), restore registers and CPSR mode), restore registers and CPSR
nn R9 is pointer to the R9 is pointer to the beginning of beginning of jump table jump table. R10 (offset) picks out a . R10 (offset) picks out a
particular entry from jump table particular entry from jump table
nn ^in last instruction causes SPSR_svc to be copied to CPSR ^in last instruction causes SPSR_svc to be copied to CPSR
automatically if PC appears in list automatically if PC appears in list
35
Software Interrupt Software Interrupt
nn Instruction InstructionBL BL jump_table jump_table save return address in LR_SVC. save return address in LR_SVC.
Routine num0 returns to supervisor mode after completion. Routine num0 returns to supervisor mode after completion.
nn When context is restored in Supervisor mode, LR is copied to When context is restored in Supervisor mode, LR is copied to
PC and switches back to user mode PC and switches back to user mode
nn Software interrupt can be Software interrupt can be nested nested by writing SWI instruction in by writing SWI instruction in
SWI routine SWI routine
36
Nested SWIs Nested SWIs
Reentrant SWI Handling: Reentrant SWI Handling:
nn Corruption of SPSR and LR by nested SWI calls causes Corruption of SPSR and LR by nested SWI calls causes
problem e.g. 2 problem e.g. 2
nd nd
SWI exception in 1 SWI exception in 1
st st
SWI routine may SWI routine may
corrupt SPSR_SVC and LR_SVC corrupt SPSR_SVC and LR_SVC
nn Remedy: Remedy:
nn Save Context (i.e.registers, SPSR and LR ) at the Save Context (i.e.registers, SPSR and LR ) at the
beginning of Handler so that each SWI call preserves beginning of Handler so that each SWI call preserves
environment of caller. When SWI routine completed, environment of caller. When SWI routine completed,
restore Context. restore Context.
nn Following assembly code of SWI hander is reentrant and Following assembly code of SWI hander is reentrant and
safely handles nested SWI calls safely handles nested SWI calls
nn Register R9 (base address) is pointing to beginning of Register R9 (base address) is pointing to beginning of
Branch Table Branch Table
SWI handler : SWI handler :
STMFD SP!, {R0-R12,LR} ; Store registers and LR_SVC
MRS R2, SPSR ; Get SPSR_SVC into register R2
STR R2, [SP, #-4]! ; Store SPSR_SVC in stack
37
Nested SWIs Nested SWIs
LDR R10, [LR, # LDR R10, [LR, #-- 4] ; read SWI instruction opcode 4] ; read SWI instruction opcode
AND R10, R10, #0x00FFFFFF ; get 24 AND R10, R10, #0x00FFFFFF ; get 24--bit number in R10 bit number in R10
MOV R10, R10 LSL #2 ; word align the offset MOV R10, R10 LSL #2 ; word align the offset
ADD R9, R9, R10 ADD R9, R9, R10 ; add base R9 to offset ; add base R9 to offset
BLX R9 ; go to appropriate location in BLX R9 ; go to appropriate location in
branch table branch table
LDR R2, [SP], #4 ; Restore SPSR_SVC from stack LDR R2, [SP], #4 ; Restore SPSR_SVC from stack
MSR SPSR, R2 MSR SPSR, R2
LDMFD SP!, {R0 LDMFD SP!, {R0--R12,LR} ; restore registers R12,LR} ; restore registers
MOVS PC, LR MOVS PC, LR ; Return from current routine ; Return from current routine
38
Software Interrupt Software Interrupt
nn Suffix S Suffix S in MOVS signifies that SPSR is also copied to CPSR in MOVS signifies that SPSR is also copied to CPSR
39
Thumb Instructions Thumb Instructions
nn On average, thumb program takes 35% less memory (high On average, thumb program takes 35% less memory (high
code density) code density)
nn 16 16--bit fixed size instructions =>higher performance than ARM bit fixed size instructions =>higher performance than ARM
with 16 with 16--bit data bus bit data bus
nn How Thumb instructions differ from ARM? How Thumb instructions differ from ARM?
nn Only branch instruction (B label) is executed conditionally Only branch instruction (B label) is executed conditionally
nn Barrel shift operations are separate instructions Barrel shift operations are separate instructions
nn Multiple load/store (LDM/STM) support only IA mode. Multiple load/store (LDM/STM) support only IA mode.
nn PUSH & POP instructions for stack operation ( only full PUSH & POP instructions for stack operation ( only full
descending stack) descending stack)
nn No instruction to access CPSR, SPSR and co No instruction to access CPSR, SPSR and co--processor processor
nn Restricted Register access Restricted Register access
nn You must switch to ARM state to alter CPSR & SPSR and to You must switch to ARM state to alter CPSR & SPSR and to
access coprocessor access coprocessor
40
Thumb Instructions Thumb Instructions
nn ARM ARM--Thumb inter Thumb inter--working: working:
BX and BLX instructions of ARM and Thumb does same BX and BLX instructions of ARM and Thumb does same
thing thing
CODE32 ; followings are word aligned codes CODE32 ; followings are word aligned codes
LDR R0, thumbcode +1 ; set LSB of R0 to 1, point LDR R0, thumbcode +1 ; set LSB of R0 to 1, point
R0[31:1] to thumbcode R0[31:1] to thumbcode
MOV LR, PC MOV LR, PC ; store return address ; store return address
BX R0 BX R0 ; branch to thumb state ; branch to thumb state
--------------------------------------------------------------------------- ---------------------------------------------------------------------------
CODE16 ; followings are half word aligned codes CODE16 ; followings are half word aligned codes
thumbcode thumbcode
ADD R1, #1 ; thumb instructions ADD R1, #1 ; thumb instructions
. . . . . . . ; . . . . . . . ;
BX LR BX LR ; return to ARM state ; return to ARM state
Thumb Instructions: Thumb Instructions:
nn Branch Instructions: Branch Instructions:
41
Thumb Instructions Thumb Instructions
B <cc> label B <cc> label : branch to label with condition : branch to label with condition
nn Branch range is Branch range is --256 to +254 256 to +254
B label B label : branch to label without conditional code : branch to label without conditional code
nn Branch range is Branch range is --2048 to +2046 2048 to +2046
BL <cc> subroutine_label BL <cc> subroutine_label (LR automatically stores return add.) (LR automatically stores return add.)
nn 4 Mbytes range 4 Mbytes range
BX Rm BX Rm : branch with exchange : branch with exchange
nn If LSB of Rm is 0, processor switches to ARM state otherwise it If LSB of Rm is 0, processor switches to ARM state otherwise it
will remain in THUMB state. PC =Rm & 0xFFFFFFFE will remain in THUMB state. PC =Rm & 0xFFFFFFFE
BLX Rm BLX Rm :: similar to BX Rm but additionally stores return address similar to BX Rm but additionally stores return address
in LR in LR
BLX label BLX label
nn Branching in Branching in 4 Mbytes range with LR storing return address 4 Mbytes range with LR storing return address
nn Makes Makes T=0 T=0 and Enters into ARM state and Enters into ARM state
42
nn Data Processing Instructions: Data Processing Instructions:
nn ADD/ADC/AND/BIC/EOR/MOV/MUL/MVN/NEG/ORR/ ADD/ADC/AND/BIC/EOR/MOV/MUL/MVN/NEG/ORR/
SBC/SUB Rd, Rn SBC/SUB Rd, Rn
nn ADD/SUB Rd, Rn #immed3 ADD/SUB Rd, Rn #immed3
nn ADD/MOV/SUB Rd, #immed8 ADD/MOV/SUB Rd, #immed8
nn ADD/SUB Rd, Rn, Rm ADD/SUB Rd, Rn, Rm
nn ADD Rd, PC, #immed8*4 (i.e. 0,4,8, . ,1020) ADD Rd, PC, #immed8*4 (i.e. 0,4,8, . ,1020)
nn ADD Rd, SP, #immed8*4 ADD Rd, SP, #immed8*4
nn ADD/ SUB SP, #immed7*4 (i.e. 0,4,8, .., 508) ADD/ SUB SP, #immed7*4 (i.e. 0,4,8, .., 508)
nn CMN/CMP/TST Rn, Rm CMN/CMP/TST Rn, Rm
nn CMP Rn, #immed8 CMP Rn, #immed8
nn MOV Rn, Rd MOV Rn, Rd
Barrel Shift Instructions: Barrel Shift Instructions:
nn LSL/LSR/ASR Rd, Rm, #immed5 LSL/LSR/ASR Rd, Rm, #immed5
nn ASR/LSL/LSR/ROR Rd, Rs ASR/LSL/LSR/ROR Rd, Rs
nn Single Register Load/Store Instructions Single Register Load/Store Instructions
nn LDR/STR {B|H} Rd, [Rn, #immed5] LDR/STR {B|H} Rd, [Rn, #immed5]
nn LDR { H | SB | SH } Rd, [Rn, Rm] LDR { H | SB | SH } Rd, [Rn, Rm]
nn STR {B | H} Rd, [Rn, Rm] STR {B | H} Rd, [Rn, Rm]
nn LDR Rd, [PC, #immed8*4] LDR Rd, [PC, #immed8*4]
nn LDR / STR Rd, [SP, #immed8*4] LDR / STR Rd, [SP, #immed8*4]
Thumb Instructions Thumb Instructions
43
nn Multiple Register Load/Store Multiple Register Load/Store
nn LDM / STM {IA } Rn!, {low register list} LDM / STM {IA } Rn!, {low register list}
nn Stack Instructions: Stack Instructions:
nn POP {low register_list, PC } POP {low register_list, PC }
nn PUSH {low register_list, LR } PUSH {low register_list, LR }
nn There is no SP in instruction but SP is automatically There is no SP in instruction but SP is automatically
updated updated
nn Stack is always full descending Stack is always full descending
nn Software Interrupt: Software Interrupt:
SWI SWI Number(8 Number(8--bit) bit)
nn Switches to ARM state and takes similar actions as ARM Switches to ARM state and takes similar actions as ARM
equivalent SWI equivalent SWI
nn Unlike ARM it cant be executed conditionally Unlike ARM it cant be executed conditionally
Thumb Instructions Thumb Instructions
44
ARM ARM Programs Programs
[1] Bit [1] Bit--Field Manipulation: Field Manipulation:
nn Packing/Unpacking of bit fields (variable size) e.g. variable length Packing/Unpacking of bit fields (variable size) e.g. variable length
code code
nn Used to create compressed file that packs item at bit granularity Used to create compressed file that packs item at bit granularity
Ex: Ex:-- Bit Field Pack/Unpack: Bit Field Pack/Unpack:
R0 contains code to be written to R1. Let R0 contains code to be written to R1. Let Rm Rmcontains value of no. contains value of no.
free bits available in R1 and codelen is length of code free bits available in R1 and codelen is length of code
Algorithm for Variable Length Code Packing: Algorithm for Variable Length Code Packing:
nn Pack variable length code to create bytestream Pack variable length code to create bytestream
nn Initially codes are packed in 32 Initially codes are packed in 32--bit buffer (reg. R1) from MSB to LSB. bit buffer (reg. R1) from MSB to LSB.
Once buffer is full, it can be stored to memory Once buffer is full, it can be stored to memory
nn Sometimes code needs to be split into two parts. We make buffer full Sometimes code needs to be split into two parts. We make buffer full
with 1 with 1
st st
part, store the buffer in memory and write 2 part, store the buffer in memory and write 2
nd nd
part in empty part in empty
buffer buffer
45
nn Three functions in packing: (1) Align byte Three functions in packing: (1) Align byte--stream pointer (2)insert stream pointer (2)insert
codes to codes to bitbuff bitbuff and store and store bitbuff bitbuff in mem in mem.( .(3) finishing of byte stream 3) finishing of byte stream
nn Byte stream pointer may not be word aligned at the end of write. Next Byte stream pointer may not be word aligned at the end of write. Next
write must begin with word aligned address write must begin with word aligned address
nn ARM CODE: ARM CODE:
bytestream R0 ; current byte add. in output stream bytestream R0 ; current byte add. in output stream
code code R4 ; current code R4 ; current code
46
codelen R5 ; length of current code codelen R5 ; length of current code
bitbuff R6 bitbuff R6 ; 32 ; 32--bit bit big endian big endian buffer buffer
bitsfree R7 bitsfree R7 ; no. of bits free in bitbuff ; no. of bits free in bitbuff
temp R8 ; used bits of bitbuff temp R8 ; used bits of bitbuff
write_start write_start ; ; 11
st st
routine routine (to word align bytestream) (to word align bytestream)
MOV bitbuffer, #0 MOV bitbuffer, #0
MOV bitsfree, # 32 MOV bitsfree, # 32
align_loop: align_loop:
TST bytestream, #3 TST bytestream, #3 ; is bytestream is aligned ? ; is bytestream is aligned ?
LDRNEB code, [bytestream, # LDRNEB code, [bytestream, # --1]! ; if not, get byte 1]! ; if not, get byte
SUBNE bitsfree, bitsfree, # 8 SUBNE bitsfree, bitsfree, # 8 ; update bitsfree ; update bitsfree
ORRNE bitbuff, code, bitbuff, ROR # 8 ; copy byte in bitbuff ORRNE bitbuff, code, bitbuff, ROR # 8 ; copy byte in bitbuff
BNE align_loop ; loop until bytestream is aligned BNE align_loop ; loop until bytestream is aligned
MOV bitbuff, bitbuff, ROR #8 ; adjust bitbuff MOV bitbuff, bitbuff, ROR #8 ; adjust bitbuff
MOV PC, LR ; return MOV PC, LR ; return
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
47
write_code write_code ; ; 22
nd nd
routine ( to write codes in buffer & store buffer if routine ( to write codes in buffer & store buffer if
;; it gets full ) it gets full )
SUBS bitsfree, bitsfree, codelen ; is bitsfree > code length? SUBS bitsfree, bitsfree, codelen ; is bitsfree > code length?
BLE buff_full ; if not branch to buff_full BLE buff_full ; if not branch to buff_full
ORR bitbuff, bitbuff, code, LSL bitsfree ; otherwise write code ORR bitbuff, bitbuff, code, LSL bitsfree ; otherwise write code
MOV PC, LR MOV PC, LR ; return ; return
buff_full: buff_full:
RSB bitsfree, bitsfree, # 0 ; make bitsfree positive RSB bitsfree, bitsfree, # 0 ; make bitsfree positive
ORR bitbuff, bitbuff, code, LSR bitsfree ; write 1 ORR bitbuff, bitbuff, code, LSR bitsfree ; write 1
st st
part of split code part of split code
STR bitbuff, [bytestream], #4 ; store bitbuff in memory STR bitbuff, [bytestream], #4 ; store bitbuff in memory
RSB bitsfree, bitsfree, #32 RSB bitsfree, bitsfree, #32 ; update bitsfree ; update bitsfree
MOV bitbuff, code, LSL bitsfree ; write 2 MOV bitbuff, code, LSL bitsfree ; write 2
nd nd
part of split code part of split code
MOV PC, LR MOV PC, LR ; return ; return
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
48
write_finish write_finish ; ; 33
RD RD
routine routine (to finish packing) (to finish packing)
RSBS temp, bitsfree, #32 ; temp = no. of used bits in bitbuff RSBS temp, bitsfree, #32 ; temp = no. of used bits in bitbuff
finish_loop: finish_loop:
STRGTB bitbuff, [bytestream], # 1 ;start storing bytes of bitbuff in STRGTB bitbuff, [bytestream], # 1 ;start storing bytes of bitbuff in
MOVGT bitbuff, bitbuff, LSL # 8 ; memory from MSB MOVGT bitbuff, bitbuff, LSL # 8 ; memory from MSB
SUBGTS temp, temp, #8 ; update temp SUBGTS temp, temp, #8 ; update temp
BGT finish_loop BGT finish_loop ; loop till temp >0 ; loop till temp >0
MOV PC, LR MOV PC, LR ; return ; return
Note: Note: Above code assumes big endian data transfer Above code assumes big endian data transfer
[2] SIMD processing: [2] SIMD processing:
nn Let us consider graphics example of processing multiple 8 Let us consider graphics example of processing multiple 8--bit pixels of an bit pixels of an
image image
nn Problem : merge two images X and Y to produce new image Z by scaling X Problem : merge two images X and Y to produce new image Z by scaling X
with a/256 and Y with 1 with a/256 and Y with 1-- (a/256) where 0<a<256. (a/256) where 0<a<256.
nn let x let x
nn
and y and y
nn
and z and z
nn
denotes nth 8 denotes nth 8--bit pixel of X, Y and Z bit pixel of X, Y and Z
nn zn =( a/256 x zn =( a/256 x
nn
+{1 +{1-- a/256)}y a/256)}y
nn
nn Zn =w Zn =w
nn
/256 where w /256 where w
nn
=a(x =a(x
nn
yy
nn
) +256 yn ) +256 yn
nn We load four pixels at once in 32 We load four pixels at once in 32--bit ARM register xx =[x3,x2,x1,x0] bit ARM register xx =[x3,x2,x1,x0]
nn We need two expanded pixels in ARM register x =[0,x2,0,x0] We need two expanded pixels in ARM register x =[0,x2,0,x0]
ARM Programs ARM Programs
49
IMG_W equ 176 IMG_W equ 176
IMG_H equ 144 IMG_H equ 144
pz pz R0 ; pointer to destination image R0 ; pointer to destination image
px px R1 ; pointer to first image X R1 ; pointer to first image X
py py R2 ; pointer to second image Y R2 ; pointer to second image Y
aaR3 ; 8 R3 ; 8--bit scaling factor bit scaling factor
xx xx R4 ; holds four pixels of X R4 ; holds four pixels of X
yy yy R5 ; holds four pixels of Y R5 ; holds four pixels of Y
x x R6 ; holds two expanded pixels of X i.e. [0, x2, 0, x0] R6 ; holds two expanded pixels of X i.e. [0, x2, 0, x0]
yy R7 ; holds two expanded pixels of Y i.e. [0, y2, 0, y0] R7 ; holds two expanded pixels of Y i.e. [0, y2, 0, y0]
zz R8 ; holds four pixels of Z R8 ; holds four pixels of Z
cnt cnt R9 ; number of remaining pixels R9 ; number of remaining pixels
STMFD sp!, {R4 STMFD sp!, {R4--R8, LR } R8, LR }
MOV cnt, #IMG_W * IMG_H MOV cnt, #IMG_W * IMG_H
LDR mask, =0x00FF00FF LDR mask, =0x00FF00FF
loop: loop:
LDR xx, [px], #4 LDR xx, [px], #4
LDR yy, [py], #4 LDR yy, [py], #4
50
AND x, mask, xx AND x, mask, xx
AND y, mask, yy AND y, mask, yy
SUB x, x, y SUB x, x, y
MUL x, a, x MUL x, a, x
ADD x, x, y, LSL #8 ADD x, x, y, LSL #8
AND z, mask, x, LSR#8 AND z, mask, x, LSR#8
AND x, mask, xx, LSR #8 AND x, mask, xx, LSR #8
AND y, mask, yy, LSR #8 AND y, mask, yy, LSR #8
SUB x, x, y SUB x, x, y
MUL x, a, x MUL x, a, x
ADD x, x, y, LSL #8 ADD x, x, y, LSL #8
AND x, mask, x, LSR #8 AND x, mask, x, LSR #8
ORR z, z, x, LSL #8 ORR z, z, x, LSL #8
STR z, [pz], #4 STR z, [pz], #4
SUBS cnt, cnt, #4 SUBS cnt, cnt, #4
BGT loop BGT loop
LDMFD sp!, {r4 LDMFD sp!, {r4--r8, PC} r8, PC}
51
52
ARM7 TDMI block diagram ARM7 TDMI block diagram
53
External Interface through AMBA Bus External Interface through AMBA Bus
54
AMBA
Interface
Inst. & data cache
MMU
ARM Core
CP15 EmbeddedICE & JTAG
Write
Buffer
AMBA
Address
AMBA
Data
Virtual
Address
Physical
Address
Inst. & data
nn JTAG TAP controller: JTAG TAP controller:
nn Basically used to test PCB assembly, interconnect or even sub block Basically used to test PCB assembly, interconnect or even sub block
inside IC without any physical prob. inside IC without any physical prob.
nn J TAG scan chain =>embedded solution to testing an IC for certain J TAG scan chain =>embedded solution to testing an IC for certain
static faults (shorts, opens, and logic errors). static faults (shorts, opens, and logic errors).
nn ICs supporting J TAG will have the four additional pins : ICs supporting J TAG will have the four additional pins : Test Clock Test Clock
((TCK TCK), ), Test Mode Select Test Mode Select ((TMS TMS), ), Test Data Input Test Data Input ((TDI TDI), and ), and Test Data Test Data
Output Output ((TDO TDO). ).
nn Embedded ICE (In Circuit Emulator): Embedded ICE (In Circuit Emulator):
nn Used to debug software of embedded system through Used to debug software of embedded system through breakpoints breakpoints and and
watch watch--points points
nn Breakpoint is an address at which program execution halts Breakpoint is an address at which program execution halts
nn Watch point is a Watch point is a value value that may combine address, data or control that may combine address, data or control
signals. When match occurs, debug event is generated that halts signals. When match occurs, debug event is generated that halts
processor execution processor execution
nn uses J TAG as the transport mechanism to access on uses J TAG as the transport mechanism to access on--chip debug chip debug
modules inside the target CPU modules inside the target CPU
55
DATA BUS DATA BUS
nn Uni & Bidirectional Data Bus: Uni & Bidirectional Data Bus:
nn When When BUSEN BUSEN is HIGH, all instruction and input data are presented to is HIGH, all instruction and input data are presented to
DIN[31:0] whereas output data appears on DOUT[31:0] DIN[31:0] whereas output data appears on DOUT[31:0]
nn When BUSEN is LOW, only bidirectional D[31:0] is used When BUSEN is LOW, only bidirectional D[31:0] is used
nn Unidirectional data bus is used for coprocessor/external IC connection Unidirectional data bus is used for coprocessor/external IC connection
56

Вам также может понравиться