Академический Документы
Профессиональный Документы
Культура Документы
ARM Acorn RISC Machine(19831985), Acorn Computers Limited, Cambridge, England ARM Advanced RISC Machine (1990), ARM Limited Now the company named ARM Holdings ARM has been licensed to many semiconductor manufacturers Some company licensing with ARM are: Altera, Intel, IBM, Microsoft, Epson, NEC, Nokia, Motorola, Panasonic, etc.
Advanced RISC Machines ARM Core uses a RISC Architecture ARM is Physical hardware design company. ARM licenses its cores out and other companies make processors based on its cores RISC Design philosophy Limited, simple but powerful instructions that execute within a single cycle at a high clock speed A complex instruction is obtained as a sequence of simple instructions. In RISC processor software (compiler) is complex but the processor architecture (hardware) is simple. Ex : ARM, ATMEL , AVR, MIPS, Power PC etc
Page 1
Instructions Reduced Number of Instructions Execute in a single cycle The compiler synthesizes complicated operations Each instruction is a fixed length Pipelined instruction execution The processing of instructions is broken down into smaller units that can be executed in parallel by pipelines Pipeline advances by one step on each cycle for maximum throughput
Page 2
Page 3
Page 4
Data bus To carry data (instruction or data item) Von Neumann architecture Instruction decoder translates instruction No data processing instruction to manipulate data in memory, uses load-store architecture to read-write data between memory and register file Register file, r0 to r15 is a storage bank of 32-bit registers Sign extend hardware converts signed 8-bit and 16-bit numbers to 32-bit values when they read from memory to register file ARM instructions have typically two source registers, Rn and Rm and a single result or destination register Rd Source operands are read from register file using internal buses A and B respectively ALU or MAC (Multiply-ACcumulate unit) takes Rn and Rm values from A and B buses and computes a result Data processing instruction write the result Rd directly to the register file through result bus Load and Store instructions uses ALU to generate an address and held it in address register and send it to address bus to access external memory Incrementer updates address register to point next sequential memory for load-store operation Register Rm can be alternatively preprocessed in the barrel shifter before it enters the ALU
Page 5
r13-r15 are for special function r13 : Stack pointer r14 : Link register- to put return address whenever it calls a subroutine r15 : Program counter r13 and r14 can also be used as general purpose register when processor is running with operating system
Page 6
Four fields (8 bit wide): Flags Status (reserved for future use) Extension (reserved for future use) Control Control eld contains the processor mode, state, and interrupt mask bits Flags eld contains the condition ags. Processor modes Determines which registers are active and the access rights to the cpsr register itself Privileged mode allow full read-write access to cpsr Non previleged mode only allows read access to the control eld in the cpsr but still allows read-write access to the condition ags Privileged mode Abort : Failed to attempt access memory Fast interrupt request Interrupt request Supervisor : Processor is in after reset and OS kernel operates in System: Special mode like user mode (non privileged) allows full read-write access to cpsr Undened :This mode is used when the processor encounters an instruction that is undened or not supported by the implementation Non Privileged Mode User mode : Used for programs and applications.
Page 7
Every processor mode except user mode can change mode by writing directly to the mode bits of cpsr All processor mode except system mode have a set of banked registers that are subset of main 16 registers An eg. Processor in interrupt request mode, instructions execute still access registers r13 and r14, however these registers are banked registers r13_irq and r14_irq Processor mode can be changed by a program that writes directly to cpsr or by hardware when the core responds to an exception or interrupt
Page 8
Happens when an interrupt request due to external device rising an interrupt to the processor core User register r13 and r14 to banked registers r13_irq (contains stack address) and r14_irq (return address) respectively New register spsr_irq appears and stores previous modes cpsr, during return from cpsr restore from spsr_irq o Saving of cpsr to spsr only occurs when an exception or interrupt is raised
Page 9
Default mode (when power is applied to core) is supervisor mode, which is privileged mode and useful to initialization code, have full access to cpsr
States and instruction sets State of the core determines which instruction set is being executed Three instruction sets: 1. ARM 2. Thumb 3. Jazelle ( The Jazelle instruction set is a closed instruction set and is not openly available. Jazelle executes 8-bit instructions and is a hybrid mix of software and hardware designed to speed up the execution of Java bytecodes)
Page 10
Interrupt masks To stop specific interrupt requests IRQ and FIQ The I bit masks IRQ when set to binary 1, and F bit masks FIQ when set to binary 1 Condition flags Update according to the result of ALU operation
Notation for cpsr bits: small letter for 0 and capital letter for 1
Page 11
Pipeline It is a mechanism in RISC processor used to execute instructions Using pipeline speeds up execution by fetching the next instruction while other instructions are being decoded and executed Different ARM core have different stage pipeline ARM7 core has three stage pipeline:
Fetch loads an instruction from memory Decode identifies the instruction to be executed Execute processes the instruction and writes the result back to a register
Page 12
The process is called filling the pipeline Allows the core to execute an instruction every cycle
Higher operating frequency results higher performance Latency increases Increase in instruction throughput by around 13% in 5 stage pipeline Fetch The instruction is fetched from memory and placed in the instruction pipeline Decode The instruction is decoded and register operands read from the register file Execute An operand is shifted and the ALU result generated Memory (Buffer/Data) Data memory is accessed if required. Otherwise the ALU result is buffered for one clock cycle to give the same pipeline flow for all instructions
Page 13
Increase in instruction throughput by around 34% in 6 stage pipeline 1.3 Dhrystone MIPS per MHz Code written for the ARM7 will execute on ARM9 and ARM10
Where MSR is the instruction to mask interrupts An instruction in the execute stage will complete even though an interrupt has been raised The execution of a branch instruction or branching by the direct modification of the PC causes the ARM core to flush its pipeline Exceptions, Interrupts, and the Vector Table When an exception or interrupt occurs, the processor set the PC to a specific memory address The address is within a special address range called the vector table The entries in the vector table are instructions that branch to specific routines designed to handle a particular exception or interrupt When an exception or interrupt occurs, the processor suspends normal execution and starts loading instructions from the exception vector table
Page 14
Where higher address uses OS like Linux, Microsofts embedded products etc. Exception priority
Page 15
When an exception causes a mode change the core automatically Saves the cpsr to the spsr of the exception mode Saves the pc to the lr of the exception mode Set the cpsr to the exception mode Sets pc to the address of the exception handler Core Extensions Standard components placed next to the ARM core To the improve performance, manage resources, provide extra functionality etc. Three hardware extensions 1. Caches and Tightly Coupled Memory (TCM) 2. Memory Management 3. Coprocessors interface Cache and TCM Block of fast memory placed between main memory and the core Allows more efficient fetches from memory Usually single level cache A simplified Von-Neumann style architecture with cache is:
Page 16
Memory management Embedded systems have multiple memory devices Appropriate memory access is provided by memory management Three types: 1. Non protected memory Fixed and less flexible Useful for small system 2. Memory Protection Unit (MPU) Limited number of memory region and controlled by a set of special co-processor registers Suitable for medium range system 3. Memory Management Unit (MMU) Comprehensive type Uses a set of translation tables to provide ne-grained control over memory. These tables are stored in main memory and provide a virtual-to-physical address map as well as access permissions. MMUs are designed for more sophisticated platform operating systems that support multitasking Co-processors Coprocessors can be attached to the ARM processor A separate chip, that performs lot of calculations for the microprocessor, relieving the CPU some of its work and thus enhancing overall speed of system. A secondary processor used to speed up operation by taking over a specific part of main processors work.
Department of ECE, VKCET Page 17
Page 18
Page 19
Data Processing Instructions The data processing instructions manipulate data within registers. They are move instructions, arithmetic instructions, logical instructions, comparison instructions, and multiply instructions. Most data processing instructions can process one of their operands using the barrel shifter. If you use the S sufx on a data processing instruction, then it updates the ags in the cpsr. Move and logical operations update the carry ag C, negative ag N, and zero ag Z. 1. Move Instructions Simplest ARM instruction. It copies N into a destination register Rd, where N is a register or immediate value. This instruction is useful for setting initial values and transferring data between registers.
Page 20
Eg. 2 r0 = 0x0000 0000 r1 = 0x0000 FFFF MOVN r0,r1 ; r0 = ~r1 POST r0 = 0xFFFF 0000 r1 = 0x0000 FFFF Barrel Shifter A unique and powerful feature of the ARM processor is the ability to shift the 32bit binary pattern in one of the source registers left or right by a specic number of positions before it enters the ALU. This shift increases the power and exibility of many data processing operations. There are data processing instructions that do not use the barrel shift, for example, the MUL (multiply), CLZ (count leading zeros), and QADD (signed saturated 32bit add) instructions. PRE
Eg. 1
Page 21
LSL:
LSR:
ASR:
ROR:
Page 22
Eg. 2 (University question) PRE r0 = 0x0000 0000 r1 = 0x0000 0001 r2 = 0x0000 000A MOV r1,r0, ROR r2 Find code segment register content after the execution of given instruction? Soln: MOV r1, r0, ROR r2 results r1 = rotate (unsigned r0 >> r2) POST r0 = 0x0000 0000 r1 = 0x0000 0000 r2 = 0x0000 000A
Page 23
(Left shift by status flag cause change in C flag, Z flag. Here shift left by 1 cause multiply by two to r1, the r0 = r1 x 2. But r1 x 2 = 0x0001 0000 0008, the C flag will set to 1 and r0 = 0x0000 0008) Eg. 4 PRE r0 = 0x8000 0001 r1 = 0x0000 0001 MOV r1,r0,ASR #1 POST r0 = 0xc000 0000 r1 = 0x0000 0001 (ASR is arithmetic shift right, (signed)r0 >> 1 results 0b1100 0000 0000 0000 0000 0000 0000 0000 = 0xc000 0000) Eg. 5 PRE r0 =0x0000 0000 r2 = 0x8000 0001 cpsr = nzcvqiFt_USR MOV r0,r2,RRX POST cpsr = nzCvqiFt_USR r0 = 0x4000 0000 r2 = 0x8000 0001
Page 24
Eg. 1
Eg. 2
Eg. 3
PRE
r0 = 0x0000 0000 r1 = 0xFFFF FFFF cpsr = nzCvqiFt_USR ADCS r0,r1,#1 POST r0 = 0x0000 0001 r1 = 0xFFFF FFFF cpsr = nzCvqiFt_USR
; r0 = r1 + 1+ Carry
Page 25
; r1 = r1 - 1
Eg. 5 (University question) PRE r0 = 0x0000 0000 r2 = 0x0000 000A RSB r0,r2,r2,LSL #3 Soln: Instruction performs r0 = (r2 x 8) r2 = 0x0000 0050 0x0000 000A = 0x0000 0046 Then POST r0 = 0x0000 0046 r2 = 0x0000 000A Eg. 6 PRE r0 = 0x0000 0000 r1 = 0x0000 0004 ADD r0,r1,r1,LSL,#2 POST r0 = 0x0000 0014 r1 = 0x0000 0004 PRE
; r0 = r1 + (r1x4)
Eg. 7
r0 = 0x0000 0000 r1 = 0xFFFF FFF6 ;r1 = -10 ADD r0,r1,r1,ASR #1 ; r0 = r1 + (r1/2) POST r0 = 0xFFFF FFF1 r1 = 0xFFFF FFF6
(r1,ASR #1 results 0xFFFF FFFB = -5 and 0xFFFF FFF6 + 0xFFFF FFFB = 0xFFFF FFF1 = -15)
Page 26
Eg. 1 PRE r0 = 0x0000 0000 r1 = 0xF1F1 1111 r2 = 0xF0F0 AAAA AND r0,r1,r2 ; r0 = r1 & r2 POST r0 = 0xF0F0 1111 Eg. 2 PRE r0 = 0x0000 0000 r1 = 0x1234 0000 r2 = 0x0000 5678 ORR r0,r1,r2 POST r0 = 0x1234 5678 Eg. 3 r0 =0xFFFF FFFF r1 = 0xFFFF FFFF cpsr = nzcvqiFt_USR EORS r0,r1,r0 ;r0 = r1 ^ r0 POST r0 =0x0000 0000 cpsr = nZcvqiFt_USR Eg. 4 (University question) PRE r0 = 0x0000 0000 r1 = 0x0000 0001 r2 = 0x0000 000A BIC r0,r1,r2 ; r0 = r1 & ~r2 POST r0 = 0x0000 0001 PRE
; r0 = r1 | r2
Page 27
Eg. 1
r0 = 0x0000 0000 r3 = 0x0000 000F PRE cpsr = nzcvqiFt_USR ; Assumption TST r0,r3 POST cpsr = nzcvqiFt_USR
Eg 3: (University question) Obtain the status of cpsr after executing TEQ r0,r1. Assume r0 = 0x0000 000A, r1 = 0xFF00 0000 and processor mode is USER Soln: PRE cpsr = nzcvqIFt_USR ;Assumption r0 = 0x0000 000A r1 = 0xFF00 0000 TEQ r0,r1 POST cpsr nzcvqIFt_USR
Page 28
5. Multiply Instructions The multiply instructions multiply the contents of a pair of registers and, depending upon the instruction, accumulate the results in with another register. The long multiplies accumulate onto a pair of registers representing a 64-bit value. The nal result is placed in a destination register or a pair of registers.
Eg. 1
Page 29
Eg. 3 (University question): PRE r0 = 0x0000 0000 r1 = 0x0000 0001 r2 = 0x0000 000A r3 = 0x0000 000F UMLAL r0,r1,r2,r3 Soln: Instruction performs [r0,r1] = [r0,r1] + (r2 x r3) r2 x r3 = 0x0000 0000 0000 0096 [r0,r1] = 0x0000 0000 0000 0001 [r0, r1] + (r2 x r3) = 0x0000 0000 0000 0097 POST r0 = 0x0000 0000 r1 = 0x0000 0097 2. Branch Instructions A branch instruction changes the ow of execution or is used to call a routine. This type of instruction allows programs to have subroutines, if-then-else structures, and loops. The change of execution ow forces the program counter to point to a new address The ARMv5E instruction set includes four different branch instructions
Page 30
The address label is stored in the instruction as a signed pc-relative offset and must be within approximately 32 MB of the branch instruction. T refers to the Thumb bit in the cpsr. When instructions set T, the ARM switches to Thumb state.
Eg. 1
Forward label skip the instructions followed by B, and backward label provide infinite loop.
Page 31
3. Load-Store Instructions Load-store instructions transfer data between memory and processor registers. There are three types of load-store instructions: 1. Single-register transfer 2. Multiple-register transfer 3. Swap Single-Register Transfer These instructions are used for moving a single data item in and out of a register. The data types supported are signed and unsigned words (32-bit), half words (16bit), and bytes. Various load-store single-register transfer instructions:
Page 32
Single-Register Load-Store Addressing Modes The ARM instruction set provides different modes for addressing memory. These modes incorporate one of the indexing methods: 1. Pre-index with write back 2. Pre-index 3. Post-index
Page 33
Eg.
Page 34
Multiple-Register Transfer Load-store multiple instructions can transfer multiple registers between memory and the processor in a single instruction. The transfer occurs from a base address register Rn pointing into memory. Multiple-register transfer instructions are more efcient from single-register transfers for moving blocks of data around memory and saving and restoring context and stacks.
Page 35
Page 36
Swap Instruction The swap instruction is a special case of a load-store instruction. It swaps the contents of memory with the contents of a register.
Page 37
4. Exception generating instructions One of them is software interrupt instruction (SWI) causes a software interrupt exception, which provides a mechanism for applications to call operating system routines
When the processor executes an SWI instruction, it sets the program counter to the offset 0x8 in the vector table. The instruction also forces the processor mode to SVC, which allows an operating system routine to be called in a privileged mode. Each SWI instruction has an associated SWI number, which is used to represent a particular function call or feature. Eg.
Page 38
This example is in SVC mode. In user mode you can read all cpsr bits, but you can only update the condition ag eld f
Page 39
The cp eld represents the coprocessor number between p0 and p15. The opcode elds describe the operation to take place on the coprocessor. The Cn, Cm, and Cd elds describe registers within the coprocessor.
Eg. Example shows a CP15 register being copied into a general-purpose register.
7. Extension instructions: The ARMv5E extensions provide many new instructions. One of the most important additions is the signed multiply accumulate instructions that operate on 16-bit data. New instructions are:
Page 40
Saturated Arithmetic Normal ARM arithmetic instructions wrap around when overow an integer value. For example, 0x7fffffff+1= -0x80000000. Thus, when you design an algorithm, you have to be careful not to exceed the maximum representable value in a 32-bit integer. Eg.
In the example, registers r1 and r2 contain positive numbers. Register r2 is equal to 0x7fffffff, which is the maximum positive value you can store in 32 bits. In a perfect world adding these numbers together would result in a large positive number. Instead the value becomes negative and the overow ag, V, is set. Using theARMv5E instructions you can saturate the resultonce the highest number is exceeded the results remain at the maximum value of 0x7fffffff. This avoids the requirement for any additional code to check for possible overows.
Page 41
Eg.
Page 42
Page 43