Вы находитесь на странице: 1из 72

UNIT - 1

ARM Cortex-M3 Processor

The ARM Cortex-M3 processor offers superior efficiency and flexibility and is specifically developed
for response and power sensitive applications. The EFM 32-bit MCUs use the Cortex-M3's low power
and high performance abilities in combination with Silicon Labs' unique low power peripherals to
create a superior low power embedded systems platform.

Low Power

 32-bit Cortex-M3 designed for low power operation

 High power efficiency with Thumb®-2 instruction set

 Small core footprint with integrated power mode support

High Performance

 Cortex-M3 delivering 1.25 DMIPS/MHz

 Separate data and instruction bus

 High code density and performance with Thumb-2 instruction set

 Excellent clock per instruction ratio

 Nested Vectored Interrupt Controller (NVIC) for outstanding interrupt handling

 Superior math capability

Thumb-2 Instruction Set Architecture (ISA)

Cortex-M3 supports 16- and 32-bit instructions available in the Thumb-2 instruction set. Both can be
mixed without extra complexity and without reducing the Cortex-M3 performance. Hardware divide
instructions and a number of multiply instructions give EFM32 users high data-crunching
throughput.

3-stage Pipeline Core Based on Harvard Architecture

The ARM Cortex-M3 3-stage pipeline includes instruction fetch, instruction decode and instruction
execution. Cortex-M3 also has separate buses for instructions and data. The Harvard architecture
reduces bottlenecks common to shared data- and instruction buses. Quickly Servicing Critical Tasks
and Interrupts from the low energy modes, EFM32's Cortex-M3 is active within 2 µs and delivers
1.25 DMIPS/MHz on the Dhrystone 2.1 Benchmark. The NVIC is an integral part of the Cortex-M3
processor and ensures outstanding interrupt handling abilities. It is possible configure up to 240
physical interrupts with 1-256 levels of priority, and Non-Maskable Interrupts further increase
interrupt handling. For embedded systems this enhanced determinism makes it possible to handle
critical tasks in a known number of cycles.
Reducing the 32-bit Footprint

The Cortex-M3 has a small footprint which reduces system cost. High 32-bit performance reduces an
application's active periods, the periods where the CPU is handling data. Reducing the active periods
increases the application's battery lifetime significantly, and the EFM32 can spend most of the time
in the efficient low energy modes.

Salient Features of the Cortex-M3

 32-bit microprocessor.

 32-bit data path, 32-bit register bank and 32-bit memory interfaces.

 Harvard Architecture – separate instruction bus and data bus.

 3-stage pipeline with branch speculation.

 Thumb-2 instruction set.

 No switching between ARM state and thumb state.

 Instruction fetches are 32 bits. Up to two instructions can be fetched in one cycle. As
a result, there’s more available bandwidth for data transfer.

 ALU with hardware divide and single cycle multiply.

 Configurable Nested Vector Interrupt Controller (NVIC).

 Maximum of 240 external interrupts can be configured.

 Low gate count, suitable for low power designs.

 Memory Protection Unit (MPU).

 Operation Mode Selection – user and privilege modes.

 Advanced debug components.

Applications:

i) Low-cost microcontrollers:

The cortex M3 processor is ideally suited for low-cost micro controllers, which are commonly used in
consumer products. Low power, high performance, ease-of-use are the advantages.

ii) Automotive:

The cortex M3 has high performance efficiency and low interrupt latency, allowing to be used in real
time systems.
iii) Data communication:

The processor’s low power and high efficiency, coupled with Thumb-2 instructions, make cortex M3
ideal for many communication applications. (Bluetooth, Zigbee)

iv) Industrial control:

In industrial control applications simplicity, fast response and reliability are key factors. Cortex M3
has low interrupt latency so is best suited.

v) Consumer products:

The cortex M3 is a small processor and is highly efficient and low in power and supports an MPU
enabling complex software to execute while providing robust memory protection.

Advantages:

 Many instructions are single cycle.

 Simultaneous data and instruction access can be performed.

 Faster time to market and easier code maintenance.

 Higher code density and reduced memory requirements.

 Instruction fetches are 32bit. More bandwidth is available for data transfer.

 It has advanced interrupt handling features.

ARM Coretex-M3 Architecture

The cortex-M3 ARM processor is a high performance 32-bit processor, which offers the significant
benefits to the developers. The ARM architecture is a ‘Harward architecture’ which offers separate
data and instruction buses for communicating with the ROM and RAM memories. It consist a 3-
stage pipeline to fetch, decode and execute the instructions sequentially. The Cortex processor is a
cost sensitive device which is used to reduce the processor area and has extensive improving
interrupt handling and system debug capabilities.
Architecture of ARM Cortex-M3

The cortex-M3 arm processors are implemented by THUMB instruction set based on THUMB-2
technology, therefore, ensures high code density and reduce the program memory requirement. The
cortex-m3 instruction set provides the excellent performance due to modern 32-bit
architecture. The ARM processor core-m3 is closely integrated to Nested Vector Interrupt Controller
(NVIC) to provide the good interrupt performance.

Additional Features of the Cortex-M3 Processor

It is a RISC Controller

 32-bit high performance CPU

 3-stage pipeline and compact one


It has THUMB-2 technology

 Optimal merges of 16/32 bit instructions

 High performance

It supports tools and RTOS and It has core Sight debug and trace

 JTAG or 2-pin serial wire debug connection

 Support for multiple processors

Low power Modes

 It supports sleep modes

 Control the software

 Multiple power domains

Nested vectored interrupt controller (NVIC)

 Low latency, low jitter interrupts response

 No need for assembly programming

Registers

1. The Cortex-M3 has registers R0 through R15. R13 (the stack pointer) is banked, with only
one copy of the R13 visible at a time.

 R0 – R12: General Purpose Registers. These are 32 bit registers for data operations.
Some 16-bit Thumb instructions can only access a subset of these registers (low
registers R0-R7).

 R13: Stack Pointers. Contains two stack pointers. They are banked so that only one
is visible at a time.

 Main Stack Pointer (MSP) – The main stack pointer used by the Operating
system and exception handlers.

 Process Stack Pointer (PSP) – used by the application code.

 R14: Link Register. When a subroutine is called, the return address is stored in the
link register.

 R15: The Program Counter. The program counter is the current program address.

2. The Cortex-M3 also has a number of special registers. They are -

 Program Status registers (PSR)

 Interrupt Mask registers (PRIMASK, FAULTMASK and BASEPRI).


 Control register (CONTROL)

3. The Cortex-M3 has 18 registers in total compared to 37 registers for traditional ARM.

ARM Microcontroller Register Modes

An ARM microcontroller is a load store reducing instruction set computer architecture means the
core cannot directly operate with the memory. The data operations must be done by the registers
and the information is stored in the memory by an address. The ARM cortex-M3 consists of 37
register sets wherein 31 are general purpose registers and 6 are status registers. The ARM uses
seven processing modes to run the user task.

 USER Mode

 FIQ Mode

 IRQ Mode

 SVC Mode

 UNDEFINED Mode

 ABORT Mode

 Monitor Mode

Register Modes

USER Mode: The user mode is a normal mode, which has the least number of registers. It doesn’t
have SPSR and has limited access to the CPSR.

FIQ and IRQ: The FIQ and IRQ are the two interrupt caused modes of the CPU. The FIQ is processing
interrupt and IRQ is standard interrupt. The FIQ mode has additional five banked registers to provide
more flexibility and high performance when critical interrupts are handled.
SVC Mode: The Supervisor mode is the software interrupt mode of the processor to start up or
reset.

Undefined Mode: The Undefined mode traps when illegal instructions are executed. The ARM core
consists of 32-bit data bus and faster data flow.

THUMB Mode: In THUMB mode 32-bit data is divided into 16-bits and increases the processing
speed.

THUMB-2 Mode: In THUMB-2 mode the instructions can be either 16-bit or 32-bit and it increases
the performance of the ARM cortex –M3 microcontroller. The ARM cortex-m3 microcontroller uses
only THUMB-2 instructions.

Some of the registers are reserved in each mode for the specific use of the core. The reserved
registers are

 Stack Pointer (SP).

 Link Register (LR).

 Program Counter (PC).

 Current Program Status Register (CPSR).

 Saved Program Status Register (SPSR).

The reserved registers are used for specific functions. The SPSR and CPSR contain the status control
bits which are used to store the temporary data. The SPSR and CPSR register have some properties
that are defined operating modes, Interrupt enable or disable flags and ALU status flag. The ARM
core operates in two states 32-bit state or THUMBS state.

ARM-Cortex Microcontroller Programming

In the present days, the microcontroller vendors are offering 32-bit microcontrollers based on ARM
cortex-m3 architecture. Many embedded system developers are starting to use these 32-bit
microcontrollers for their projects. The ARM microcontrollers supports for both low-level and high
level programming languages. Some of the traditional microcontroller architectures are made with
many limitations therefore, difficult to use the high level programming language.
Programming

For example the memory size is limited and performance might not be sufficient. The ARM
microcontrollers runs at 100Mhz frequency and higher performance, therefore it supports the
higher level languages. The ARM microcontroller is programmed with different IDES such as
keiluvision3, keiluvision4, coocox and so on. A 8-bit microcontroller use 8-bit instructions and the
ARM cortex-M uses a 32-instructions.

Operation Modes

The processor supports two modes of operation, Thread mode and Handler mode:

 Thread mode is entered on Reset, and can be entered as a result of an exception return.
Privileged and User (Unprivileged) code can run in Thread mode.

 Handler mode is entered as a result of an exception. All code is privileged in Handler mode.

The operation modes (thread mode and handler mode) determine whether the processor is running
a normal program or running an exception handler like an interrupt handler or system exception
handler. The privilege levels provide a mechanism for safeguarding memory accesses to critical
regions as well as providing a basic security model.

When the processor is running a main program (thread mode), it can be either in a privileged state
or a user state, but exception handlers can only be in a privileged state. When the processor exits
reset, it is in thread mode with privileged access right. In this state, a program has access to all
memory ranges and can use all supported instructions.

Software in the privileged access level can switch the program into the user access level using the
control register. When an exception takes place, the processor will always switch back to the
privileged state and return to the previous state when exiting the exception handler. A user program
cannot change back to the privileged state by writing to the control register. It has to go through an
exception handler that programs the control register to switch the processor back into the privileged
access level when returning to thread mode.

Vector Table

The Vector Table defines the entry addresses of the processor exceptions and the device specific
interrupts. It is typically located at the beginning of the program memory, however Using Interrupt
Vector Remap it can be relocated to RAM. The symbol __Vectors is the address of the vector table in
the startup code and the register SCB->VTOR holds the start address of the vector table.

An Armv8-M implementation with TrustZone provides two vector tables:

 vector table for Secure handlers

 vector table for Non-Secure handlers

Nested Vector Interrupt Controller (NVIC)

The Cortex-M3 processor includes an interrupt controller called the Nested Vectored Interrupt
Controller (NVIC). It is closely coupled to the processor core and provides a number of features as
follows:

 Nested interrupt support

 Vectored interrupt support

 Dynamic priority changes support

 Reduction of interrupt latency

 Interrupt masking

Nested Interrupt support - All the external interrupts and most of the system exceptions can be
programmed to different priority levels. When an interrupt occurs, the NVIC compares the priority of
this interrupt to the current running priority level. If the priority of the new interrupt handler is
higher than the current level, the interrupt handler of the new interrupt will override the current
running task.

Vectored Interrupt Support - When an interrupt is accepted, the starting address of the interrupt
service routine (ISR) is located from a vector table in memory.

Dynamic Priority Changes Support - Priority levels of interrupts can be changed by software during
run time. Interrupts that are being serviced are blocked from further activation until the ISR is
completed, so their priority can be changed without risk of accidental reentry.

Reduction of Interrupt Latency - The Cortex-M3 processor also includes a number of advanced
features to lower the interrupt latency. These include automatic saving and restoring some register
contents and reducing delay in switching from one ISR to another.
Interrupt Masking - Interrupts and system exceptions can be masked based on their priority level or
masked completely using the interrupt masking registers BASEPRI, PRIMASK, and FAULTMASK.

Exceptions and interrupts

The Cortex-M3 processor supports interrupts and system exceptions. The processor and the NVIC
prioritize and handle all exceptions. An exception changes the normal flow of software control. The
processor uses Handler mode to handle all exceptions except for reset.

 Exceptions are numbered:

 1 to 15 for system exceptions.

 16 and above for external interrupt inputs.

 Most of the exceptions have programmable priority, and a few have fixed priority.

 Support 1-240 interrupt.

External Interrupts

 When an enabled exception occurs but cannot be carried out immediately, it will be pended.
(a higher-priority interrupt service routine is running or if the interrupt mask register is set)

 This means that a register in the NVIC (pending status) will hold the exception request until
the exception can be carried out.

 This is different from traditional ARM processors. CPSR->SPSR Switch to ARM mode and
Disable IRQ

Instruction Set

 All instructions are 32 bits long.


 Most instructions execute in a single cycle.
 Every instruction can be conditionally executed.
 A load/store architecture
• Data processing instructions act only on registers – Three operand format – Combined ALU
and shifter for high speed bit manipulation
• Specific memory access instructions with powerful auto-indexing addressing modes. – 32
bit and 8 bit data types and also 16 bit data types on ARM Architecture v4. – Flexible
multiple register load and store instructions

Reset Sequence Instruction Set

The addressable memory space of the processor always starts with zero i.e 0x00000000. The
beginning of the memory space starting from zero, actually contains the “vector table”. Vector table
is nothing but a table of information about the initial stack pointer value and various exception
handler addresses.

1) After reset, PC is loaded with the address 0x00000000.


2) Then the processor fetches the value at 0x00000000 in to the MSP. i.e Main Stack Pointer.

3) Next the processor reads the address of the reset handler from the location 0x00000004 in to the
Program Counter.

4) Then the processor jumps to your reset handler and start executing the first instruction which you
wrote there.

5) After the required initialization, you can then call your main() from Reset Handler ,that’s how the
control comes to your main() function in your Application.

Unified Assembler Language

Unified Assembler Language (UAL) is a common syntax for ARM and Thumb instructions. It
supersedes earlier versions of both the ARM and Thumb assembler languages. Code written using
UAL can be assembled for ARM or Thumb for any ARM processor. The assembler faults the use of
unavailable instructions.

Real View Compilation Tools (RVCT) v2.1 and earlier can only assemble the pre-UAL syntax. Later
versions of RVCT and ARM Compiler tool chain can assemble code written in pre-UAL and UAL
syntax.

By default, the assembler expects source code to be written in UAL. The assembler accepts UAL
syntax if any of the directives CODE32, ARM, THUMB, or THUMBX is used or if you assemble with
any of the --32, --arm, --thumb, or --thumbx command line options. The assembler also accepts
source code written in pre-UAL ARM assembly language when you assemble with CODE32 or ARM.

The assembler accepts source code written in pre-UAL Thumb assembly language when
you assemble using the --16 command line option, or the CODE16 directive in the source code.

Memory Map

The Cortex-M3 has predefined memory maps, which allows built in peripherals, such as the interrupt
controller and debug components, to be accessed by simple memory access instructions. The
predefined memory map also allows the Cortex-M3 processor to be highly optimized for speed and
ease of integration in system-on-a-chip (SoC) designs.

The Cortex-M3 design has an internal bus infrastructure optimized for this memory usage. In
addition, the design allows these regions to be used differently. For example, data memory can still
be put into the CODE region, and program code can be executed from an external Random Access
Memory (RAM) region. The Cortex-M3 memory map is outlined in figure as shown.
Figure : Processor memory map

Memory interfaces

Memory Map Interface

Instruction fetches are performed over the ICode bus. Data accesses are
Code
performed over the DCode bus.

SRAM Instruction fetches and data accesses are performed over the system bus.

SRAM_bitband Alias region. Data accesses are aliases. Instruction accesses are not aliases.

Peripheral Instruction fetches and data accesses are performed over the system bus.

Periph_bitband Alias region. Data accesses are aliases. Instruction accesses are not aliases.

External RAM Instruction fetches and data accesses are performed over the system bus.

External Device Instruction fetches and data accesses are performed over the system bus.
Memory Map Interface

Accesses to:

 Instrumentation Trace Macrocell (ITM)

 Nested Vectored Interrupt Controller (NVIC)

 Flashpatch and Breakpoint (FPB)

 Data Watchpoint and Trace (DWT)

 Memory Protection Unit (MPU)

Private Peripheral are performed to the processor internal Private Peripheral Bus (PPB).
Bus
Accesses to:

 Trace Point Interface Unit (TPIU)

 Embedded Trace Macrocell (ETM)

 System areas of the PPB memory map

are performed over the external PPB interface.

This memory region is Execute Never (XN), and so instruction fetches are
prohibited. An MPU, if present, cannot change this.

Memory access attributes

The memory map shows what is included in each memory region. Aside from decoding which
memory block or device is accessed, the memory map also defines the memory attributes of the
access. The memory attributes you can find in the Cortex®-M3 and Cortex-M4 processors include the
following:

Bufferable: Write to memory can be carried out by a write buffer while the processor continues on
to next instruction execution.

Cacheable: Data obtained from memory read can be copied to a memory cache so that next time it
is accessed the value can be obtained from the cache to speed up program execution.

Executable: The processor can fetch and execute program code from this memory region.

Sharable: Data in this memory region could be shared by multiple bus masters. The memory system
needs to ensure coherency of data between different bus masters in the shareable memory region.

The processor bus interfaces output the memory access attribute information to the memory system
for each instruction and data transfer. The default memory attribute settings can be overridden if
the MPU is present and the MPU region configurations are programmed differently from the default.
In most existing Cortex-M3 and Cortex-M4 microcontrollers, only the Executable and Bufferable
attributes affect the operation of the applications. The Cacheable and Bufferable attributes are
usually used by a cache controller, which specifies memory types and caching scheme.

The Sharable memory attribute is needed in systems with multiple processors and multiple cache
units with cache coherency control. When a data access is indicated as Sharable, the cache
controller needs to ensure the value is coherent with other cache units as it could have been cached
and modified by another processor.

Though the Cortex-M3 and Cortex-M4 processors do not have a cache memory or cache controller, a
cache unit can be added on the microcontroller, which can use the memory attribute information to
define the memory access behaviors. In addition, the cache attributes might also affect the
operation of memory controllers for on-chip memory and off-chip memory, depending on the
memory controllers used by the chip manufacturers.

The Bufferable attribute is used inside the processor. In order to provide better performance, the
Cortex-M3 and Cortex-M4 processors support a single entry write buffer on the bus interface. A data
write to a bufferable memory region can be carried out in a single clock cycle and continue to the
next instruction execution, even if the actual transfer needs several clock cycles to be completed on
the bus interface (Figure 6.17).
MPU access permission attributes

This describes the MPU access permission attributes. The access permission bits, TEX, C, B, S, AP, and
XN, of the RASR, control access to the corresponding memory region. If an access is made to an area
of memory without the required permissions, then the MPU generates a permission fault.

TEX, C, B encoding

TEX C B Description Memory type Region shareability

b000 0 0 Strongly ordered. Strongly ordered Shareable

b000 0 1 Shared device. Device Shareable

b000 1 0 Outer and inner write-through. No write allocate. Normal S

b000 1 1 Outer and inner write-back. No write allocate. Normal S

b001 0 0 Outer and inner noncacheable. Normal S

b001 0 1 Reserved. Reserved Reserved

b001 1 0 Implementation-defined.

b001 1 1 Outer and inner write-back. Write and read allocate. Normal S

b010 0 0 Nonshared device. Device Not shareable

b010 0 1 Reserved. Reserved Reserved

b010 1 X Reserved. Reserved Reserved

Cached memory BB = outer policy.


b1BB A A Normal S
AA = inner policy.

Bit Band Operations

Bit-banding maps a complete word of memory onto a single bit in the bit-band region. For example,
writing to one of the alias words sets or clears the corresponding bit in the bit-band region. This
enables every individual bit in the bit-banding region to be directly accessible from a word-aligned
address using a single LDR instruction. It also enables individual bits to be toggled without
performing a read-modify-write sequence of instructions.

The processor memory map includes two bit-band regions. These occupy the lowest 1MB of the
SRAM and Peripheral memory regions respectively. These bit-band regions map each word in an
alias region of memory to a bit in a bit-band region of memory.
The System bus interface contains logic that controls bit-band accesses as follows:

 It remaps bit-band alias addresses to the bit-band region.

 For reads, it extracts the requested bit from the read byte, and returns this in the Least
Significant Bit (LSB) of the read data returned to the core.

 For writes, it converts the write to an atomic read-modify-write operation.

 The processor does not stall during bit-band operations unless it attempts to access the
System bus while the bit-band operation is being carried out.

The memory map has two 32-MB alias regions that map to two 1-MB bit-band regions:

 Accesses to the 32-MB SRAM alias region map to the 1-MB SRAM bit-band region.

 Accesses to the 32-MB peripheral alias region map to the 1-MB peripheral bit-band region.

A mapping formula shows how to reference each word in the alias region to a corresponding bit, or
target bit, in the bit-band region. The mapping formula is:

 bit_word_offset = (byte_offset x 32) + (bit_number × 4)


 bit_word_addr = bit_band_base + bit_word_offset

where:

 bit_word_offset is the position of the target bit in the bit-band memory region.

 bit_word_addr is the address of the word in the alias memory region that maps to the
targeted bit.

 bit_band_base is the starting address of the alias region.

 byte_offset is the number of the byte in the bit-band region that contains the targeted bit.

 bit_number is the bit position, 0 to 7, of the targeted bit.

Example showing Bit Band Operation

 The alias word at 0x23FFFFE0 maps to bit [0] of the bit-band byte
at 0x200FFFFF: 0x23FFFFE0 = 0x22000000 + (0xFFFFF*32) + 0*4.

 The alias word at 0x23FFFFFC maps to bit [7] of the bit-band byte
at 0x200FFFFF: 0x23FFFFFC = 0x22000000 + (0xFFFFF*32) + 7*4.

 The alias word at 0x22000000 maps to bit [0] of the bit-band byte
at 0x20000000: 0x22000000 = 0x22000000 + (0*32) + 0*4.

 The alias word at 0x2200001C maps to bit [7] of the bit-band byte
at 0x20000000: 0x2200001C = 0x22000000 + (0*32) + 7*4.
Figure : Bit-band mapping

How Bit-banding Works

Bit-banding is a term that ARM uses to describe a feature that is available on the Cortex M3 and M4
CPU cores. Basically, the device takes a region of memory (the Bit-band region) and maps each bit in
that region to an entire word in a second memory region (the Bit-band Alias Region).

The benefit of Bit-banding is that a write to a word in the alias region performs a write to the
corresponding bit in the Bit-band region. Also, reading a word in the alias region will return the value
of the corresponding bit in the Bit-band region. These operations take a single machine instruction
thus eliminate race conditions. This is especially useful for interacting with peripheral registers
where it is often necessary to set and clear individual bits.

The image below shows a byte in the Bit-band region on the top. The bottom row of bytes represent
the bit-band alias region. Here we only choose to use 8-bit words. On the Cortex M3 and M4 the
alias region would contain 32-bit words.
To use this feature, you must first get the address of the word in the alias region that corresponds to
the bit you wish to read/write. This is done in C with a simple pre-processor macro. This is not the
only solution to this problem. All common architectures have implemented mechanisms for
atomically setting and clearing bits. ARM’s approach is elegant in that it can be exercised with ANSI
C, while most others implementations require special C extensions or the use of assembly language.

Unaligned Transfers

The Cortex M3 supports unsigned transfers on single accesses. Data memory accesses can be
defined an aligned or unaligned. Traditionally, ARM processors such as ARM 7, ARM 9, ARM 10 allow
only aligned transfers. This means in accessing memory, a word transfer must have address bit [1]
and bit [0] equal to 0, and a half word transfer must have address bit [0] equal to 0. For example,
word data can be located at 0x1000 or 0x1002, but it cannot be 0x1001.

In the cortex m3, unaligned transfers are supported in normal memory accesses (such as LDR, LDRH,
STR and STRH instructions). There are number of limitations :

 Unaligned transfers are not supported in load/store multiple instructions.


 Stack operations (Push/Pop) must be aligned.
 Exclusive accesses (such as LDREX or STREX) must be aligned; otherwise, a fault exception
will be triggered.
 Unaligned transfers are not supported in bit-band operations. Results will be unpredictable
if you attempt to do so.

When unaligned transfers are used, they are actually converted into multiple aligned transfers by
the processors bus interface unit. This conversion is transparent, so the application programmers do
not have to worry about it. However, when an unaligned transfer takes place, it is broken into
separate transfers and as a result, it takes more clock cycles for single data access and might not be
good for situations in which high performance is required. To get the best performance, it’s worth
making sure that data are aligned properly.

Exclusive Accesses

The cortex m3 has no SWP instruction (swap), which was used for semaphore operations in
traditional ARM processors. This is now being replaced by exclusive access operations. In newer
ARM processors, the read/write access can be carried out on separate buses. In such situations, the
SWP instructions can no longer be used to make the memory access atomic because the read and
write in a locked transfer sequence must be on the same bus. Therefore the locked transfers are
replaced by exclusive accesses. The concept of exclusive access is quite simple but different from
SWP instructions; it allows the possibility that the memory location for a semaphore could be
accessed by another bus master or another process running on the same processor.

To allow exclusive access to work properly in a multiple processor environment, an additional


hardware called exclusive access monitor is required. This monitor checks the transfers towards
shared address location and replies to the processor if an exclusive access is success. The processor
bus interface also provides additional control signals to this monitor to indicate if the transfer is an
exclusive access.
If the memory device has been accessed by another bus master between the exclusive read and the
exclusive write, the exclusive access monitor will flag an exclusive failed through the bus system
when the processor attempts the exclusive write.

The Pipeline

The Cortex-M3 processor has a three-stage pipeline. The pipeline stages are instruction fetch,
instruction decode, and instruction execution.

The Three-Stage Pipeline in the Cortex-M3

Some people might argue that there are four stages because of the pipeline behavior in the bus
interface when it accesses memory, but this stage is outside the processor, so the processor itself
still has only three stages.

When running programs with mostly 16-bit instructions, you will find that the processor might not
fetch instructions in every cycle. This is because the processor fetches up to two instructions (32-bit)
in one go, so after one instruction is fetched, the next one is already inside the processor. In this
case, the processor bus interface may try to fetch the instruction after the next or, if the buffer is
full, the bus interface could be idle. Some of the instructions take multiple cycles to execute; in this
case, the pipeline will be stalled.

In executing a branch instruction, the pipeline will be flushed. The processor will have to fetch
instructions from the branch destination to fill up the pipeline again. However, the Cortex-M3
processor supports a number of instructions in v7-M architecture, so some of the short-distance
branches can be avoided by replacing them with conditional execution codes.1

Due to the pipeline nature of the processor and to ensure that the program is compatible with
Thumb codes, when the program counter is read during instruction execution, the read value will be
the address of the instruction plus 4. This offset is constant, independent of the combination of 16-
bit Thumb instructions and 32-bit Thumb-2 instructions. This ensures consistency between Thumb
and Thumb-2.

Inside the instruction pre-fetch unit of the processor core, there is also an instruction buffer. This
buffer allows additional instructions to be queued before they are needed. This buffer prevents the
pipeline being stalled when the instruction sequence contains 32-bit Thumb-2 instructions that are
not word aligned. However, this buffer does not add an extra stage to the pipeline, so it does not
increase the branch penalty.
Figure : Use of a Buffer in Instruction Fetch Unit to Improve 32-Bit Instruction Handling

Bus Interfaces on the Cortex-M3

Unless you are designing a SoC product using the Cortex-M3 processor, it is unlikely that you can
directly access the bus interface signals described here. Normally the chip manufacturer will hook up
all the bus signals to memory blocks and peripherals, and in a few cases, you might find that the chip
manufacturer connected the bus to a bus bridge and allows external bus systems to be connected
off-chip. The bus interfaces on the Cortex-M3 processor are based on AHB-Lite and APB protocols,
which are documented in the AMBA Specification.

The I-Code Bus

The I-Code bus is a 32-bit bus based on the AHB-Lite bus protocol for instruction fetches in memory
regions from 0x00000000 to 0x1FFFFFFF. Instruction fetches are performed in word size, even for
Thumb instructions. Therefore, during execution, the CPU core could fetch up to two Thumb
instructions at a time.

The D-Code Bus

The D-Code bus is a 32-bit bus based on the AHB-Lite bus protocol; it is used for data access in
memory regions from 0x00000000 to 0x1FFFFFFF. Although the Cortex-M3 processor supports
unaligned transfers, you won’t get any unaligned transfer on this bus, because the bus interface on
the processor core converts the unaligned transfers into aligned transfers for you. Therefore, devices
(such as memory) that attach to this bus need only support AHB-Lite (AMBA 2.0) aligned transfers.
The System Bus

The system bus is a 32-bit bus based on the AHB-Lite bus protocol; it is used for instruction fetch and
data access in memory regions from 0x20000000 to 0xDFFFFFFF and 0xE0100000 to 0xFFFFFFFF. As
with the to the D-Code bus, all transfers are aligned.

The External Private Peripheral Bus

The External Private Peripheral bus (External PPB) is a 32-bit bus based on the APB bus protocol. This
is intended for private peripheral accesses in memory regions 0xE0040000 to 0xE00FFFFF. However,
since some part of this APB memory is already used for TPIU, ETM, and the ROM table, the memory
region that can be used for attaching extra peripherals on this bus is only 0xE0042000 to
0xE00FF000. Transfers on this bus are word aligned.

The Debug Access Port Bus

The Debug Access Port (DAP) bus interface is a 32-bit bus based on an enhanced version of the APB
specification. This is for attaching debug interface blocks such as SWJ-DP or SW-DP. Do not use this
bus for other purposes.
UNIT – 2

Exception Model

The processor and the Nested Vectored Interrupt Controller (NVIC) prioritize and handle all
exceptions. All exceptions are handled in Handler mode. Processor state is automatically stored to
the stack on an exception, and automatically restored from the stack at the end of the Interrupt
Service Routine (ISR). The vector is fetched in parallel to the state saving, enabling efficient interrupt
entry. The processor supports tail-chaining that enables back-to-back interrupts without the
overhead of state saving and restoration. The following features enable efficient, low latency
exception handling:

 Automatic state saving and restoring. The processor pushes state registers on the stack
before entering the ISR, and pops them after exiting the ISR with no instruction overhead.

 Automatic reading of the vector table entry that contains the ISR address in code memory or
data SRAM. This is performed in parallel to the state saving.

Note

Vector table entries are ARM/Thumb interworking compatible.

This causes bit [0] of the vector value to load into the EPSR T-bit on exception entry. Creating a table
entry with bit [0] clear generates an INVSTATE fault on the first instruction of the handler
corresponding to this vector.

 Support for tail-chaining. In tail-chaining, the processor handles back-to-back interrupts


without popping and pushing registers between ISRs.

 Dynamic reprioritization of interrupts.

 Closely-coupled interface between the processor core and the NVIC to enable early
processing of interrupts and processing of late-arriving interrupts with higher priority.

 Configurable number of interrupts, from 1 to 240.

 Configurable number of interrupt priorities, from 3 to 8 bits (8 to 256 levels).

 Separate stacks and privilege levels for Handler and Thread modes.

 ISR control transfer using the calling conventions of the C/C++ standard ARM Architecture
Procedure Call Standard (AAPCS).

 Priority masking to support critical regions.

Note

The number of interrupts, and bits of interrupt priority, are configured during implementation.
Software can choose only to enable a subset of the configured number of interrupts, and can choose
how many bits of the configured priorities to use.
Exception types

Exception types

The exception types are:

Reset

Reset is invoked on power up or a warm reset. The exception model treats reset as a special form of
exception. When reset is asserted, the operation of the processor stops, potentially at any point in
an instruction. When reset is deasserted, execution restarts from the address provided by the reset
entry in the vector table. Execution restarts as privileged execution in Thread mode.

NMI

A NonMaskable Interrupt (NMI) can be signalled by a peripheral or triggered by software. This is the
highest priority exception other than reset. It is permanently enabled and has a fixed priority of -2.
NMIs cannot be:

 masked or prevented from activation by any other exception

 preempted by any exception other than Reset.

HardFault

A HardFault is an exception that occurs because of an error during exception processing, or because
an exception cannot be managed by any other exception mechanism. HardFaults have a fixed
priority of -1, meaning they have higher priority than any exception with configurable priority.

MemManage

A MemManage fault is an exception that occurs because of a memory protection related fault. The
MPU or the fixed memory protection constraints determines this fault, for both instruction and data
memory transactions. This fault is always used to abort instruction accesses to Execute Never (XN)
memory regions.

BusFault

A BusFault is an exception that occurs because of a memory related fault for an instruction or data
memory transaction. This might be from an error detected on a bus in the memory system.

UsageFault

A UsageFault is an exception that occurs because of a fault related to instruction execution. This
includes:

 an undefined instruction

 an illegal unaligned access

 invalid state on instruction execution


 an error on exception return.

The following can cause a UsageFault when the core is configured to report them:

 an unaligned address on word and halfword memory access

 division by zero.

SVCall

A supervisor call (SVC) is an exception that is triggered by the SVC instruction. In an OS environment,
applications can use SVC instructions to access OS kernel functions and device drivers.

PendSV

PendSV is an interrupt-driven request for system-level service. In an OS environment, use PendSV for
context switching when no other exception is active.

SysTick

A SysTick exception is an exception the system timer generates when it reaches zero. Software can
also generate a SysTick exception. In an OS environment, the processor can use this exception as
system tick.

Interrupt (IRQ)

A interrupt, or IRQ, is an exception signalled by a peripheral, or generated by a software request. All


interrupts are asynchronous to instruction execution. In the system, peripherals use interrupts to
communicate with the processor.

Various types of exceptions exist in the processor. A fault is an exception that results from an error
condition because of instruction execution. Faults can be reported synchronously or asynchronously
to the instruction that caused them. In general, faults are reported synchronously. The Imprecise Bus
Fault is an asynchronous fault supported in the ARMv7-M profile. A synchronous fault is always
reported with the instruction that caused the fault. An asynchronous fault does not guarantee how it
is reported with respect to the instruction that caused the fault.
Exception states

Each exception is in one of the following states:

Inactive

The exception is not active and not pending.

Pending

The exception is waiting to be serviced by the processor.

An interrupt request from a peripheral or from software can change the state of the corresponding
interrupt to pending.

Active

An exception that is being serviced by the processor but has not completed.

Note - An exception handler can interrupt the execution of another exception handler. In this case
both exceptions are in the active state.

Active and pending

The exception is being serviced by the processor and there is a pending exception from the same
source.

Exception handlers

The processor handles exceptions using:


Interrupt Service Routines (ISRs)

The IRQ interrupts are the exceptions handled by ISRs.

Fault handlers

HardFault, MemManage fault, UsageFault, and BusFault are fault exceptions handled by the fault
handlers.

System handlers

NMI, PendSV, SVCall SysTick, and the fault exceptions are all system exceptions that are handled by
system handlers.

Exception Priority

All exceptions have an associated priority, with:

 a lower priority value indicating a higher priority

 configurable priorities for all exceptions except Reset, HardFault, and NMI.

If software does not configure any priorities, then all exceptions with a configurable priority have a
priority of 0.

For example, assigning a higher priority value to IRQ[0] and a lower priority value to IRQ[1] means
that IRQ[1] has higher priority than IRQ[0]. If both IRQ[1] and IRQ[0] are asserted, IRQ[1] is
processed before IRQ[0].

If multiple pending exceptions have the same priority, the pending exception with the lowest
exception number takes precedence. For example, if both IRQ[0] and IRQ[1] are pending and have
the same priority, then IRQ[0] is processed before IRQ[1].

When the processor is executing an exception handler, the exception handler is preempted if a
higher priority exception occurs. If an exception occurs with the same priority as the exception being
handled, the handler is not preempted, irrespective of the exception number. However, the status of
the new interrupt changes to pending.

New exception has higher priority than current exception priority or thread and
interrupts current flow.

This is the response to a pended interrupt, causing entry to an ISR if the pended
Pre- interrupt is higher priority than the active ISR or thread. When one ISR pre-empts
emption another, the interrupts are nested.

On exception entry the processor automatically saves processor state, which is pushed
on to the stack. In parallel with this, the vector corresponding to the interrupt is fetched.
Execution of the first instruction of the ISR starts when processor state is saved and the
first instruction of the ISR enters the execute stage of the processor pipeline. The state
saving is performed over the System bus and DCode bus. The vector fetch is performed
over either the System bus or the ICode bus depending on where the vector table is
located.

A mechanism used by the processor to speed up interrupt servicing. On completion of


Tail-
an ISR, if there is a pending interrupt of higher priority than the ISR or thread that is
chain
being returned to, the stack pop is skipped and control is transferred to the new ISR.

With no pending exceptions or no pending exceptions with higher priority than a stacked
ISR, the processor pops the stack and returns to stacked ISR or Thread Mode.

On completion of an ISR the processor automatically restores the processor state by


Return popping the stack to the state prior to the interrupt that caused the ISR to be entered. If
a new interrupt arrives during the state restoration, and that interrupt is of higher
priority than the ISR or thread that is being returned to, then the state restoration is
abandoned and the new interrupt is handled as a tail-chain.

A mechanism used by the processor to speed up pre-emption. If a higher priority


interrupt arrives during state saving for a previous pre-emption, the processor switches
to handling the higher priority interrupt instead and initiates the vector fetch for that
Late-
interrupt. The state saving is not effected by late arrival because the state saved is the
arriving
same for both interrupts, and the state saving continues uninterrupted. Late arriving
interrupts are managed until the first instruction of the ISR enters the execute stage of
the processor pipeline. On return, the normal tail-chaining rules apply.

In the processor exception model, priority determines when and how the processor takes
exceptions. You can:

 assign software priority levels to interrupts

 group priorities by splitting priority levels into pre-emption priorities and subpriorities.

Priority levels

The NVIC supports software-assigned priority levels. You can assign a priority level from 0 to 255 to
an interrupt by writing to the eight-bit PRI_N field in an Interrupt Priority Register. Hardware priority
decreases with increasing interrupt number. Priority level 0 is the highest priority level, and priority
level 255 is the lowest. The priority level overrides the hardware priority. For example, if you assign
priority level 1 to IRQ[0] and priority level 0 to IRQ[31], then IRQ[31] has higher priority than IRQ[0].

Note - Software prioritization does not affect reset, Non-Maskable Interrupt (NMI), and hard fault.
They always have higher priority than the external interrupts.
When multiple interrupts have the same priority number, the pending interrupt with the lowest
interrupt number takes precedence. For example, if both IRQ[0] and IRQ[1] are priority level 1,
then IRQ[0] has higher priority than IRQ[1].

Priority grouping

To increase priority control in systems with large numbers of interrupts, the NVIC supports priority
grouping. You can use the PRIGROUP field in the Application Interrupt and Reset Control Register to
split the value in every PRI_N field into a pre-emption priority field and a subpriority field. The pre-
emption priority group is referred to as the group priority. Where multiple pending exceptions share
the same group priority, the sub-priority bit field resolves the priority within a group. This is referred
to as the sub-priority within the group. The combination of the group priority and the sub-priority is
referred to generally as the priority. Where two pending exceptions have the same priority, the
lower pending exception number has priority over the higher pending exception number. This is
consistent with the priority precedence scheme.

Pre- Number of pre-


Binary point emption Subpriority emption Number of
PRIGROUP[2:0] position field field priorities subpriorities

b000 bxxxxxxx.y [7:1] [0] 128 2

b001 bxxxxxx.yy [7:2] [1:0] 64 4

b010 bxxxxx.yyy [7:3] [2:0] 32 8

b011 bxxxx.yyyy [7:4] [3:0] 16 16

b100 bxxx.yyyyy [7:5] [4:0] 8 32

b101 bxx.yyyyyy [7:6] [5:0] 4 64

b110 bx.yyyyyyy [7] [6:0] 2 128

b111 b.yyyyyyyy None [7:0] 0 256

Vector table

The vector table contains the reset value of the stack pointer, and the start addresses, also called
exception vectors, for all exception handlers. Figure shows the order of the exception vectors in the
vector table. The least-significant
significant bit of each vector must be 1, indicating that the exception handler
is Thumb code.

Figure : Vector table

On system reset, the vector table is fixed at address 0x00000000. Privileged software can write to
the VTOR to relocate the vector table start address to a different memory location, in the
range 0x00000080 to 0x3FFFFF80.

Interrupts

An interrupt is the automatic transfer of software execution in response to a hardware event that is
asynchronous with the current software execution. This hardware event is called a trigger. The
hardware event can either be a busy to ready transition in an eexternal
xternal I/O device (like the UART
input/output) or an internal event (like bus fault, memory fault, or a periodic timer). When the
hardware needs service, signified by a busy to ready state transition, it will request an interrupt by
setting its trigger flag. A thread is defined as the path of action of software as it executes. The
execution of the interrupt service routine is called a background thread. This thread is created by the
hardware interrupt request and is killed when the interrupt service routin
routinee returns from interrupt
(e.g., by executing a BX LR).). A new thread is created for each interrupt request. It is important to
consider each individual request as a separate thread because local variables and registers used in
the interrupt service routine are unique and separate from one interrupt event to the next interrupt.
In a multi-threaded system, we consider the threads as cooperating to perform an overall task.
Consequently we will develop ways for the threads to communicate (e.g., FIFO) and to synchronize
with each other. Most embedded systems have a single common overall goal. On the other hand,
general-purpose computers can have multiple unrelated functions to perform. A process is also
defined as the action of software as it executes. Processes do not necessarily cooperate towards a
common shared goal. Threads share access to I/O devices, system resources, and global variables,
while processes have separate global variables and system resources. Processes do not share I/O
devices.

There are no standard definitions for the terms mask, enable, and arm in the professional, Computer
Science, or Computer Engineering communities. Nevertheless, in this class we will adhere to the
following specific meanings. To arm a device means to allow the hardware trigger to interrupt.
Conversely, to disarm a device means to shut off or disconnect the hardware trigger from the
interrupts. Each potential interrupting trigger has a separate arm bit. One arms a trigger if one is
interested in interrupts from this source. Conversely, one disarms a trigger if one is not interested in
interrupts from this source.

To enable means to allow interrupts at this time. Conversely, to disable means to postpone
interrupts until a later time. On the ARM Cortex-M processor there is one interrupt enable bit for the
entire interrupt system. We disable interrupts if it is currently not convenient to accept interrupts. In
particular, to disable interrupts we set the I bit in PRIMASK. In C, we enable and disable interrupts by
calling the functions EnableInterrupts() and DisableInterrupts() respectively.

The software has dynamic control over some aspects of the interrupt request sequence. First, each
potential interrupt trigger has a separate arm bit that the software can activate or deactivate. The
software will set the arm bits for those devices from which it wishes to accept interrupts, and will
deactivate the arm bits within those devices from which interrupts are not to be allowed. In other
words it uses the arm bits to individually select which devices will and which devices will not request
interrupts. For most devices there is a enable bit in the NVIC that must be set (periodic SysTick
interrupts are an exception, having no NVIC enable). The third aspect that the software controls is
the interrupt enable bit. Specifically, bit 0 of the special register PRIMASK is the interrupt mask bit.

If this bit is 1 most interrupts and exceptions are not allowed, which we will define as disabled. If the
bit is 0, then interrupts are allowed, which we will define as enabled. The fourth aspect is priority.
The BASEPRI register prevents interrupts with lower priority interrupts, but allows higher priority
interrupts. For example if the software sets the BASEPRI to 3, then requests with level 0, 1, and 2
can interrupt, while requests at levels 3 and higher will be postponed. The software can also specify
the priority level of each interrupt request. If BASEPRI is zero, then the priority feature is disabled
and all interrupts are allowed. The fifth aspect is the external hardware trigger. One example of a
hardware trigger is the Count flag in the NVIC_ST_CTRL_R register which is set periodically by
SysTick. Another example of hardware triggers are bits in the GPIO_PORTF_RIS_R register that are
set on rising or falling edges of digital input pins. Five conditions must be true for an interrupt to be
generated:

 Device arm
 NVIC enable
 Global enable
 Interrupt priority level must be higher than current level executing
 Hardware event trigger

For an interrupt to occur, these five conditions must be simultaneously true but can occur in any
order.

An interrupt causes the following sequence of five events. First, the current instruction is finished.
Second, the execution of the currently running program is suspended, pushing eight registers on the
stack (R0, R1, R2, R3, R12, LR, PC, and PSR with the R0 on top). If the floating point unit on the
TM4C123 is active, an additional 18 words will be pushed on the stack representing the floating
point state, making a total of 26 words. Third, the LR is set to a specific value signifying an interrupt
service routine (ISR) is being run (bits [31:4] to 0xFFFFFFF, and bits [3:0] specify the type of interrupt
return to perform). In our examples we will see LR is set to 0xFFFFFFF9. If the floating point registers
were pushed, the LR will be 0xFFFFFFE9. Fourth, the IPSR is set to the interrupt number being
processed. Lastly, the PC is loaded with the address of the ISR (vector).

 Current instruction is finished


 Eight registers are pushed on the stack
 LR is set to 0xFFFFFFF9
 IPSR is set to the interrupt number
 PC is loaded with the interrupt vector

Interrupt Service Routine (ISR)

The interrupt service routine (ISR) is the software module that is executed when the hardware
requests an interrupt. There may be one large ISR that handles all requests (polled interrupts), or
many small ISRs specific for each potential source of interrupt (vectored interrupts). The design of
the interrupt service routine requires careful consideration of many factors. Except for the SysTick
interrupt, the ISR software must explicitly clear the trigger flag that caused the interrupt
(acknowledge). After the ISR provides the necessary service, it will execute BX LR. Because LR
contains a special value (e.g., 0xFFFFFFF9), this instruction pops the 8 registers from the stack, which
returns control to the main program. If the LR is 0xFFFFFFE9, then 26 registers (R0-
R3,R12,LR,PC,PSW, and 18 floating point registers) will be popped by BX LR. There are two stack
pointers: PSP and MSP. The software in this class will exclusively use the MSP. It is imperative that
the ISR software balance the stack before exiting. Execution of the previous thread will then
continue with the exact stack and register values that existed before the interrupt. Although
interrupt handlers can create and use local variables, parameter passing between threads must be
implemented using shared global memory variables. A private global variable can be used if an
interrupt thread wishes to pass information to itself, e.g., from one interrupt instance to another.
The execution of the main program is called the foreground thread, and the executions of the
various interrupt service routines are called background threads.

An axiom with interrupt synchronization is that the ISR should execute as fast as possible. The
interrupt should occur when it is time to perform a needed function, and the interrupt service
routine should perform that function, and return right away. Placing backward branches (busy-wait
loops, iterations) in the interrupt software should be avoided if possible. The percentage of time
spent executing interrupt software should be small when compared to the time between interrupt
triggers.

Performance measures: latency and bandwidth

For an input device, the interface latency is the time between when new input is available, and the
time when the software reads the input data. We can also define device latency as the response
time of the external I/O device. For example, if we request that a certain sector be read from a disk,
then the device latency is the time it take to find the correct track and spin the disk (seek) so the
proper sector is positioned under the read head. For an output device, the interface latency is the
time between when the output device is idle, and the time when the software writes new data.
A real-time system is one that can guarantee a worst case interface latency. Bandwidth is defined as
the amount of data/sec being processed.

Interrupt Inputs and Pending Behavior

When an interrupt input is asserted, it will be pended. Even if the interrupt source de-asserts the
interrupt, the pended interrupt status will still cause the interrupt handler to be executed when the
priority is allowed.

If the pending status is cleared before the processor starts responding to the pended interrupt,the
interrupt can be canceled.(for example, because pending status register is cleared while
PRIMASK/FAULTMASK is set to 1)

The pending status of the interrupt can be accessed in the NVIC and is writable, so you can clear a
pending interrupt or use software to pend a new interrupt by setting the pending register.
When the processor starts to execute an interrupt, the interrupt becomes active and the pending bit
will be cleared automatically.

When an interrupt is active, you cannot start processing the same interrupt again until the interrupt
service routine is terminated with an interrupt return (also called an exception exit).

Then the active status is cleared and the interrupt can be processed again if the pending status is 1.

If an interrupt source continues to hold the interrupt request signal active, the interrupt will be
pended again at the end of the interrupt service routine.

If an interrupt is pulsed several times before the processor starts processing it, it will be treated as
one single interrupt request.

If an interrupt is de-asserted and then pulsed again during the interrupt service routine, it will be
pended again.

Interrupt Set Pending and Clear Pending

If an interrupt takes place but cannot be executed immediately (for instance, if another higher-
priority interrupt handler is running), it will be pended. The interrupt-pending status can be accessed
through the Interrupt Set Pending (SETPEND) and Interrupt Clear Pending (CLRPEND) registers.
Similarly to the enable registers, the pending status controls might contain more than one register if
there are more than 32 external interrupt inputs.

The values of pending status registers can be changed by software, so you can cancel a current
pended exception through the CLRPEND register, or generate software interrupts through the
SETPEND register
Fault Exceptions

A number of system exceptions are useful for fault handling. There are several categories of faults:

• Bus faults

• Memory management faults

• Usage faults

• Hard faults

Features

 Execution of an undefined instruction (including trying to execute floating point instructions


when the floating point unit is disabled).
 Execution of Co-processor instructions – the Cortex-M3 and Cortex-M4 processors do not
support Co-processor access instructions, but it is possible to use the usage fault mechanism
to emulate co-processor instruction support.
 Trying to switch to ARM state – classic ARM processors like ARM7TDMI support both ARM
instruction and Thumb instruction sets, while Cortex-M processors only support Thumb ISA.
 Software ported from classic ARM processors might contain code that switches the
processor to ARM state, and software could potentially use this feature to test whether the
processor it is running on supports ARM code.
 Invalid EXC_RETURN code during exception-return sequence. For example, trying to return
to Thread level with exceptions still active (apart from the current serving exception).
 Unaligned memory access with multiple load or multiple store instructions (including load
double and store double).
 Execution of SVC when the priority level of the SVC is the same or lower than current level;
this scenario leads to a HardFault not a Usage Fault.
 Exception return with Interrupt-Continuable Instruction (ICI) bits in the unstacked xPSR, but
the instruction being executed after exception return is not a multiple-load/store
instruction.

It is also possible, by setting up the Configuration Control Register (CCR) to generate usage faults for
the following:

 Divide by zero
 All unaligned memory accesses

Please note that the floating point instructions supported by the Cortex-M4 are not co-processor
instructions (e.g., MCR, MRC). However, slightly confusingly, the register that enables the floating
point unit is called the Coprocessor Access Control Register (CPACR).

Supervisor Calls (SVC)

As with previous ARM cores there is an instruction, SVC (formerly SWI) that generates a supervisor
call. Supervisor calls are normally used to request privileged operations or access to system
resources from an operating system.
The SVC instruction has a number embedded within it, often referred to as the SVC number. This is
sometimes used to indicate what the caller is requesting. On previous ARM cores you had to extract
the SVC number from the instruction using the return address in the link register, and the other SVC
arguments were already available in R0 through R3.

On the Cortex-M3, the core saves the argument registers to the stack on the initial exception entry.
A late-arriving exception, taken before the first instruction of the SVC handler executes, might
corrupt the copy of the arguments still held in R0 to R3. This means that the stack copy of the
arguments must be used by the SVC handler. Any return value must also be passed back to the caller
by modifying the stacked register values. In order to do this, a short piece of assembly code must be
implemented as the start of the SVC handler. This identifies which stack the registers were saved to,
extracts the SVC number from the instruction, and passes the number and a pointer to the
arguments to the main body of the handler written in C.

Example SVC Handler

__asm

void SVCHandler(void)

IMPORT SVCHandler_main

TST lr, #4 MRSEQ r0, MSP MRSNE r0, PSP B

SVCHandler_main

void SVCHandler_main(unsigned int * svc_args)

unsigned int svc_number;

svc_number = ((char *)svc_args[6])[-2];

switch(svc_number)

case SVC_00: /* Handle SVC 00 */

break;

case SVC_01: /* Handle SVC 01 */

break;

default: /* Unknown SVC */


break;

}}

This code tests the EXC_RETURN value set by the processor to determine which stack pointer was in
use when the SVC was called. On most systems this will be unnecessary, because in a typical system
design supervisor calls will only be made from user code which uses the process stack. In this case,
the assembly code can consist of a single MSR instruction followed by a tailcall branch (B instruction)
to the C body of the handler.

Supervisor Call and Pendable Service Call

 Supervisor Call (SVC) and Pendable Service Call (PendSV) are two exceptions targeted at
software and operating systems.
 SVC is for generating system function calls. For example, instead of allowing user programs
to directly access hardware, an operating system may provide access to hardware through
an SVC.
 So when a user program wants to use certain hardware, it generates the SVC exception
using SVC instructions, and then the software exception handler in the operating system is
executed and provides the service the user application requested.
 SVC can also make software more portable because the user application does not need to
know the programming details of the hardware.
 The user program will only need to know the application programming interface (API)
function ID and parameters; the actual hardware-level programming is handled by device
drivers.
 PendSV (Pendable Service Call) works with SVC in the OS.
 SVC (by SVC instruction) cannot be pended (an application calling SVC will expect the
required task to be done immediately), whereas PendSV can be pended and is useful for an
OS to pend an exception so that an action can be performed after other important tasks are
completed.
 PendSV is generated by writing 1 to the PENDSVSET bit in the NVIC Interrupt Control State
register.
 A typical use of PendSV is context switching (switching between tasks).
 If context switching is triggered by SysTick but an IRQ is being executed, the SVC for context
switching will preempt the current IRQ. When the SVC is finished, it will try to return to the
thread mode and a usage fault will be generated as an active IRQ is not completed. Solution:
Use PendSV instead, so the IRQ will be completed first.

Nested Vector Interrupt Controller (NVIC)

In a microcontroller, such as those at the heart of industrial motion controllers, interrupts serve
as a way to immediately divert the central processing unit from its current task to another, more
important task.

An interrupt can be triggered internally from the microcontroller (MCU) or externally, by a


peripheral. The interrupt alerts the central processing unit (CPU) to an occurrence such as a
time-based event (a specified amount of time has elapsed or a specific time is reached, for
example), a change of state, or the start or end of a process.

Depending on the implementation used by the silicon manufacturer, the NVIC can support up to
240 external interrupts with up to 256 different priority levels that can be dynamically
reprioritized. It supports both level and pulse interrupt sources. The processor state is
automatically saved by hardware on interrupt entry and is restored on interrupt exit. The NVIC
also supports tail-chaining of interrupts.

The use of an NVIC in the Cortex-M3 means that the vector table for a Cortex-M3 is very
different to previous ARM cores. The Cortex-M3 vector table contains the address of the
exception handlers and ISR, not instructions as most other ARM cores do. The initial stack
pointer and the address of the reset handler must be located at 0x0 and 0x4 respectively. These
values are then loaded into the appropriate CPU registers at reset.

Interrupts can be triggered internally – from a timer, for example – or externally, from
peripherals.

Nested vector interrupt control (NVIC) is a method of prioritizing interrupts, improving the MCU’s
performance and reducing interrupt latency. NVIC also provides implementation schemes for
handling interrupts that occur when other interrupts are being executed or when the CPU is in the
process of restoring its previous state and resuming its suspended process.

The term “nested” refers to the fact that in NVIC, a number of interrupts (up to several hundred in
some processors) can be defined, and each interrupt is assigned a priority, with “0” being the highest
priority. In addition, the most critical interrupt can be made non-maskable, meaning it cannot be
disabled (masked).

One function of NVIC is to ensure that higher priority interrupts are completed before lower-priority
interrupts, even if the lower-priority interrupt is triggered first. For example, if a lower-priority
interrupt is being registered* or executed and a higher-priority interrupt occurs, the CPU will stop
the lower-priority interrupt and process the higher-priority one first.

Exits and Tail Chaining

Similarly, a handling scheme referred to as “tail-chaining” specifies that if an interrupt is pending


while the ISR for another, higher-priority another interrupt completes, the processor will
immediately begin the ISR for the next interrupt, without restoring its previous state.

With tail-chaining, if an interrupt is pending as the ISR for another, higher-priority interrupt
completes, the processor will immediately begin the ISR for the next interrupt, without restoring
its previous state.

The term “vector” in nested vector interrupt control refers to the way in which the CPU finds the
program, or ISR, to be executed when an interrupt occurs. Nested vector interrupt control uses a
vector table that contains the addresses of the ISRs for each interrupt. When an interrupt is
triggered, the processor gets the address from the vector table.

The prioritization and handling schemes of nested vector interrupt control reduce the latency and
overhead that interrupts typically introduce and ensure low power consumption, even with high
interrupt loading on the controller.

Interrupt Latency

The term interrupt latency refers to the number of clock cycles required for a processor to respond
to an interrupt request, this is typically a measure based on the number of clock cycles between the
assertion of the interrupt request up to the cycle where the first instruction of the interrupt handler
expected (figure).
Figure : Definition of interrupt latency

In many cases, when the clock frequency of the system is known, the interrupt latency can also be
expressed in terms of time delay, for example, in µsec.

In many processors, the exact interrupt latency depends on what the processor is executing at the
time the interrupt occurs. For example, in many processor architectures, the processor starts to
respond to a interrupt request only when the current executing instruction completes, which can
add a number of extra clock cycles. As a result, the interrupt latency value can contain a best case
and a worst case value. This variation can result in jitters of interrupt responses, which could be
problematic in certain applications like audio processing (with the introduction of signal distortions)
and motor control (which can result in harmonics or vibrations).

Ideally, a processor should have the following characteristics:

 The interrupt latency should be low

 The interrupt response is deterministic and low jitter

 The interrupt handler take as short a time to execute as possible

 Can be configured to enter sleep mode on the last instruction of the interrupt service
routine if no other interrupt needs service (for interrupt driven applications)

The interrupt latency itself is not the full story. A microcontroller marketing leaflet highlighting an
extremely low interrupt latency doesn’t necessarily mean that the microcontroller can satisfy the
real-time requirements of a product. A real embedded system might have many interrupt sources
and normally each interrupt source has an associated priority level. Many processor architectures
support the nesting of interrupts, which means during the execution of a low priority interrupt
service routine (ISR), a high priority service can pre-empt and the low priority ISR is suspended, and
resume when the high priority ISR completed (figure).
Figure : Nested Interrupt support

Many embedded systems require nested interrupt handling, and when a high priority level is
running, services to low priority in
interrupt
terrupt requests would be delayed. Thus the interrupt latency is
normally a lot worse for low priority interrupts, as would be expected.

The nested interrupt handling requirement means that the interrupt controller in the system needs
to be flexible in interrupt
errupt management, and ideally provide all the essential interrupt prioritization
and masking capability. In some cases this could be handled in software, but this can increase the
software overhead of the interrupt processing (and code size) and increase the effective latency of
serving interrupts.

Interrupt sequence of NVIC

This describes how to use interrupts and exceptions and access functions for the Nested Vector
Interrupt Controller (NVIC).

Arm provides a template file startup_


startup_device for each supported
ted compiler. The file must be adapted
by the silicon vendor to include interrupt vectors for all device-specific
device specific interrupt handlers. Each
interrupt handler is defined as a weak function to an dummy handler. These interrupt handlers can
be used directly in application software without being adapted by the programmer.

The table below lists the core exception vectors of the various Cortex
Cortex-M processors.

IRQ Armv Armv8


n M M0 M M M SC0 SC3 8-M -M Descriptio
Exception Vector
Val 0 + 3 4 7 00 00 Baseli Mainli n
ue ne ne

Non
NonMaskableInt_IRQ
-14 Maskable
n
Interrupt
Hard Fault
HardFault_IRQn -13
Interrupt

Memory
MemoryManagemen Managem
-12
t_IRQn ent
Interrupt

Bus Fault
BusFault_IRQn -11
Interrupt

Usage
UsageFault_IRQn -10 Fault
Interrupt

Secure
SecureFault_IRQn -9 Fault
Interrupt

SV Call
SVCall_IRQn -5
Interrupt

Debug
DebugMonitor_IRQn -4 Monitor
Interrupt

Pend SV
PendSV_IRQn -2
Interrupt

System
SysTick_IRQn -1 Tick
Interrupt

System timer, SysTick

The processor has a 24-bit


bit system timer, SysTick, that counts down from the reload value to zero,
reloads, that is wraps to, the value in the SYST_RVR
SYST_RVR register on the next clock edge, then counts
down on subsequent clocks.

Note

When the processor is halted for debugging the counter does not decrement.

The system timer registers are:


Table : System timer registers summary

Required Reset
Address Name Type Description
privilege value

SysTick Control and Status


0xE000E010 SYST_CSR RW Privileged
Register

0xE000E014 SYST_RVR RW Privileged UNKNOWN SysTick Reload Value Register

0xE000E018 SYST_CVR RW Privileged UNKNOWN SysTick Current Value Register

SysTick Calibration Value


0xE000E01C SYST_CALIB RO Privileged
Register

Systick Module

The System Tick Timer is an integral part of the Cortex-M3. The System Tick Timer is intended to
generate a fixed 10 millisecond(user configurable) interrupt for use by an operating system or other
system management software. The System Tick Timer is a 24-bit timer that counts down to zero and
generates an interrupt. The intent is to provide a fixed time interval between interrupts.

In order to generate recurring interrupts at a specific interval, the STRELOAD register must be
initialized with the correct value for the desired interval.

Registers

Register Address Description

STCTRL 0xE000 E010 System Timer Control and status register

STRELOAD 0xE000 E014 System Timer Reload value register

STCURR 0xE000 E018 System Timer Current value register

STCALIB 0xE000 E01C System Timer Calibration value register

Bit 0 – ENABLE This bit is used to enable/disable the systick counter.

 0-Disables the systick timer.


 1-Enables the systick timer.

Bit 1 – TICKINT This bit is used to enable/disable the systick timer interrupt. When enabled the
SysTick_Handler ISR will be called.

 0-Disables the systick timer Interrupt.


 1-Enables the systick timer Interrupt.

Bit 2 – CLKSOURCE This bit is used to select clock source for System Tick timer.

 0-The external clock pin (STCLK) is selected.


 1-CPU clock is selected.

Bit 16- COUNTFLAG System Tick counter flag. This flag is set when the System Tick counter counts
down to 0, and is cleared by reading this register.

Bit 23:0 – RELOAD This is the 24-bit timer value that is loaded into the System Tick counter when it
counts down to 0.

Bit 23:0 – CURRENT Reading this register returns the current value of the System Tick counter.
Writing any value clears the System Tick counter and the COUNTFLAG bit in STCTRL.

Timer Calculation

SysTick timer calculation for 1ms tick with 100 Mhz CPU clock (cclk).

cclk = 100 Mhz

ticks = 1000/sec

$$RELOAD = (cclk / ticks) - 1 = (100 Mhz / 1000) - 1 = 100,000 - 1 = 99,999$$

Steps To configure Systick

1. Set the Reload value for required tick in STRELOAD.

2. Enable the Systick Module by setting Enable bit in STCTRL.

3. Select the CPU Clock Source by setting CLKSOURCE bit in STCTRL.

4. Finally enable the SysTick interrupt by setting TICKINT bit in STCTRL.


UNIT – 3

LPC17xx Microcontroller

The LPC17xx is an ARM Cortex-M3 based microcontroller for embedded applications requiring a high
level of integration and low power dissipation. The ARM Cortex-M3 is a next generation core that
offers system enhancements such as modernized debug features and a higher level of support block
integration. High speed versions (LPC1769 and LPC1759) operate at up to a 120 MHz CPU frequency.
Other versions operate at up to an 100 MHz CPU frequency. The ARM Cortex-M3 CPU incorporates a
3-stage pipeline and uses a Harvard architecture with separate local instruction and data buses as
well as a third bus for peripherals. The ARM Cortex-M3 CPU also includes an internal prefetch unit
that supports speculative branches. The peripheral complement of the LPC17xx includes up to 512
kB of flash memory, up to 64 kB of data memory, Ethernet MAC, a USB interface that can be
configured as either Host, Device, or OTG, 8 channel general purpose DMA controller, 4 UARTs, 2
CAN channels, 2 SSP controllers, SPI interface, 3 I2C interfaces, 2-input plus 2-output I2S interface, 8
channel 12-bit ADC, 10-bit DAC, motor control PWM, Quadrature Encoder interface, 4 general
purpose timers, 6-output general purpose PWM, ultra-low power RTC with separate battery supply,
and up to 70 general purpose I/O pins.

Features

 ARM Cortex-M3 processor, running at frequencies of up to 120 MHz on high speed versions
(LPC1769 and LPC1759), up to 100 MHz on other versions. A Memory Protection Unit (MPU)
supporting eight regions is included.
 ARM Cortex-M3 built-in Nested Vectored Interrupt Controller (NVIC).
 Up to 512 kB on-chip flash program memory with In-System Programming (ISP) and In-
Application Programming (IAP) capabilities. The combination of an enhanced flash memory
accelerator and location of the flash memory on the CPU local code/data bus provides high
code performance from flash.
 Up to 64 kB on-chip SRAM includes: – Up to 32 kB of SRAM on the CPU with local code/data
bus for high-performance CPU access. – Up to two 16 kB SRAM blocks with separate access
paths for higher throughput. These SRAM blocks may be used for Ethernet, USB, and DMA
memory, as well as for general purpose instruction and data storage.
 Eight channel General Purpose DMA controller (GPDMA) on the AHB multilayer matrix that
can be used with the SSP, I2S, UART, the Analog-to-Digital and Digital-to-Analog converter
peripherals, timer match signals, GPIO, and for memory-to-memory transfers.
 Multilayer AHB matrix interconnect provides a separate bus for each AHB master. AHB
masters include the CPU, General Purpose DMA controller, Ethernet MAC, and the USB
interface. This interconnect provides communication with no arbitration delays unless two
masters attempt to access the same slave at the same time.
 Split APB bus allows for higher throughput with fewer stalls between the CPU and DMA. A
single level of write buffering allows the CPU to continue without waiting for completion of
APB writes if the APB was not already busy.

Serial interfaces:

– Ethernet MAC with RMII interface and dedicated DMA controller.


– USB 2.0 full-speed controller that can be configured for either device, Host, or OTG operation with
an on-chip PHY for device and Host functions and a dedicated DMA controller.

– Four UARTs with fractional baud rate generation, internal FIFO, IrDA, and DMA support. One UART
has modem control I/O and RS-485/EIA-485 support.

– Two-channel CAN controller.

– Two SSP controllers with FIFO and multi-protocol capabilities. The SSP interfaces can be used with
the GPDMA controller.

– SPI controller with synchronous, serial, full duplex communication and programmable data length.
SPI is included as a legacy peripheral and can be used instead of SSP0.

– Three enhanced I2C-bus interfaces, one with an open-drain output supporting the full I2C
specification and Fast mode plus with data rates of 1Mbit/s, two with standard port pins.
Enhancements include multiple address recognition and monitor mode.

– I 2S (Inter-IC Sound) interface for digital audio input or output, with fractional rate control. The I2S
interface can be used with the GPDMA. The I2S interface supports 3-wire data transmit and receive
or 4-wire combined transmit and receive connections, as well as master clock output.

Other peripherals:

– 70 (100 pin package) or 52 (80-pin package) General Purpose I/O (GPIO) pins with configurable
pull-up/down resistors, open drain mode, and repeater mode. All GPIOs are located on an AHB bus
for fast access, and support Cortex-M3 bit-banding. GPIOs can be accessed by the General Purpose
DMA Controller. Any pin of ports 0 and 2 can be used to generate an interrupt.

– 12-bit Analog-to-Digital Converter (ADC) with input multiplexing among eight pins, conversion
rates up to 200 kHz, and multiple result registers. The 12-bit ADC can be used with the GPDMA
controller.

– 10-bit Digital-to-Analog Converter (DAC) with dedicated conversion timer and DMA support.

– Four general purpose timers/counters, with a total of eight capture inputs and ten compare
outputs. Each timer block has an external count input. Specific timer events can be selected to
generate DMA requests.

– One motor control PWM with support for three-phase motor control.

– Quadrature encoder interface that can monitor one external quadrature encoder.

– One standard PWM/timer block with external count input.

– Real-Time Clock (RTC) with a separate power domain. The RTC is clocked by a dedicated RTC
oscillator. The RTC block includes 20 bytes of battery-powered backup registers, allowing system
status to be stored when the rest of the chip is powered off. Battery power can be supplied from a
standard 3 V Lithium button cell. The RTC will continue working when the battery voltage drops to as
low as 2.1 V. An RTC interrupt can wake up the CPU from any reduced power mode.
– Watchdog Timer (WDT). The WDT can be clocked from the internal RC oscillator, the RTC oscillator,
or the APB clock.

– Cortex-M3 system tick timer, including an external clock input option.

– Repetitive interrupt timer provides programmable and repeating timed interrupts.

Applications

 E-Metering
 Lighting
 Industrial networking
 Alarm systems
 White goods
 Motor control

LPC1768 Memory Map

As it is 32-bit architecture it can access 2^32 locations(4GB). This 4Gb of addressable locations are
divided into ROM,RAM,GPIO,AHB Peripherals as shown in the below image. In LPC1768 the GPIO
registers are mapped to memory location 0x2009 C000 - 0x2009 FFFF.
GPIO of LPC17xx

ARM is 32-bit architecture and provides 32 bit GPIO ports. In this tutorial, we are going to cover
about GPIO pins, how to use them, how to configure GPIO registers and an example how
microcontroller can interact with outside world with GPIO pins. For this tutorial we are taking
LPC1769 as reference and with the use of CMSIS library.

In order to get started with GPIO ports, we need to look into the five ‘registers’ that controls the
port pins: FIODIR, FIOMASK, FIOPIN, FIOSET and FIOCLR. Each of these registers is explained in detail
below with some basic examples of how they work

GPIO port Direction register FIODIR (FIO0DIR to FIO4DIR)

This word accessible register is used to control the direction of the pins when they are configured as
GPIO port pins. Direction bit for any pin must be set according to the pin functionality. For example,
if we want to use our GPIO pin to send signals ‘out’ from the microcontroller to some external
device, we need to set the pin as output (‘1’).

Consider the below example to understand more about this register. Suppose we want to set 0th pin
of port0 as input and 0th pin of port1 as output, the code will be as follows.

LPC_GPIO0->FIODIR=0x0;
LPC_GPIO1->FIODIR=0x1;

The first line shows how to set the 0th pin of port0 as input to receive information from the outside
world. In the second line, we set the 0th pin of port1 as output to send information to the outside
world. We can configure more than one pins as input or output by just setting the register values.

LPC_GPIO0->FIODIR=0x0;
LPC_GPIO1->FIODIR=0xFF;

GPIO port output Set register FIOSET (FIO0SET to FIO4SET) This register is used to produce a HIGH
level output at the port pins configured as GPIO in an OUTPUT mode. Writing 1 produces a HIGH
level at the corresponding port pins. Writing 0 has no effect. If any pin is configured as an input or a
secondary function, writing 1 to the corresponding bit in the FIOxSET has no effect.

GPIO port output Clear register FIOCLR (FIO0CLR to FIO4CLR) This register is used to produce a LOW
level output at port pins configured as GPIO in an OUTPUT mode. Writing 1 produces a LOW level at
the corresponding port pin and clears the corresponding bit in the FIOxSET register. Writing 0 has no
effect. If any pin is configured as an input or a secondary function, writing to FIOxCLR has no effect.

LPC_GPIO2->FIOCLR = 0x000000FF;

This line of code clear the port2 (configured as output port using FIODIR) lowest 8 bits. If your GPIO
pin is set as Output (set using the FIODIR register), you can use FIOSET to set your GPIO pin to high
or FIOCLR to set it to low.

GPIO port Pin value register FIOPIN (FIO0PIN to FIO4PIN) You can use the FIOPIN register to read
the current logic state of every GPIO pin in the pin block regardless whether the pin is configured as
input or output. FIOPIN returns the current state of ALL 32 pins in the pin block, you have to do a
little bit of extra work to determine the value of one specific pin. That we can know by using this line
of the code.

Timer Interrupt in LPC1768 Microcontroller

Timer is internal peripheral in LPC1768. They use CPU clock to keep track of time and count. Timer
enhances use of microcontroller in number of ways. Timers send periodic events and make precise
measurements. It makes time available for your microcontroller project. This means you can start
using temporal information in your program, without having to use unwieldy spin loops.

Timers in LPC1768 Basics

In LPC1768 Microcontroller there are four timers Timer 0, Timer1, Timer2 and Timer3. These are 32-
bit timer/counter with programmable 32-bit prescaler. All are identical but can have options to set
them independently. And can be used without interfering with each other.

Timer Counter Block Diagram

All timer’s are built around an internal Timer Counter (TC), which is a 32-bit register and
incremented periodically. The rate of change is derived from the current speed of the CPU clock
you’ve connected up and what the Prescale Counter is set to. There is nothing more to it than that.
The Prescale Counter is clock divider. As mentioned earlier, Timer Counter is a 32-bit register and it
counts in range from 0x00000000 to 0xFFFFFFFF.

Before we get into programming fundamentals, it’s important to understand powering procedure
for LPC1768 microcontroller. Powering device is necessary before choosing the peripheral clock and
setting the prescale register. We need to turn on timer. Upon RESET most of the LPC’s peripherals
are OFF and aren’t being supplied power by microcontroller.

This can save a lot of power, but few core peripherals are turned on when the LPC starts, among
these GPIO, TIMER0 and TIMER1. This means that TIMER2 and TIMER3 start off and if you need
them in your application then you’ll have to turn them on. Additionally, if you don’t need TIMER0 or
TIMER1, you can even turn them OFF to save some power. Usually, power control is considered a
system feature and is controlled by register LPC_SC–>PCONP.
ADC Block

Features

 The ADC module in the LPC1768 uses the technique of successive approximation to convert
signals from analog to digital values. The internal SAR (Successive Approximation Register) is
designed to take the input from a voltage comparator of the internal module to give a 12-bit
output resulting in a high precision result.

 The 12-bit conversion rate is clocked at 200 kHz.

 This speed is achieved with 8 multiplexed channels.

 The measurement range is set between Vrefn and Vrefp which is usually 3v and should not
exceed the Vdd voltage level.

 The module supports burst conversion mode as well.

 The clock required for analog to digital converter is provided by the APB clock and is scaled
to the clock required for the successive approximation process using a programmable
divider that is included in each converter.

Configuration

 The power to the ADC module must be given initially. The reset value for PCADC or the
power control for ADC bit in the PCONP register is 0. Therefore, it is mandatory that this step
must not be skipped in the software.

 The clock to the ADC module must be set.

 The corresponding pins in the PINSEL register must be selected to function as ADC pins.

 Control the ADC operation using the ADC control register, ADCR. The basic operational flow
is to start the conversion process, read the result and stop after conversion and reading is
completed.

LPC1768 has an inbuilt 12 bit Successive Approximation ADC which is multiplexed among 8 input
pins. The ADC reference voltage is measured across VREFN to VREFP, meaning it can do the
conversion within this range. Usually the VREFP is connected to VDD and VREFN is connected to
GND. As LPC1768 works on 3.3 volts, this will be the ADC reference voltage.

Now the $$resolution of ADC = 3.3/(2^{12}) = 3.3/4096 =0.000805 = 0.8mV$$

The below block diagram shows the ADC input pins multiplexed with other GPIO pins.
The ADC pin can be enabled by configuring the corresponding PINSEL register to select ADC function.
When the ADC function is selected for that pin in the Pin Select register, other Digital signals are
disconnected from the ADC input pins.

Adc Channel Port Pin Pin Functions Associated PINSEL Register


AD0 P0.23 0-GPIO, 1-AD0[0], 2-I2SRX_CLK, 3-CAP3[0] 14,15 bits of PINSEL1

AD1 P0.24 0-GPIO, 1-AD0[1], 2-I2SRX_WS, 3-CAP3[1] 16,17 bits of PINSEL1

AD2 P0.25 0-GPIO, 1-AD0[2], 2-I2SRX_SDA, 3-TXD3 18,19 bits of PINSEL1

AD3 P0.26 0-GPIO, 1-AD0[3], 2-AOUT, 3-RXD3 20,21 bits of PINSEL1

AD4 P1.30 0-GPIO, 1-VBUS, 2- , 3-AD0[4] 28,29 bits of PINSEL3

AD5 P1.31 0-GPIO, 1-SCK1, 2- , 3-AD0[5] 30,31 bits of PINSEL3

AD6 P0.3 0-GPIO, 1-RXD0, 2-AD0[6], 3- 6,7 bits of PINSEL0

AD7 P0.2 0-GPIO, 1-TXD0, 2-AD0[7], 3- 4,5 bits of PINSEL0

ADC Registers

The below table shows the registers associated with LPC1768 ADC. We are going to focus only on
ADCR and ADGDR as these are sufficient for simple A/D conversion. However once you are familiar
with LPC1768 ADC, you can explore the other features and the associated registers.

Register Description

ADCR A/D COntrol Register: Used for Configuring the ADC

ADGDR A/D Global Data Register: This register contains the ADC’s DONE bit and the result of
the most recent A/D conversion

ADINTEN A/D Interrupt Enable Register

ADDR0 - A/D Channel Data Register: Contains the recent ADC value for respective channel
ADDR7

ADSTAT A/D Status Register: Contains DONE & OVERRUN flag for all the ADC channels

Some other registers

Though there are some more registers, we are restricting ourselves to use these registers only as this
will be more convenient. Apart from ADC Global Data register there are more 8 ADC Data registers
(one Data register per ADC channel). DONE and OVERRUN bits for each channel can be monitored
separately from the bits present in ADC Status register.

One can use the A/D Global Data Register to read all data from the ADC else use the A/D Channel
Data Registers. It is important to use one method consistently because the DONE and OVERRUN
flags can otherwise get out of synch between the AD0GDR and the A/D Channel Data Registers,
potentially causing erroneous interrupts or DMA activity.

Introduction: UART in LPC1768

The LPC 1768 micro-controller consists of 4 UART peripherals. (UART0, UART1, UART2, and UART3).
Few of the striking features of these peripherals are:

 Like any other UART peripheral, they can handle data sizes of 5 to 8 bits.

 They support 16 bytes receive and transmit FIFOs. Which means that they can store 16-bytes
of data in a first in first out fashion without overwriting existing data in the FIFO buffer
before it gets filled.

 It has a built-in baud rate generator.

 It supports built-in DMA (Direct Memory Access) for transmission and reception which is
ideal when data of byte size has to be transmitted and the controller has to be relieved from
basic data communication to perform other tasks.

 It has multi-processor addressing modes and has an IrDA mode to support infrared
communication as well.
POWER: This is a register that is used to switch on or off the different peripherals in the LPC 1768
module to increase the power efficiency. A particular peripheral block will be turned on or off by
gating on or gating of the clock source to that particular block. The bit in the PCONP register that
controls the clock source to the UART0 peripheral, is PCUART0 or bit 3 in the PCONP register. The
reset value of this bit in the power control register is 1. Therefore, it is already powered on reset and
even if this step is skipped, the program will work.

CLOCK FREQUENCY: The peripheral clock frequency is set in the peripheral clock
clock selection register
(PCLKSEL0). The pair of bits that control the UART0 clock frequency is bits <7:6> named as
PCLK_UART0. On reset, the value of PCLK_UART0 is 00.

BAUD RATE: There are a few bits that must be set in the U0LCR or the Line Control Register of
UART0.

 The baud rate calculation is done on the basis of values entere


entered
d in the DLL and DLM
registers. The formula for baud rate is:

UART0baudrate = (Clockk Frequency/(16*(256 * (U0DLM + U0DLL))*(1+(DivAddVal/MulAddVal))))


L))*(1+(DivAddVal/MulAddVal))))

But first, we must gain access to these registers by enabling the DLAB bit in the U0LCR
register which is bit 7.

 The 8-bit
bit character length must also be selected through the U0LCR register by entering
values <1,1> for bits 0 and 1.
UART FIFO: FIFO is enabled by enabling the FIFO bit in the U0FCR register. Next, both the R0 FIFO
and T0 FIFO must be set to 1 to clear the bytes in the respective buffers.

UART0 PINS: The PINSEL0 register controls the functions in the lower half of PORT0. The bits that
must be set for UART0 function are as follows.PINSEL0 BITS: <5:4> FUNCTION WHEN: <0:1> is TXD0
PINSEL0 BITS: <7:6> FUNCTION WHEN: <0:1> is RXD0

UART Interrupt: We will not be using the interrupt procedure in our example. But if you want to
access the UART interrupt enable register or U0IER, the DLAB bits set for accessing the divisor latch
register to enter the baud rate, must be set to 0. In the interrupt register, the lower three bits are
set to enable the receive data interrupt (bit 0), THRE interrupt (bit 1) and RX line status interrupt (bit
2).

LPC1768 SPI Block

The below block diagram shows the SPI input pins multiplexed with other GPIO pins.
The SPI pin can be enabled by configuring the corresponding PINSEL register to select SPI function.
When the SPI function is selected for that pin in the Pin Select register, other Digital signals are
disconnected from the SPI input pins.

Port Pin PINSEL_FUNC_0 PINSEL_FUNC_1 PINSEL_FUNC_2 PINSEL_FUNC_3

P0.15 GPIO TXD1 SCK0 SCK

P0.16 GPIO RXD1 SSEL0 SSEL

P0.17 GPIO CTS1 MISO0 MISO

P0.18 GPIO DCD1 MOSI0 MOSI

SPI Registers

Register Description

SPCR SPI Control Register : used to configure SPI

SPSR SPI status Register :

SPDR SPI Data Register : contains received data or data to be transmitted

SPCCR SPI Clock Counter Register : used to control master SCK frequency
Steps for using SPI

 Initialize SPI
 Send Data
 Receive Data

Pulse Width Modulator operation in LPC1768

The LPC 1768 micro-controller


controller has a motor control PWM and
an 6-output general-purpose
purpose PWM.

Features

 The PWM module operates using a timer or a counter.

 There are seven-match


match registers that allow 6-single
6 single edge controlled or 3 double-edged
double
controlled PWM outputs.A match can result in any of the following with optional interrupt
generation:

o Continuous operation.

o Stop timer.

o Reset timer.
 The accuracy
uracy of the pulses is generated with minimum error as the pulse outputs are
synchronized with the match registers. This is achieved in the software where it is required
for the programmer to release new match values for the output pins to generate the pulses
pul

Description

In reality, the PWM module is a timer module that has been given PWM functions. If the PWM mode
is not enabled, the module can be used as a standard timer. It can be used as a 32 bit timer/counter
32-bit
with a 32-bit prescaler.

Basic Configuration

 Like every other module configuration, the PWM module has to be powered to start any
PWM operation. In the PCONP register, bit 6 or PCPWM1 should be 1 for the module to be
powered. But on reset, this pin is already set to 1. So it is alright to skip this step.

 Now the peripheral clock must be selected through the PCLKSEL0 register and then select
the PCLK_PWM1 bits in it, which are bits <13,12>. The reset value is 00. The following is the
function of the peripheral clock based on values on the PCLK_PWM1 bits.
b

 The selection of the PWM pins comes next. PINSEL4 register controls the functions of the
lower half of PORT 2. The bits 0 to 11 are responsible for the functioning of the 6 PWM
outputs. When configured as <0,1>, in each pair of the bits, the pins of the PINSEL4 register
are selected for PWM output.
 Finally, the match register and counter registers in the peripheral have to be selected
appropriately for specific functions.

POWER: The first step as we mentioned, is to power up the module. But since the reset value of the
PCPWM1 pin in the PCONP register is 1, this line of code is pretty much insignificant.

PIN SELECT: Now we must select the appropriate pins that we will be using to get the PWM output
by setting the appropriate bits as shown in TABLE 2.

MATCH CONTROL REGISTER: This step must be performed prior to enabling the PWM enable pin in
the PWM1TCR (timer control register). Otherwise, a match event will not occur to cause shadow
register contents to become effective. The idea here is simple.

 We will be setting the PWMMR0R bit in the PWM1MCR register as 1 to enable the reset
mode

 The PWMMR0 register will hold the value of 1 complete cycle or the full cycle.

LATCH ENABLE REGISTER: The use of this register makes the PWM output more accurate. When the
software writes the new value for the match register, the value is not used straight away. When the
event of a match occurs, the contents of the shadow register will be transferred to the shadow
register only if the corresponding bit in the Latch Enable Register (LER) has been set to 1. So, until a
match event occurs and the corresponding LER bit is set, no effect in the PWM operation will take
place.

COUNTER & PRESCALER: Now we must enter the counter and prescaler values through the PWM1
timer control register TCR and PWM1 prescaler register PR.

ENABLE THE PWM OUTPUT PINS: The corresponding bits have to be set as 1 to enable the PWM
output.

DUTY CYCLE: The final step is to enter the duty cycle of the PWM operation. This is set in the PWM
match register MRx where x is 1 to 6. After entering the value to the necessary match register the
corresponding latch enable must be set for the PWM operation to take effect.

RTC Registers

The RTC includes a number of registers. We will concentrate some of the basic required registers.

Register Description

CCR Clock Control Register

Consolidated Time Registers

CTIME 0 Consolidated Time Register 0


CTIME 1 Consolidated Time Register 1

CTIME 2 Consolidated Time Register 2

Time Counter Registers

SEC Seconds Counter

MIN Minutes Register

HOUR Hours Register

DOM Day of Month Register

DOW Day of Week Register

DOY Day of Year Register

MONTH Months Register

YEAR Years Register

CALIBRATION Calibration value Register

RTC Register Configuration

CCR ( Clock Control Register ) The Clock Control Register register controls the operation of the clock
divide circuit.

Bit 0 – CLKEN : Clock Enable This bit is written with one to enable time counters and written with
zero to disable time counters.

Bit 1 - CTCRST : CTC Reset When this bit is set to one, the internal oscillator divider are reset.

Bit 3:2 – Reserved These bits must be set to zero for normal RTC operation.

Bit 4 - CCALEN : Calibration counter enable To disable calibration counter and reset it to zero, write
one to this bit. When written zero, the counter is enabled and starts counting

Steps for Using RTC

Initialize RTC
 Disable RTC clock using CCR ( Clock Control Register )

 Reset clock using CCR register

 Enable RTC calibration in RTC Calibration Register

 Enable the clock for RTC using CCR register

Set Date and Time

As we saw previously, there are separate Time Counter registers for each time parameter hour, min,
sec and same is the case date. We just need to copy the required values to these registers.

Read Date and Time

The values of date and time can be read from associated Time Counter registers. Alternately, date
and time can also be read from Consolidated Time registers. We will use Time Counter registers for
both reading and writing.

Watchdog Timer Dialog

The Watchdog Timer Dialog shows the current state of the on-chip Watchdog Timer. You can change
the Watchdog Timer settings using the controls in this dialog.

Mode Register

 WDMOD (Watchdog Mode Register) contains the Watchdog


Timer mode (Debug or Operate) and status bits of the
Watchdog Timer.

 WDEN (Watchdog Timer Enable) is set to enable the


Watchdog Timer interrupt.

 WDRESET (Watchdog Timer Reset) is set to reset the


Watchdog Timer.

 WDTOF (Watchdog Timer Time-Out Flag) is set when the Watchdog Timer times out.

 WDINT (Watchdog Timer Interrupt Flag) is set when a Watchdog Timer interrupt occurs.

Time Constant and Value

 WDTC (Watchdog Timer Constant Register) contains the time-out value used to reload the
Watchdog Timer.

 WDTV (Watchdog Timer Value Register) contains the current value of the Watchdog Timer.

Feed Register

 WDFEED (Watchdog Timer Feed Sequence Register) is used to reload the Watchdog Timer.
Writing 0xAA followed by 0x55 to this register reloads the Watchdog Timer to its preset
value.
UNIT – 4

Programmable DSP

 A digital signal processor is a specialized microprocessor for the purpose of real-time DSP
computing.
 DSP applications commonly share the following characteristics:
 Algorithms are mathematically intensive; common algorithms require many multiply and
accumulates.
 Algorithms must run in real-time; processing of a data block must occur before next block
arrives.
 Algorithms are under constant development; DSP systems should be flexible to support
changes and improvements in the state-of-the-art.
 Programmable DSPs should provide for instructions that one would find in most general
microprocessors.
 The basic instruction capabilities (provided with dedicated high-speed hardware) should
include:
 arithmetic operations: add, subtract and multiply
 logic operations: AND, OR, XOR, and NOT I multiply and accumulate (MAC) operation I signal
scaling operations before and/or after digital signal processing

Support architecture should include:

 RAM; i.e., on-chip memories for signal samples


 ROM; on-chip program memory for programs and algorithm parameters such as filter
coefficients I on-chip registers for storage of intermediate results

DSP Computational Building Blocks

 Multiplier
 Shifter
 Multiply and accumulate (MAC) unit
 Arithmetic logic unit

Multiplier

The following specifications are important when designing a multiplier:

 speed −→ decided by architecture which trades off with circuit complexity and power
dissipation
 accuracy −→ decided by format representa ons (number of bits and fixed/floating pt)
 dynamic range −→ decided by format representa ons

Shifter

It is required to scale down or scale up operands and results to avoid errors resulting from overflows
and underflows during computations.
MAC Unit

 Multiply
ultiply and accumulate (MAC) unit performs the accumulation of a series of suc
successively
generated products
 Common
ommon operation in DSP applications such as filtering

Arithmetic and Logic Unit

Arithmetic logicc unit (ALU) carries out additional arithmetic and logic operations required for a DSP:

 Add, subtract, increment, decrement, negate


 AND, OR, NOT, XOR, compare
 Shift, multiply (uncommonn to general microprocessors)

With
ith additional features common to general microprocessors:

 Status
tatus flags for sign, zero, carry and overflow I overflow management
management via saturation logic
 Register
egister files for storing intermediate results

Bus Architecture and Memory

 Bus architecture and memory play a significant role in dictating cost, speed and size of DSPs.
 Common architectures
ures include the von Neumann and Harvard architectures.

Harvard Architecture

DSP chips often have Harvard-type


type architecture (see Figure) or some modified version of Harvard
architecture. This type of system architecture implies that there are at least two system buses, one
for instruction transfers and one for data. Quite often, three system buses can be found on DSPs,
one for instructions, one for data (including I/O) aand
nd one for transferring coefficients from a
separate memory area or chip.

Figure: Harvard architecture


Some examples of Harvard architectures involve early computer systems where programming input
could be in one media, for example, punch cards, and stored data could be in another media, for
example, on tape. More modern computers may have modern CPU processes for both systems, but
separate them in a hardware design. The Harvard architecture, with its strict separation of code and
data processes, can be contrasted with a modified Harvard architecture, which may combine some
features of code and data systems while preserving separation in others. One example is the use of
two caches, with one common address space. It can also be contrasted with Von Neumann
architecture, named for John von Neumann, which does not focus on separating input from data.

It has separate memory for instructions and data and had separate buses for instruction fetches and
data transfer. The Memory cell sizes for instructions and data are different. A more complex Control
unit is required to handle two buses. Both instruction fetches and data transfer can take place
simultaneously. Programs can’t write themselves and memory organisation is not in the hands of the
programmer.

The key features are :

1. The two different memories can have different characteristics: for example, in embedded
systems, instructions may be held in read-only memory while data may require read-write memory.

2. In some systems, there is much more instruction memory than data memory, so a larger word size
is used for instructions.

3. The instruction address bus may be wider than the data bus. Embedded systems include special-
purpose devices built into devices often operating in real-time, such as those used in navigation
systems, traffic lights, aircraft control systems and simulators. Harvard architecture can be faster
than Von Neumann architecture because data and instructions can be fetched in parallel instead of
competing on the same bus.

Advantages

 There is less chance of corruption since data and instructions are transferred via different
buses.

 Data and instructions are accessed in the same way.

Disadvantages

 When there is free data memory it cannot be used for instructions and vice versa. Memory
dedicated to each must be carefully balanced in manufacture.

 Production of a computer with two buses is more expensive and takes more time to
manufacture.

Multi-port memories

Another type of memory that can be used is called multi-port memory. It has multiple independent
sets of address and data lines, allowing multiple independent memory accesses in parallel. So in this
case we do not need to have separate banks of program and data memory since they can be
accessed simultaneously from the same bank. Figure shows a Harvard architecture combined with
dual-port data memory and single-port program memory.

This architecture is used in the Motorola DSP561xx processors. The disadvantage of multi-port
memory is that it takes up more silicon area to implement.

The Multiplier-accumulator (MAC) unit supports large number of digital signal processing (DSP)
applications. It also furnishes signal processing ability to the microcontroller for various applications
such as servo/audio control etc. MAC, being an execution unit in the processor implements a 3-stage
pipelined arithmetic architecture which optimizes 16×16 multipliers.

This design supports both 16-bit and 32-bit operands. It also supports signed/unsigned integers plus
signed, fixed-point fractional input operands. The MAC unit supports mainly three functions:

 Signed and unsigned integer multiplies

 Multiply-accumulate operations supporting signed, unsigned, and signed fractional operands

 Miscellaneous register operations


An accumulator adder and multiplier together form a MAC unit. Usually CarryCarry-select/Carry
select/Carry-save
adders are mostly implemented due to the DSP application requirement of fast speed. The memory
fetches the inputs from its location to mult
multiplier for further multiply-accumulate
accumulate operations. The
generated result of MAC unit is stored at a relevant memory location. The situation demands that
this complete process should be carried out into a single clock cycle
cycle.

Barrel Shifter

A barrel shifter is a digital circuit that can shift a data word by a specified number of bits without the
use of any sequential logic, only pure combinational logic. One way to implement it is as a sequence
of multiplexers where the output of one multiplexer is connected to tthe he input of the next
multiplexer in a way that depends on the shift distance. A barrel shifter is often used to shift and
rotate n-bits
bits in modern microprocessors, typically within a single clock cycle.
For example, take a four-bit barrel shifter, with inputs A, B, C and D. The shifter can cycle the order
of the bits ABCD as DABC, CDAB, or BCDA; in this case, no bits are lost. That is, it can shift all of the
outputs up to three positions to the right (and thus make any cyclic combination of A, B, C and D).
The barrel shifter has a variety of applications, including being a useful component
in microprocessors (alongside the ALU).

Introduction to TI DSP Processor Family

Texas Instruments TMS320 is a blanket name for a series of digital signal processors (DSPs)
from Texas Instruments. It was introduced on April 8, 1983 through the TMS32010 processor, which
was then the fastest DSP on the market.

The processor is available in many different variants, some with fixed-point arithmetic and some
with floating point arithmetic. The TMS320 processors were fabricated on MOS integrated circuit
chips, including both NMOS and CMOS variants. The floating point DSP TMS320C3x, which
exploits delayed branch logic, has as many as three delay slots.

The flexibility of this line of processors has led to it being used not merely as a co-processor
for digital signal processing but also as a main CPU. Newer implementations support standard
IEEE JTAG control for boundary scan and/or in-circuit debugging.

The original TMS32010 and its subsequent variants is an example of a CPU with a modified Harvard
architecture, which features separate address spaces for instruction and data memory but the ability
to read data values from instruction memory. The TMS32010 featured a fast multiply-and-
accumulate operation useful in both DSP applications as well as transformations used in computer
graphics. The graphics controller card for the Apollo Computer DN570 Workstation, released in
1985, was based on the TMS32010 and could transform 20,000 2D vectors every second.

DSP Product Generations

TMS320C2xx generation: The TMS320C2xx was introduced in 1995. Manufactured with triple-level
metal and full complementary CMOS static logic, the ’C2xx provides 20-40 MIPS performance. The
’C2xx, also available as a core for TI’s customizable DSPs, is the low-cost, fixed-point DSP of the
future. The TMS320C24x generation high-speed central processing unit (CPU) allows the use of
advanced algorithms, yielding better performance and reducing system component count.

TMS320C3x generation: The TMS320C3x is an easy-to-use 32-bit floating-point DSP that achieves
33–60 million floating-point operations per second (MFLOPS) and 16.67–30 MIPS. The architecture
of the ’C3x is specifically designed to be an efficient compiler platform. The highly optimized C
compiler, the parallel instruction set, and the ’C3x general-purpose features ensure shorter time to
market.

TMS320C4x generation: The TMS320C4x is a high-performance parallel processor with up to 488


Mbytes/s of data throughput, 40-80 MFLOPS, and 20-40 MIPS. It accepts source code from the ’C3x.
Parallel processing development tools are available for the ’C4x.

TMS320C5x generation: The TMS320C5x is a high-performance fixed-point DSP that achieves 20-50
MIPS and accepts source code from the ’C1x, ’C2x, and ’C2xx generations. The architecture of the
’C5x generation includes flexible power-management features. The ’C5x is available in low-voltage
versions.

TMS320C54x generation: The TMS320C54x provides the cost-effective combination of high


performance and low power. The ’C54x executes up to 66 MIPS and can operate at 3.0 V, 3.3 V, or 5
V. The specialized architecture is optimized to meet the needs of a variety of existing and emerging
worldwide telecommunication and wireless applications.

TMS320C6x generation: The TMS320C6x generation offers cost-effective solutions to high-


performance DSP programming challenges. The ’C6x devices are the first to feature VelociTI, which
allows performance of up to 1600 million instructions per second (MIPS).

TMS320C8x generation: The TMS320C8x integrates multiple (up to four) 32-bit advanced DSPs, a 32-
bit RISC master processor with a 100-MFLOPS floating-point unit, a transfer controller with up to
400-Mbytes/s off-chip transfer rate, and up to 50K bytes of on-chip RAM—on a single piece of
silicon. The ’C80 also includes two on-chip frame timers.

TMS320AVxxx generation: The TMS320AVxxx high-speed, application-specific digital compression


products meet the demand of compressed audio and video playback in applications such as video
conferencing, digital broadcasting, high-definition TV, graphic workstations, and others supported by
international compression standards.
UNIT - 5

Specifications of C6000 DSP core

 DSPs with total active core power at less than 0.15 mW/MHz at 1.05V and standby power at
less than 0.15 mW and performance up to 300 MHz (600 MIPS)

 Extremely low power consumption and more power modes extend battery life

 Higher integration with large on-chip memory and a variety of peripheral offerings on a
smaller form factor

 Software compatible with all C5000 DSPs

 Easy-to-use development tools and more value-added thirdparty solutions speed time to
market

TMS320C6000 Architecture

 The ’C62x is a fixed-point digital signal processor (DSP) and is the first DSP to use the VelociTI
architecture. VelociTI is a high-performance, advanced very-long-instruction-word (VLIW)
architecture, making it an excellent choice for multichannel, multifunction, and
performance-driven applications.
 The ’C67x is a floating-point DSP with the same features. It is the second DSP to use the
VelociTI architecture.
 The ’C64x is a fixed-point DSP with the same features. It is the third DSP to use the VelociTI
architecture.
The ’C6000 DSPs are based on the ’C6000 CPU, which consists of:

1. Program fetch unit


2. Instruction dispatch unit
3. Instruction decode unit
4. Two data paths, each with four functional units
5. Thirty-two 32-bit registers (’C62x and ’C67x)
6. Sixty-four 32-bit registers (’C64x)
7. Control registers
8. Control logic
9. Test, emulation, and interrupt logic

TMS320C6000 Pipeline

The ’C6000 pipeline has several features that provide optimum performance, low cost, and simple
programming.

 Increased pipelining eliminates traditional architectural bottlenecks in program fetch, data


access, and multiply operations.
 Pipeline control is simplified by eliminating pipeline locks.
 The pipeline can dispatch eight parallel instructions every cycle.
 Parallel instructions proceed simultaneously through the same pipe

VLIW stands for Very Long Instruction Word

This architecture was introduced by Texas Instruments. As the name itself denotes, we fetch a “long
instruction”. Meaning, in the C6000, eight instructions (this is definitely long!) are always fetched in
every clock cycle. This constitutes a fetch packet. A traditional VLIW architecture consists of multiple
execution units running in parallel, performing multiple instructions during a single clock cycle. Refer
Figure 6. Here there are three ALUs (i.e. execution units) that share the same program and data
memory. Each ALU has its own address and data bus which is independent of all other buses.
The demands placed upon a DSP microprocessor are somewhat different from those placed upon a
general purpose processor. These differences impact the choices that are made in the design of the
architectures for each problem. Some of the most significant differences are:

Processor Core

In a general purpose machine, the processor core is called upon to perform a variety of arithmetic
and logic operations on a variety of data types ranging from characters, to integers, to floating point
numbers. In a DSP machine, the processor core only has to perform multiply accumulate operations
efficiently to be efficient for most DSP applications.

Memory Addressing

In DSP applications most processing is performed on arrays of data. The types of processing to be
performed usually demand special addressing features such as circular or bit-reversed addressing of
arrays. Due to the dominance of these operations in DSP, justification for inclusion of these
resources at the hardware level is easily obtained, whereas in general purpose architectures these
features usually cannot be justified.

Data Locality

Data locality differs greatly between general purpose applications and DSP applications. Since most
DSP operations are upon arrays of data, there is usually little benefit to a large register as would be
found in a general purpose architecture. Like- wise, since DSP applications tend to operate on large
arrays of data, they violate the \cache assumption" that justifies the inclusion of data cache
memories in general purpose architectures. Instead, since access to arrays of data in DSP
applications is predictable, inclusion of on-chip data memories is more beneficial than data cache.

Programming

The issues in programming a DSP are significantly different from those in general purpose
computing. Since most DSP applications are single tasking, hard real-time in nature, instruction
execution timing must be predictable in order to guarantee task completion. Since most DSP
applications are embedded, the use of assembly language programming or additional programmer
export to optimize code can usually be justified if it resulting in more efficient use of hardware
resources which can be rejected in manufacturing savings.

Data paths

A data path is a set of functional units that carry out data processing operations. Data paths, along
with a control unit, make up the CPU (central processing unit) of a computer system. A larger data
path can also be created by joining more than one together using multiplexers. Currently, data paths
can only be configured once. Researchers are trying to find ways to imprint data paths on fabrics and
make them reconfigurable. This action would allow them to be configured at runtime, providing for
improved efficiency and power savings.

For example, if area is not an issue, but speed is the primary concern, then a large design could be
constructed to generate the output in potentially a single clock cycle. If the area is not an issue, but
throughput is required, then pipelining could be used to maximize the overall data rates, although
the individual latency may be high. Finally, if area is the critical factor, then single functional blocks
can be used and registers used to store intermediate values and the same function applied
repeatedly. Clearly this will be a lot slower, but potentially take a lot less space.

In the basic data path model there are blocks of combinational logic separated by registers. Clearly
there are options for optimizing the data flow by considering how best to move the data between
the registers for speed or area optimization.

It is important to ensure that some simple rules are followed to ensure robust synthesis. The first is
to make sure that each signal in the combinational block is defined for every cycle; in other words it
is important not to leave undefined branches in case or if statements. If this occurs, then a memory
latch is inferred and therefore a latch will be synthesized and as this is not linked to the global clock,
unpredictable behavior can result.

Fig : Data Path of TMS320C6x

Cross Path

Each functional unit reads directly from and writes directly to the register file within its own data
path. That is, the .L1, .S1, .D1, and .M1 units write to register file A and the .L2, .S2, .D2, and .M2
units write to register file B. The register files are connected to the opposite-side register file's
functional units via the 1X and 2X cross paths. These cross paths allow functional units from one data
path to access a 32-bit operand from the opposite side register file. The 1X cross path allows the
functional units of data path A to read their source from register file B, and the 2X cross path allows
the functional units of data path B to read their source from register file A.
On the C67x DSP, six of the eight functional units have access to the register file on the opposite
side, via a cross path. The .M1, .M2, .S1, and .S2 units' src2 units are selectable between the cross
path and the same side register file. In the case of the .L1 and .L2, both src1 and src2 inputs are also
selectable between the cross path and the same-side register file. Only two cross paths, 1X and 2X,
exist in the C6000 architecture. Thus, the limit is one source read from each data path's opposite
register file per cycle, or a total of two cross path source reads per cycle. In the C67x DSP, only one
functional unit per data path, per execute packet, can get an operand from the opposite register file.

Features of the C6000 devices include:

 Advanced VLIW CPU with eight functional units, including two multipliers and six arithmetic
units
 Executes up to eight instructions per cycle for up to ten times the performance of typical
DSPs
 Allows designers to develop highly effective RISC-like code for fast development time
 Instruction packing
 Gives code size equivalence for eight instructions executed serially or in parallel
 Reduces code size, program fetches, and power consumption
 Conditional execution of all instructions
 Reduces costly branching
 Increases parallelism for higher sustained performance
 Efficient code execution on independent functional units
 Industry’s most efficient C compiler on DSP benchmark suite
 Industry’s first assembly optimizer for fast development and improved parallelization
 8/16/32-bit data support, providing efficient memory support for a variety of applications
 40-bit arithmetic options add extra precision for vocoders and other computationally
intensive applications
 Saturation and normalization provide support for key arithmetic operations
 Field manipulation and instruction extract, set, clear, and bit counting support common
operation found in control and data manipulation applications.

Instructions and Instruction set

The language to command a computer architecture is comprised of instructions and the vocabulary
of that language is called the instruction set. The only way computers can represent information is
based on high or low electric signals, i.e., transistors (electric switches) being turned on or off. Being
limited to those 2 alternatives, we represent information in computers using bits (binary digits),
which can have one of two values: 0 or 1. So, instructions will be stored in and read by computers as
sequences of bits. This is called machine language. To make sure we don’t need to read and write
programs using bits, every instruction will also have a ”natural language” equivalent, called the
assembly language notation. For example, in C, we can use the expression c = a + b; or, in assembly
language, we can use add c, a, b and these instructions will be represented by a sequence of bits
000000 · · · 010001001 in the computer.
Groups of bits are named as follows:

 bit 0 or 1
 byte 8 bits
 half word 16 bits
 word 32 bits
 double word 64 bits

Features of TMS320C30

 High-Performance Floating-Point Digital Signal Processor (DSP)

– TMS320C30-50 (5 V) 40-ns Instruction Cycle Time 275 MOPS, 50 MFLOPS, 25 MIPS

– TMS320C30-40 (5 V) 50-ns Instruction Cycle Time 220 MOPS, 40 MFLOPS, 20 MIPS

– TMS320C30-33 (5 V) 60-ns Instruction Cycle Time 183.3 MOPS, 33.3 MFLOPS, 16.7
MIPS

– TMS320C30-27 (5 V) 74-ns Instruction Cycle Time 148.5 MOPS, 27 MFLOPS, 13.5


MIPS

 32-Bit High-Performance CPU

 16-/32-Bit Integer and 32-/40-Bit Floating-Point Operations

 32-Bit Instruction Word, 24-Bit Addresses

 Two 1K × 32-Bit Single-Cycle Dual-Access On-Chip RAM Blocks

 One 4K × 32-Bit Single-Cycle Dual-Access On-Chip ROM Block

 On-Chip Memory-Mapped Peripherals:

– Two Serial Ports

– Two 32-Bit Timers

– One-Channel Direct Memory Access (DMA) Coprocessor for Concurrent I/O and CPU
Operation

Block Diagram of TMS320C30

The TMS320C30 is a 32-bit floating-point processor manufactured in 0.7-µm triple-level-metal CMOS


technology. The TMS320C30’s internal busing and special DSP instruction set have the speed and
flexibility to execute up to 50 MFLOPS (million floating-point operations per second). The
TMS320C30 optimizes speed by implementing functions in hardware that other processors
implement through software or microcode. This hardware-intensive approach provides performance
previously unavailable on a single chip.
The TMS320C30 can perform parallel multiply and ALU operations on integer or floating-point
floating data
in a single cycle. Each processor also possesses a general-purpose
purpose register file, a program cache,
dedicated ARAUs, internal dual-access
access memories, one DMA channel supporting concurrent I/O, and
a short machine-cycle
cycle time. High performance and ease of use are results of these features. General
General-
purpose
urpose applications are enhanced greatly by the large address space, multiprocessor interface,
internally and externally generated wait states, two external interface ports, two timers, serial ports,
and multiple interrupt structure. The TMS320C30 supports a wide variety of system applications
from host processor to dedicated coprocessor.

Вам также может понравиться