Академический Документы
Профессиональный Документы
Культура Документы
Overview
Recalled
Packet processing functions (forwarding, queuing) Traditional network processing systems (CPU + NICs) General network processor architecture and tradeoffs Intel IXP network processors overall architecture
Ning Weng
ECE 526
Ning Weng
ECE 526
Memory Mapping
Allocation of address space (2^32) to different system components Accesses to memory is translated into access to component Needs to be carefully crafted
Ning Weng
ECE 526
Microengines
Microengines are data-path packet processors IXP IXP 2400 have 8 Microengines Simpler than XScale Low level device as a micro-sequencer Optimized for packet processing More complex to use Often abbreviated as uE
Ning Weng
ECE 526
uE Functions
uEs handle ingress and egress packet processing:
Packet ingress from physical layer hardware Checksum verification Header processing and classification Packet buffering in memory Table lookup and forwarding Header modification Checksum computation Packet egress to physical layer hardware
Ning Weng
ECE 526
uE Architecture
uE characteristics:
Programmable microcontroller RISC design 256 general-purpose registers 512 transfer registers 128 next neighbor registers Hardware support for 8 threads and context switching 640 words of local memory Control of an Arithmetic and Logic Unit Direct access to various functional units A unit to compute a Cyclic Redundancy Check (CRC)
Ning Weng
ECE 526
uE as Micro-sequencer
Micro-sequencer does not contain native instructions for possible operations
Instead of using instructions, uE invokes functional units to perform operations Control unit is much simpler
Example 1:
uE does not have ADD R2,R3 instruction Instead: ALU ADD R2, R3 ALU indicates that ALU should be used ADD is a parameter to ALU
Example 2:
Memory access not by simple LOAD R2, 0xdeadbeef Instead: SRAM LOAD R2, 0xdeadbeef
uE Instruction Set
General
ALU and etc
CAM
CAM_CLEAR: clear all entries in local memories
Ning Weng
ECE 526
10
uE Memories
uEs: viewing memories differently than XScale does
Does not map memories and I/O devices into a liner address space Does not view memories as a seamless, uniform repository
uE ISA: requiring a separate instruction for each type of memory and I/O device
SRAM[read, $$x, address1, address2]
Ning Weng
ECE 526
11
Execution Pipeline
What is pipeline? Why pipeline is employed?
One instruction is executed per cycle if pipeline is proper designed
Ning Weng
ECE 526
12
Pipelining
Ning Weng
ECE 526
13
Pipelining Problems
Possible sources of pipelining problems
Data dependencies Control dependencies Resource dependencies Memory accesses
How pipelining problem impact system performance How these impact can be removed or reduced
Remove the sources so that no stall happened Hide the impact of pipelining stall
Ning Weng
ECE 526
14
Pipeline Stalls
K: K+1 ALU ADD R2, R1, R2 ALU ADD R3, R2, R3
Threading Illustration
Ning Weng
ECE 526
16
Hardware Threads
uEs support 8 hardware thread contexts
One thread can execute at any given time When stall occurs, uE can switch to other thread (if not stalled)
Switching rules
If thread stalls, check if next is ready for processing Keep trying until ready thread is found If none is available, stall uE and wait for any thread to unblock
Summary
Control processor (slow path): XScale core
Overall architecture Typical functions Processor features
Ning Weng
ECE 526
18
Lab3 Brief
Intel Reference Systems SDK Tutorial Lab 3
Ning Weng
ECE 526
19
Ning Weng
ECE 526
20
Tools: All three parts require access to a machine that has the Intel SDK installed. If you want, you can also request an installation CD for your own machine, check with TA.
Ning Weng
ECE 526
21
Ning Weng
ECE 526
22
Ning Weng
ECE 526
23
Ning Weng
ECE 526
24
How to do Lab3
Windows machine with SDK installed Download lab instructions and source code from blackboard Start early. Very exciting lab. Due day
Part I and Part II 10/13 Part III 10/20
Ning Weng
ECE 526
25