L13 IXAXscaleMicroengine SDK Tutorial

ECE 526 Network Processing Systems Design
IXP XScale and Microengines Chapter 18 & 19: D. E. Comer
Overview
Recalled
Packet processing functions (forwarding, queuing) Traditional network processing systems (CPU + NICs) General network processor architecture and tradeoffs Intel IXP network processors overall architecture
Focus on individual components of Intel IXP chip

Control processor (slow path): XScale core
Overall architecture Typical functions Processor features
Packet processing processor (fast path): Microengines

Architecture and features Differences to conventional processors Pipelining and multi-threading
Ning Weng
ECE 526
Purpose of Control Processor

Functions typically executed by embedded control proc:
Bootstrapping Exception handling Higher-layer protocol processing Interactive debugging Diagnostics and logging Memory allocation Application programs (if needed) User interface and/or interface to the GPP Control of packet processors Other administrative functions
Ning Weng
ECE 526
XScale Memory Architecture

Memory architecture
Uses 32-bit linear address space configurable endian mode Byte addressable
Memory Mapping
Allocation of address space (2^32) to different system components Accesses to memory is translated into access to component Needs to be carefully crafted
XScale assumes byte addressable memory

Underlying memory uses different size (SDRAM) How does this work?
Support for Virtual Memory

For demand paging to secondary storage
Ning Weng ECE 526 4
Shared Memory Address Issues

Memory is shared between XScale and Microengines Same data, but different addresses What impact does this have?
Pointers need to be translated Data structures with pointers can not be shared
Ning Weng
ECE 526
Microengines
Microengines are data-path packet processors IXP IXP 2400 have 8 Microengines Simpler than XScale Low level device as a micro-sequencer Optimized for packet processing More complex to use Often abbreviated as uE
Ning Weng
ECE 526
uE Functions
uEs handle ingress and egress packet processing:
Packet ingress from physical layer hardware Checksum verification Header processing and classification Packet buffering in memory Table lookup and forwarding Header modification Checksum computation Packet egress to physical layer hardware
Ning Weng
ECE 526
uE Architecture
uE characteristics:
Programmable microcontroller RISC design 256 general-purpose registers 512 transfer registers 128 next neighbor registers Hardware support for 8 threads and context switching 640 words of local memory Control of an Arithmetic and Logic Unit Direct access to various functional units A unit to compute a Cyclic Redundancy Check (CRC)
Ning Weng
ECE 526
uE as Micro-sequencer
Micro-sequencer does not contain native instructions for possible operations
Instead of using instructions, uE invokes functional units to perform operations Control unit is much simpler
Example 1:
uE does not have ADD R2,R3 instruction Instead: ALU ADD R2, R3 ALU indicates that ALU should be used ADD is a parameter to ALU
Example 2:
Memory access not by simple LOAD R2, 0xdeadbeef Instead: SRAM LOAD R2, 0xdeadbeef
Altogether similar to normal processor, but more basic

Ning Weng ECE 526 9
uE Instruction Set
General
ALU and etc
Brach and Jump

BR: branch unconditionally
CAM
CAM_CLEAR: clear all entries in local memories
I/O and context swap

SCRATCH (read and write)
For detail see Figure 19.1, 19.2, Comer.
Ning Weng
ECE 526
10
uE Memories
uEs: viewing memories differently than XScale does
Does not map memories and I/O devices into a liner address space Does not view memories as a seamless, uniform repository
uE ISA: requiring a separate instruction for each type of memory and I/O device
SRAM[read, $$x, address1, address2]
Programmer: required binding of data items to specific type of memory permanently.
Ning Weng
ECE 526
11
Execution Pipeline
What is pipeline? Why pipeline is employed?
One instruction is executed per cycle if pipeline is proper designed
uEs use five-stage or six-stage pipeline:
Ning Weng
ECE 526
12
Pipelining
Ning Weng
ECE 526
13
Pipelining Problems
Possible sources of pipelining problems
Data dependencies Control dependencies Resource dependencies Memory accesses
How pipelining problem impact system performance How these impact can be removed or reduced
Remove the sources so that no stall happened Hide the impact of pipelining stall
Ning Weng
ECE 526
14
Pipeline Stalls
K: K+1 ALU ADD R2, R1, R2 ALU ADD R3, R2, R3
Control dependencies, memory have even bigger impact

Ning Weng ECE 526 15
Threading Illustration
Ning Weng
ECE 526
16
Hardware Threads
uEs support 8 hardware thread contexts
One thread can execute at any given time When stall occurs, uE can switch to other thread (if not stalled)
Very low overhead for context switch

Zero-cycle context switch Effectively can take around three cycles due to pipeline flush
Switching rules
If thread stalls, check if next is ready for processing Keep trying until ready thread is found If none is available, stall uE and wait for any thread to unblock
Improves overall throughput Questions:

Why not 16, 32 threads why not have 48 uEs with 1 thread?
Ning Weng ECE 526 17
Summary
Control processor (slow path): XScale core
Overall architecture Typical functions Processor features
Packet processing processor (fast path): Microengines

Architecture and features Differences to conventional processors Pipelining and multi-threading
Ning Weng
ECE 526
18
Lab3 Brief
Intel Reference Systems SDK Tutorial Lab 3
Ning Weng
ECE 526
19
Intel Reference Systems

Hardware Testbed
IXP2400 network processors QDRM-SRAM, Flash ROM and other memories 1G optical ethernet ports 100M ethernet management port Serial interface PCI interfaces Compiler Assembler, linker Simulator Reference codes
SDK (software development kit)
Ning Weng
ECE 526
20
Lab3: Forwarding, Counting & Classification

Goal: to explore the basic functionalities of the IXP2400 software development kit and Microengines. 3 parts:
Part I: collecting a number of workload statistics from the IXP SDK simulator. Follow steps of lab instruction. Part II: adding one counting block to count the number of packets. Part III: implementing a simple packet classification mechanism.
Tools: All three parts require access to a machine that has the Intel SDK installed. If you want, you can also request an installation CD for your own machine, check with TA.
Ning Weng
ECE 526
21
Part I: Forwarding Simulation

run an implementation of IP forwarding on the IXP2400 simulator. All the code is provided to you. collect a set of workload statistics that are reported by the simulator.
Ning Weng
ECE 526
22
Part II: Forwarding and Counting

modify above applications by adding counter block store how many packets are received.
Ning Weng
ECE 526
23
Part III: Classification and Counting

classifying packets based on the packet header information. There are four types of traffic that are considered in this lab:
Web traffic over TCP over IPv4 Non-Web traffic over TCP over IPv4 UDP over IPv4 IPv6
modifying the code to report the number of packets in each type.
Ning Weng
ECE 526
24
How to do Lab3
Windows machine with SDK installed Download lab instructions and source code from blackboard Start early. Very exciting lab. Due day
Part I and Part II 10/13 Part III 10/20
Ning Weng
ECE 526
25

L13 IXAXscaleMicroengine SDK Tutorial

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

L13 IXAXscaleMicroengine SDK Tutorial

Загружено:

Авторское право:

Доступные форматы

ECE 526 Network Processing Systems Design

IXP XScale and Microengines Chapter 18 & 19: D. E. Comer

Focus on individual components of Intel IXP chip

Packet processing processor (fast path): Microengines

Purpose of Control Processor

XScale Memory Architecture

XScale assumes byte addressable memory

Support for Virtual Memory

Shared Memory Address Issues

Altogether similar to normal processor, but more basic

Brach and Jump

I/O and context swap

For detail see Figure 19.1, 19.2, Comer.

Programmer: required binding of data items to specific type of memory permanently.

uEs use five-stage or six-stage pipeline:

Control dependencies, memory have even bigger impact

Very low overhead for context switch

Improves overall throughput Questions:

Packet processing processor (fast path): Microengines

Intel Reference Systems

SDK (software development kit)

Lab3: Forwarding, Counting & Classification

Part I: Forwarding Simulation

Part II: Forwarding and Counting

Part III: Classification and Counting

modifying the code to report the number of packets in each type.

Вам также может понравиться