Вы находитесь на странице: 1из 60

+

William Stallings
Computer Organization
and Architecture
10th Edition
© 2016 Pearson Education, Inc., Hoboken,
NJ. All rights reserved.
+ Chapter 3
A Top-Level View of Computer
Function and Interconnection
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Computer Components

 Originally computers used a Hardwired program


 The result of the process of connecting the various components in
the desired configuration
 The computer must be rewired to run a different program

 Contemporary computer designs are based on concepts


developed by John von Neumann at the Institute for
Advanced Studies, Princeton

 Referred to as the von Neumann architecture it has three key


concepts:
 Data and instructions are stored in a single read-write memory
 The contents of this memory are addressable by location, without
regard to the type of data contained there
 Execution occurs in a sequential fashion (unless explicitly
modified) from one instruction to the next

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Data
Sequence of
arithmetic
and logic
functions
Results

(a) Programming in hardware

Hardware
and Software Instruction Instruction
codes
Approaches interpreter

Control
signals

General-purpose
Data arithmetic Results
and logic
functions

(b) Programming in software

Figure 3.1 Hardware and Software Approaches

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Software
• A sequence of codes or instructions
• Part of the hardware interprets each instruction and Software
generates control signals
• Provide a new sequence of codes for each new
program instead of rewiring the hardware

Major components:
• CPU I/O
• Instruction interpreter
Components
• Module of general-purpose arithmetic and logic
functions
• I/O Components
• Input module
+ • Contains basic components for accepting data
and instructions and converting them into an
internal form of signals usable by the system
• Output module
• Means of reporting results

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Memory Memory buffer
address register (MBR) MEMORY
register (MAR) • Contains the data
• Specifies the to be written into
address in memory memory or
for the next read or receives the data
write read from memory

MAR
I/O address I/O buffer
register (I/OAR) register (I/OBR)
• Specifies a • Used for the
particular I/O exchange of data
+ device between an I/O
module and the
CPU
MBR

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Fetch Cycle
 At the beginning of each instruction cycle the processor
fetches an instruction from memory

 The program counter (PC) holds the address of the


instruction to be fetched next

 The processor increments the PC after each instruction


fetch so that it will fetch the next instruction in sequence

 The fetched instruction is loaded into the instruction


register (IR)

 The processor interprets the instruction and performs the


required action

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


CPU Main Memory
0
System 1
2
PC MAR Bus
Instruction
Instruction
Instruction
IR MBR

I/O AR
Data
Execution
unit Data
I/O BR
Data
Data

I/O Module n–2


n–1

PC = Program counter
Buffers IR = Instruction register
MAR = Memory address register
MBR = Memory buffer register
I/O AR = Input/output address register
I/O BR = Input/output buffer register

Figure 3.2 Computer Components: Top-Level View


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Fetch Cycle/Execute Cycle
 We will expand and add details to these basic fetch and
execute cycles throughout the lecture
Fetch Cycle Execute Cycle

Fetch Next Execute


START HALT
Instruction Instruction

Figure 3.3 Basic Instruction Cycle


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Action Categories
• Data transferred from • Data transferred to or
processor to memory from a peripheral
or from memory to device by
processor transferring between
the processor and an
I/O module

Processor- Processor-
memory I/O

Data
Control processin
g
• An instruction may • The processor may
specify that the perform some
sequence of arithmetic or logic
execution be altered operation on data

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


0 3 4 15
Opcode Address

(a) Instruction format

0 1 15
S Magnitude

(b) Integer format

Program Counter (PC) = Address of instruction


Instruction Register (IR) = Instruction being executed
Accumulator (AC) = Temporary storage

(c) Internal CPU registers

0001 = Load AC from Memory


0010 = Store AC to Memory
0101 = Add to AC from Memory

(d) Partial list of opcodes

Figure 3.4 Characteristics of a Hypothetical Machine

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 0 PC 300 1 9 4 0 3 0 1 PC
301 5 9 4 1 AC 301 5 9 4 1 0 0 0 3 AC
302 2 9 4 1 1 9 4 0 IR 302 2 9 4 1 1 9 4 0 IR
• •
• •
940 0 0 0 3 940 0 0 0 3
941 0 0 0 2 941 0 0 0 2
Step 1 Step 2
Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 1 PC 300 1 9 4 0 3 0 2 PC
301 5 9 4 1 0 0 0 3 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 5 9 4 1 IR 302 2 9 4 1 5 9 4 1 IR
• •
• •
940 0 0 0 3 940 0 0 0 3 3+2=5
941 0 0 0 2 941 0 0 0 2
Step 3 Step 4
Memory CPU Registers Memory CPU Registers
300 1 9 4 0 3 0 2 PC 300 1 9 4 0 3 0 3 PC
301 5 9 4 1 0 0 0 5 AC 301 5 9 4 1 0 0 0 5 AC
302 2 9 4 1 2 9 4 1 IR 302 2 9 4 1 2 9 4 1 IR
• •
• •
940 0 0 0 3 940 0 0 0 3
941 0 0 0 2 941 0 0 0 5
Step 5 Step 6

Figure 3.5 Example of Program Execution


(contents of memory and registers in hexadecimal)
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Example Program Execution
1. The PC contains 300, the address of the first instruction.
 Value at address 300 is loaded (1940) into IR
 First digit (1) is the op code (Load AC)
 The last three digits are the address (940)

2. First instruction is implemented (0003 loaded to AC) and PC is


incremented

3. Next instruction (address 301) is loaded (5941) into IR


 First digit (5) is the op code (Add to AC)
 The last three are the address (941)

4. The data at address 941 are added to the AC and the PC is incremented

5. Next instruction (address 302) is loaded (2941) into IR


 First digit (2) is the op code (Store AC)
 The last three are the address (941)

6. The contents of the AC are written to memory and the PC is incremented

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Multiple Memory Accesses
 The execute cycle may involve more than one memory
access.

 For example the PDP-11 includes the instruction ADD B,A


 A PDP-11 has multiple registers rather than a single accumulator

 The instruction cycle would look like this:


 Fetch the instruction ADD B, A
 Read the contents of memory location A into a register
 Read the contents of memory location B into a different register
 Add the two values
 Write the result from the processor to memory location A

 This results in a more complicated execute cycle

 A more detailed instruction cycle reflects this


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Instruction Operand Operand
fetch fetch store

Multiple Multiple
operands results

Instruction Instruction Operand Operand


Data
address operation address address
Operation
calculation decoding calculation calculation

Return for string


Instruction complete, or vector data
fetch next instruction

Figure 3.6 Instruction Cycle State Diagram


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Interrupts
 An interrupt allows some module (I/O, memory) to stop the
normal processing of the processor
 It may or may not resume processing after handling the interrupt

 Interrupts are a way to improve processing efficiency


 External devices are slower than the processor
 We don’t want the processor to wait for the printer
 The printer can interrupt the processor when it needs attention

 Interrupts are also used to stop a process when some type


of error has occurred
 The error may or may not be recoverable

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Program Generated by some condition that occurs as a result of an instruction
execution, such as arithmetic overflow, division by zero, attempt to
execute an illegal machine instruction, or reference outside a user's
allowed memory space.
Timer Generated by a timer within the processor. This allows the operating
system to perform certain functions on a regular basis.
I/O Generated by an I/O controller, to signal normal completion of an
operation, request service from the processor, or to signal a variety of
error conditions.
Hardware failure Generated by a failure such as power failure or memory parity error.

Table 3.1

Classes of Interrupts

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


User Program Interrupt Handler

i
Interrupt
occurs here i+1

Figure 3.8 Transfer of Control via Interrupts

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


User I/O User I/O User I/O
Program Program Program Program Program Program

1 4 1 4 1 4

I/O I/O I/O


Command Command Command
WRITE WRITE WRITE
5
2a
END
2 2

Interrupt Interrupt
2b Handler Handler

WRITE WRITE 5 WRITE 5

END END
3a

3 3

3b

WRITE WRITE WRITE

(a) No interrupts (b) Interrupts; short I/O wait (c) Interrupts; long I/O wait

= interrupt occurs during course of execution of user program

Figure 3.7 Program Flow of Control Without and With Interrupts


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Interrupts
 In these examples a “write” involves three steps:
 Preprocessing
 The write itself which takes place on a peripheral device
 Postprocessing

 The first system, without interrupts, must wait during each


write operation

 The second system, with interrupts, can initiate the write


and then proceed with normal operations
 When the write is complete the processor is interrupted do the
postprocessing necessary at the end of the write

 On the third system the processor tries to initiate a second


write before the first is completed
 Again, even with interrupts, the processor must wait

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Time

1 1

4 4
I/O operation
I/O operation;
processor waits 2a concurrent with
processor executing

5 5

2b
2
4
I/O operation
4 3a concurrent with
processor executing
I/O operation;
processor waits 5

5 3b

(b) With interrupts


3

(a) Without interrupts

Figure 3.10 Program Timing: Short I/O Wait

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Time

1 1

4 4

I/O operation; 2 I/O operation


processor waits concurrent with
processor executing;
then processor
waits
5

5
2
4
4
3 I/O operation
concurrent with
I/O operation; processor executing;
processor waits then processor
waits

5
5

3 (b) With interrupts

(a) Without interrupts

Figure 3.11 Program Timing: Long I/O Wait

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Interrupt Cycle
 An interrupt cycle is added to the instruction cycle

Fetch Cycle Execute Cycle Interrupt Cycle

Interrupts
Disabled
Check for
Fetch Next Execute
START Interrupt;
Instruction Instruction Interrupts Process Interrupt
Enabled

HALT

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Instruction Operand Operand
fetch fetch store

Multiple Multiple
operands results

Instruction Instruction Operand Operand


Data Interrupt
address operation address address Interrupt
Operation check
calculation decoding calculation calculation

No
Instruction complete, Return for string interrupt
fetch next instruction or vector data

Figure 3.12 Instruction Cycle State Diagram, With Interrupts

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Multiple Interrupts
 Once interrupts are introduced, it become possible (likely?)
that a interrupt will be thrown while the system is processing
the handler for an earlier interrupt

 There are two strategies to handle this situation.

1. Disabled interrupts – once the processor begins handling


an interrupt, it will ignore further interrupts.
 If a further interrupt occurs it will usually remain pending so it
can be handled once the current handler completes
 We need an interrupt queue
 This approach does not take into account priority or time-critical
needs

2. Define interrupt priorities – a higher priority interrupt can


interrupt the handler of one with a lower priority

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Interrupt
User program handler X

Interrupt
handler Y

(a) Sequential interrupt processing

Interrupt
User program handler X

Interrupt
handler Y

(b) Nested interrupt processing

Figure 3.13 Transfer of Control with Multiple Interrupts

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Priority Interrupts
 As an example of this second approach, consider a system with three I/O
devices: a printer, a disk, and a communications line, with increasing
priorities of 2, 4, and 5, respectively
 A user program begins at t = 0
 At t = 10, a printer interrupt occurs
 While printer interrupt service routine (ISR) is still executing, at t = 15, a
communications interrupt occurs.
 Because the communications line has higher priority than the printer, the
interrupt is honored. The printer ISR is interrupted
 While this routine is executing, a disk interrupt occurs (t = 20)
 Because this interrupt is of lower priority, it is simply held, and the
communications ISR runs to completion.
 When the communications ISR is complete (t = 25), the system processes the disk ISR
because it has a higher priority
 Only when that routine is complete (t = 35) is the printer ISR resumed.
 When that routine completes (t = 40), control finally returns to the user program.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Printer Communication
User program
interrupt service routine interrupt service routine
t=0

15
0 t=
t =1

t = 25

t= t = 25 Disk
40 interrupt service routine

t=
35

Figure 3.14 Example Time Sequence of Multiple Interrupts


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
I/O Function
 In addition to exchanging data with memory, the processor can
exchange data directly with I/O modules

 I/O modules control one or more devices

 Processor can read data from or write data to an I/O module


 Processor identifies a specific device that is controlled by a particular
I/O module
 I/O instructions rather than memory referencing instructions

 In some cases it is desirable to allow I/O exchanges to occur


directly with memory
 The processor grants to an I/O module the authority to read from or
write to memory so that the I/O memory transfer can occur without
tying up the processor
 The I/O module issues read or write commands to memory relieving
the processor of responsibility for the exchange
 This operation is known as direct memory access (DMA)

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Read Memory
Write
N Words
Address 0 Data

Data N–1

Read I/O Module Internal


Write Data

External
Address M Ports Data

Internal
Data Interrupt
Signals
External
Data

Instructions Address

Control
Data CPU Signals

Interrupt Data
Signals

Figure 3.15 Computer Modules

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


The interconnection structure must support the
following types of transfers:

Memory Processor I/O to or


I/O to Processor
to to from
processor to I/O
processor memory memory

An I/O
module is
allowed to
exchange
data
Processor Processor
directly
reads an Processor reads data Processor
with
instruction writes a from an I/O sends data
memory
or a unit of unit of data device via to the I/O
without
data from to memory an I/O device
going
memory module
through the
processor
using direct
memory
access

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Bus Interconnection
 There needs to be a means for the different components of a
computer system to communicate

 Modern general-purpose computers use various specialized


point-to-point interconnection structures

 A common bus structure was previously dominant and is still


used for many embedded systems
 Multiple devices connect to the bus
 A signal transmitted by one device can be received by any device
on the bus
 The devices must be synchronized so that signals do not overlap

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


A communication pathway Signals transmitted by any
connecting two or more one device are available for
devices reception by all other
• Key characteristic is that it is a devices attached to the bus
shared transmission medium • If two devices transmit during the
same time period their signals will
I
overlap and become garbled
n
n
e
Typically consists of multiple
communication lines Computer systems contain a t
• Each line is capable of
number of different buses
that provide pathways B c
transmitting signals representing
binary 1 and binary 0 between components at
various levels of the
e
computer system hierarchy u t
r
s i
System bus c
• A bus that connects major
computer components (processor,
The most common computer o
memory, I/O) interconnection structures
are based on the use of one o
or more system buses
n
n
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Data Bus
 Data lines that provide a path for moving data among system
modules

 May consist of 32, 64, 128, or more separate lines

 The number of lines is referred to as the width of the data bus

 The number of lines determines how many bits can be


transferred at a time

 The width of the data bus


is a key factor in
determining overall
system performance

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Address Bus Control Bus

 Used to designate the source or


destination of the data on the  Used to control the access and the
data bus use of the data and address lines
 If the processor wishes to
read a word of data from  Because the data and address lines
memory it puts the address of are shared by all components there
the desired word on the must be a means of controlling their
use
address lines
 Control signals transmit both
 Width determines the maximum
command and timing information
possible memory capacity of the among system modules
system
 Timing signals indicate the validity
 Also used to address I/O ports of data and address information
 The higher order bits are
used to select a particular  Command signals specify operations
module on the bus and the to be performed
lower order bits select a
memory location or I/O port
within the module
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
CPU Memory Memory I/O I/O

Control lines

Address lines Bus

Data lines

Figure 3.16 Bus Interconnection Scheme

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Point-to-Point Interconnect
Principal reason for change At higher and higher data
was the electrical rates it becomes
constraints encountered increasingly difficult to
with increasing the perform the synchronization
frequency of wide and arbitration functions in a
synchronous buses timely fashion

A conventional shared bus


on the same chip magnified
Has lower latency, higher
the difficulties of increasing
data rate, and better
bus data rate and reducing
scalability
bus latency to keep up with
the processors

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+Quick Path Interconnect
QPI
 Introduced in 2008

 Multiple direct connections


 Direct pairwise connections to other components
eliminating the need for arbitration found in shared
transmission systems

 Layered protocol architecture

 These processor level interconnects use a layered


protocol architecture rather than the simple use of
control signals found in shared bus arrangements

 Packetized data transfer

 Data are sent as a sequence of packets each of which


includes control headers and error control codes

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


I/O device

I/O device
I/O Hub

DRAM

DRAM
Core Core
A B

DRAM

DRAM
Core Core
C D
I/O device

I/O device
I/O Hub

QPI PCI Express Memory bus

Figure 3.17 Multicore Configuration Using QPI


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
QPI Layers
QPI is defined as a four-layer protocol architecture:

 Physical Layer: the actual wires carrying signals, along with


the circuitry and logic to support the transfer of 1’s and 0’s.
 The unit of data transfer at the physical layer is 20 bits, which is
called a Phit (physical unit)

 Link Layer: responsible for reliable transmission and flow


control.
 The link layer’s unit of transfer is an 80 bit Flit (flow control unit)

 Routing Layer: Provides the framework for directing packets.

 Protocol Layer: The high-level set of rules for exchanging


packets of data between devices. A packet is an integral
number of Flits.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Packets
Protocol Protocol

Routing Routing

Flits
Link Link

Physical Phits Physical

Figure 3.18 QPI Layers


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
COMPONENT A
Intel QuickPath Interconnect Port
Fwd Clk

Rcv Clk
Transmission Lanes Reception Lanes

Fwd Clk
Rcv Clk

Reception Lanes Transmission Lanes

Intel QuickPath Interconnect Port


COMPONENT B

Figure 3.19 Physical Interface of the Intel QPI Interconnect

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


QPI Physical Layer
 The previous figure shows the physical architecture of a QPI port

 The QPI port consists of 84 individual links grouped as follows


 Each data path consists of a pair of wires that transmits data one bit at a
time; each pair is referred to as a lane
 There are 20 data lanes in each direction (transmit and receive), plus a
clock lane in each direction
 Thus, QPI is capable of transmitting 20 bits in parallel in each direction
 The 20-bit unit is referred to as a Phit
 The form of transmission on each lane is known as differential signaling,
or balanced transmission – signals are transmitted as a current that travels
down one conductor and returns on the other
 The binary value depends on the voltage difference
 Typically, one line has a positive voltage value and the other line has zero
voltage, and one line is associated with binary 1 and one line is associated
with binary 0

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


QPI Physical Layer Lanes
 Another function performed by the physical layer is that it manages
the translation between 80-bit flits and 20-bit phits using a technique
known as multilane distribution

 The flits can be considered as a bit stream that is distributed across the
data lanes in a round-robin fashion (first bit to first lane, second bit to
second lane, etc.)

 This approach enables QPI to achieve very high data rates by


implementing the physical link between two ports as multiple parallel
channels.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


#2n+1 #n+1 #1 QPI
lane 0

bit stream of flits #2n+2 #n+2 #2 QPI


lane 1

#2n+1 #2n #n+2 #n+1 #n #2 #1

#3n #2n #n QPI


lane 19

Figure 3.20 QPI Multilane Distribution

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
QPI Link Layer

 Flow control function


 Performs two key  Needed to ensure that a
functions: flow control and sending QPI entity does not
error control overwhelm a receiving QPI
entity by sending data faster
 Operate on the level of than the receiver can process
the flit (flow control the data and clear buffers for
unit) more incoming data
 Each flit consists of a 72-
bit message payload
 Error control function
and an 8-bit error
control code called a  Detects and recovers from
cyclic redundancy check bit errors, and so isolates
(CRC) higher layers from
experiencing bit errors

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
QPI Routing and Protocol Layers

Routing Layer Protocol Layer


 Packet is defined as the unit of
 Used to determine the course transfer
that a packet will traverse
across the available system  One key function performed at
interconnects this level is a cache coherency
protocol which deals with
 Defined by firmware and making sure that main
describe the possible paths memory values held in
that a packet can follow multiple caches are consistent

 A typical data packet payload


is a block of data being sent to
or from a cache

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
Peripheral Component
Interconnect (PCI)
 A popular high bandwidth, processor independent bus that can
function as a mezzanine or peripheral bus

 Delivers better system performance for high speed I/O


subsystems

 PCI Special Interest Group (SIG)


 Created to develop further and maintain the compatibility of the PCI
specifications

 PCI Express (PCIe)


 Point-to-point interconnect scheme intended to replace bus-based
schemes such as PCI
 Key requirement is high capacity to support the needs of higher data rate
I/O devices, such as Gigabit Ethernet
 Another requirement deals with the need to support time dependent data
streams
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Core Core

Gigabit PCIe
Memory
Ethernet
Chipset
PCIe–PCI PCIe
Memory
Bridge

PCIe

PCIe PCIe
Switch

PCIe PCIe

Legacy PCIe PCIe PCIe


endpoint endpoint endpoint endpoint

Figure 3.21 Typical Configuration Using PCIe


© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
PCIe Layers

PCIe is described by a three layer architecture

 Physical Layer – the physical wires that carry the data


 Also includes the circuitry and logic to support the transmission and receipt
of 1’s and 0’s

 Data Link Layer – responsible for reliable transmission and flow


control
 Data packets at the data link layer are called DLLPs

 Transaction Layer – generates and consumes data packets and


manages the flow between two components
 Data packets at the transaction layer are called TLPs

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Transaction layer
packets (TLP)
Transaction Transaction

Data link layer


packets (DLLP)
Data Link Data Link

Physical Physical

Figure 3.22 PCIe Protocol Layers

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
PCIe Physical Layer

 PCIe operates in a similar manner to QPI with the exception


that lanes are bidirectional and the number of lanes on a
PCIe port can be 1, 4, 6, 16, of 32

 Bits sent to lanes in a round-robin scheme

 At each physical lane data are buffered and processed 16 bytes (128
bits) at a time

 Each block is encoded into a 130 bit code word for transmission

 There in no common clock; the receiver looks for transitions in the


data for synchronization

 The extra 2 bits ensure that in a long sequence of 1’s there are at least
some 0’s to provide these transitions

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


128b/ PCIe
B4 B0
130b lane 0
byte stream
128b/ PCIe
B5 B1
130b lane 1
B7 B6 B5 B4 B3 B2 B1 B0

128b/ PCIe
B6 B2
130b lane 2

128b/ PCIe
B7 B3
130b lane 3

Figure 3.23 PCIe Multilane Distribution

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
PCIe Data Link Layer

 The data link layer sends packets between two devices that are
concerned with “bookkeeping” between the two devices.

 The data link layer also adds error checking and address bits to
each TLP to ensure they arrive at the correct device

 These packets fall into the following categories:


 Flow control packets – regulate the rate at which TLPs and
DLLPs can be transmitted
 Power management packets – manage power platform
budgeting
 ACK packets to acknowledge the receipt of a valid data packet
 NAK packets acknowledges the receipt of an invalid data
packet – the packet must be resent

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+  Receives read and write requests from
the software above the TL and creates
request packets for transmission to a
destination via the link layer

PCIe  Most transactions use a split transaction


technique
Transaction Layer (TL)  A request packet is sent out by a
source PCIe device which then waits
for a response called a completion
packet
 TL messages and some write
transactions are posted transactions
(meaning that no response is
expected)

 TL packet format supports 32-bit


memory addressing and extended
64-bit memory addressing

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+
The TL supports four address
spaces:
 Memory  I/O
 The memory space includes  This address space is used
system main memory and
PCIe I/O devices
for legacy PCI devices, with
reserved address ranges
 Certain ranges of memory
addresses map into I/O used to address legacy I/O
devices devices

 Configuration  Message
 This address space enables  This address space is for
the TL to read/write control signals related to
configuration registers interrupts, error handling,
associated with I/O devices and power management

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


Table 3.2
PCIe TLP Transaction Types
Address Space TLP Type Purpose
Memory Read Request
Transfer data to or from a location in the
Memory Memory Read Lock Request
system memory map.
Memory Write Request
I/O Read Request Transfer data to or from a location in the
I/O
I/O Write Request system memory map for legacy devices.
Config Type 0 Read Request
Config Type 0 Write Request Transfer data to or from a location in the
Configuration
Config Type 1 Read Request configuration space of a PCIe device.
Config Type 1 Write Request
Message Request Provides in-band messaging and event
Message reporting.
Message Request with Data
Completion
Memory, I/O, Completion with Data
Returned for certain requests.
Configuration Completion Locked
Completion Locked with Data
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Number
of octets
1 STP framing 1 Start

Appended by PL
2 Sequence number
DLLP

Created
by DLL
4

2 CRC

12 or 16 Header 1 End

Created by Transaction Layer

Appended by Data Link Layer

Appended by Physical Layer


0 to 4096 Data

0 or 4 ECRC

4 LCRC

1 STP framing

(a) Transaction Layer Packet (b) Data Link Layer Packet

Figure 3.25 PCIe Protocol Data Unit Format

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.


+ Summary A Top-Level View of
Computer Function
and Interconnection
Chapter 3
 Point-to-point interconnect
 QPI physical layer
 Computer components
 QPI link layer
 Computer function
 QPI routing layer
 Instruction fetch and
execute  QPI protocol layer
 Interrupts  PCI express
 I/O function  PCI physical and logical
 Interconnection structures architecture
 Bus interconnection  PCIe physical layer
 PCIe transaction layer
 PCIe data link layer
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+ Homework A Top-Level View of
Computer Function
and Interconnection
Chapter 3
 Problems:

Review Questions: 3.1, 3.3, 3.5, 3.11, 3.12, 3.14


 3.2, 3.3, 3.5

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.