Академический Документы
Профессиональный Документы
Культура Документы
Root Complex (RC) : It is a PCIe host. It usually provides slots using which other PCI /
PCIe devices can be connected.
End Point (EP) : It is a PCIe device which usually has peripherals like USB or SATA. It
has its own address space (32b/64b).
Bridge: A Bridge is used to connect a PCI/PCIX device to a PCIe root complex.
Switch: A switch is used to connect multiple PCI Express devices to the root complex. If
there are no enough slots in the board a switch will be used.
Transaction Layer
Data Link Layer
Physical Layer
The communication between the target (RX) and initiator (TX) happens as per the below layers,
shown :
Communication is done through Transaction layer packets (TLP). These transactions are of four
types.
Memory Read/Write (Mrd/MWr): These are used to transfer data from/to the mapped
memory location
IO Read/Write(IORd/IOWr): These are used to transfer data from/to I/O location
Configuration Read/Write(CfgRd/CfgWr): These are used to configuring the end
points
Message Transaction: These are used for signaling an event and general messaging.
Supports Vendor defined messages
Non-Posted Transactions
(NP): In these transactions
when a request packet is
sent, the target is expected
to return a completion
packet to the requester
which contain data.
Example: Memory Read,
Memory Read-Lock, I/O
Read, I/O write etc.,
Posted Transactions (P):In these transactions, when a request packet is sent, no
completions are expected from target.Example: Memory write
Address routing
ID routing
Implicit routing
Implicit Routing: This routing is used only with message requests. Interrupt signals, vendor-
defined messages, Error Signaling, LTR (Latency tolerance reporting) messages etc., This
routing is determined by using subfield in type field.
Data link layer is the middle layer which is responsible for Link management, Data integrity,
Flow Control, Error correction, Error detection. Basic data reliability mechanism is conducted by
this layer.
Physical layer is the last layer which is responsible for data exchange. This layer consists of two
sub-blocks.
Logical sub-block
Electrical sub-block
Services:
In this layer there uses symbol encoding ( 8-bit/ 10-bit) at requester and decoder at completer and
special symbols for framing and management. COM (indicates initialization & management of
link), STP( start of TLP), SDP( start of DLLP), SKP (skip) are some of the special symbols
which acts as control signals. In this layer pseudo-random algorithm takes place which
eliminates the repletion patterns which is termed as Scramblers.
As data is needed to be transferred bit by bit this layer consists of serializer at transmitter and de-
serializer at receiver section.
This block consists of differential drivers at transmitter and differential receivers at receiver
section. At transmitted section the out-bounded symbols from serializers from each lane is which
are the bit stream are converted to electrical signals. Where as in receiver section converted
electrical signals from transmitter are detected from each lane and generates the bit stream which
are de-serialized in to symbols.
The root complex can directly access the memory without CPU intervention just like DMA.
PCIe device can also use the same feature to read/write. The host should give permission to
access the memory and know the address into which the data should be read/written. Root
complex allows other devices to be connected using root ports. In the figure1, three root ports are
used.
To the second root port a switch is connected to which PCIe end points. Point-to-point topology
is followed here, which allows a single link to be connected with two devices only. There is a
host bridge which connects CPU to root ports, which are connected to Bus 0.
Each Bus will be assigned a bus number by the software during enumeration process. In the case
of a switch, the primary bus is Bus 3 since it is the lowest Bus connected to a switch and
secondary bus is Bus 4 and the sub-ordinate bus is Bus 8 (highest bus number). Any transaction
routed to Bus 4/8 is accepted by the switch and routed accordingly.
The numbers of I/O lines in a PCIe are less, because it is based on serial bus technology. The I/O
lines consists of a pair of differential Receiver (Rx+,Rx-) lines and a pair of differential
Transmitter(Tx+,Tx-)lines if it is a one lane card. And it also consists of other auxiliary signals
REFCLK+, REFCLK-. A PERST signal is present which is used to inform when clock and
voltage signals are stable. PRSNT1, PRSNT2 are used for hot plug detection. In-band signaling
is used for interrupt handling.
PCI Express supports hot plug and surprise hot unplug without usage of sideband signals. Hot
plug interrupt messages, communicated in-band to the root complex, trigger hot plug software to
detect a hot plug or removal event. Rather than implementing a centralized hot plug controller as
exists in PCI platforms, the hot plug controller function is distributed to the port logic associated
with a hot plug capable port of a switch or root complex. Two colored LEDs, a Manually-
operated Retention Latch (MRL), MRL sensor, attention button, power control signal and
PRSNT2# signal are some of the elements of a hot plug capable port.
PCIe defines the registers necessary to support the integration of a Hot plug controller within
individual root and switch ports. Under Hot plug software control , these hot plug controllers and
the associated port interface with in the root or switch port must control the card interfaces
signals to ensure orderly power down and power up as cards are removed and replaced.
Hot plug controller must assert and de- assert the PERST# signal to the PCIe card connector. It
must remove or apply power to the card connector. It must selectively turn on or off the power
and Attention Indicators associated with a specific card connector to draw the user's attention to
the connector and advertise whether power is applied to the slot. It must monitor slot events
(eg.card removal) and report these events to software via interrupts.
PCIe Hot-Plug is designed as a "no surprises" Hot- plug methodology. In other words, the user is
not permitted to install or remove a PCIe card without first notifying software. System software
prepares both the card and slot for the cards removal and replacement, and finally indicates to the
end user status of the hot plug process and notification that installation or removal may be
performed.
PCI Express cards (unlike PCI) must implement the edge contacts with card presence detect pins
(PRSNT1# and PRSNT2#) that break contact first (when the card is removed from the slot). This
gives advanced notice to software of a "surprise" removal and enough time to remove power
prior to the signal breaking contact.
The host system can access the PCIe endpoint only using the PCIe address space. This address
space is a virtual address space for which no physical memory is allocated. It is just like a list of
addresses used in the Transaction Layer Packet in order to identify the target of the packet.
Root complex has IP registers in order to configure the IP. It has registers to enable the clocks, to
program the file, to configure the lane width (1 lane or 2 lane), configure the speed mode (gen 1
or gen2 or gen3). It has registers to translate address from CPU address to PCIe address.
Configurable address space is used by the CPU in order to access the PCIe address space. There
are four spaces in a PCIe system: memory space, IO space, message space and configuration
space. Apart from message space all others will have physical address associated with it. The
size of the configuration space is 4KB while that of an IO space is 64KB.Configuration space
will have all the information about the device. It has device ID, vendor ID, class code and
various capabilities of the device. It is a software backward compatible to a PCI which has a size
of 250Bytes. Of this 4KB of configuration space, the first 64 bytes are standard and are called as
Standardized Headers.
These Standardized Headers are of two types (Type 0 & Type 1) .Type 0 will be used by the
PCIe end points and will have information that is applicable to end points. Type 1 will be used
by root ports, bridges and switches. It contains information applicable only to those. Every PCIe
end point will also have a configuration space just that it has different type of header that is used
in root complex.
During the Enumeration process the host system should read the configuration space of the end
point, this has to be mapped to some address in the PCIe address space. The mechanism in which
the PCIe end point maps to the address space is called as the Enhanced Configuration Access.
The detailed block diagram of a PCIe device's layers is given below which explains key
functions of each layer as it relates to outbound traffic and response to inbound traffic. The
different layers which are involved in the transaction process are Device core/Software layer,
Transaction layer, Data link layer and Physical layer.
The device core consists of root complex core logic or end point core logic such as that of an
Ethernet controller, SCSI controller, USB controller etc.
The device core logic provides the necessary information (Transaction type, address, amount of
data to transfer, data, traffic class, message index etc.) by the PCIe device to generate TLPs. This
information is sent via the Transmit interface to the Transaction layer of the device.
The Header is 3 double words or 4 double words in size and may include information such as;
Address, TLP type, transfer size, requester ID/completer ID, tag, traffic class, byte enables,
completion codes, and attributes (including "no snoop" and "relaxed ordering" bits). The transfer
size or length field indicates the amount of data to transfer calculated in double words (DWs).
The data transfer length can be between 1 to 1024 DWs. Write request TLPs include data
payload in the amount indicated by the length field of the header. For a read request TLP, the
length field indicates the amount of data requested from a completer. This data is returned in one
or more completion packets.
Read request TLPs do not include a data payload field. Byte enables specify byte level address
resolution. Request packets contain a requester ID (bus#, device#, function #) of the device
transmitting the request. The tag field in the request is memorized by the completer and the same
tag is used in the completion. A bit in the Header (TD = TLP Digest) indicates whether this
packet contains an ECRC field also referred to as Digest. This field is 32-bits wide and contains
an End-to-End CRC (ECRC). The ECRC field is generated by the Transaction Layer at time of
creation of the outbound TLP. It is generated based on the entire TLP from first byte of header to
last byte of data payload (with the exception of the EP bit, and bit 0 of the Type field. These two
bits are always considered to be a 1 for the ECRC calculation). The TLP never changes as it
traverses the fabric (with the exception of perhaps the two bits mentioned in the earlier sentence).
The receiver device checks for an ECRC error that may occur as the packet moves through the
fabric.
The receiver side of the Transaction Layer stores inbounds TLPs in receiver virtual channel
buffers. The receiver checks for CRC errors based on the ECRC field in the TLP. If there are no
errors, the ECRC field is stripped and the resultant information in the TLP header as well as the
data payload is sent to the Device Core.
The primary function of the Data Link Layer is to ensure data integrity during packet
transmission and reception on each Link. If a transmitter device sends a TLP to a remote receiver
device at the other end of a Link and a CRC error is detected, the transmitter device is notified
with a NAK DLLP. The transmitter device automatically replays the TLP. This time hopefully
no error occurs. With error checking and automatic replay of packets received in error, PCI
Express ensures very high probability that a TLP transmitted by one device will make its way to
the final destination with no errors. This makes PCI Express ideal for low error rate, high-
availability systems such as servers.
The Transaction Layer must observe the flow control mechanism before forwarding outbound
TLPs to the Data Link Layer. If sufficient credits exist, a TLP stored within the virtual channel
buffer is passed from the Transaction Layer to the Data Link Layer for transmission.
The Data Link Layer is responsible for TLP CRC generation and TLP error checking. For
outbound TLPs from Transmitter, a Link CRC (LCRC) is generated and appended to the TLP. In
addition, a sequence ID is appended to the TLP. Device A's Data Link Layer preserves a copy of
the TLP in a replay buffer and transmits the TLP to Receiver. The Data Link Layer of the remote
Receiver receives the TLP and checks for CRC errors. If there is no error, the Data Link Layer of
Receiver returns an ACK DLLP with a sequence ID to Transmitter. Transmitter has confirmation
that the TLP has reached Receiver (not necessarily the final destination) successfully.
Transmitter clears its replay buffer of the TLP associated with that sequence ID.
If on the other hand a CRC error is detected in the TLP received at the remote Receiver, then a
NAK DLLP with a sequence ID is returned to Transmitter. For a given TLP in the replay buffer,
if the transmitter device receives a NAK 4 times and the TLP is replayed 3 additional times as a
result, then the Data Link Layer logs the error, reports a correctable error, and re-trains the Link.
The receive side of the Data Link Layer is responsible for LCRC error checking on inbound
TLPs. If no error is detected, the device schedules an ACK DLLP for transmission back to the
remote transmitter device. The receiver strips the TLP of the LCRC field and sequence ID.
If a CRC error is detected, it schedules a NAK to return back to the remote transmitter. The TLP
is eliminated.
Both TLP and DLLP type packets are sent from the Data Link Layer to the Physical Layer for
transmission over the Link. Also, packets are received by the Physical Layer from the Link and
sent to the Data Link Layer. The Physical Layer is divided in two portions, the Logical Physical
Layer and the Electrical Physical Layer. The Logical Physical Layer contains digital logic
associated with processing packets before transmission on the Link, or processing packets
inbound from the Link before sending to the Data Link Layer. The Electrical Physical Layer is
the analog interface of the Physical Layer that connects to the Link.
TLPs and DLLPs from the Data Link Layer are clocked into a buffer in the Logical Physical
Layer. The Physical Layer frames the TLP or DLLP with a Start and End character. The symbol
is a framing code byte which a receiver device uses to detect the start and end of a packet. The
Start and End characters are shown appended to a TLP and DLLP.
The transmit logical sub-block conditions the received packet from the Data Link Layer into the
correct format for transmission. Packets are byte striped across the available Lanes on the Link.
Each byte of a packet is then scrambled with the aid of Linear Feedback Shift Register type
scrambler. By scrambling the bytes, repeated bit patterns on the Link are eliminated, thus
reducing the average EMI noise generated. The resultant bytes are encoded into a 10b code by
the 8b/10b encoding logic. The primary purpose of encoding 8b characters to 10b symbols is to
create sufficient 1-to-0 and 0-to-1 transition density in the bit stream to facilitate recreation of a
receive clock with the aid of a PLL at the remote receiver device. Note that data is not
transmitted along with a clock. Instead, the bit stream contains sufficient transitions to allow the
receiver device to recreate a receive clock. The parallel-to-serial converter generates a serial bit
stream of the packet on each Lane and transmits it differentially at 8 Gbits/s.
The receive Electrical Physical Layer clocks in a packet arriving differentially on all Lanes. The
serial bit stream of the packet is converted into a 10b parallel stream using the serial to parallel
converter. The receiver logic also includes an elastic buffer which accommodates for clock
frequency variation between a transmit clock with which the packet bit stream is clocked into a
receiver and the receiver clock. The 10b symbol stream is decoded back to the 8b representation
of each symbol with the 8b/10b decoder. The 8b characters are de-scrambled. The Byte
unstriping logic, re-creates the original packet stream transmitted by the remote devices.
We would like to illustrate the transaction of memory read showing packet transmission between
requester (Root Complex) and completer (End Point) to accomplish this transaction.
The root complex on the behalf of
the processor initiates a non-
posted memory read (MRd). The
root complex transmits an MRd
packet with contains amongst
other fields, an address, TLP type,
requestor ID and length of transfer
field. A Switch A which is a 3 port
switch receives the packet on its
upstream port. The switch
logically appears like a 3 virtual
bridge device connected by an
internal bus. The local bridges
within the switch contain memory
and I/O base and limit address
registers within their configuration
space similar to PCI bridges. The
MRd packet address is decoded by
the switch and compared with the
base / limit address range registers
of the two downstream local
bridges. The switch internally
forwards MRd packet from the
upstream ingress port to the
correct downstream port. The
MRd packet is forwarded to
Switch B. Switch B decodes the
address in a similar manner.
Assume the MRd packets are
forwarded to the right hand port so
that the completer end point
receives the MRd packet.
But if we compare the life cycle of a bus write operation with the one of a read, there's an evident
difference. A write TLP operation is fire- and forget. Once the packet has been formed and
handed over to the Data Link layer, there's no need to worry about it anymore.
The root complex on the behalf of
the processor initiates a non-
posted memory write (MWr). The
root complex transmits an MWr
packet with contains amongst
other fields, an address, TLP type,
requestor ID and length of transfer
field. A Switch A which is a 3 port
switch receives the packet on its
upstream port. The switch
logically appears like a 3 virtual
bridge device connected by an
internal bus. The local bridges
within the switch contain memory
and I/O base and limit address
registers within their configuration
space similar to PCI bridges. The
MWr packet address is decoded
by the switch and compared with
the base / limit address range
registers of the two downstream
local bridges. The switch
internally forwards MWr packet
from the upstream ingress port to
the correct downstream port. The
MWr packet is forwarded to
Switch B. Switch B decodes the
address in a similar manner.
Assume the MWr packets are
forwarded to the right hand port so
that the completer end point
receives the MWr packet.
We would like to illustrate the transaction of IO write showing packet transmission between
requester (Root Complex) and completer (End Point) to accomplish this transaction. IO requests
can only be initiated by a Root Complex or a legacy end point. PCI Express end points do not
initiate IO transactions. IO transactions are intended for legacy support. Native PCI Express
devices are not prohibited from implementing IO space, but the specification states that a PCI
Express End point must not depend on the operating system allocating I/O resources that are
requested.
IO requests are routed by switches in a similar manner to memory requests. Switches route IO
request packets by comparing the IO address in the packet with the IO base and limit address
range registers in the virtual bridge configuration space associated with a switch.
The CPU initiates an IO write on
the Front Side Bus (FSB). The
write contains a target IO address
and up to 4Bytes of data. The Root
Complex creates an IO Write
request TLP (IOWr) using address
and data from the CPU
transaction. It uses its own
requester ID in the packet header.
This packet is routed through
switch A and B. the completer
bend point returns a completion
without data (Cpl) and completion
status of �successful completion'
to confirm the reception of good
data from the requester.
The Root Complex initiates on
behalf of the processor an IO read
request. The read contains a target
IO address The Root Complex
creates an IO Read request TLP
(IORd) using address from the
CPU transaction. It uses its own
requester ID in the packet header.
This packet is routed through
switch A and B. the completer
bend point returns a completion
with data (CplD) and completion
status of �successful completion'
to confirm the transmission of
good data to the requester.
MRd-> CplD
IORd->CplD or Cpl
IOWr->Cpl
CfgRd->CplD
CfgWr->Cpl
The transaction layer uses this information to build a CplD TLP. A 3DW header is created. In
addition, the transaction layer adds its own completer ID to the header. The TD bit in the TLP
header is set if a 32-bit end to end CRC is added to the tail portion of the TLP. The TLP includes
the data payload. The flow control logic confirms sufficient credits are available for the virtual
channel associated with the traffic class. Now the CplD TLP is sent to the Data link layer. The
Data link layer adds a 12 bit sequence ID and a 32 bit LCRC which is calculated based on the
entire packet. A copy of the TLP with sequence ID and LCRC is stored in the replay buffer.
This packet is forwarded to the physical layer which tags on a start symbol and an end symbol to
the packet. The packet is byte striped across the available Lanes, scrambled and 10 bit encoded.
Finally the CplD packet is converted to a serial bit stream on all Lanes and transmitted to the
switch (or End point) ingress port which happens with the help of VC arbitration and by using
Port arbitration packet point to the correct egress port to reach the requester.
The requester
converts the
incoming serial bit
stream back to 10b
symbols while
assembling the
packet in a elastic
buffer. The 10 b
symbols are
converted back to
bytes and the bytes
from all lanes are
de scrambled and
unstriped. The start
and end symbols
are detected and
removed. The
resultant TLP is
sent to the Data
link layer. The
Data link layer
checks for LCRC
errors in the
received CplD
TLP and checks
the sequence ID
for missing or out
of sequence TLPs.
Assume no error.
The data link layer
creates an ACK
DLLP which
contains the same
sequence ID as
contained in the
CplD TLP
received. A 16 bit CRC is added to the ACK DLLP. The DLLP is sent back to the physical layer
which transmits the ACK DLLP to the completer.
The Completer physical layer reformulates the ACK DLLP and sends it up to the Data link layer
which evaluates the sequence ID and compares it with TLPs stored in the replay buffer. The
stored CplD TLP associated with the ACK received is discarded from the replay buffer. If a
NAK DLLP was received by the completer instead, it would re-send a copy of the stored CplD
TLP. In the mean time, the requester transaction layer receives the CplD TLP in the appropriate
virtual channel buffer mapped to the TLP TC. The transaction layer uses the tag in the header of
the CplD TLP to associate the completion with the original request. Transaction layer checks for
ECRC error. It forwards the header contents and data payload including the completion status to
the requester Device Core. Memory Read Transaction DONE.
All devices capture the Bus number and device number information provided by the upstream
device during each type 0 configuration write cycle. Information is contained in Byte 8-9 of
configuration request.
Assume that this request start by root complex and travel via Bridge or Switch (traverse like
other request transactions commonly) to PCIe End point or PCI/PCIX. The operations for Type 0
configuration read or write when it arrives on the destination bus, the devices on the bus decode
the header's device number field to determine which of them is the target device. Then the
selected device decodes the header's function number field to determine the selected function
within the device. The selected function uses the concatenated extended register number and
register number fields to select the target D word in the function's configuration space. Lastly the
function uses the first D word byte enable field to select the byte's to be read or written with in
the selected D word.
The only devices that pay attention to a type 1 configuration read or write are PCI-to-PCI
bridges. Upon receipt of a Type 1 configuration read or write request packet, a PCI to-PCI bridge
compares the target bus number in the packet header to the range of buses that reside behind the
bridge. If the target bus is the bridge's secondary bus, the packet is converted from a Type 1 to a
Type 0 configuration request when it is passed to the secondary bus. The devices on that bus then
decode the packet header and further process. If the target bus is not the bridge �secondary bus
but is a bus that resides beneath its secondary bus, the Type 1 request is passed through to the
bridge's secondary bus as is.
Native PCIe device must use MSI (Message signaled Interrupt) delivery.
Legacy End Point must support MSI and optionally support INTx messages. Such
devices may be boot devices that must use legacy interrupts during boot, but once its
driver loads MSIs are used.
PCIe to PCIX Bridge must support INTx messages.
5.1 Native PCIe
Interrupt Delivery
Message Signaled
Interrupts are
delivered to the
Root Complex via
memory write
transactions. These
are edge-triggered
signals At startup
time, the
configuration
software scans the
PCIe fabric and
discovers devices.
When a PCIe
function is
discovered, the
configuration
software reads the
capabilities List
Pointer to obtain
the location of the
first capability
register within the
chain of registers.
Then it searches
the capability
register sets until it
discovers the MSI
capability register
set. Software
assigns a D-Word
aligned memory address to the device' message Address register. This is the destination address
of the memory write used when delivering an interrupt request. Software checks the multiple
message capable fields in the device's message control register to determine how many event
specific messages the device would like assigned to it. The software then allocates a number of
messages equal to or less than what the device requested. At a minimum, one message will be
allocated to the device. Then it writes the base message data pattern into the device's message
Data register. Finally, the software sets the MSI Enable bit in the device's message control
register, thereby enabling it to generate interrupts using MSI memory write.
PCIe defines a variety of mechanisms used for checking, reporting and identifying errors and
also how to handle them in appropriate hardware and software elements.
PCIe error checking focuses on errors associated with the PCIe interface and the delivery of
transactions between the requester and completer functions.
These mechanisms are controlled and reported through configuration registers mapped in to
distinct regions of configuration space.
There are two types of errors. They are Correctable errors and Non-Correctable errors.
These errors can recover without any loss of information by the hardware which means hardware
itself corrects these errors. Example: When an LCRC error is occurred in a TLP, it might be
corrected by DLL. This is considered as correctable errors.
This impacts the functionality of the interfaces. These errors are of two type's fatal and Non-Fatal
errors.
Fatal errors are the non-correctable errors which occur when there arises a unreliability in related
hardware in a particular link. In order to get reliable operation, the components on the link must
make reset.
Whereas non-fatal errors are caused when there is unreliability during a specific transaction.
Isolation of these errors are possible. Resetting the components on the link is not necessary.