Академический Документы
Профессиональный Документы
Культура Документы
Specification
Doc #: H02174
Project: Xenon
Revision: 0.43
Date: 06/24/04
Version: 1.3 Xenon System Architecture
Proprietary Notice
The information contained herein is confidential, is submitted in confidence, and is proprietary information of
Microsoft Corporation, and shall only be used in the furtherance of the contract of which this document forms
a part, and shall not, without Microsoft Corporation’s prior written approval, be reproduced or in any way
used in whole or in part in connection with services or equipment offered for sale or furnished to others. The
information contained herein may not be disclosed to a third party without consent of Microsoft Corporation,
and then, only pursuant to a Microsoft approved non-disclosure agreement. Microsoft assumes no liability for
incidental or consequential damages arising from the use of this specification contained herein, and reserves
the right to update, revise, or change any information in this document without notice.
Published by
X-box Console Group
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052-6399
Telephone (425) 882-8080
Revision History
Revision Changes Date By Status
0.1 Original 2/26/03 Nick Baker Draft
0.1a Added Structure 3/15/03 Nick Baker Draft
0.2 Updated Southbridge description. Added Big Endian 6/5/03 Greg Williams Draft
notes as they apply to Southbridge
0.25 Updated top-level block diagram. Updated Southbridge. 8/27/03 Greg Williams Draft
Updated memory.
0.30 Updated Ana relevant sections 9/21/03 John Tardif Draft
0.35 Updated Southbridge Endianness section 12/15/03 Greg Williams, Draft
Stephen Au
0.36 Rolled in Andy’s WMA Endianness document 12/16/03 Greg Williams, Draft
Andy Walters
0.37 General clean-up 1/7/04 Nick Baker Draft
0.4 Added BackSide Bus Spec subdocument 2/13/04 Greg Williams Draft
0.41 Corrected and updated system information 3/3/04 L. Del Castillo Draft
0.42 Updated memory, memory interface, system boot, reset, 3/9/04 Harjit Singh Draft
and time sections
0.43 Southbridge update – USB overhaul 6/24/04 Greg Williams Draft
Conventions Used
Description Represents Examples
Documents Referenced
Title Document Number Author
Xenon Product Specification Todd Roshak (toddro)
\\xenon\specs\Xenon Product Spec.doc
Xenon Design Specification Harjit Singh (harjits)
\\xenon\specs\Hardware\Xenon Design Specification.doc
Table of Contents
1 Introduction ________________________________________________________________ 7
2 System Block Diagram________________________________________________________ 7
3 Architecture Overview ________________________________________________________ 8
3.1 Introduction ___________________________________________________________________ 8
3.2 Core Digital Components ________________________________________________________ 8
3.3 Architecture Justification ________________________________________________________ 9
3.4 System Components ___________________________________________________________ 10
3.5 Distributed Components ________________________________________________________ 11
3.6 Key Architectural Mechanisms __________________________________________________ 11
3.7 Low Level Software Architecture (?)______________________________________________ 12
3.8 Alternate SKU Considerations ___________________________________________________ 13
3.9 Technical Specifications ________________________________________________________ 13
3.10 Performance Overview (Sue) __________________________________________________ 14
4 Core Digital Components_____________________________________________________ 15
4.1 CPU _________________________________________________________________________ 15
4.1.1 Further Reading ____________________________________________________________________16
4.2 GPU_________________________________________________________________________ 17
4.3 Memory______________________________________________________________________ 19
4.4 Southbridge __________________________________________________________________ 20
4.4.1 Notes_____________________________________________________________________________21
4.5 Ana _________________________________________________________________________ 21
4.6 Front Side Bus (Art) ___________________________________________________________ 26
4.6.1 Link Layer ________________________________________________________________________26
4.7 Back Side Bus_________________________________________________________________ 27
4.8 Memory Bus __________________________________________________________________ 27
4.9 SMBus_______________________________________________________________________ 28
5 System Components _________________________________________________________ 29
5.1 Questions ____________________________________________________________________ 29
5.2 DVD_________________________________________________________________________ 29
5.3 HDD ________________________________________________________________________ 29
5.4 MU__________________________________________________________________________ 29
5.5 Game Controllers _____________________________________________________________ 30
5.6 Network _____________________________________________________________________ 30
5.7 Expansion ____________________________________________________________________ 30
6 Distributed Components______________________________________________________ 30
6.1 Audio________________________________________________________________________ 30
6.2 Video ________________________________________________________________________ 30
7 Key Architectural Mechanisms ________________________________________________ 31
7.1 System Dataflow ______________________________________________________________ 31
7.2 System Memory Map __________________________________________________________ 31
7.3 System Coherence _____________________________________________________________ 31
7.4 System Interrupt Mechanism ____________________________________________________ 31
7.5 System Ordering ______________________________________________________________ 31
7.6 System Security _______________________________________________________________ 31
7.7 System Endianess______________________________________________________________ 31
7.8 System Clocking_______________________________________________________________ 31
7.9 System Boot __________________________________________________________________ 32
7.10 System Reset ________________________________________________________________ 33
7.11 System Time ________________________________________________________________ 33
7.12 Power States & Power Management ____________________________________________ 33
7.12.1 Off ______________________________________________________________________________33
7.12.2 Standby___________________________________________________________________________33
7.12.3 Quiet _____________________________________________________________________________34
7.12.4 Full Power ________________________________________________________________________34
7.12.5 Power Control Events________________________________________________________________34
7.13 CPU/GPU Synchronization (Nick) ______________________________________________ 39
7.14 CPU – GPU Procedural Geometry Communication (Nick)__________________________ 40
7.14.1 Vertex commands___________________________________________________________________40
7.14.2 GPU Memory mapped registers ________________________________________________________40
7.14.3 Current Implementation ______________________________________________________________41
7.14.4 Requirements ______________________________________________________________________42
7.15 System Debug Facilities (?) ____________________________________________________ 43
7.15.1 Low Level Debug ___________________________________________________________________43
7.15.2 Development Systems _______________________________________________________________43
7.16 System Bandwidth / Latency Roll-Up (Nick)______________________________________ 43
8 Low Level Software Architecture (MarcW?) _____________________________________ 45
8.1 Flash Resident Drivers _________________________________________________________ 45
8.2 BIOS ________________________________________________________________________ 45
8.3 Low Level Drivers _____________________________________________________________ 45
8.4 Network Stack ________________________________________________________________ 45
8.5 Procedural Geometry __________________________________________________________ 45
8.6 Video Mode Selection __________________________________________________________ 45
9 Alternate SKU Considerations (Nick) ______________________Error! Bookmark not defined.
10 Other ___________________________________________________________________ 46
10.1 Not Covered ________________________________________________________________ 46
1 Introduction
The purpose of this document is to capture the specification for the Xenon system architecture.
It serves as a high level description and requirements for interoperability of different core components between
themselves, other IO devices and peripherals and software. It is not a detailed architecture description of individual
components. The intended audience is hardware and software architects and designers who want a high level overview
of the system.
GPU CPU
RJ-45 Launch Process: 90 nm bulk (TSMC 90GT) Launch Process: 90nm enhanced SOI (10KE0)
Ana SB Launch Die Size (main) : 177 mm^2 Launch Die Size: 168 mm^2
Launch Process: 0.18u Launch Process: 0.15u Launch Die Size (EDRAM) : 71 mm^2 Frequency: 3.0-3.5 GHz
Launch Die Size: 13.4 mm^2 Launch Die Size: 34.7 mm^2 Core Frequency: 500 MHz Core VDD: APS 1.075V – 1.275V (@ Ball)
Frequency: 170 MHz Frequency: 125 MHz Core VDD: 1.1V Power: 85W
Core VDD: 1.8V Core VDD: 1.8V Power: 38W (29W + 9W eDRAM) Power/Ground Bumps: 2113
Power: 1.3W Power: 3.2W Signal I/O: 443 Signal I/O: 219
Signal I/O: 81 Signal I/O: 183 Power I/O : 582 (282 I/O, 300 core) Power/Ground Balls: 680
Power I/O: 61 Package: 23x23 382 TEBGA Package: Flip-Chip 35x35 1025 ball BGA Package: Flip-Chip 899 Ball Plastic /Organic BGA
Package: LQFP144 - Target ThetaJc = 0.2-0.5 degreesC/W
The diagram[NRB1] shows the main system components. These are described in more detail in the next section. Note that
the latest version of the diagram may be obtained from: \\xenon\specs\Architecture\Xenon System Block Diagram.vsd
3 Architecture Overview
3.1 Introduction
The following sections give a high level view of the system architecture. For further reading see the
corresponding main chapters later in this document.
• Ana: Ana is the ANAlog chip that contains the system clock reference, video DACs,
thermals sensors as well as the digital encoder for analog video standards
(NTSC/PAL/HDTV/VGA).
See \\xenon\specs\Ana\Ana_ One_Pager.doc for an overview.
• Memory: The system has a unified memory architecture consisting of GDDR memory.
128MB, 256MB, 512MB and 1GB memory configurations are supported, although 256MB
console with 512MB development systems are the POR.
See \\xenon\specs\memory\specs\ati_spec_16mx32_8b_v11.pdf for a sample part spec.
The components communicate over the following interfaces:
• Front Side Bus (FSB): Interface between the CPU and GPU. This is a 5.4Gbps differential
link custom to the CPU vendor. It is symmetrical with 10.8GB/sec peak in the write
direction, and 10.8GB/sec peak in the read direction. The high bandwidth (in the write
direction at least) is to support procedural data generation (XPS) on the CPU which is
pushed in a tightly coupled fashion to the GPU.
See \\xenon\specs\CPU\FSB\FSB_BUSSPEC.pdf for the FSB documentation.
• Back Side Bus (BSB): This connects the GPU to the SouthBridge. This is a PCI-Express
2x bus with a peak of 500 MB/sec in each direction.
See \\xenon\HWDev\Electrical\Southbridge\Interface Specs\PCI\pciexpress_base_10a.pdf
for the PCI-Express base specification.
• Memory Bus: The interface between the GPU and main memory. This is GDDR, running at
1.4-1.6Gbps. At 128bit wide, this provides a peak of 22.4GB/sec (@1.4Gbps). The exact
frequency will be determined later based on the availability of parts.
The sample part spec (\\xenon\specs\memory\specs\ati_spec_16mx32_8b_v11.pdf) also
serves as the interface definition.
• Xenon Digital Video Output (XDVO): This is the pixel output bus that interfaces the GPU to
the video encoder portion of Ana. It is a 15bit 135MPix/sec DDR (2 cycles to transfer one
pixel) bus that supports most HDTV and HD Monitor standards135MPix.
See (\\xenon\Specs\Ana\XDVO.doc) for the XDVO bus documentation.
• System Management Bus (SMBus): This is a low pin count serial interface (similar to IIC)
that the various chips use to communicate with one another for reset and power
management purposes. Most likely there is no direct connection to the GPU/CPU, other
than indirectly say through resets.
See \\xenon\HWDev\Electrical\Industry Standards\SMBus Version 2.0.pdf for the base
specification.
The full reasoning behind the architecture is beyond the scope of this document (please read the
“Think Week White Papers” for a more thorough analysis).
At the highest level the system looks like a multi-processor PC with integrated graphics. This was
not necessarily the intention, and actually the distance of the CPU from main memory was longer
than hoped. A split memory architecture with local processor memory was desired, but cost
constraints dictated a unified memory architecture. Once that decision was made, placing the
memory next to the highest bandwidth customer (the GPU) was the next logical step. This does
present a memory latency problem to the CPU, so large caches and CPU pre-fetching are required
to compensate.
The next level is the exact intent of the multi-processing and multi-threading. Again this was driven
by cost efficiency reasons. For developers, the easiest architecture is to present a single high
performance processor. However, the CPU industry has reached a limit to how far instruction level
parallelism can be taken with increasing cost and complexity for doing so. Forcing parallel
programming with several simpler (cheaper) processors especially in a closed environment such as
a game console is a logical step, for which there is also prior art. Furthermore, several areas of a
game (especially at the low levels of physics and rendering) are parallelizable.
To aid in the parallel programming within the rendering pipeline in particular, the CPU and GPU
have been closely coupled to allow procedural generation of data. This also helps in a cost
constrained environment where the amount of system memory a developer would need to store all
the offline generated art they want can never be achieved. Even if a developer does not want to
tackle a multi-processing problem, by using Microsoft supplied APIs he can effectively take
advantage of the extra processing power to perform parallel number crunching as well as
decompression of geometry and to some extent textures.
An interesting note here is that because parallel programming is hard, we do want to get significant
performance out of a single CPU core, so the processor (read single core) chosen still has
competitive SpecINT.
The use of Embedded DRAM also requires some discussion here. Graphics processors are
extremely bandwidth hungry, and this is typically solved in PC graphics by using very wide memory
interfaces, render target and z-buffer compression, and on-chip caching. EDRAM was chosen
because going wider than 128bits was not a cost option, and because compression and caching
typically behave unpredictably.
The choice of a DirectX compatible 3DCore should be self explanatory, and the most up to date
version of the standard was chosen (DX10). However, given schedule / cost constraints, not all of
the DX10 spec made it into the hardware. The main rendering features of interest are:
• Unified Shader Core: This allows effective load balancing of vertex and pixel shaders, so
achieving better efficiency of the compute resources.
• Multi-Render Target: This allows deferred lighting passes, where per pixel computations
can be performed in a geometry independent fashion.
• High Dynamic Range: Floating point and high precision fixed point formats are supported
to allow HDR effects.
The core digital components (CPU, GPU, SB, Memory, Ana) comprise the minimum architecture
components required to be able to boot and run the OS. In addition there are several storage and
IO devices supported, all of which may or may not be present in a given product configuration.
- DVD: Used for game content delivery.
- HDD: Optional component used for the alternate SKUs to enhance certain capabilities
such as ripped audio, saved games.
- Ethernet (10/100BaseT): For Live and sharing content with PCs. Will also be used for
development systems.
See \\xenon\HWDev\Electrical\Ethernet PHY\VIA-DS6103110.pdf for example external
PHY specification.
- Audio DACs: Separate component for audio output.
See \\xenon\HWDev\Electrical\Audio\Wolfson\WM8726.pdf for example spec.
- Memory Cards: As in Xbox1, saved game data can be stored on USB based memory
cards. Without a HDD in all configurations, this will be required for all game saving on
Xenon.
- Wired Controllers: As with Xbox1, the wired controllers use a modified version of USB
(XUSB).
- Expansion Devices: A couple of standard USB ports will be available for other expansion
devices such as USB HDDs, Cameras, etc.
- IR Input: IR is supported directly by the SB.
See \\xenon\HWDev\Electrical\IR Receiver\Xenon Infrared Receiver Spec.doc for spec.
- Wireless Controllers: Wireless controllers are supported via additional circuitry that
interfaces to one of the SB USB ports.
- AV Packs: Again similar to Xbox1, Xenon supports a Audio Video Interface Port (AVIP) to
break-out the audio and video signals depending on the AV components the end customer
has. Planned AVPacks would be: Standard (Analog Stereo Audio+Composite Video), RF
(one each for North America, Japan, Europe), Enhanced (Digital Audio + S-Video added),
SCART (RGB component along with Composite video), Component, and VGA. See
\\xenon\Specs\Peripherals\Xenon_AV_Pack_Design_Spec.doc for design spec.
- HDMI/HDCP: The base architecture also supports routing the XDVO bus to an optional
HDMI chip.
Documentation for the different IO busses can be referenced as follows:
- Serial ATA:
\\xenon\HWDev\Electrical\Southbridge\Interface Specs\ATA\Serial ATA 1.0 gold.pdf
There are a few features for which there is no dedicated processing, rather processing is shared
amongst the different chips and emulated. These are called out below:
- Digital Video Processing: There is no dedicated MPEG decoder in the system. The
processing (decode) is expected to be performed completely on the CPU, with maybe
some help from the 3DCore’s shader array if needed.
- Audio: The audio support on the hardware is limited to WMAPro decode and audio out
DMA. All voice generation, mixing, and effects processing is done on the CPU.
The salient points about how certain architectural features are implemented are listed below. Refer
later to more detailed descriptions:
- Endianess: The CPU is Big Endian (byte ordering). All devices on the system are big
endian as well, except for the SouthBridge IO components which are Little Endian (due to
their PC heritage).
- CPU Coherence: Coherence between the CPU cores is maintained by hardware. The
coherence point is the L2 cache. To aid in this the L2 is inclusive of the L1 caches, and the
L1 caches are write-through.
- DMA Coherence: Only IO coherence via snooping is implemented. High bandwidth
devices, such as rendering, should avoid using this mechanism and software should use
non-cached write combining, or software managed coherence when synchronizing data in
these cases.
- Instruction Ordering: PowerPC is loosely ordered. This requires the use of barrier
operations to force ordering when required, e.g. when accessing hardware registers.
- DMA Ordering: As always, ordering rules are required to guard against race conditions
between DMA transfers, interrupts and Memory Mapped IO (MMO) operations. At a high
level, the hardware will use interrupts to guarantee that ordering is maintained. There is no
support for simultaneous fine grain (e.g. within a CPU cache line) access to any memory
location or register by different devices.
- Interrupts: Interrupts are message based (there is no interrupt pin on the CPU). Messages
flow from the Southbridge to the main collector in the Northbridge which then forwards the
messages to the interrupt processor on the CPU chip. Emulation for edge triggered and
level sensitive interrupts is supported.
- Memory Map: The CPU can address a 42bit memory range. Only 32bits are available
outside of the CPU (providing a 4GB main system memory map). This 32bit space is
broken down into a 1GB main memory window (not all of which may be present), 2GB of
reserved, and 1GB of MMIO and configuration space. Internal to the CPU, the additional
memory range is used to implement a secure boot environment to guard against certain
security attacks, i.e. there are certain structures on the CPU that only it can address, and
only in a super-privileged (HyperVisor) mode.
- Security: Secure boot, piracy prevention and DRM are implemented via a security scheme
that relies on a boot sequence that start on the main CPU die itself, as well as with a
security engine that allows blocks of main memory to be protected.
- Boot Procedure: The OS kernel is booted via a several stage boot process. Initially the
CPU’s internal bootloader (BL1) starts up. This fetches and decrypts the stage 2
bootloader (BL2) from external FLASH. The BL2 enables main memory and copies the
kernel from FLASH to main memory before entering the kernel.
- Xenon Procedural Synthesis: There is a collection of features implemented in the CPU and
GPU that allow transient geometry data to be generated by the CPU and absorbed directly
by the GPU without hitting main memory. Briefly, a 128kB set of the CPU’s L2 can
optionally be locked down for several geometry FIFOs. These FIFOs can be read directly
by the GPU so as to fetch vertex data. A low latency (non-interrupt based) synchronization
scheme is achieved by allowing the GPU to write command processor tail-pointer updates
directly to the CPU. The CPU also allows streaming data past the L2 for reads, and past
the L1 for writes to assist in avoiding cache pollution. Intrinsic VMX2 data pack instructions
are also an important feature.
3.7.1 Overview
3.7.2 Hardware Resident APIs/Code
This section defines what drivers and/or code should live in FLASH vs. in the title library (and
therefore game media). This is so that hardware can be rev’d over time and still provide backwards
compatibility.
1. Power Management
2. DVE
a. HDMI support may/will require additional code space
b. Closed captioning
c. Wide-screen signalling
3. Video Resize
4. Video Colorspace conversion
5. Video Gamma
6. Temp sensor
a. Calibration parameters need to be stored on a per box basis
7. Ana (Clocks, etc.)
8. Ethernet Phy
9. Audio codec
10. Flash
11. USB
12. SATA
13. FSB settings
The core architecture has the requirement to support a few different SKUs.
This list can be boiled down to allowing for 2x memory, future use of HDMI/HDCP. I think other
expansion issues such as RTC and USB for Helium are out of place in this document, as they are
more implementation-specific.
• Development Systems: The main constraint imposed by the DevKits is the requirement to
double the amount of main memory (512MB) and still maintain the same memory
performance. The challenge here will be the electricals when doubling the memory parts.
The DevKits will also use the serial debugger interface on the SouthBridge for kernel
debugging.
• PRO SKU: This is identical to the game console, except it will come with different
peripherals (including HDD and wireless gamepads) as standard.
• PC SKU: This is the most challenging and to some extent the least well understood.
Nominally Windows (XP or Longhorn) will be run using the Connectix emulation layer.
Additional peripherals including a Real Time Clock (RTC) and USB Hub will be added to
the motherboard. A maximum display resolution of 1280x1024@75Hz has been chosen,
which drives the minimum amount of Embedded DRAM and the maximum pixel display
rate (135MPix/sec).
• HDTV DVD: Not currently POR, but we are planning hooks for future HDTV DVD playback.
Other than the correct choice of media and compression scheme, the main issue for the
core architecture is the copy protection required on the display output. It looks like this will
be HDMI/HDCP which is not supported in the current Ana. A couple of options for design
updates at a later date are possible and discussed later.
Waternoose SOC
CIU
1 MB Shared L2
L2C
NCU 1MB
Bank 1
BIU
MPi Bus
MPi Bus Security Engine
FSB
Interrupt C. Link Layer Boot RAM
Physical
Layer
Sec. fuses Boot ROM
FSB
GPU
IBM FSB
Physical
Layer
Link Layer
Coherency Block
The CPU is a multi-core SOC arranged in an SMP fashion. All cores are identical and are
optimized for vector floating point, as is common in 3D graphics applications. It is the belief that
certain portions of a game are parallelizable, so we provide parallelism both at the core level (SMP)
and thread level (SMT). To ensure that this system is programmable by a wide range of
developers, the cores are all coherent with each other, and Microsoft intends to provide middleware
libraries to developers so that the parallelism is hidden from those developers that do not want to
take this programming challenge.
To help visualize how this system may be programmed in this fashion one possible example is
discussed. One core is allocated to the game engine, this is in fact the only CPU core than the
developers program. All threads associated with that game run on this “Host” core. A set of API
functions implement accelerated and optimized routines for physics, animation, collision detection,
audio, etc. that run on a second core. A third core is dedicated to procedural synthesis. This last
one is an important subset as the CPU and GPU have dedicated hardware to allow procedural
synthesis on the CPU to be efficiently pushed in a tightly coupled fashion to the GPU, without
spilling to memory. For the API functions mentioned, including the procedural geometry, the
developer can provide their own routines that the XOS schedules appropriately.
Important features of the CPU:
• ISA: 64bit PowerPC, derived from Power4 architecture.
• SMP: All cores are identical and coherent with one another.
• SMT: All cores support 2 simultaneous threads.
• Vector Floating Point (VMX2, 128bit): The cores each have a dedicated vector floating
point unit that is capable of performing the equivalent of a DP4 each cycle at sustained
throughput. There are also special instructions for data swizzling and compaction so as to
be compatible with common 3D datatypes and operations.
• Advanced Cache Management: The caching control supports many of the issues common
with multi-media systems with optimizations for data-streaming, set locking etc.
• System Coherency: The CPU supports a coherency protocol that allows the L2 and L1
caches to be coherent (if so desired) with any DMA hardware (3D or otherwise). Use of this
is via snoops, so this should only be used for low-bandwidth operation.
• System Security: The CPU implements a confidential scheme that allows the system to be
protected for copy protection, DRM and privacy purposes.
MMU
Cached write combining and the RC machines
Instruction throughput / latency
Streaming support
Locking support
XPS support
Scalar FP
4.1.1 Further Reading
The following documentation is recommended for a better understanding of PowerPC and this
processor in particular.
Standard PowerPC documentation:
\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PowerPC_Architecture_Book_I.pdf
\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PowerPC_Architecture_Book_II.pdf
\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PowerPC_Architecture_Book_III.pdf
Xenon specific documentation:
\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PPC_WN_Book4.pdf
\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\vmx128-isa-000.pdf
4.2 GPU
NorthBridge Description
3D core features
Control / synchronization
Driver implementation issues and architecture
Procedural synthesis
The GPU is the main system controller hub, connecting the FSB, memory interface and BSB. It
also contains the 3D rendering core.
The following diagram shows conceptually how the major components within the GPU are
connected.
16b/5.4GHz
16b/5.4GHz
ATI/MS Confidential
FSB Rx
FSB Tx
FSB CLK: 675MHz
MEM CLK: 800MHz
Core CLK: 500MHZ (White) FSB PHY Tx FSB PHY Rx
PIX CLK: 135MHz
PCI-E CLK: 2.5GHz FSB Link Tx FSB Link Rx
EDRAM CLK: 500MHz
FSB Link (MCLK)
128b
128b
Bus Interface Unit (MCLK)
BIU (SCLK)
Master
Master
Slave
Slave
Read
Write
Write
Read
32b
32b
32b
32b
PCI-E PHY
PCI-E Link
Snoop
Read /
Write
32b
32b
GFX
SB Read 32b
SB Write 32b
EDRAM
3D Core
(RBCLK)
XDVO
Display
PHY
XDVO
16b/270MHz Controller
32b
bc / rb
32b
32b
32b
32b
vgt
vc
tc
XPS
Memory Hub
Buffer
256b
256b
256b
256b
128b
128b
128b
128b
Memory Memory
Controller 0 Controller 1
1.6GHz
1.6GHz
64b
64b
Note in the diagram that there are no DACs but there is an independent Video output bus to Ana
which contains a digital encoder for analog video as well as the video DACs.
TODO: Add RBClk domain.
GPU Memory Configurations
Configuration Total Memory # of Ranks Banks Row Column
4.3 Memory
The memory devices shall conform to the GDDR3 memory specification. The key features of the
devices are:
• 512Mb device density configured as 16Mx32 devices
• Two data accesses per clock cycle with a 4n prefetch
• Differential clock. Clock frequency of 800 MHz
• Single ended, per byte read and write strobes
• Pseudo open drain I/O with calibrated output drive
• On die termination
• Packaging supports mirror function to allow a clamshell memory design
• Packaging supports 1.6Gbps signaling
• Need to add: banks, cycle time
Important features of the memory are:
• Given our best estimate of memory pricing and our overall cost target, a total capacity of
256 MB is targeted for the product. This would be based on (4) 512 Mb, x32 parts.
• Given the need for extra memory to accommodate debug and pre-optimized games, a total
capacity of 512 MB is targeted for the development systems.
This will be based on either (8) 512 Mb, x32 parts (2 DRAM loads on the data wires), or (8)
512 Mb, x16 parts. The first of these cases might require special operating conditions as
we do not need to guarantee operation in high volume and over all conditions. The latter
requires special part development by a DRAM vendor, though some vendors have
indicated the possibility of a single design supporting both x16 and x32.
• Because memory pricing is volatile and difficult to forecast, these targets could change and
the device availability over the life of the product must support a range from 128 MB to 1
GB of memory.
• Support for a path to (2) 1 Gb parts is necessary in the controller, though it may never be
cost effective to make this transition. This must be completely seamless.
• The memory devices shall support a boundary scan capability to allow verification of the
connection between the GPU memory controller and the memory devices. The test
coverage shall include shorts between signals, opens on the GPU, the signal trace or
memory device. It may support coverage that allows isolation of the problem to a particular
end of the signal; for example, the open is on the GPU side and not the signal trace or
memory device.
The memory configurations supported by the GPU and memory interface are documented in the
GPU section. See chapter 4.2.
Eventually, we’ll need specifications for the exact timing parameters (on the order of 1 or more
clock cycles) used by the bus.
4.4 Southbridge
Southbridge
PHY PHY MAC JTAG
WMA DFT TAP
DVD
HDD Pro Ethernet
Drive
Interface Decoder MAC
Interface
Group A ECB(PCI)
HBEDB
OHCI Note: Bus Arb /Muxing module for both busses
Group A USB2.0/ not shown . ECB: 15 units, HBEDB: 6 units
Devices 1.1/XUSB EHCI
Bridge PCI-Express
PHY (PCIE2ECB Ctrl (PCIECB)
BDG) for CPU -init
config
PCI-
Expr x2 BackSide
32 PHY
32
x2 x2 On-Chip Bus to NB
PHY
Group B dir Bridge dir Control
(ECB2HDEDB
BDG) for LB
(OCBCB)
device DMA
EHCI
Group B USB2.0/ Interrupt
Devices 1.1/XUSB OHCI Collector
PHY
UART Kernel Debug
SMM
Timer GPIOs
System
Audio Out Flash PWM x2
Ana
SMC UART
Interface SMC Debug
8051 IR IR
RTC
Audio
Flash Dead Box Ana AVIP Front-Panel
DACs
Xface Power Supply Unit
DVD Tray
To AVIP SiS
Microsoft
The SouthBridge chip is akin to those found in traditional PC architectures, but is customized for
the game console, primarily for cost reasons. Some of the more standard (or soon to be standard)
functions and interfaces in the SouthBridge are:
4.5 Ana
This section needs to focus on the only real architecturally-significant functional blocks in the ANA
chip: the DVE and the clock architecture. The thermal sensor, fan control, etc are design-specific
details that are not relevant to the Xenon console.
The Ana chip consists of four main functional blocks: A Thermal Sensor (TS) block which is used to
monitor system temperatures, a Clock Synthesizer (CS) which is used to generate the required
system clocks for Xenon, a Digital Video Encoder (DVE) which is required to convert a digital pixel
stream from the Northbridge into analog outputs suitable for connection to a television or monitor,
and a System Management Bus Interface which provides the host interface for Ana. A simplified
block diagram of the Ana chip is shown in Figure 4.1.
System
Crystal
( 27 MHz)
To/From
SMC
Video Clock
SMBUS
Control Thermal Analog
To External
Interface Sensor Sensors( Diodes)
CSR
Interface
DAC A Analog
CSR Interface
DAC B Analog
To Filters&
1 X Pixel Clock In Video Interface
From Pixel Digital Port
Generator Pixel Data DAC C Analog
( GPU)
Video
Digital Timing( Syncs)
Encoder
DAC D Analog
Digital Timing
• A pulse width modulated signal converted to a fixed voltage from the Southbridge
• A feedback signal from external fan driver circuitry
• The output of the op-amps drive the external fan drive circuitry.
Ana contains a JTAG interface for boundary scan of the XDVO bus and for controlling an internal
tap controller which can be used to access analog IP test structures. Parallel scan chains will be
used for ATPG coverage of digital logic.
Interrupt functionality is provided via an interrupt pin (VID_INT). This pin is attached to the closed
caption logic in the DVE and will signify when the hardware is ready to accept more closed caption
data for a specific field.
There are also miscellaneous pins on Ana devoted to various contingency/visibility options (e.g.
bypasses for PLLs, power on reset cell and crystal oscillator, bringing out video clock, oscillator
output and output of power-on reset cell to pins).
The following pertains to the system clock generation:
• All component outputs are observable on Ana pin for debug and measurement purposes.
Similarly, all inputs to components can be directly supplied from pins (either during
standard operation or via bypass inputs for debug, measurement, and contingency
purposes.
• Power down capability for every PLL (but not the oscillator).
• Single 27 MHz oscillator source with bypass allowing external clock source. Output of
oscillator available on external pin for debug, measurement purposes as well as ability to
slave external clock generation device to internally generated clocks (though phase locking
not supported). Always running as long as box is plugged into wall.
• Programmable Video (and 2xPixel) clock generation (up to 170MHz video DAC frequency
or 135 Mpix/sec pixel rate) for video encoder and pixel interface. With bypass capability
allowing video clock to be supplied from external clock source. Locked to audio clock.
2Xpixel clock is output externally via differential outputs. Output enabled only when box is
powered on.
• 24.576MHz Audio clock generation. Locked to video clock. Output enabled only when box
is powered on.
• Input to audio and video PLLs from either the on-chip oscillator or from external clock
source. Allows for off-chip pullable oscillator.
• 25 MHz clock generation for Ethernet clock and Southbridge. Outputs enabled only when
box is powered on.
• 48 MHz standby clock generation for internal SMBus clock (bypassable with external pin)
and for standby clock to Southbridge. Always on as long as box is plugged into wall.
• 100 MHz clock generation for Serial ATA components and interfaces. Selectable spread
spectrum with -0.5% down-spread triangle modulation. Output via differential outputs.
Output enabled only when box is powered on.
• 100 MHz clock generation for Northbridge, PCI Express and CPU (separate differential
outputs for each. Selectable spread spectrum with up to -1.5% down spread spectrum
modulation. Outputs enabled only when box is powered on.
The following pertains to the thermal measurement block:
• remote temperature sensing channels to monitor CPU, GPU, EDRAM and board
temperature diodes in addition to calibration channel
• Metal fuse window planned for trimming band gap
The FSB is divided into six sections consisting of three Layers which are further broken into
Transmit and Receive sections: The Transport layer communicates with the rest of the CPU/GPU,
The Link layer is responsible for CRC generation/checking and handling error checking and packet
retransmission (as well as data alignment in the receive section), and the Phy layer, which changes
from a wide, slower interface to the 5.4 Gb/sec. PCB lanes. The phy receive layer is responsible
for bit-aligning the data to the forwarding clock transmitted with the data.
The CPU FSB transport layer also communicates with the Security Unit for encryption and
decryption of some data. It also has some other logic not needed in the GPU version: An ability to
map the 10-bit MPI tags used inside the CPU to the 5-bit Transaction IDs (TIDs) used by the FSB
(and the GPU). Other than these differences, the primary difference between the CPU and GPU
versions of the FSB are that they are implemented in different silicon processes, and the units have
different primary datapath widths: 8 bytes in the CPU and 16 bytes in the GPU.
Needs update …
The Design of the FSB Link and PHY will be provided by the CPU company.
The FSB is a high performance bus that connects the GPU and the CPU. Like many high speed
busses, it is comprised of a link and a physical layer. The general physical specifications of the
bus are in the following table.
For each of the FSB commands, the packets include the following informational fields:
The Back Size Bus is a 2 lane PCI-Express link connecting the SouthBridge to the GPU.
The memory bus shall conform to the GDDR3 specification. The key features of the GDDR3 spec.
are:
• Unified memory architecture utilizing high-speed graphics memory devices.
• Total data bus width of 128-bits
• Data signaling at 1.6 Gbps, implying peak bandwidth 25.6 GB/sec.
• Because memory pricing is volatile and difficult to forecast, these targets could change and
the interface must support a range from 128 MB to 1 GB of memory.
• To provide better utilization of the memory bus, two independent memory controllers are
likely. The memory interleaving between these controllers, as well as between the internal
banks within the DRAM, is to be determined.
• Support for a path to (2) 1 Gb parts is necessary in the controller, though it may never be
cost effective to make this transition. This must be completely seamless.
• Note that identical performance between XDKs and consoles is only met when running in
the lower half of the XDK memory. When the upper half of the XDK memory is used, there
is an extra cycle penalty when doing back to back reads from alternating ranks.
• The memory interface shall support a boundary scan capability to allow verification of the
connection between the GPU memory controller and the memory devices. The test
coverage shall include shorts between signals, opens on the GPU, the signal trace or
memory device. It may support coverage that allows isolation of the problem to a particular
end of the signal; for example, the open is on the GPU side and not the signal trace or
memory device.
The memory configurations supported by the GPU and memory interface are documented in the
GPU section. See chapter 4.2.
4.9 SMBus
I think only one is required, that which must communicate to the AVIP. The mechncanism by which
the DVE is configured I believe is implementation-specific.
The system shall support two SMBus V2.0 compliant interfaces. These interfaces may be
contained in the Southbridge and shall be accessible by the CPU and System Management
Controller.
One SMBus interface shall be used for communication internal to the system. In the existing
system architecture, the interface is used to connect the Southbridge to the Ana IC. If other SMBus
devices are added to the system, they shall use this interface.
The second SMBus interface shall be used for communication with external devices over the Audio
Video Interface Port (AVIP). The two uses identified are:
• Video Electronics Standards Association Display Data Channel (VESA DDC) found on
Video Graphics Adapter (VGA) monitors
• Controlling the console in a Kiosk for point of sale demonstration purposes. The controls
include power, DVD and system settings. This application may be extended to include
development and factory test.
The SMBus interfaces may be time multiplexed, thus the Southbridge is the only supported master.
5 System Components
5.1 Questions
Are we going to encrypt data. Linked to per box basis? What level or performance extraction.
Critical levels of performance. Be explicit on this.
How do we bundle things together for online. Harddrive be some of internal. Assume user level
expansion.
Some of these should be pluggable by the user. Have Xbox LOGO program.
5.2 DVD
The DVD drive is a custom form factor drive built specifically for the Xenon console.
General Specifications:
Form-Factor: Sub-half-height, custom form factor for Xenon console[NB5]
Interface: SATA 1.0 + sideband tray control and status
Speed: CAV, 12x DVD at outer diameter
Media formats (read-only): CD, CD-R, DVD-5, DVD-9, DVD-X2 (Xbox 1.0), DVD-X3 (Xbox
2.0)
Access time: 115ms average
5.3 [NB6]HDD
5.4 [NB8]MU
The console architecture supports connectivity to both wired and wireless game pads designed
specifically for Xenon.
Wired Gamepad:
Interface: X-USB, low-speed USB signaling with expanded payload capability
Power: 4 units or less of USB power
Connector: Standard USB connector
[NB11]Wireless Gamepad:
This section should focus on how the wireless radio interconnects to the system via USB. There
should be reference to the remote resume requirement.
Wireless Interface: Custom 2.4GHz spread spectrum, half-duplex transceiver
Wired Interface: Used during recharging, X-USB
5.6 [NB12]Network
Expansion of the base console and connection to peripheral devices is via three USB 2.0 ports, two
mounted on the front panel, one mounted on the rear panel.
6 Distributed Components
6.1 Audio
A description of the distributed Audio processing can be found in the following document.
\\xenon\specs\Architecture\Xenon Audio.doc
6.2 Video
A description of the distributed Video processing can be found in the following document.
\\xenon\specs\Architecture\Xenon Video.doc
This document discussed the main producer / consumer models of the system.
2-6-04 Ordering
I’ve extracted the relevant part of Hartog’s response on this (long and tortuous) thread:
I think that the implication of the behavior we're discussing here is that the GPU and the CPU cannot reliably use a flag
in memory to indicate the arrival of some chunk of data from the CPU, since if the GPU polls such a flag, it can see it
set before the data is actually delivered. This occurs because the in-order property for writes is not preserved for
locations that are in different banks (channels) , and because there is no concept of a "conflicting read" for reads issued
by the GPU. I think the question for us is whether the GPU ever needs to do such polling. Right?
[SH] Exactly.
The CPU, GPU and SB shall support a mechanism to allow fetching and execution of the
1BL from the system flash. This mechanism shall be functional without any configuration or
setup.
4. The 2BL sets up the hardware to get memory going and anything to get ROM XIP more
performant. Once the memory is up and running, the 2BL copies the 3BL into memory and
jumps to it.
5. The 3BL uses the System Flash controller’s DMA interface and copies the kernel from the
system flash into memory. It patches and verifies the memory image. Again system flash
interface in SB deals with idiosyncrasies of NAND flash. Incidentally, some value slightly
less than 8 MB is actually available to software due to parts shipping with bad blocks.
6. The CPU jumps to the kernel now in memory.
7. Main system boot phase. Note that none of what was described dealt with encryption or
security in the Southbridge or the system flash itself. Believe this is consistent with
Dinarte’s preferred scheme but need to parse latest mail and verify this.
This section should focus on the aspects of reset sequence that are relevant to architecture. It is
my opinion that the details of reset sequence are entirely design-specific.
System Reset is controlled by the System Management Controller. The various reset signals are
configured in a star topology, with the SMC as the hub. The reset sequence follows the power-
supply sequence, and consists of sequentially releasing reset to various components, verifying an
acknowledgement, and proceeding to the next stage.
The detailed power-on reset diagram is located in
\\xenon\Specs\Hardware\Xenon Power on Reset.vsd
The three power [NB16]states for the system are: “Off”, “Standby”, and “Operational”.
To reduce overall system power consumption, power minimization techniques may be employed in
all states.
This section should describe the aspects in which the architecture needs flexability in order to
support quiet mode operation:
1. CPU cores must be capable of being individually enabled/disabled
2. The GPU must have capability of disabling shaders
3. The CPU shall have selectable clock frequency between “Fast” and Slow”
SW will need to go through a rigid sequence of operations to gracefully change power modes
(there is no automatic hardware power sequencing control).
7.12.1 Off
This is the state of the system when the power supply is not plugged into the wall and/or the power
supply is not plugged into the system.
When the system power supply is plugged in and the power supply is connected to the system, the
system shall transition to the Standby state.
Regardless of how short a duration of time that the system enters this state, on application of
power, it shall transition to the Standby state without operator intervention subject to certain ESD,
susceptibility limitations.
7.12.2 Standby
This state shall be entered from either Off, Quiet or Full Power states.
In this state, the functions that shall be powered include: The clock generator for the SMC, the
SMC, the SMC firmware store, the front panel button circuitry, the expansion ports, the power
circuit in the AVIP, the IR receiver and demodulation block, the wired and wireless controller ports.
On detection of any power up event on the front panel buttons circuitry, the AVIP, the IR
demodulator, the wired and wireless controller ports, and the SMC power cycle timer, the system
shall transition to the Full Power state.
For details on how the system handles power events via the expansion port, see section 7.12.4.2
In the event of a power interruption that is less than 16 milliseconds in duration, the system shall
remain in this state and continue to operate as if no power interruption occurred. If the power
interruption is longer than 16 milliseconds, the system may transition to the Off state.[NB18]
7.12.3 Operating States
This section should take the place of the Quiet and Full Power states below, and focus on the
configurations of the system to implement various power states. The degrees of freedom are:
• CPU Cores enabled {1, 2, 3}
• GPU Shaders enabled { 1, 4, 12}
• ODD spin max speed: {slow, med, fast}
7.12.3.1 Quiet
This state shall be entered from Full Power state only.
This state is designed for A/V playback and wireless game pad charging. The goal is to minimize
system acoustic level. This shall be accomplished by reducing the system power consumption to a
minimum. The CPU, NB shall include circuitry that allows portions of the chips to be turned off,
slowed down either under program control or by internal circuit usage determination functions.
The SB may include circuitry that allows portions of the chips to be turned off, slowed down either
under program control or by internal circuit usage determination functions.
On detection of a power down event, the system shall transition to the Standby state.
On detection of loss of power, the system software shall put the system in a safe state and
immediately transition to the Standby state.
The system software may transition the system to Full Power state based on user input.
The system software shall monitor transitions between this state and Full Power state and ensure
that they occur at a frequency that is not annoying to the user.
7.12.3.2 Full Power
This state shall be entered from Standby or Quiet state.
In this state, the system shall operate at its maximum capabilities.
On detection of a power down event, the system shall transition to Standby state.
On detection of loss of power, the system software shall put the system in a safe state and
immediately transition to the Standby state.
The system software may transition the system to Quiet state based on user input.
The system software shall monitor transitions between this state and Quiet state and ensure that
they occur at a frequency that is not annoying to the user.
7.12.4 Power Control Events[NB19]
This section documents the power control events that can be generated via the front panel button
circuitry, AVIP, the IR receiver and demodulator, the USB expansion port, and the wireless
controller ports[HS20]. The event detection is done by a different subsystem depending on the
current state of the system.
Since there is the possibility of multiple power control events occurring simultaneously, the SMC
shall implement the following behavior:
• On detection of the first power up event, the SMC shall ignore all power control events for
a period of 500 milliseconds.
• The SMC shall report the first power up event
• Need to work with usability to create a chart that has priorities, multiple sources, etc. For
example, box is in standby, user presses and holds remote control on button and then
presses the power button on the console. Should the console go to full power (remote),
then transition to standby (front panel button), and then go back to full power (remote) ?
• I have put verbiage below on power transitions however, keep in mind that this verbiage
will be modified once we have the full chart from usability.
7.12.4.1 Front Panel Buttons
The front panel consists of two momentary push buttons: Power Control Button, DVD Tray Control
Button.
signal for a low value. A power on sequence is a low value for >40 msec. Upon detection of the
power up sequence, the SMC shall note the source of the power up event is the AVIP power up
command, transition the system to Full Power state and when requested by the system software,
provide the power event source.
In Quiet or Full power states, the SMC shall continue to monitor the PWRON signal. A low value
sequence that is ~40msec in length is used to indicate that the SMC needs to issue a transaction
on the DDC interface on AVIP to go read further control/status information. One of the control
modes returned could be a power down request. Also, a low value sequence that is ~200msec or
more is used to indicate a forcible power down state. Upon detection of either of the power down
sequences, the SMC shall note the source of the power down event is the AVIP power down
command, send a message to the system software and wait for a response. If the system software
doesn’t respond within 5 seconds, the SMC shall transition the system to the Standby state.
If auto power on required, a controller connected to the AVIP interface could use a relay (such that
when power is not applied, the signal is shorted to ground) to ground the signal which SMC
samples to indicate power on. +5V then becomes available to controller which energizes the relay
to isolate the signal from ground and allow the controller to drive the signal. If no auto power on is
needed, one could hook a momentary switch to the PWRON to pull the signal to ground for power
up. The SMC maintains power on state so that when the machine is commanded to shut down
(and the +5V goes away, which cause the relay to shut PWRON to ground), the SMC will then
ignore the fact that the PWRON is pulled low (so you don't get endless repetition). Similarly,
because the enabling of +5V may take sometime, the SMC will wait to see PWRON go high on a
transition from standby and poweron. If that event doesn’t happen in 5 secs, the SMC will log an
error event and power down the machine until it sees the PWRON signal asserted.
The AVIP Power Control protocol is document in the Xenon Design Specification.
7.12.4.4 IR Receiver and Demodulator
This block is different from other blocks in that the output of this block has to be processed to
determine whether a power control event has occurred.
When the system is in Standby state, this block shall decode the signal from the IR receiver and
output it to the SMC. The SMC shall interpret the command and compare it to the IR power up
command. If the received command matches the IR power up command, the SMC shall note the
source of the power up event is the IR power up command, transition the system to Full Power
state and when requested by the system software, provide the power event source.
In Quiet or Full power states, this block shall continue to decode the signal from the IR receiver and
output it to the SMC. The SMC shall interpret the command and compare it to the IR power down
command. If the received command matches the IR power down command, the SMC shall note the
source of the power down event is the IR receiver, send a message to the system software and
wait for a response. If the system software doesn’t respond within 5 seconds, the SMC shall
transition the system to the Standby state.
7.12.4.5 Wired Controller Ports
In a typical wired system topology there are one or more game pads that are plugged into the wired
controller ports on the console – one game pad per controller port.
Please note that through out this section the term game pad refers to the device that the user
manipulates and the term controller port refers to the hardware in the console that the game pad
connects to.
The wired game pads, controller ports, the SMC and system software shall be designed to allow
control of the system power states via the XUSB protocol. It shall not require any additional signal
wires or methods. The XUSB protocol is a derivative of the USB protocol where the data bit time
has been increased and intelligent downstream traffic routing is implemented. In the USB protocol,
downstream traffic is broadcast to all leaf nodes of the USB tree, in intelligent downstream traffic
routing it is only sent to the specific leaf that the traffic is intended for. Both of these changes result
in lowered emissions, as well as increased cable lengths.
When the system is in Standby state, the controller port shall be powered and be able to detect
XUSB protocol based connect, disconnect and resume signaling from each wired controller port
and output it to the SMC; the state of the output to the system software is undefined since the
system software is not running in this state.
The SMC shall interpret the command using XUSB protocol connect, disconnect and resume
timing to determine if the controller is present and if so, requesting that the system be powered up.
On determination that a power up event has occurred via the controller port, the SMC shall note
particular controller port which is the source of the power up event, transition the system to Full
Power state and when requested by the system software, provide the power event source.
When the system enters Standby state:
• If there is no game pad plugged into a controller port, the XUSB signal lines shall be at a
single-ended 0 (SE0) state and the output to the SMC shall indicate the controller port
state is disconnected.
• If the user plugs in a game pad to the controller port and the power up button sequence
has not been activated, the game pad shall drive XUSB signal lines to the J state. This
behavior is the same when the controller is plugged into a system that is in Quiet or Full
power states. Furthermore, once the game pad detects that the bus is idle (J state with no
traffic) for the XUSB suspend time duration, it shall transition to a low power state. In this
low power state, the game pad shall be able to monitor its buttons for the power up button
sequence and perform XUSB resume signaling.
• If there is a game pad plugged into a controller port and the power up button sequence has
not been activated, the controller shall drive the XUSB signal lines to the suspend state (J
state) and the output to the SMC shall indicate the controller port state is connected.
• If there is a game pad plugged into a controller port and the power up button sequence is
activated, the game pad shall generate XUSB resume signaling by transitioning the XUSB
signal lines to the resume state (K state) for the XUSB resume time duration. The controller
port shall detect the resume state and the output to the SMC shall indicate the game
controller port state is resume.
When the system is in Quiet or Full power states, this block shall continue to detect the connect
and disconnect signaling and output it to the system software. While it may continue to output
these states to the SMC, the SMC shall ignore this output.
In the Quiet and Full power states the power sequencing is controlled by the system software in
concert with the game pad. When the user activates the Standby state button sequence, the
controller shall send the button sequence to the system software. The system software shall
process the sequence and this processing may ask the user for confirmation of the power state
change and after confirmation, initiate the process of changing the power state to the Standby
state.
In the event the system software is non-responsive to the Standby state button sequence, the
sequence will be lost and the system power state will not be alterable via this power control event.
Since the button sequence is sent as part of a XUSB data transfer, the controller port shall not
incorporate any logic to detect this.
7.12.4.6 Wireless Controller
In a typical system topology, there are one or more wireless game pad devices that connect via a
bidirectional wireless link to the wireless transceiver in the console. The wireless transceiver
connects to a wireless controller port via a USB interface. One notable difference between wired
and wireless controllers is that in the wired system, there is one controller port per game pad. In the
wireless system, all the game pads share the same wireless link and connect to a single
transceiver which connects to a wireless controller port in the console.
Please note that through out this section the term game pad refers to the wireless game pad device
that the user manipulates and the term controller port refers to the wireless controller port in the
console.
There are two ways to do this – one way is use USB suspend and resume like the wired controller
and the second way is to use a separate wakeup signal from the transceiver.
The game pads, link, transceiver, controller port, the SMC and system software shall be designed
to allow control of the system power states via a separate set of signals between the transceiver
and the SMC instead of using the USB signaling between the transceiver and the controller port.
When the system is in Standby state, the controller port may be powered down and does not need
to detect USB protocol based resume signaling. If the controller port is powered down, the
transceiver connection to the controller port shall also be powered down.
The transceiver and SMC power control signals consist of a console standby power state signal
(TRAN_STANDBY) that goes from the SMC to the transceiver and a system resume signal
(TRAN_RESUME) from the transceiver to the SMC. The TRAN_STANDBY signal indicates to the
transceiver whether it should monitor the links for power up button sequences and relay the status
to the SMC via the TRAN_RESUME signal. On determination that a power up event has occurred
via the TRAN_RESUME signal, the SMC shall note that the wireless controller port is the source of
the power up event, transition the system to Full Power state and when requested by the system
software, provide the power event source.
When the system is in Standby state:
• The SMC shall assert TRAN_STANDBY
• On detection of TRAN_STANDBY, the transceiver shall transition to a low power state. In
this low power state, the transceiver shall be able to establish a link with game pad(s) and
assert TRAN_RESUME as appropriate.
• While the transceiver hasn’t established a link with a game pad, it shall negate
TRAN_RESUME.
• If the wireless transceiver establishes a link with a game pad(s) and the power up button
sequence has not been detected, it shall negate TRAN_RESUME. In this low power state,
the transceiver shall be able to maintain the link with game pad and establish links with
other game pad(s) and perform USB resume signaling.
• If the wireless transceiver establishes a link with a game pad(s) and the power up button
sequence has been activated, the transceiver shall pulse TRAN_RESUME for TTRANRESUME.
The SMC shall detect the TRAN_RESUME pulse and initiate the transition of the system to
Full Power state.
In the Quiet and Full power states, the SMC shall negate the TRAN_STANDBY signal and ignore
the state of the TRAN_RESUME signal.
In the Quiet and Full power states the power sequencing is controlled by the system software in
concert with the game pad. When the user activates the Standby state button sequence, the game
pad shall send the button sequence to the system software. The system software shall process the
sequence and this processing may ask the user for confirmation of the power state change and
after confirmation, initiate the process of changing the power state to the Standby state.
In the event the system software is non-responsive to the Standby state button sequence, the
sequence will be lost and the system power state will not be alterable via this power control event.
At a high level how the CPU/GPU are expected to communicate and synchronize with one another.
Most of the synchronization mechanisms are pretty standard w.r.t. PC graphics, with the exception
of the procedural geometry scheme discussed in the next section.
This section describes what the GPU hardware must do to implement the procedural geometry
algorithm. For further details of this algorithm, refer to the document “Xenon Procedural Geometry”.
Conceptually, the GPU’s main command list is stored in main memory. The CPU kicks off the
command processor in the GPU through a register write that points the command processor to a
memory address. The command processor starts fetching commands and data from this address
until a pre-programmed stop address is reached.
GPU CPU
Command
Rest of
the
proc. Reg
Start_addr writes
pipeline
Curr_addr
Main Memory
We will discuss the current implementation from the concept later in this section.
First we will discuss the command processor commands.
7.14.1 Vertex commands
In addition to the traditional commands we need the following commands for the procedural
geometry algorithm.
CALL <addr>: GPU will store the current address and begin processing data at <addr>.
RETURN: GPU will begin executing at the most recently stored CALL address. Multiple levels
of CALL/RETURN might be needed.
JUMP <addr> : GPU will begin executing commands from the <addr>
WRITEBACK <addr> <data> : GPU writes <data> back to <addr>.
7.14.2 GPU Memory mapped registers
Current_addr[31:0] : Vertex Unit is currently reading this address.
Stop_addr[31:0] : Vertex unit will stop processing the data when Current_addr == Stop_addr;
Start_vertex_proc[0:0] : Thjs kicks off the vertex unit to start from Current_addr.
At the start of the frame, the CPU will set Current_addr = start of the GPU push buffer. and
stop_addr = start of the GPU push buffer. Then the CPU sets Start_vertex_proc. Now vertex unit is
ready for data and wait until CPU changes Stop_addr register. Once it changes, the verytex unit
start fetching at Current_addr until Current_addr reaches Stop_addr.
In the mean time the CPU is continuously updating the Stop_addr as more data is written into the
memory.
Note that if the GPU reaches the Stop_addr it just waits there until the CPU updates the Stop_addr.
There is no need for the Start_vertex_proc to be set again.
7.14.3 Current Implementation
The current implementation algorithm has been described in “Xenon Procedural Geometry”. This
section describes how the GPU might implement it.
GPU Command
CPU
Processor
CPU Core
The Rest of the Stop_addr
GPU Pipeline
Curr_addr
L2 Cache
Cpuc Fifo 0
F F
Coherency S S
Cpuc Fifo 1
B B
Block
Cpuc Fifo 2
Writeback
Mem Cntl registers
The seven different actions that are going on between CPU, GPU and memory are indicated in the
diagram above. They are:
1) CPU core writes to a memory mapped register in GPU
2) GPU writeback to a memory mapped CPU register. This is caused by the WRITEBACK
command described in an earlier section. This would be used for updating CPUC fifo tail
pointer and GPU push buffer tail pointers. For the tail-pointer writebacks the CPU implements a
section of cacheable memory directly on its die. There are memory mapped registers in the
CPU.
3) GPU read data request to the coherency block. The coherency block decides whether the data
is to be read from main memory or CPU’s L2 and directs the request accordingly which is
shown in 4) and 6)
4) Coherency block determined that the data is CPU’s L2 and makes a request for cache line
castout.
5) CPU gives back the Cache line castout data and is routed to the vertex unit as a response to 3)
6) Coherency block determined that the data requested by Vertex Unit is in Main memory and
makes a request to main memory.
7) Main memory gives the data back to vertex unit in response to 3).
This section is not trying to describe the coherency algorithm which is described elsewhere. We
have put in coherency module just to show how the data might flow depending on the architecture
of the GPU and how it will be modified to have the coherency block.
Note that the only GPU initiated transaction for which the CPU responds is a coherency
transaction. In 3) above the coherency block handles the procedural geometry FIFO data
differently. For the procedural geometry FIFOs, the CPU write-no-allocates these in a physical
address range that doesn't exist. When the GPU comes to read these vertices, the coherency
block in the GPU sees that the CPU has these addresses as dirty and issues a castout/invalidate.
The GPU now needs to know to not let that data go out to memory, but to suck the return data from
the CPU up directly.
For the writebacks described in 2) above, the GPU when it determines that the CPU owns this
writeback address (owning meaning that this tag is valid in the CPU's L2), the castout/reload
command that the GPU now generates causes this writeback data to update this memory on the
CPU die. The castout/reload also has the effect of causing the CPU's L1/L2 to refetch this data, but
this now gets read from this cacheable memory block, rather than from main memory.
The reason for the GPU fetching vertex data from the CPU cache is that there is a very high
bandwidth of data (on the order of 16GB) that would damage system performance if it were stored
to memory by the CPU and subsequently read by the vertex processor.
The reason for the GPU to write values of the FIFO tail pointers into memory mapped in the CPU is
that the algorithm requires the CPUC to know very quickly that the GPU is done processing a
certain block of data in the L2 and that the CPUC can reuse that block. The CPUC thread will be
spin waiting on the tail pointer and if that data were in main memory, there would be a latency issue
and a FSB bandwidth issue.
7.14.4 Requirements
• The output of the procedural geometry would be mostly inline tristrip flexible vertex format
data types or any format that is supported by streaming.
• The following could be a possible proposal for the format of inline vertex data.
All inline vertex data would be split into blocks, with each block beginning with a
single 32-bit DWORD header. The header would encode four possible instructions:
The FSB/CPU responds to the BIU with a response command and 128 bytes of data on the FSB "receive" interface.
Request/responses are linked by a tag as the CPU does not guarantee ordering if multiple reads are outstanding.
7.17.1 BSB
2-6-04 BSB Completions
Unsupported Request is generated when the GPU receives a request that it does not recognize. Badly formed or
corrupted packets always generate UR (bad CRC, header, etc), and any request received while the GPU’s
BUS_MASTER_ENABLE bit is cleared always generate UR. Other than that, UR is the ‘else’ in a big if;else if;else
if;else statement. It’s therefore easier to tell you what won’t generate a UR than what will.
I used the tables on page 365, 366 of PCI Express System Architecture by Mindshare, Inc to decide between UR and
CA for invalid requests. Specifically, requests that do not reference address space mapped within the device are UR, and
requests that violate programming rules for a device are CA. CA will only happen for the specific cases listed above in
prototype mode.
7.18 Reliability
In general the sources for errors are: radioactive isotopes (in package and PCB materials), cosmic rays and UFOs
8.2 BIOS
Memory
Audio
9 Other
9.1 Not Covered
This section states what is not covered in this document, but should be covered elsewhere
• APIs
• Better-together
• Remoting devices
• Video-on PC
• Media device.
• Network security. All done in application layer. Require system to be secure.
• Peripherals (cameras, etc.)
• Performance abtraction for HDD.
• Where do I put my stuff without having to buy an MU.
• Mass-storage performance abstraction: say what should be included in software specification.