Вы находитесь на странице: 1из 46

Xenon System Architecture

Specification

Doc #: H02174

Project: Xenon

Author: Nick Baker

Revision: 0.43

Date: 06/24/04
Version: 1.3 Xenon System Architecture

Proprietary Notice
The information contained herein is confidential, is submitted in confidence, and is proprietary information of
Microsoft Corporation, and shall only be used in the furtherance of the contract of which this document forms
a part, and shall not, without Microsoft Corporation’s prior written approval, be reproduced or in any way
used in whole or in part in connection with services or equipment offered for sale or furnished to others. The
information contained herein may not be disclosed to a third party without consent of Microsoft Corporation,
and then, only pursuant to a Microsoft approved non-disclosure agreement. Microsoft assumes no liability for
incidental or consequential damages arising from the use of this specification contained herein, and reserves
the right to update, revise, or change any information in this document without notice.

Published by
X-box Console Group
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052-6399
Telephone (425) 882-8080

© 2002 Microsoft Corporation. All rights reserved. Printed in the USA.


Microsoft, and MS are registered trademarks and Windows is a trademark of Microsoft Corporation.

Microsoft Proprietary and Confidential. Page 2 of 46


Version: 1.3 Xenon System Architecture

Revision History
Revision Changes Date By Status
0.1 Original 2/26/03 Nick Baker Draft
0.1a Added Structure 3/15/03 Nick Baker Draft
0.2 Updated Southbridge description. Added Big Endian 6/5/03 Greg Williams Draft
notes as they apply to Southbridge
0.25 Updated top-level block diagram. Updated Southbridge. 8/27/03 Greg Williams Draft
Updated memory.
0.30 Updated Ana relevant sections 9/21/03 John Tardif Draft
0.35 Updated Southbridge Endianness section 12/15/03 Greg Williams, Draft
Stephen Au
0.36 Rolled in Andy’s WMA Endianness document 12/16/03 Greg Williams, Draft
Andy Walters
0.37 General clean-up 1/7/04 Nick Baker Draft
0.4 Added BackSide Bus Spec subdocument 2/13/04 Greg Williams Draft
0.41 Corrected and updated system information 3/3/04 L. Del Castillo Draft
0.42 Updated memory, memory interface, system boot, reset, 3/9/04 Harjit Singh Draft
and time sections
0.43 Southbridge update – USB overhaul 6/24/04 Greg Williams Draft

Conventions Used
Description Represents Examples

Documents Referenced
Title Document Number Author
Xenon Product Specification Todd Roshak (toddro)
\\xenon\specs\Xenon Product Spec.doc
Xenon Design Specification Harjit Singh (harjits)
\\xenon\specs\Hardware\Xenon Design Specification.doc

Microsoft Proprietary and Confidential. Page 3 of 46


Version: 1.3 Xenon System Architecture

Table of Contents

1 Introduction ________________________________________________________________ 7
2 System Block Diagram________________________________________________________ 7
3 Architecture Overview ________________________________________________________ 8
3.1 Introduction ___________________________________________________________________ 8
3.2 Core Digital Components ________________________________________________________ 8
3.3 Architecture Justification ________________________________________________________ 9
3.4 System Components ___________________________________________________________ 10
3.5 Distributed Components ________________________________________________________ 11
3.6 Key Architectural Mechanisms __________________________________________________ 11
3.7 Low Level Software Architecture (?)______________________________________________ 12
3.8 Alternate SKU Considerations ___________________________________________________ 13
3.9 Technical Specifications ________________________________________________________ 13
3.10 Performance Overview (Sue) __________________________________________________ 14
4 Core Digital Components_____________________________________________________ 15
4.1 CPU _________________________________________________________________________ 15
4.1.1 Further Reading ____________________________________________________________________16
4.2 GPU_________________________________________________________________________ 17
4.3 Memory______________________________________________________________________ 19
4.4 Southbridge __________________________________________________________________ 20
4.4.1 Notes_____________________________________________________________________________21
4.5 Ana _________________________________________________________________________ 21
4.6 Front Side Bus (Art) ___________________________________________________________ 26
4.6.1 Link Layer ________________________________________________________________________26
4.7 Back Side Bus_________________________________________________________________ 27
4.8 Memory Bus __________________________________________________________________ 27
4.9 SMBus_______________________________________________________________________ 28
5 System Components _________________________________________________________ 29
5.1 Questions ____________________________________________________________________ 29
5.2 DVD_________________________________________________________________________ 29
5.3 HDD ________________________________________________________________________ 29
5.4 MU__________________________________________________________________________ 29
5.5 Game Controllers _____________________________________________________________ 30
5.6 Network _____________________________________________________________________ 30
5.7 Expansion ____________________________________________________________________ 30

Microsoft Proprietary and Confidential. Page 4 of 46


Version: 1.3 Xenon System Architecture

6 Distributed Components______________________________________________________ 30
6.1 Audio________________________________________________________________________ 30
6.2 Video ________________________________________________________________________ 30
7 Key Architectural Mechanisms ________________________________________________ 31
7.1 System Dataflow ______________________________________________________________ 31
7.2 System Memory Map __________________________________________________________ 31
7.3 System Coherence _____________________________________________________________ 31
7.4 System Interrupt Mechanism ____________________________________________________ 31
7.5 System Ordering ______________________________________________________________ 31
7.6 System Security _______________________________________________________________ 31
7.7 System Endianess______________________________________________________________ 31
7.8 System Clocking_______________________________________________________________ 31
7.9 System Boot __________________________________________________________________ 32
7.10 System Reset ________________________________________________________________ 33
7.11 System Time ________________________________________________________________ 33
7.12 Power States & Power Management ____________________________________________ 33
7.12.1 Off ______________________________________________________________________________33
7.12.2 Standby___________________________________________________________________________33
7.12.3 Quiet _____________________________________________________________________________34
7.12.4 Full Power ________________________________________________________________________34
7.12.5 Power Control Events________________________________________________________________34
7.13 CPU/GPU Synchronization (Nick) ______________________________________________ 39
7.14 CPU – GPU Procedural Geometry Communication (Nick)__________________________ 40
7.14.1 Vertex commands___________________________________________________________________40
7.14.2 GPU Memory mapped registers ________________________________________________________40
7.14.3 Current Implementation ______________________________________________________________41
7.14.4 Requirements ______________________________________________________________________42
7.15 System Debug Facilities (?) ____________________________________________________ 43
7.15.1 Low Level Debug ___________________________________________________________________43
7.15.2 Development Systems _______________________________________________________________43
7.16 System Bandwidth / Latency Roll-Up (Nick)______________________________________ 43
8 Low Level Software Architecture (MarcW?) _____________________________________ 45
8.1 Flash Resident Drivers _________________________________________________________ 45
8.2 BIOS ________________________________________________________________________ 45
8.3 Low Level Drivers _____________________________________________________________ 45
8.4 Network Stack ________________________________________________________________ 45
8.5 Procedural Geometry __________________________________________________________ 45
8.6 Video Mode Selection __________________________________________________________ 45
9 Alternate SKU Considerations (Nick) ______________________Error! Bookmark not defined.

Microsoft Proprietary and Confidential. Page 5 of 46


Version: 1.3 Xenon System Architecture

10 Other ___________________________________________________________________ 46
10.1 Not Covered ________________________________________________________________ 46

Microsoft Proprietary and Confidential. Page 6 of 46


Version: 1.3 Xenon System Architecture

1 Introduction
The purpose of this document is to capture the specification for the Xenon system architecture.

It serves as a high level description and requirements for interoperability of different core components between
themselves, other IO devices and peripherals and software. It is not a detailed architecture description of individual
components. The intended audience is hardware and software architects and designers who want a high level overview
of the system.

2 System Block Diagram


Figure 1: Xenon System Block Diagram
PC SKU (Helium) Extras Xenon System Block Diagram Future HDTV Support
HDD Drive SATA Rev 2.6, 06/18/04 TMDS XDVO
HDMI HDMI/HDCP

RTC IIC GPIO’s in SB /Ana for DDC , UPD, CEC


HDMI or DVI w/HDCP separate chip or integrated into Ana
Separate connector to reduce cost on non -Pro SKU
USB connections Host Dataflow HDMI connector smaller form factor than DVI
noted below Parallel Core Cores Only analog outputs or digital outputs enabled at any one time
64 TMDS - 4 high speed differential pairs
Debug 32 kB L1 I$ 32 kB L1 I$
Clocks to CPU , 32 kB L1 D$ 32 kB L1 D$
Fan(s) NB, SB, EPHY Serial
Debug JTAG
1 MB Shared L2$
Clk Gen, 8 2
Video Thermal
Thermal diode info
DACs Sensor ,
(2 add’l from board )
Fan Driver 64 kB
AVIP 15 XDVO: 135MHz DDR Boot ROM CPU
Audio
DACs DENC
Ana HDD Connector Signaling: Custom Differential
Bitrate: 5.4 Gbps
16 16 Raw B/W: 21.6 GB/sec
Serial
S/ PDIF I2S SMBus PWM x2 ATA
4
IR
SouthBridge
Front-Panel (x2): Signaling: PCI-Express
USB EHCI Bitrate: 2.5 Gbps
e.g. 1 gamepad (or hub) + BGC: 2 of 4 Raw B/W: 1 GB/sec
1 camera, or 2 gamepads OHCI ports unused 2
NorthBridge
2 indpt.
Helium: 4 expansion +XUSB
controllers
USB 2
Memory Card Interface (x2) 512 Mb,
EHCI BGC: 1 of 5 x32
USB Front-Panel , PSU,
Rear-Panel (x1): USB OHCI ports unused GPIOs DVD Tray
+XUSB SMC Kernel
e.g. Omni WiFi (802.11x), UART Debug Port Total Size: 256 MB
or general expansion SMC RTC SMC 128 Raw B/W: 22.4 - 25.6 GB/sec
Helium: Keyboard/Mouse GPIO UART Debug Port 3D Core 10MB
MII Serial Signaling: GDDR
ATA
(Gfx) EDRAM Bitrate: 1.4 - 1.6 Gbps
Argon Wireless Module 8
2.4 GHz Core VDD: 1.8 V
Baseband 2
Radio
SMC: 12 kB Power: 6-10W
EEPROM
16 MB Kernel: 256 kB GPU
Ethernet DVD Drive System Drivers: 1 MB
Wireless gamepads (Radon) PHY Flash Config: 256B
charged via front -panel USB
Dash etc ~11 MB

GPU CPU
RJ-45 Launch Process: 90 nm bulk (TSMC 90GT) Launch Process: 90nm enhanced SOI (10KE0)
Ana SB Launch Die Size (main) : 177 mm^2 Launch Die Size: 168 mm^2
Launch Process: 0.18u Launch Process: 0.15u Launch Die Size (EDRAM) : 71 mm^2 Frequency: 3.0-3.5 GHz
Launch Die Size: 13.4 mm^2 Launch Die Size: 34.7 mm^2 Core Frequency: 500 MHz Core VDD: APS 1.075V – 1.275V (@ Ball)
Frequency: 170 MHz Frequency: 125 MHz Core VDD: 1.1V Power: 85W
Core VDD: 1.8V Core VDD: 1.8V Power: 38W (29W + 9W eDRAM) Power/Ground Bumps: 2113
Power: 1.3W Power: 3.2W Signal I/O: 443 Signal I/O: 219
Signal I/O: 81 Signal I/O: 183 Power I/O : 582 (282 I/O, 300 core) Power/Ground Balls: 680
Power I/O: 61 Package: 23x23 382 TEBGA Package: Flip-Chip 35x35 1025 ball BGA Package: Flip-Chip 899 Ball Plastic /Organic BGA
Package: LQFP144 - Target ThetaJc = 0.2-0.5 degreesC/W

The diagram[NRB1] shows the main system components. These are described in more detail in the next section. Note that
the latest version of the diagram may be obtained from: \\xenon\specs\Architecture\Xenon System Block Diagram.vsd

Microsoft Proprietary and Confidential. Page 7 of 46


Version: 1.3 Xenon System Architecture

3 Architecture Overview
3.1 Introduction

The following sections give a high level view of the system architecture. For further reading see the
corresponding main chapters later in this document.

3.2 Core Digital Components

The system consists of the following main components:


• CPU: The CPU is a custom 4GHz PowerPC CPU designed specifically for Xenon. It
consists of 3 CPU cores running in a SMP model. Each core supports 2-way Simultaneous
Multi-Threading (SMT), allowing a total of six simultaneous hardware threads. The
architecture is scalable to accommodate a late binding decision on the exact number of
cores depending on the final cost model. All cores are identical and have specially
designed vector floating point acceleration (VMX2) and a shared 1MB L2 cache.
See \\xenon\specs\CPU\CPU_One_Pager(IBM).doc for an overview.
• GPU: This is the main system controller hub containing the CPU’s Bus Interface Unit (BIU),
the memory controller, a DX9/10 3D rendering core, system coherency controller and IO
interface. It is broken into two main sections: the Northbridge (BIU, Memory, IO) and the
3DCore. The 3DCore is a 500MHz unified shader architecture based on the R500 and
uses 10MB of embedded DRAM for the render targets and z-buffer to provide more
consistent rendering performance.
See \\xenon\specs\Graphics\GPU Preliminary Specification (1 pager).doc for an overview.
• SouthBridge: This is the IO controller chip that contains interfaces to all the peripherals,
including audio output and decompression, Serial-ATA for DVD and HDD, USB1.1/2.0 for
peripherals and memory units. It also contains the System Management Controller (SMC)
and system FLASH interface.
[NB2]See \\xenon\specs\Southbridge\Southbridge_One_pager.doc for an overview.

• Ana: Ana is the ANAlog chip that contains the system clock reference, video DACs,
thermals sensors as well as the digital encoder for analog video standards
(NTSC/PAL/HDTV/VGA).
See \\xenon\specs\Ana\Ana_ One_Pager.doc for an overview.
• Memory: The system has a unified memory architecture consisting of GDDR memory.
128MB, 256MB, 512MB and 1GB memory configurations are supported, although 256MB
console with 512MB development systems are the POR.
See \\xenon\specs\memory\specs\ati_spec_16mx32_8b_v11.pdf for a sample part spec.
The components communicate over the following interfaces:
• Front Side Bus (FSB): Interface between the CPU and GPU. This is a 5.4Gbps differential
link custom to the CPU vendor. It is symmetrical with 10.8GB/sec peak in the write
direction, and 10.8GB/sec peak in the read direction. The high bandwidth (in the write
direction at least) is to support procedural data generation (XPS) on the CPU which is
pushed in a tightly coupled fashion to the GPU.
See \\xenon\specs\CPU\FSB\FSB_BUSSPEC.pdf for the FSB documentation.
• Back Side Bus (BSB): This connects the GPU to the SouthBridge. This is a PCI-Express
2x bus with a peak of 500 MB/sec in each direction.
See \\xenon\HWDev\Electrical\Southbridge\Interface Specs\PCI\pciexpress_base_10a.pdf
for the PCI-Express base specification.

Microsoft Proprietary and Confidential. Page 8 of 46


Version: 1.3 Xenon System Architecture

• Memory Bus: The interface between the GPU and main memory. This is GDDR, running at
1.4-1.6Gbps. At 128bit wide, this provides a peak of 22.4GB/sec (@1.4Gbps). The exact
frequency will be determined later based on the availability of parts.
The sample part spec (\\xenon\specs\memory\specs\ati_spec_16mx32_8b_v11.pdf) also
serves as the interface definition.
• Xenon Digital Video Output (XDVO): This is the pixel output bus that interfaces the GPU to
the video encoder portion of Ana. It is a 15bit 135MPix/sec DDR (2 cycles to transfer one
pixel) bus that supports most HDTV and HD Monitor standards135MPix.
See (\\xenon\Specs\Ana\XDVO.doc) for the XDVO bus documentation.
• System Management Bus (SMBus): This is a low pin count serial interface (similar to IIC)
that the various chips use to communicate with one another for reset and power
management purposes. Most likely there is no direct connection to the GPU/CPU, other
than indirectly say through resets.
See \\xenon\HWDev\Electrical\Industry Standards\SMBus Version 2.0.pdf for the base
specification.

3.3 Architecture Justification

The full reasoning behind the architecture is beyond the scope of this document (please read the
“Think Week White Papers” for a more thorough analysis).
At the highest level the system looks like a multi-processor PC with integrated graphics. This was
not necessarily the intention, and actually the distance of the CPU from main memory was longer
than hoped. A split memory architecture with local processor memory was desired, but cost
constraints dictated a unified memory architecture. Once that decision was made, placing the
memory next to the highest bandwidth customer (the GPU) was the next logical step. This does
present a memory latency problem to the CPU, so large caches and CPU pre-fetching are required
to compensate.
The next level is the exact intent of the multi-processing and multi-threading. Again this was driven
by cost efficiency reasons. For developers, the easiest architecture is to present a single high
performance processor. However, the CPU industry has reached a limit to how far instruction level
parallelism can be taken with increasing cost and complexity for doing so. Forcing parallel
programming with several simpler (cheaper) processors especially in a closed environment such as
a game console is a logical step, for which there is also prior art. Furthermore, several areas of a
game (especially at the low levels of physics and rendering) are parallelizable.
To aid in the parallel programming within the rendering pipeline in particular, the CPU and GPU
have been closely coupled to allow procedural generation of data. This also helps in a cost
constrained environment where the amount of system memory a developer would need to store all
the offline generated art they want can never be achieved. Even if a developer does not want to
tackle a multi-processing problem, by using Microsoft supplied APIs he can effectively take
advantage of the extra processing power to perform parallel number crunching as well as
decompression of geometry and to some extent textures.
An interesting note here is that because parallel programming is hard, we do want to get significant
performance out of a single CPU core, so the processor (read single core) chosen still has
competitive SpecINT.
The use of Embedded DRAM also requires some discussion here. Graphics processors are
extremely bandwidth hungry, and this is typically solved in PC graphics by using very wide memory
interfaces, render target and z-buffer compression, and on-chip caching. EDRAM was chosen
because going wider than 128bits was not a cost option, and because compression and caching
typically behave unpredictably.

Microsoft Proprietary and Confidential. Page 9 of 46


Version: 1.3 Xenon System Architecture

The choice of a DirectX compatible 3DCore should be self explanatory, and the most up to date
version of the standard was chosen (DX10). However, given schedule / cost constraints, not all of
the DX10 spec made it into the hardware. The main rendering features of interest are:
• Unified Shader Core: This allows effective load balancing of vertex and pixel shaders, so
achieving better efficiency of the compute resources.
• Multi-Render Target: This allows deferred lighting passes, where per pixel computations
can be performed in a geometry independent fashion.
• High Dynamic Range: Floating point and high precision fixed point formats are supported
to allow HDR effects.

3.4 System Components

The core digital components (CPU, GPU, SB, Memory, Ana) comprise the minimum architecture
components required to be able to boot and run the OS. In addition there are several storage and
IO devices supported, all of which may or may not be present in a given product configuration.
- DVD: Used for game content delivery.
- HDD: Optional component used for the alternate SKUs to enhance certain capabilities
such as ripped audio, saved games.
- Ethernet (10/100BaseT): For Live and sharing content with PCs. Will also be used for
development systems.
See \\xenon\HWDev\Electrical\Ethernet PHY\VIA-DS6103110.pdf for example external
PHY specification.
- Audio DACs: Separate component for audio output.
See \\xenon\HWDev\Electrical\Audio\Wolfson\WM8726.pdf for example spec.
- Memory Cards: As in Xbox1, saved game data can be stored on USB based memory
cards. Without a HDD in all configurations, this will be required for all game saving on
Xenon.
- Wired Controllers: As with Xbox1, the wired controllers use a modified version of USB
(XUSB).
- Expansion Devices: A couple of standard USB ports will be available for other expansion
devices such as USB HDDs, Cameras, etc.
- IR Input: IR is supported directly by the SB.
See \\xenon\HWDev\Electrical\IR Receiver\Xenon Infrared Receiver Spec.doc for spec.
- Wireless Controllers: Wireless controllers are supported via additional circuitry that
interfaces to one of the SB USB ports.
- AV Packs: Again similar to Xbox1, Xenon supports a Audio Video Interface Port (AVIP) to
break-out the audio and video signals depending on the AV components the end customer
has. Planned AVPacks would be: Standard (Analog Stereo Audio+Composite Video), RF
(one each for North America, Japan, Europe), Enhanced (Digital Audio + S-Video added),
SCART (RGB component along with Composite video), Component, and VGA. See
\\xenon\Specs\Peripherals\Xenon_AV_Pack_Design_Spec.doc for design spec.
- HDMI/HDCP: The base architecture also supports routing the XDVO bus to an optional
HDMI chip.
Documentation for the different IO busses can be referenced as follows:
- Serial ATA:
\\xenon\HWDev\Electrical\Southbridge\Interface Specs\ATA\Serial ATA 1.0 gold.pdf

Microsoft Proprietary and Confidential. Page 10 of 46


Version: 1.3 Xenon System Architecture

- USB 2.0: \\xenon\HWDev\Electrical\Southbridge\Interface Specs\USB\USB 2_0 Spec.pdf


- USB 1.1: \\xenon\HWDev\Electrical\Industry Standards\usb11.pdf
- MII: \\xenon\HWDev\Electrical\Southbridge\Interface Specs\EMAC\MII.pdf
- I2S: \\xenon\HWDev\Electrical\Industry Standards\i2sbus.pdf
As shown in the diagram, it is possible to extend the system further by using IIC (SMBus) or
additional USB Hubs. These will be used add additional USB ports and other components such as
a Real-Time Clock for the PC SKU.

3.5 Distributed Components

There are a few features for which there is no dedicated processing, rather processing is shared
amongst the different chips and emulated. These are called out below:
- Digital Video Processing: There is no dedicated MPEG decoder in the system. The
processing (decode) is expected to be performed completely on the CPU, with maybe
some help from the 3DCore’s shader array if needed.
- Audio: The audio support on the hardware is limited to WMAPro decode and audio out
DMA. All voice generation, mixing, and effects processing is done on the CPU.

3.6 Key Architectural Mechanisms

The salient points about how certain architectural features are implemented are listed below. Refer
later to more detailed descriptions:
- Endianess: The CPU is Big Endian (byte ordering). All devices on the system are big
endian as well, except for the SouthBridge IO components which are Little Endian (due to
their PC heritage).
- CPU Coherence: Coherence between the CPU cores is maintained by hardware. The
coherence point is the L2 cache. To aid in this the L2 is inclusive of the L1 caches, and the
L1 caches are write-through.
- DMA Coherence: Only IO coherence via snooping is implemented. High bandwidth
devices, such as rendering, should avoid using this mechanism and software should use
non-cached write combining, or software managed coherence when synchronizing data in
these cases.
- Instruction Ordering: PowerPC is loosely ordered. This requires the use of barrier
operations to force ordering when required, e.g. when accessing hardware registers.
- DMA Ordering: As always, ordering rules are required to guard against race conditions
between DMA transfers, interrupts and Memory Mapped IO (MMO) operations. At a high
level, the hardware will use interrupts to guarantee that ordering is maintained. There is no
support for simultaneous fine grain (e.g. within a CPU cache line) access to any memory
location or register by different devices.
- Interrupts: Interrupts are message based (there is no interrupt pin on the CPU). Messages
flow from the Southbridge to the main collector in the Northbridge which then forwards the
messages to the interrupt processor on the CPU chip. Emulation for edge triggered and
level sensitive interrupts is supported.
- Memory Map: The CPU can address a 42bit memory range. Only 32bits are available
outside of the CPU (providing a 4GB main system memory map). This 32bit space is
broken down into a 1GB main memory window (not all of which may be present), 2GB of
reserved, and 1GB of MMIO and configuration space. Internal to the CPU, the additional
memory range is used to implement a secure boot environment to guard against certain

Microsoft Proprietary and Confidential. Page 11 of 46


Version: 1.3 Xenon System Architecture

security attacks, i.e. there are certain structures on the CPU that only it can address, and
only in a super-privileged (HyperVisor) mode.
- Security: Secure boot, piracy prevention and DRM are implemented via a security scheme
that relies on a boot sequence that start on the main CPU die itself, as well as with a
security engine that allows blocks of main memory to be protected.
- Boot Procedure: The OS kernel is booted via a several stage boot process. Initially the
CPU’s internal bootloader (BL1) starts up. This fetches and decrypts the stage 2
bootloader (BL2) from external FLASH. The BL2 enables main memory and copies the
kernel from FLASH to main memory before entering the kernel.
- Xenon Procedural Synthesis: There is a collection of features implemented in the CPU and
GPU that allow transient geometry data to be generated by the CPU and absorbed directly
by the GPU without hitting main memory. Briefly, a 128kB set of the CPU’s L2 can
optionally be locked down for several geometry FIFOs. These FIFOs can be read directly
by the GPU so as to fetch vertex data. A low latency (non-interrupt based) synchronization
scheme is achieved by allowing the GPU to write command processor tail-pointer updates
directly to the CPU. The CPU also allows streaming data past the L2 for reads, and past
the L1 for writes to assist in avoiding cache pollution. Intrinsic VMX2 data pack instructions
are also an important feature.

3.7 Low Level Software Architecture

3.7.1 Overview
3.7.2 Hardware Resident APIs/Code
This section defines what drivers and/or code should live in FLASH vs. in the title library (and
therefore game media). This is so that hardware can be rev’d over time and still provide backwards
compatibility.
1. Power Management
2. DVE
a. HDMI support may/will require additional code space
b. Closed captioning
c. Wide-screen signalling
3. Video Resize
4. Video Colorspace conversion
5. Video Gamma
6. Temp sensor
a. Calibration parameters need to be stored on a per box basis
7. Ana (Clocks, etc.)
8. Ethernet Phy
9. Audio codec
10. Flash
11. USB
12. SATA
13. FSB settings

Microsoft Proprietary and Confidential. Page 12 of 46


Version: 1.3 Xenon System Architecture

14. Memory settings


15. PCIe settings
16. Basic IR decode (power and eject buttons)
17. SMC code and kernel i/f
18. Fan algorithm
19. Front panel
20. Argon interface
21. CPU
a. Init sequence
b. For some CPU cost reduction items, we need some kernel support: 32bit mode
SLB, use of less than the full launch TLB (still figuring out how that will work), etc.
c. Certain code sequences may cause bugs. These need to be caught at Cert.
d. Supervisor 42bit MMIO drivers (Interrupt handlers).

3.8 Alternate SKU Considerations

The core architecture has the requirement to support a few different SKUs.
This list can be boiled down to allowing for 2x memory, future use of HDMI/HDCP. I think other
expansion issues such as RTC and USB for Helium are out of place in this document, as they are
more implementation-specific.
• Development Systems: The main constraint imposed by the DevKits is the requirement to
double the amount of main memory (512MB) and still maintain the same memory
performance. The challenge here will be the electricals when doubling the memory parts.
The DevKits will also use the serial debugger interface on the SouthBridge for kernel
debugging.
• PRO SKU: This is identical to the game console, except it will come with different
peripherals (including HDD and wireless gamepads) as standard.
• PC SKU: This is the most challenging and to some extent the least well understood.
Nominally Windows (XP or Longhorn) will be run using the Connectix emulation layer.
Additional peripherals including a Real Time Clock (RTC) and USB Hub will be added to
the motherboard. A maximum display resolution of 1280x1024@75Hz has been chosen,
which drives the minimum amount of Embedded DRAM and the maximum pixel display
rate (135MPix/sec).
• HDTV DVD: Not currently POR, but we are planning hooks for future HDTV DVD playback.
Other than the correct choice of media and compression scheme, the main issue for the
core architecture is the copy protection required on the display output. It looks like this will
be HDMI/HDCP which is not supported in the current Ana. A couple of options for design
updates at a later date are possible and discussed later.

3.9 Technical Specifications

Spec Value Spec Value


Number of processor cores 3 XPS

Microsoft Proprietary and Confidential. Page 13 of 46


Version: 1.3 Xenon System Architecture

CPU Frequency 4GHz SpecINT 1461 (single thread)


RAM 256MB Console, 512MB Dev
(128MB-1GB supported)
L1 Cache (data) 32KB, 4 way associative
L1 Cache (instruction) 32KB, 2 way associative
L2 Cache 1 MB shared (8-way)
GPU ATI R500
HDD Storage bus Serial ATA
DVD Storage bus Serial ATA
Front side bus speed 10.8GB/sec read + 10.8GB/sec
write
Gamepad interface XUSB (modified USB 1.1)
Memory units: Xenon 64MB MUs (USB 1.1)
HDD TBD
DVD Drive Serial ATA
CPU Core Hardware Threads 2 (per core)
CPU Core Instruction Issue 2 per cycle
CPU Core VMX2 Datapath 128bits
CPU Core VMX2 Registers 128 (x2 threads)
CPU Core VMX2 Double Scalar only
Precision
System Memory BW 22.4GB/sec
GPU RT+ZB Memory BW 256GB/sec
GPU RT+ZB Memory Capacity 10MB
GPU Frequency 500MHz
GPU Geometry Rate 500MVtx/sec
GPU Shader Rate 24GInstr/sec (shared vertex and
pixel)
GPU Shader Datapath 128bits
GPU Texture Rate 8Gtex/sec (filtered)
GPU Pixel Rate 4GPix/sec (AA, alpha blend, z-
test)
Display Pixel Rate 135MPix/sec

3.10 Performance Overview (Sue)

2-6-04 Asymmetric Latencies


At the performance meeting today Jim confirmed that different cores will definitely have different latencies to L2. They
still do not have information about just what those latencies will be.

Microsoft Proprietary and Confidential. Page 14 of 46


Version: 1.3 Xenon System Architecture

4 Core Digital Components


4.1 CPU

Waternoose SOC

HFC+ VMX2 HFC+ VMX2 HFC+ VMX2

MMU MMU MMU

CIU
1 MB Shared L2
L2C
NCU 1MB
Bank 1

BIU

MPi Bus
MPi Bus Security Engine

FSB
Interrupt C. Link Layer Boot RAM
Physical
Layer
Sec. fuses Boot ROM

FSB
GPU

IBM FSB
Physical
Layer
Link Layer

Coherency Block

"MPi Bus" like


"MPi Bus" like

Memory Graphic IO Controller


Controller Processors

The CPU is a multi-core SOC arranged in an SMP fashion. All cores are identical and are
optimized for vector floating point, as is common in 3D graphics applications. It is the belief that
certain portions of a game are parallelizable, so we provide parallelism both at the core level (SMP)
and thread level (SMT). To ensure that this system is programmable by a wide range of
developers, the cores are all coherent with each other, and Microsoft intends to provide middleware
libraries to developers so that the parallelism is hidden from those developers that do not want to
take this programming challenge.
To help visualize how this system may be programmed in this fashion one possible example is
discussed. One core is allocated to the game engine, this is in fact the only CPU core than the

Microsoft Proprietary and Confidential. Page 15 of 46


Version: 1.3 Xenon System Architecture

developers program. All threads associated with that game run on this “Host” core. A set of API
functions implement accelerated and optimized routines for physics, animation, collision detection,
audio, etc. that run on a second core. A third core is dedicated to procedural synthesis. This last
one is an important subset as the CPU and GPU have dedicated hardware to allow procedural
synthesis on the CPU to be efficiently pushed in a tightly coupled fashion to the GPU, without
spilling to memory. For the API functions mentioned, including the procedural geometry, the
developer can provide their own routines that the XOS schedules appropriately.
Important features of the CPU:
• ISA: 64bit PowerPC, derived from Power4 architecture.
• SMP: All cores are identical and coherent with one another.
• SMT: All cores support 2 simultaneous threads.
• Vector Floating Point (VMX2, 128bit): The cores each have a dedicated vector floating
point unit that is capable of performing the equivalent of a DP4 each cycle at sustained
throughput. There are also special instructions for data swizzling and compaction so as to
be compatible with common 3D datatypes and operations.
• Advanced Cache Management: The caching control supports many of the issues common
with multi-media systems with optimizations for data-streaming, set locking etc.
• System Coherency: The CPU supports a coherency protocol that allows the L2 and L1
caches to be coherent (if so desired) with any DMA hardware (3D or otherwise). Use of this
is via snoops, so this should only be used for low-bandwidth operation.
• System Security: The CPU implements a confidential scheme that allows the system to be
protected for copy protection, DRM and privacy purposes.
MMU
Cached write combining and the RC machines
Instruction throughput / latency
Streaming support
Locking support
XPS support
Scalar FP
4.1.1 Further Reading
The following documentation is recommended for a better understanding of PowerPC and this
processor in particular.
Standard PowerPC documentation:
\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PowerPC_Architecture_Book_I.pdf
\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PowerPC_Architecture_Book_II.pdf
\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PowerPC_Architecture_Book_III.pdf
Xenon specific documentation:
\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\PPC_WN_Book4.pdf
\\xenon\specs\CPU\CPU_Vendor_data\PPC_Specs\vmx128-isa-000.pdf

Microsoft Proprietary and Confidential. Page 16 of 46


Version: 1.3 Xenon System Architecture

4.2 GPU

NorthBridge Description
3D core features
Control / synchronization
Driver implementation issues and architecture
Procedural synthesis
The GPU is the main system controller hub, connecting the FSB, memory interface and BSB. It
also contains the 3D rendering core.
The following diagram shows conceptually how the major components within the GPU are
connected.

Microsoft Proprietary and Confidential. Page 17 of 46


Version: 1.3 Xenon System Architecture

16b/5.4GHz

16b/5.4GHz
ATI/MS Confidential

FSB Rx
FSB Tx
FSB CLK: 675MHz
MEM CLK: 800MHz
Core CLK: 500MHZ (White) FSB PHY Tx FSB PHY Rx
PIX CLK: 135MHz
PCI-E CLK: 2.5GHz FSB Link Tx FSB Link Rx
EDRAM CLK: 500MHz
FSB Link (MCLK)

128b

128b
Bus Interface Unit (MCLK)
BIU (SCLK)

Master

Master

Slave

Slave
Read

Write

Write

Read
32b

32b

32b

32b
PCI-E PHY

PCI-E Link

PCI-E Tx BSB CPU Read 32b


4b/2.5GHz Master CPU Write 32b
IO Controller (IOC )
PCI-E Rx BSB SB Write 32b
4b/2.5GHz Slave SB Read 32b

Snoop
Read /
Write
32b

32b
GFX

Bus Interface (BIF)


Command
RBBM
Processor (CP)

SB Read 32b
SB Write 32b
EDRAM
3D Core
(RBCLK)
XDVO

Display
PHY

XDVO
16b/270MHz Controller
32b
bc / rb
32b

32b

32b

32b
vgt
vc
tc

XPS
Memory Hub
Buffer
256b
256b

256b
256b
128b

128b
128b

128b

Memory Memory
Controller 0 Controller 1
1.6GHz

1.6GHz
64b

64b

Note in the diagram that there are no DACs but there is an independent Video output bus to Ana
which contains a digital encoder for analog video as well as the video DACs.
TODO: Add RBClk domain.
GPU Memory Configurations
Configuration Total Memory # of Ranks Banks Row Column

Microsoft Proprietary and Confidential. Page 18 of 46


Version: 1.3 Xenon System Architecture

Name System Device memory per Bits Bits


Memory Size devices rank (per bank) (per
per MC bank)
Xenos_128_4 128MB 8Mx32 4 1 4 12 9
Xenos_256_8 256MB 8Mx32 8 2 4 12 9
Xenos_256_4 256MB 16Mx32 4 1 8 12 9
Xenos_512_8 512MB 16Mx32 8 2 8 12 9
Xenos_512_4 512MB 32Mx32 4 1 8 13 9
Xenos_1024_8 1GB 32Mx32 8 2 8 13 9

4.3 Memory

The memory devices shall conform to the GDDR3 memory specification. The key features of the
devices are:
• 512Mb device density configured as 16Mx32 devices
• Two data accesses per clock cycle with a 4n prefetch
• Differential clock. Clock frequency of 800 MHz
• Single ended, per byte read and write strobes
• Pseudo open drain I/O with calibrated output drive
• On die termination
• Packaging supports mirror function to allow a clamshell memory design
• Packaging supports 1.6Gbps signaling
• Need to add: banks, cycle time
Important features of the memory are:
• Given our best estimate of memory pricing and our overall cost target, a total capacity of
256 MB is targeted for the product. This would be based on (4) 512 Mb, x32 parts.
• Given the need for extra memory to accommodate debug and pre-optimized games, a total
capacity of 512 MB is targeted for the development systems.
This will be based on either (8) 512 Mb, x32 parts (2 DRAM loads on the data wires), or (8)
512 Mb, x16 parts. The first of these cases might require special operating conditions as
we do not need to guarantee operation in high volume and over all conditions. The latter
requires special part development by a DRAM vendor, though some vendors have
indicated the possibility of a single design supporting both x16 and x32.
• Because memory pricing is volatile and difficult to forecast, these targets could change and
the device availability over the life of the product must support a range from 128 MB to 1
GB of memory.
• Support for a path to (2) 1 Gb parts is necessary in the controller, though it may never be
cost effective to make this transition. This must be completely seamless.
• The memory devices shall support a boundary scan capability to allow verification of the
connection between the GPU memory controller and the memory devices. The test
coverage shall include shorts between signals, opens on the GPU, the signal trace or
memory device. It may support coverage that allows isolation of the problem to a particular

Microsoft Proprietary and Confidential. Page 19 of 46


Version: 1.3 Xenon System Architecture

end of the signal; for example, the open is on the GPU side and not the signal trace or
memory device.
The memory configurations supported by the GPU and memory interface are documented in the
GPU section. See chapter 4.2.
Eventually, we’ll need specifications for the exact timing parameters (on the order of 1 or more
clock cycles) used by the bus.

4.4 Southbridge

DVD Drive HDD Ethernet PHY

SATA SATA MII

Southbridge
PHY PHY MAC JTAG
WMA DFT TAP
DVD
HDD Pro Ethernet
Drive
Interface Decoder MAC
Interface

Group A ECB(PCI)
HBEDB
OHCI Note: Bus Arb /Muxing module for both busses
Group A USB2.0/ not shown . ECB: 15 units, HBEDB: 6 units
Devices 1.1/XUSB EHCI
Bridge PCI-Express
PHY (PCIE2ECB Ctrl (PCIECB)
BDG) for CPU -init
config
PCI-
Expr x2 BackSide
32 PHY
32
x2 x2 On-Chip Bus to NB
PHY
Group B dir Bridge dir Control
(ECB2HDEDB
BDG) for LB
(OCBCB)
device DMA
EHCI
Group B USB2.0/ Interrupt
Devices 1.1/XUSB OHCI Collector
PHY
UART Kernel Debug
SMM
Timer GPIOs
System
Audio Out Flash PWM x2
Ana
SMC UART
Interface SMC Debug
8051 IR IR
RTC

S/PDIF I2S NAND SPI SMBus (x2) SMC GPIO

Audio
Flash Dead Box Ana AVIP Front-Panel
DACs
Xface Power Supply Unit
DVD Tray
To AVIP SiS
Microsoft

The SouthBridge chip is akin to those found in traditional PC architectures, but is customized for
the game console, primarily for cost reasons. Some of the more standard (or soon to be standard)
functions and interfaces in the SouthBridge are:

Microsoft Proprietary and Confidential. Page 20 of 46


Version: 1.3 Xenon System Architecture

• PCI-Express x2 link as the bus interface to the NB.


• Serial ATA ports for both the DVD drive and an optional HDD.
• 10/100 Ethernet MAC.
• USB 2.0/1.1, which in this chip takes the form of separate EHCI and OHCI host controllers
for two sets of ports, group A (4 ports), and group B (5 ports). All USB ports support the
custom XUSB protocol which reduces EMI, as outlined in the “notes” below. All USB ports
also support remote wake via direct connection of transceiver inputs to the SMC.
The more custom aspects of the chip include:
• WMA Pro decode hardware for a portion of the audio processing; the rest is performed on
one of the CPU cores.
• System Management Controller (SMC) hardware, which is basically an 8051 core that
handles, among other things, power, reset, and thermal management. It is powered
separately in the Southbridge. It includes a cheap form of RTC (Real Time Clock), which
functions only when the unit is plugged in, as well as a programmable interval timer (cheap
version of 8254) which provides an interrupt rate of 1 ms for the OS scheduler. Finally, it
provides UART ports for both kernel and SMC firmware debug.
• System Flash interface. This is custom in the sense that (1) it allows for NAND Flash parts
to be used to reduce cost, and (2) has the necessary control interfaces to allow SMC code
to be DMA’d into an internal Southbridge SRAM, as well as the kernel to be DMA’d into
main memory.[NB3][GW4]
• IR interface. A small hardware block samples and decodes the output of an external
demodulator, which then allows the SMC communicate the commands received to the
main CPU..
• Wireless game controller interface. The baseband controller for wireless will live in an
external chip and communicate to the Southbridge over a USB 1.1 interface.
All of the devices within Southbridge communicate over an on-chip bus interface which is custom to
Microsoft, but based on an internal version of the PCI bus.
4.4.1 Notes
XUSB is basically USB 1.1 run in LS mode (1.5 Mbps) but with two key modifications:
• Traffic is not broadcast to all ports as in standard USB 1.1, there are additional bits the
driver controls to direct it appropriately. This reduces EMI.
• The payload is 32B instead of the maximum 8B allowed in standard LS mode. This allows
for enough bandwidth in our worst-case scenarios (4 controllers, headsets, etc).

4.5 Ana

This section needs to focus on the only real architecturally-significant functional blocks in the ANA
chip: the DVE and the clock architecture. The thermal sensor, fan control, etc are design-specific
details that are not relevant to the Xenon console.
The Ana chip consists of four main functional blocks: A Thermal Sensor (TS) block which is used to
monitor system temperatures, a Clock Synthesizer (CS) which is used to generate the required
system clocks for Xenon, a Digital Video Encoder (DVE) which is required to convert a digital pixel
stream from the Northbridge into analog outputs suitable for connection to a television or monitor,
and a System Management Bus Interface which provides the host interface for Ana. A simplified
block diagram of the Ana chip is shown in Figure 4.1.

Microsoft Proprietary and Confidential. Page 21 of 46


Version: 1.3 Xenon System Architecture

System
Crystal
( 27 MHz)

2 X Pixel Clock Out


For Test Bypass Signals / Clocks
Clock To System Clock
CSR Interface System Clocks
Synthesizer Destinations

To/From
SMC
Video Clock
SMBUS
Control Thermal Analog
To External
Interface Sensor Sensors( Diodes)
CSR
Interface

DAC A Analog
CSR Interface
DAC B Analog
To Filters&
1 X Pixel Clock In Video Interface
From Pixel Digital Port
Generator Pixel Data DAC C Analog
( GPU)
Video
Digital Timing( Syncs)
Encoder
DAC D Analog

Digital Timing

Power Down Signals


From SMC
Reset
Ana
Stdby power
Power-on PWM from SB
reset Fan driver op- Fan feedback
To SMC amps
Fan drive output

Figure 4.1. Top Level Ana Block Diagram.


The configuration and control of the blocks within Ana is provided by the Control Interface
(CI). The System Management Bus (SMBus) is a 2 line serial clock and data bi-directional
interface. An external host connected to the SMBus is required to correctly configure the chip. The
CI acts as a bridge through which the external host can read and write registers that are local to
each of the other internal blocks (DVE, TS, and CS).
Ana also includes the ability to write DVE control registers using parallel transfers via the pixel data
bus when enabled. These control packet transfers are initiated by software using the GPU and
should only be performed when the video encoder is not using that bus for pixel data (i.e. only
when the video encoder is either not enabled or is blanked). Writes when the encoder is using
the pixel bus for pixel data or any reads must be performed over the SMBus.
System clocks are generated by the Clock Synthesizer (CS) and driven off chip. These clocks are:

Microsoft Proprietary and Confidential. Page 22 of 46


Version: 1.3 Xenon System Architecture

• 25 MHz Ethernet clock output to Ethernet PHY


• 25 MHz Serial ATA reference clock to Southbridge
• 100 MHz (down spread spectrum frequency modulated to -0.5%) for Serial ATA in
Southbridge
• 100 MHz (down spread spectrum frequency modulated to -1.5%) for PCI Express in
Southbridge, Northbridge system clock and CPU system clock
• 48 MHz standby clock for Southbridge (for USB and System Management Controller)
• 24.576 MHz audio clock to Southbridge
• Programmable video clock for DVE (only driven off chip for test)
• Programmable 2x pixel clock for Northbridge
The system crystal oscillator (27 MHz) is the default reference clock for all the PLLs. In order to
provide for A/V sync, the audio, video and 2x pixel clocks must be generated from the same source
and with deterministic ratios such that there is 0 ppm drift between these various clocks. The CS
also has the ability to select an external AV oscillator path to drive the audio, video and 2x pixel
clock PLL reference clocks.
The video clock is programmable to accommodate the numerous video standards that are
supported. The video PLL can be bypassed and the clock tree driven by an external clock
chip. The 2x pixel clock which is driven from a separate PLL is used by the Northbridge for
generating a 1x pixel clock that is sent back to Ana to clock in pixel data on the (Xenon Digital
Video Output) XDVO bus. The video clock and 1x pixel clock maintain integer ratios determined by
the level of over-sampling the DVE is using. The programming of these clocks is done via the CI.
The clock synthesizer block includes power down inputs that can be used to turn off some of the
clock drivers (via software) during standby mode.
The Digital Video Encoder (DVE) is used to convert a digital pixel stream from an external device
into analog video suitable for output to a TV or monitor. The DVE supports a variety of analog
video standards including NTSC, PAL, component standard definition and high definition as well as
VGA video standards. The DVE does not support resizing of video data so the Northbridge must
supply video data with the resolution required by the video encoding format. The bulk of the video
processing required in the system is performed by the Northbridge. There is limited video
processing functionality in the video encoder block itself.
The DVE receives a video clock from the CS that is used to generate the output video timing
signals and clock the digital processing pipelines in the DVE. This video clock is different for the
various supported output video standards and it is therefore programmable in the CS. A 2x pixel
clock is output to the Northbridge and a 1x pixel clock returned. The input 1x pixel clock is used to
clock in the digital pixel data received from the Northbridge’s pixel source data.
The Thermal Sensor module (TS) receives input from 5 remote diode channels. Four of these
remote diode channels will be connected to thermal diodes on system components (CPU,
Northbridge, EDRAM and a voltage regulator temperature sense diode). One channel is devoted
for calibrating the thermal sensor during manufacturing board functional test. The TS is comprised
of an analog switch, ADC and current source generator along with digital logic for delivering
temperature data to a register interface. The TS registers are accessible through the CI. The TS
uses a delta VBE method for measuring temperature.
Ana contains a power-on reset cell which detects 3.3V standby, 1.8V standby and generates an ok
signal once both voltages cross appropriate thresholds. This ok signal is fed to logic which
generates internal resets for the Ana PLLs as well as the SMC reset for the Southbridge. The
power-on reset cell also has a separate comparator to detect thresholds on an external sense input
which is divided down from the system board’s 12V supply.
Ana contains two fan driver op-amps which each take as inputs

Microsoft Proprietary and Confidential. Page 23 of 46


Version: 1.3 Xenon System Architecture

• A pulse width modulated signal converted to a fixed voltage from the Southbridge
• A feedback signal from external fan driver circuitry
• The output of the op-amps drive the external fan drive circuitry.
Ana contains a JTAG interface for boundary scan of the XDVO bus and for controlling an internal
tap controller which can be used to access analog IP test structures. Parallel scan chains will be
used for ATPG coverage of digital logic.
Interrupt functionality is provided via an interrupt pin (VID_INT). This pin is attached to the closed
caption logic in the DVE and will signify when the hardware is ready to accept more closed caption
data for a specific field.
There are also miscellaneous pins on Ana devoted to various contingency/visibility options (e.g.
bypasses for PLLs, power on reset cell and crystal oscillator, bringing out video clock, oscillator
output and output of power-on reset cell to pins).
The following pertains to the system clock generation:
• All component outputs are observable on Ana pin for debug and measurement purposes.
Similarly, all inputs to components can be directly supplied from pins (either during
standard operation or via bypass inputs for debug, measurement, and contingency
purposes.
• Power down capability for every PLL (but not the oscillator).
• Single 27 MHz oscillator source with bypass allowing external clock source. Output of
oscillator available on external pin for debug, measurement purposes as well as ability to
slave external clock generation device to internally generated clocks (though phase locking
not supported). Always running as long as box is plugged into wall.
• Programmable Video (and 2xPixel) clock generation (up to 170MHz video DAC frequency
or 135 Mpix/sec pixel rate) for video encoder and pixel interface. With bypass capability
allowing video clock to be supplied from external clock source. Locked to audio clock.
2Xpixel clock is output externally via differential outputs. Output enabled only when box is
powered on.
• 24.576MHz Audio clock generation. Locked to video clock. Output enabled only when box
is powered on.
• Input to audio and video PLLs from either the on-chip oscillator or from external clock
source. Allows for off-chip pullable oscillator.
• 25 MHz clock generation for Ethernet clock and Southbridge. Outputs enabled only when
box is powered on.
• 48 MHz standby clock generation for internal SMBus clock (bypassable with external pin)
and for standby clock to Southbridge. Always on as long as box is plugged into wall.
• 100 MHz clock generation for Serial ATA components and interfaces. Selectable spread
spectrum with -0.5% down-spread triangle modulation. Output via differential outputs.
Output enabled only when box is powered on.
• 100 MHz clock generation for Northbridge, PCI Express and CPU (separate differential
outputs for each. Selectable spread spectrum with up to -1.5% down spread spectrum
modulation. Outputs enabled only when box is powered on.
The following pertains to the thermal measurement block:
• remote temperature sensing channels to monitor CPU, GPU, EDRAM and board
temperature diodes in addition to calibration channel
• Metal fuse window planned for trimming band gap

Microsoft Proprietary and Confidential. Page 24 of 46


Version: 1.3 Xenon System Architecture

• Programmable Resolution (<= 1 oC)

• +/- 2o C accuracy in 80– 140C range, +/- 4 oC accuracy in 0-79C range


• < 500 uA operation current
The following pertains to the video encoder:
• Support SDTV formats NTSC-M/J, PAL 60 (640x480I/60Hz, 720x480I/60Hz), PAL-
B/B1/D/G/H/I (640x576I/50Hz, 720x576I/50Hz).
• Support EDTV formats (640x480P/60Hz, 720x480P/60Hz, 640x576P/50Hz,
720x576P/50Hz)
• Support HDTV formats (1280x720P/50Hz, 1280x720P/60Hz, 1920x1080I/50Hz,
1920x1080i/60Hz)
• Support VGA formats (programmable up to 135MHz pixel rate).
• Component Output Support (R/G/B, Y/Pb/Pr), CVBS/Composite Support (CVBS/Y/C), or
SCART Output Support (CVBS/R/G/B) (only one at a time)
• Programmable Color Space Conversion (for SCART RGB input for composite output).
• Support 10:10:10 bit YUV or RGB input data.
• Up to 12x over-sampling supported to reduce reconstruction filter requirements.
• Programmable filters for components, composite luma, composite baseband chroma, and
composite bandpass chroma.
• Programmable sync slew rates.
• Support closed captioning for SDTV formats.
• Support Macrovision 7.1L1, EDTV (525p/625p).
• Support WSS encoded in VBI interval for SDTV, EDTV formats.
• Support sideband WSS signaling for Japan s-video and SCART
• Support sync-on green, digital CSYNC or digital HSYNC/VSYNC for VGA
• Support CGMS-A
• Slave mode timing.
The following pertains to the video DAC:
• Four 10-bit DACs.
• Video Signal to Noise > 75 dB (noise relative to flat DC input)
• Differential nonlinearity < +/- 1.0 LSBs
• Integral nonlinearity < +/- 2.0 LSBs
• 35 mA drive capability per DAC
The following pertains to the Power Supply:
• 3.3V standby power supplies for analog IP and digital/analog I/Os
• 1.8V standby power supply for digital IP
• 3.3V power supply for DAC

Microsoft Proprietary and Confidential. Page 25 of 46


Version: 1.3 Xenon System Architecture

4.6 Front Side Bus (Art)

The FSB is divided into six sections consisting of three Layers which are further broken into
Transmit and Receive sections: The Transport layer communicates with the rest of the CPU/GPU,
The Link layer is responsible for CRC generation/checking and handling error checking and packet
retransmission (as well as data alignment in the receive section), and the Phy layer, which changes
from a wide, slower interface to the 5.4 Gb/sec. PCB lanes. The phy receive layer is responsible
for bit-aligning the data to the forwarding clock transmitted with the data.
The CPU FSB transport layer also communicates with the Security Unit for encryption and
decryption of some data. It also has some other logic not needed in the GPU version: An ability to
map the 10-bit MPI tags used inside the CPU to the 5-bit Transaction IDs (TIDs) used by the FSB
(and the GPU). Other than these differences, the primary difference between the CPU and GPU
versions of the FSB are that they are implemented in different silicon processes, and the units have
different primary datapath widths: 8 bytes in the CPU and 16 bytes in the GPU.
Needs update …
The Design of the FSB Link and PHY will be provided by the CPU company.
The FSB is a high performance bus that connects the GPU and the CPU. Like many high speed
busses, it is comprised of a link and a physical layer. The general physical specifications of the
bus are in the following table.

Topology Unidirectional Point- Point


Signaling Low voltage Differential CML, DDR
Clocking Clock forwarding, On Clock signal per byte
Frequency 3Ghz Clock for 4Gb/s data rate
Number of Lanes CPU-GPU: 32 GPU-CPU: 16
Byte Transfer Parallel to Serial (8:1)
Framing Frame signal per 16 bits
Transfer efficiency 89%

4.6.1 Link Layer


Defines Packets for transactions
Maintains Credit-based flow-control
Detects some Errors, Retries if possible and Reports hard errors.
The Link layer protocol defines the packets sent over the link as well as link initialization, flow
control and error handling. Link packets carry transactions (i.e. Read, Write) from the CPU/GPU
while also taking part in maintaining flow control for each of 4 virtual channels that are supported
over the link. Link Packets are specifically designed for the coherent system in that fields and
command types support requests and responses are required by the coherent environment.
4.6.1.1 Packet Types:
Packet Type Usage
Control 0 Packet Xmit Acks and Credits for VC 0,1 – used for link
training and link retries in event of sequence error
Control 1 Packet Xmit Acks and Credits for VC 2, 3
Basic Command Packet 8 Byte Payload packets min.
Extended Command Packet 8 Byte Payload Packets with mask
Response Packet Completion with or without data

Microsoft Proprietary and Confidential. Page 26 of 46


Version: 1.3 Xenon System Architecture

4.6.1.2 Command Types and Packets Used

Command Type Packet Type Used Description


Write (Length spec’ed) Basic command Write of various lengths up to 128 bytes
(count indicated with a field). All bytes
valid and to be written
Write Extended Includes a byte mask
Read Basic command Different Flavors for Read Line, Read
Dword, Read Intent to modify, Read with
no intent to cache, Read W/O Claim
Synchronization Basic Used to enforce ordering requirements
Coherence Commands Basic Castout, Clean, Deallocate Directory tag,
Dkill, Flush, IKill
Rerun Basic
Interrupt Basic
IORead Basic
IOWrite Basic
Data Response Response
Completion Response (non- Response
data)

For each of the FSB commands, the packets include the following informational fields:

Field # Bits Usage


Command Type 5 Defines type – i.e. Read, Write, Interrupt
Virtual Channel 2 Channel # used – VC’s allow flexible ordering models
Sequence Count 4 Per VC – used to track lost packets – up to 16 outstanding
Transaction ID 4 Effectively a Tag to identify responses
Address 32 Memory address for transaction
Byte Mask 8 Used for partial updates
Command Modifiers 3 Defines more specific action for a command – i.e.
Synchronization types for Sync command or
Castout vs Kill for Coherence commands
CRC 16 Like parity only better
<Chip to add data to packet>

4.7 Back Side Bus

The Back Size Bus is a 2 lane PCI-Express link connecting the SouthBridge to the GPU.

\\xenon\specs\Architecture\Xenon BackSide Bus Spec.doc

4.8 Memory Bus

The memory bus shall conform to the GDDR3 specification. The key features of the GDDR3 spec.
are:
• Unified memory architecture utilizing high-speed graphics memory devices.
• Total data bus width of 128-bits
• Data signaling at 1.6 Gbps, implying peak bandwidth 25.6 GB/sec.

Microsoft Proprietary and Confidential. Page 27 of 46


Version: 1.3 Xenon System Architecture

• Because memory pricing is volatile and difficult to forecast, these targets could change and
the interface must support a range from 128 MB to 1 GB of memory.
• To provide better utilization of the memory bus, two independent memory controllers are
likely. The memory interleaving between these controllers, as well as between the internal
banks within the DRAM, is to be determined.
• Support for a path to (2) 1 Gb parts is necessary in the controller, though it may never be
cost effective to make this transition. This must be completely seamless.
• Note that identical performance between XDKs and consoles is only met when running in
the lower half of the XDK memory. When the upper half of the XDK memory is used, there
is an extra cycle penalty when doing back to back reads from alternating ranks.
• The memory interface shall support a boundary scan capability to allow verification of the
connection between the GPU memory controller and the memory devices. The test
coverage shall include shorts between signals, opens on the GPU, the signal trace or
memory device. It may support coverage that allows isolation of the problem to a particular
end of the signal; for example, the open is on the GPU side and not the signal trace or
memory device.
The memory configurations supported by the GPU and memory interface are documented in the
GPU section. See chapter 4.2.

4.9 SMBus

I think only one is required, that which must communicate to the AVIP. The mechncanism by which
the DVE is configured I believe is implementation-specific.
The system shall support two SMBus V2.0 compliant interfaces. These interfaces may be
contained in the Southbridge and shall be accessible by the CPU and System Management
Controller.
One SMBus interface shall be used for communication internal to the system. In the existing
system architecture, the interface is used to connect the Southbridge to the Ana IC. If other SMBus
devices are added to the system, they shall use this interface.
The second SMBus interface shall be used for communication with external devices over the Audio
Video Interface Port (AVIP). The two uses identified are:
• Video Electronics Standards Association Display Data Channel (VESA DDC) found on
Video Graphics Adapter (VGA) monitors
• Controlling the console in a Kiosk for point of sale demonstration purposes. The controls
include power, DVD and system settings. This application may be extended to include
development and factory test.
The SMBus interfaces may be time multiplexed, thus the Southbridge is the only supported master.

Microsoft Proprietary and Confidential. Page 28 of 46


Version: 1.3 Xenon System Architecture

5 System Components
5.1 Questions

Are we going to encrypt data. Linked to per box basis? What level or performance extraction.
Critical levels of performance. Be explicit on this.
How do we bundle things together for online. Harddrive be some of internal. Assume user level
expansion.
Some of these should be pluggable by the user. Have Xbox LOGO program.

5.2 DVD

The DVD drive is a custom form factor drive built specifically for the Xenon console.
General Specifications:
Form-Factor: Sub-half-height, custom form factor for Xenon console[NB5]
Interface: SATA 1.0 + sideband tray control and status
Speed: CAV, 12x DVD at outer diameter
Media formats (read-only): CD, CD-R, DVD-5, DVD-9, DVD-X2 (Xbox 1.0), DVD-X3 (Xbox
2.0)
Access time: 115ms average

5.3 [NB6]HDD

The HDD is based on a cost-optimized small form factor drive.


General Specifications:
Form-Factor: Standard 2.5”[NB7]
Capacity: 20GB or more
Interface: SATA 1.0
Speed: Comparable to industry metrics of 10ms average seek time, 20MB/s transfer rate

5.4 [NB8]MU

The console has two MU-specific slots on the front panel.


General Specifications:
Capacity: 64MB to 1GB[NB9]
Interface: USB 2.0 logical interface, custom slot connector
Durability: 300k write/erase cycles[NB10]
Random Access Time: 25ms
Read Speed: 8MB/s
Write speed: 1MB/s

Microsoft Proprietary and Confidential. Page 29 of 46


Version: 1.3 Xenon System Architecture

5.5 Game Controllers

The console architecture supports connectivity to both wired and wireless game pads designed
specifically for Xenon.
Wired Gamepad:
Interface: X-USB, low-speed USB signaling with expanded payload capability
Power: 4 units or less of USB power
Connector: Standard USB connector
[NB11]Wireless Gamepad:
This section should focus on how the wireless radio interconnects to the system via USB. There
should be reference to the remote resume requirement.
Wireless Interface: Custom 2.4GHz spread spectrum, half-duplex transceiver
Wired Interface: Used during recharging, X-USB

5.6 [NB12]Network

The console includes an Ethernet network port:


Connector: RJ-45 with integrated LED indicators for link and activity status.[NB13]
Connection: 10Mbit and 100Mbit Ethernet
Peer-to-peer connectivity: Auto-MDIX supports peer-to-peer connectivity without hub or
crossover cable.

5.7 Peripheral Expansion

Expansion of the base console and connection to peripheral devices is via three USB 2.0 ports, two
mounted on the front panel, one mounted on the rear panel.

6 Distributed Components

6.1 Audio

A description of the distributed Audio processing can be found in the following document.

\\xenon\specs\Architecture\Xenon Audio.doc

6.2 Video

A description of the distributed Video processing can be found in the following document.

\\xenon\specs\Architecture\Xenon Video.doc

Microsoft Proprietary and Confidential. Page 30 of 46


Version: 1.3 Xenon System Architecture

7 Key Architectural Mechanisms


7.1 System Dataflow

This document discussed the main producer / consumer models of the system.

\\xenon\specs\Architecture\Xenon System Dataflow.doc

7.2 System Memory Map

The System Memory Map is maintained in 2 separate documents:


\\xenon\specs\Architecture\System Memory Map (32 bit).doc
\\xenon\specs\Architecture\System Memory Map (42 bit).doc

7.3 System Coherence

\\xenon\specs\Architecture\Xenon System Coherence Model.doc

7.4 System Interrupt Mechanism

\\xenon\specs\Architecture\Xenon Interrupt Specification.doc

7.5 System Ordering

\\xenon\specs\Architecture\Xenon System Ordering.doc

2-6-04 Ordering
I’ve extracted the relevant part of Hartog’s response on this (long and tortuous) thread:

I think that the implication of the behavior we're discussing here is that the GPU and the CPU cannot reliably use a flag
in memory to indicate the arrival of some chunk of data from the CPU, since if the GPU polls such a flag, it can see it
set before the data is actually delivered. This occurs because the in-order property for writes is not preserved for
locations that are in different banks (channels) , and because there is no concept of a "conflicting read" for reads issued
by the GPU. I think the question for us is whether the GPU ever needs to do such polling. Right?
[SH] Exactly.

7.6 System Security

\\xenon\specs\Architecture\Xenon System Security Scheme.doc

7.7 System Endianess

\\xenon\specs\Architecture\Xenon Southbridge Endianness.doc

7.8 System Clocking

\\xenon\specs\Architecture\Xenon System Clocking.doc

Microsoft Proprietary and Confidential. Page 31 of 46


Version: 0.40 Xenon System Architecture

7.9 System Boot

<SMC Boot Sequence + FLASH Swizzle>


How does the kernel boot. What is the boot sequence. What systems come up in what order. What
resets get released. How does hardware get initialized with hardware drive strengths. Everything
up to high.
Not using standard BIOS.
Need to let CPU execute out of Flash, i.e. a path must exists that allows the CPU to access code
from the system flash connected to the SouthBridge. This implies that out of reset, the hardware
must be in a default state that a). enables this path and b). allows this path to be reliable. This is
required before any internal registers within the CPU, GPU or SouthBridge can be set.

1. SMC boots then waits for a power up event


a. When power is applied to the console, the SMC comes out of reset and the SMC
boot loader, which is hard coded in a ROM inside the Southbridge, copies the
SMC code from the system flash into a code RAM located inside the Southbridge.
The System Flash Controller is responsible for correcting errors that occur during
the copy from the system flash to the internal code RAM. Any unrecoverable errors
shall be handled by the SMC boot loader.
b. SMC boot loader restores SMC state to a reset state and then jumps to the code
RAM.
2. SMC detects power up event and boots system
a. The SMC enables the system clocks, and voltage regulators
b. The SMC released the SB, and then the GPU from reset. It monitors the BSB link
training and on completion of that, releases the CPU from reset.
3. The CPU trains the FSB link and starts executing the 1BL (first boot loader) code from the
internal ROM. The code in the internal ROM shall have the minimum device dependent
settings and information to allow loading and execution of the 2BL from the system flash.

The CPU, GPU and SB shall support a mechanism to allow fetching and execution of the
1BL from the system flash. This mechanism shall be functional without any configuration or
setup.
4. The 2BL sets up the hardware to get memory going and anything to get ROM XIP more
performant. Once the memory is up and running, the 2BL copies the 3BL into memory and
jumps to it.
5. The 3BL uses the System Flash controller’s DMA interface and copies the kernel from the
system flash into memory. It patches and verifies the memory image. Again system flash
interface in SB deals with idiosyncrasies of NAND flash. Incidentally, some value slightly
less than 8 MB is actually available to software due to parts shipping with bad blocks.
6. The CPU jumps to the kernel now in memory.
7. Main system boot phase. Note that none of what was described dealt with encryption or
security in the Southbridge or the system flash itself. Believe this is consistent with
Dinarte’s preferred scheme but need to parse latest mail and verify this.

Microsoft Corporation Confidential 32


Version: 0.40 Xenon System Architecture

7.10 System Reset

This section should focus on the aspects of reset sequence that are relevant to architecture. It is
my opinion that the details of reset sequence are entirely design-specific.
System Reset is controlled by the System Management Controller. The various reset signals are
configured in a star topology, with the SMC as the hub. The reset sequence follows the power-
supply sequence, and consists of sequentially releasing reset to various components, verifying an
acknowledgement, and proceeding to the next stage.
The detailed power-on reset diagram is located in
\\xenon\Specs\Hardware\Xenon Power on Reset.vsd

7.11 System Time

The system time shall be maintained in a forty bit counter.


The time base for the counter shall be one millisecond.
The tolerance shall be below +/- 50 ppm over the life of the system.[NB14]
The counter is reset when the system is in OFF state. The time shall be maintained when in
Standby power state.

7.12 Power States & Power Management

The three power [NB16]states for the system are: “Off”, “Standby”, and “Operational”.
To reduce overall system power consumption, power minimization techniques may be employed in
all states.
This section should describe the aspects in which the architecture needs flexability in order to
support quiet mode operation:
1. CPU cores must be capable of being individually enabled/disabled
2. The GPU must have capability of disabling shaders
3. The CPU shall have selectable clock frequency between “Fast” and Slow”

SW will need to go through a rigid sequence of operations to gracefully change power modes
(there is no automatic hardware power sequencing control).

7.12.1 Off
This is the state of the system when the power supply is not plugged into the wall and/or the power
supply is not plugged into the system.
When the system power supply is plugged in and the power supply is connected to the system, the
system shall transition to the Standby state.
Regardless of how short a duration of time that the system enters this state, on application of
power, it shall transition to the Standby state without operator intervention subject to certain ESD,
susceptibility limitations.
7.12.2 Standby
This state shall be entered from either Off, Quiet or Full Power states.

Microsoft Corporation Confidential 33


Version: 0.40 Xenon System Architecture

In this state, the functions that shall be powered include: The clock generator for the SMC, the
SMC, the SMC firmware store, the front panel button circuitry, the expansion ports, the power
circuit in the AVIP, the IR receiver and demodulation block, the wired and wireless controller ports.
On detection of any power up event on the front panel buttons circuitry, the AVIP, the IR
demodulator, the wired and wireless controller ports, and the SMC power cycle timer, the system
shall transition to the Full Power state.
For details on how the system handles power events via the expansion port, see section 7.12.4.2
In the event of a power interruption that is less than 16 milliseconds in duration, the system shall
remain in this state and continue to operate as if no power interruption occurred. If the power
interruption is longer than 16 milliseconds, the system may transition to the Off state.[NB18]
7.12.3 Operating States
This section should take the place of the Quiet and Full Power states below, and focus on the
configurations of the system to implement various power states. The degrees of freedom are:
• CPU Cores enabled {1, 2, 3}
• GPU Shaders enabled { 1, 4, 12}
• ODD spin max speed: {slow, med, fast}
7.12.3.1 Quiet
This state shall be entered from Full Power state only.
This state is designed for A/V playback and wireless game pad charging. The goal is to minimize
system acoustic level. This shall be accomplished by reducing the system power consumption to a
minimum. The CPU, NB shall include circuitry that allows portions of the chips to be turned off,
slowed down either under program control or by internal circuit usage determination functions.
The SB may include circuitry that allows portions of the chips to be turned off, slowed down either
under program control or by internal circuit usage determination functions.
On detection of a power down event, the system shall transition to the Standby state.
On detection of loss of power, the system software shall put the system in a safe state and
immediately transition to the Standby state.
The system software may transition the system to Full Power state based on user input.
The system software shall monitor transitions between this state and Full Power state and ensure
that they occur at a frequency that is not annoying to the user.
7.12.3.2 Full Power
This state shall be entered from Standby or Quiet state.
In this state, the system shall operate at its maximum capabilities.
On detection of a power down event, the system shall transition to Standby state.
On detection of loss of power, the system software shall put the system in a safe state and
immediately transition to the Standby state.
The system software may transition the system to Quiet state based on user input.
The system software shall monitor transitions between this state and Quiet state and ensure that
they occur at a frequency that is not annoying to the user.
7.12.4 Power Control Events[NB19]
This section documents the power control events that can be generated via the front panel button
circuitry, AVIP, the IR receiver and demodulator, the USB expansion port, and the wireless

Microsoft Corporation Confidential 34


Version: 0.40 Xenon System Architecture

controller ports[HS20]. The event detection is done by a different subsystem depending on the
current state of the system.
Since there is the possibility of multiple power control events occurring simultaneously, the SMC
shall implement the following behavior:
• On detection of the first power up event, the SMC shall ignore all power control events for
a period of 500 milliseconds.
• The SMC shall report the first power up event
• Need to work with usability to create a chart that has priorities, multiple sources, etc. For
example, box is in standby, user presses and holds remote control on button and then
presses the power button on the console. Should the console go to full power (remote),
then transition to standby (front panel button), and then go back to full power (remote) ?
• I have put verbiage below on power transitions however, keep in mind that this verbiage
will be modified once we have the full chart from usability.
7.12.4.1 Front Panel Buttons
The front panel consists of two momentary push buttons: Power Control Button, DVD Tray Control
Button.

7.12.4.1.1 Power Control Button


This button shall be used to toggle the power of the system from Standby to Full Power and from
either Full Power or Quiet to Standby.
When this button is pressed and the system is in Standby mode, the SMC shall note the source of
the power up event is the power control button, transition the system to Full Power state and when
requested by the system software, provide the power event source.
When this button is pressed and the system is in Quiet or Full Power state, the SMC shall note the
source of the power down event is the power control button, send a message to the system
software and wait for a response. If the system software doesn’t respond within 5 seconds, the
SMC shall transition the system to the Standby state.

7.12.4.1.2 DVD Tray Control Button


Insert DVD tray control info. here.
7.12.4.2 Expansion Port
A device plugged into any of the front or rear USB expansion ports may signal for a power up event
by signaling USB remote wakeup. The SMC shall monitor the state of the DP/DN signals on each
port, and look for either a J to K or a K to J state transition, which indicates a remote wakeup has
been signaled.
Standard USB devices only signal remote wakeup if they are enumerated, that feature is enabled,
and subsequently placed in standby mode. The wired game pad shall be designed such that
remote wakeup may be signaled at any time following connection; i.e. the wired game controller
does not need to be preconfigured to allow remote wakeup signaling.
While the system architecture supports transitions from Quiet state or Full Power state to Standby
state, the implementation is heavily device, device protocol and system software dependent. The
implementation of this is beyond the scope of this document and is not covered here.
7.12.4.3 AVIP Power Control
There is a dedicated signal from the AV interface port that is used to enable external devices to
issue power control requests and well as issue events to the SMC that require further action. The
PWRON signal is pulled up to 3.3V standby power on the motherboard and is an input to the SMC
and output to the AVIP. When the system is in Standby state, the SMC shall monitor the PWRON

Microsoft Corporation Confidential 35


Version: 0.40 Xenon System Architecture

signal for a low value. A power on sequence is a low value for >40 msec. Upon detection of the
power up sequence, the SMC shall note the source of the power up event is the AVIP power up
command, transition the system to Full Power state and when requested by the system software,
provide the power event source.
In Quiet or Full power states, the SMC shall continue to monitor the PWRON signal. A low value
sequence that is ~40msec in length is used to indicate that the SMC needs to issue a transaction
on the DDC interface on AVIP to go read further control/status information. One of the control
modes returned could be a power down request. Also, a low value sequence that is ~200msec or
more is used to indicate a forcible power down state. Upon detection of either of the power down
sequences, the SMC shall note the source of the power down event is the AVIP power down
command, send a message to the system software and wait for a response. If the system software
doesn’t respond within 5 seconds, the SMC shall transition the system to the Standby state.
If auto power on required, a controller connected to the AVIP interface could use a relay (such that
when power is not applied, the signal is shorted to ground) to ground the signal which SMC
samples to indicate power on. +5V then becomes available to controller which energizes the relay
to isolate the signal from ground and allow the controller to drive the signal. If no auto power on is
needed, one could hook a momentary switch to the PWRON to pull the signal to ground for power
up. The SMC maintains power on state so that when the machine is commanded to shut down
(and the +5V goes away, which cause the relay to shut PWRON to ground), the SMC will then
ignore the fact that the PWRON is pulled low (so you don't get endless repetition). Similarly,
because the enabling of +5V may take sometime, the SMC will wait to see PWRON go high on a
transition from standby and poweron. If that event doesn’t happen in 5 secs, the SMC will log an
error event and power down the machine until it sees the PWRON signal asserted.
The AVIP Power Control protocol is document in the Xenon Design Specification.
7.12.4.4 IR Receiver and Demodulator
This block is different from other blocks in that the output of this block has to be processed to
determine whether a power control event has occurred.
When the system is in Standby state, this block shall decode the signal from the IR receiver and
output it to the SMC. The SMC shall interpret the command and compare it to the IR power up
command. If the received command matches the IR power up command, the SMC shall note the
source of the power up event is the IR power up command, transition the system to Full Power
state and when requested by the system software, provide the power event source.
In Quiet or Full power states, this block shall continue to decode the signal from the IR receiver and
output it to the SMC. The SMC shall interpret the command and compare it to the IR power down
command. If the received command matches the IR power down command, the SMC shall note the
source of the power down event is the IR receiver, send a message to the system software and
wait for a response. If the system software doesn’t respond within 5 seconds, the SMC shall
transition the system to the Standby state.
7.12.4.5 Wired Controller Ports
In a typical wired system topology there are one or more game pads that are plugged into the wired
controller ports on the console – one game pad per controller port.
Please note that through out this section the term game pad refers to the device that the user
manipulates and the term controller port refers to the hardware in the console that the game pad
connects to.
The wired game pads, controller ports, the SMC and system software shall be designed to allow
control of the system power states via the XUSB protocol. It shall not require any additional signal
wires or methods. The XUSB protocol is a derivative of the USB protocol where the data bit time
has been increased and intelligent downstream traffic routing is implemented. In the USB protocol,
downstream traffic is broadcast to all leaf nodes of the USB tree, in intelligent downstream traffic

Microsoft Corporation Confidential 36


Version: 0.40 Xenon System Architecture

routing it is only sent to the specific leaf that the traffic is intended for. Both of these changes result
in lowered emissions, as well as increased cable lengths.
When the system is in Standby state, the controller port shall be powered and be able to detect
XUSB protocol based connect, disconnect and resume signaling from each wired controller port
and output it to the SMC; the state of the output to the system software is undefined since the
system software is not running in this state.
The SMC shall interpret the command using XUSB protocol connect, disconnect and resume
timing to determine if the controller is present and if so, requesting that the system be powered up.
On determination that a power up event has occurred via the controller port, the SMC shall note
particular controller port which is the source of the power up event, transition the system to Full
Power state and when requested by the system software, provide the power event source.
When the system enters Standby state:
• If there is no game pad plugged into a controller port, the XUSB signal lines shall be at a
single-ended 0 (SE0) state and the output to the SMC shall indicate the controller port
state is disconnected.
• If the user plugs in a game pad to the controller port and the power up button sequence
has not been activated, the game pad shall drive XUSB signal lines to the J state. This
behavior is the same when the controller is plugged into a system that is in Quiet or Full
power states. Furthermore, once the game pad detects that the bus is idle (J state with no
traffic) for the XUSB suspend time duration, it shall transition to a low power state. In this
low power state, the game pad shall be able to monitor its buttons for the power up button
sequence and perform XUSB resume signaling.
• If there is a game pad plugged into a controller port and the power up button sequence has
not been activated, the controller shall drive the XUSB signal lines to the suspend state (J
state) and the output to the SMC shall indicate the controller port state is connected.
• If there is a game pad plugged into a controller port and the power up button sequence is
activated, the game pad shall generate XUSB resume signaling by transitioning the XUSB
signal lines to the resume state (K state) for the XUSB resume time duration. The controller
port shall detect the resume state and the output to the SMC shall indicate the game
controller port state is resume.
When the system is in Quiet or Full power states, this block shall continue to detect the connect
and disconnect signaling and output it to the system software. While it may continue to output
these states to the SMC, the SMC shall ignore this output.
In the Quiet and Full power states the power sequencing is controlled by the system software in
concert with the game pad. When the user activates the Standby state button sequence, the
controller shall send the button sequence to the system software. The system software shall
process the sequence and this processing may ask the user for confirmation of the power state
change and after confirmation, initiate the process of changing the power state to the Standby
state.
In the event the system software is non-responsive to the Standby state button sequence, the
sequence will be lost and the system power state will not be alterable via this power control event.
Since the button sequence is sent as part of a XUSB data transfer, the controller port shall not
incorporate any logic to detect this.
7.12.4.6 Wireless Controller
In a typical system topology, there are one or more wireless game pad devices that connect via a
bidirectional wireless link to the wireless transceiver in the console. The wireless transceiver
connects to a wireless controller port via a USB interface. One notable difference between wired
and wireless controllers is that in the wired system, there is one controller port per game pad. In the

Microsoft Corporation Confidential 37


Version: 0.40 Xenon System Architecture

wireless system, all the game pads share the same wireless link and connect to a single
transceiver which connects to a wireless controller port in the console.
Please note that through out this section the term game pad refers to the wireless game pad device
that the user manipulates and the term controller port refers to the wireless controller port in the
console.
There are two ways to do this – one way is use USB suspend and resume like the wired controller
and the second way is to use a separate wakeup signal from the transceiver.

7.12.4.6.1 USB Power Control


The game pads, link, transceiver, controller port, the SMC and system software shall be designed
to allow control of the system power states via the USB protocol. It shall not require any additional
signal wires or methods.
When the system is in Standby state, the controller port shall be powered and be able to detect
USB protocol based resume signaling from the transceiver and output it to the SMC; the state of
the output to the system software is undefined since the system software is not running in this
state.
The SMC shall interpret the command using USB protocol resume timing to determine if the
transceiver is requesting that the system be powered up. On determination that a power up event
has occurred via the controller port, the SMC shall note that the wireless controller port is the
source of the power up event, transition the system to Full Power state and when requested by the
system software, provide the power event source.
When the system is in Standby state:
• The transceiver shall attempt to establish a link with game pad(s).
• While the transceiver hasn’t established a link with a game pad, it shall drive the USB
signal lines to the J state. Once the transceiver detects that the bus is idle (J state with no
traffic) for the USB suspend time duration, it shall transition to a low power state. In this low
power state, the transceiver shall be able to establish a link with game pad(s) and perform
USB resume signaling.
• If the wireless transceiver establishes a link with a game pad(s) and the power up button
sequence has not been detected, it shall continue to drive the USB signal lines to the J
state. Once the transceiver detects that the bus is idle (J state with no traffic) for the USB
suspend time duration, it shall transition to a low power state. In this low power state, the
transceiver shall be able to maintain the link with game pad and establish links with other
game pad(s) and perform USB resume signaling.
• If the wireless transceiver establishes a link with a game pad(s) and the power up button
sequence has been activated, the transceiver shall generate USB resume signaling by
transitioning the USB signal lines to the resume state (K state) for the USB resume time
duration. The controller port shall detect the resume state and the output to the SMC shall
indicate the controller port state is resume.
In the Quiet and Full power states the power sequencing is controlled by the system software in
concert with the game pad. When the user activates the Standby state button sequence, the game
pad shall send the button sequence to the system software. The system software shall process the
sequence and this processing may ask the user for confirmation of the power state change and
after confirmation, initiate the process of changing the power state to the Standby state.
In the event the system software is non-responsive to the Standby state button sequence, the
sequence will be lost and the system power state will not be alterable via this power control event.

7.12.4.6.2 Dedicated Power Control

Microsoft Corporation Confidential 38


Version: 0.40 Xenon System Architecture

The game pads, link, transceiver, controller port, the SMC and system software shall be designed
to allow control of the system power states via a separate set of signals between the transceiver
and the SMC instead of using the USB signaling between the transceiver and the controller port.
When the system is in Standby state, the controller port may be powered down and does not need
to detect USB protocol based resume signaling. If the controller port is powered down, the
transceiver connection to the controller port shall also be powered down.
The transceiver and SMC power control signals consist of a console standby power state signal
(TRAN_STANDBY) that goes from the SMC to the transceiver and a system resume signal
(TRAN_RESUME) from the transceiver to the SMC. The TRAN_STANDBY signal indicates to the
transceiver whether it should monitor the links for power up button sequences and relay the status
to the SMC via the TRAN_RESUME signal. On determination that a power up event has occurred
via the TRAN_RESUME signal, the SMC shall note that the wireless controller port is the source of
the power up event, transition the system to Full Power state and when requested by the system
software, provide the power event source.
When the system is in Standby state:
• The SMC shall assert TRAN_STANDBY
• On detection of TRAN_STANDBY, the transceiver shall transition to a low power state. In
this low power state, the transceiver shall be able to establish a link with game pad(s) and
assert TRAN_RESUME as appropriate.
• While the transceiver hasn’t established a link with a game pad, it shall negate
TRAN_RESUME.
• If the wireless transceiver establishes a link with a game pad(s) and the power up button
sequence has not been detected, it shall negate TRAN_RESUME. In this low power state,
the transceiver shall be able to maintain the link with game pad and establish links with
other game pad(s) and perform USB resume signaling.
• If the wireless transceiver establishes a link with a game pad(s) and the power up button
sequence has been activated, the transceiver shall pulse TRAN_RESUME for TTRANRESUME.
The SMC shall detect the TRAN_RESUME pulse and initiate the transition of the system to
Full Power state.
In the Quiet and Full power states, the SMC shall negate the TRAN_STANDBY signal and ignore
the state of the TRAN_RESUME signal.
In the Quiet and Full power states the power sequencing is controlled by the system software in
concert with the game pad. When the user activates the Standby state button sequence, the game
pad shall send the button sequence to the system software. The system software shall process the
sequence and this processing may ask the user for confirmation of the power state change and
after confirmation, initiate the process of changing the power state to the Standby state.
In the event the system software is non-responsive to the Standby state button sequence, the
sequence will be lost and the system power state will not be alterable via this power control event.

7.13 CPU/GPU Synchronization (Nick)

At a high level how the CPU/GPU are expected to communicate and synchronize with one another.
Most of the synchronization mechanisms are pretty standard w.r.t. PC graphics, with the exception
of the procedural geometry scheme discussed in the next section.

Microsoft Corporation Confidential 39


Version: 0.40 Xenon System Architecture

7.14 CPU – GPU Procedural Geometry Communication (Nick)

This section describes what the GPU hardware must do to implement the procedural geometry
algorithm. For further details of this algorithm, refer to the document “Xenon Procedural Geometry”.
Conceptually, the GPU’s main command list is stored in main memory. The CPU kicks off the
command processor in the GPU through a register write that points the command processor to a
memory address. The command processor starts fetching commands and data from this address
until a pre-programmed stop address is reached.

GPU CPU
Command
Rest of
the
proc. Reg
Start_addr writes
pipeline
Curr_addr

Vertex Write Vertex


data back data
reads data writes

Main Memory

We will discuss the current implementation from the concept later in this section.
First we will discuss the command processor commands.
7.14.1 Vertex commands
In addition to the traditional commands we need the following commands for the procedural
geometry algorithm.
CALL <addr>: GPU will store the current address and begin processing data at <addr>.
RETURN: GPU will begin executing at the most recently stored CALL address. Multiple levels
of CALL/RETURN might be needed.
JUMP <addr> : GPU will begin executing commands from the <addr>
WRITEBACK <addr> <data> : GPU writes <data> back to <addr>.
7.14.2 GPU Memory mapped registers
Current_addr[31:0] : Vertex Unit is currently reading this address.
Stop_addr[31:0] : Vertex unit will stop processing the data when Current_addr == Stop_addr;
Start_vertex_proc[0:0] : Thjs kicks off the vertex unit to start from Current_addr.
At the start of the frame, the CPU will set Current_addr = start of the GPU push buffer. and
stop_addr = start of the GPU push buffer. Then the CPU sets Start_vertex_proc. Now vertex unit is
ready for data and wait until CPU changes Stop_addr register. Once it changes, the verytex unit
start fetching at Current_addr until Current_addr reaches Stop_addr.

Microsoft Corporation Confidential 40


Version: 0.40 Xenon System Architecture

In the mean time the CPU is continuously updating the Stop_addr as more data is written into the
memory.
Note that if the GPU reaches the Stop_addr it just waits there until the CPU updates the Stop_addr.
There is no need for the Start_vertex_proc to be set again.
7.14.3 Current Implementation
The current implementation algorithm has been described in “Xenon Procedural Geometry”. This
section describes how the GPU might implement it.

GPU Command
CPU
Processor
CPU Core
The Rest of the Stop_addr
GPU Pipeline
Curr_addr

L2 Cache

Cpuc Fifo 0
F F
Coherency S S
Cpuc Fifo 1
B B
Block

Cpuc Fifo 2

Writeback
Mem Cntl registers

1) Write GPU register


2) GPU writeback to CPU reg
DRAM 3) GPU Data Request to Coherency
GPU push 4) GPU request to CPU L2 Cache
buffer 5) CPU Cache line castout
6) GPU request to DRAM
7) DRAM returns data

The seven different actions that are going on between CPU, GPU and memory are indicated in the
diagram above. They are:
1) CPU core writes to a memory mapped register in GPU
2) GPU writeback to a memory mapped CPU register. This is caused by the WRITEBACK
command described in an earlier section. This would be used for updating CPUC fifo tail
pointer and GPU push buffer tail pointers. For the tail-pointer writebacks the CPU implements a
section of cacheable memory directly on its die. There are memory mapped registers in the
CPU.
3) GPU read data request to the coherency block. The coherency block decides whether the data
is to be read from main memory or CPU’s L2 and directs the request accordingly which is
shown in 4) and 6)
4) Coherency block determined that the data is CPU’s L2 and makes a request for cache line
castout.

Microsoft Corporation Confidential 41


Version: 0.40 Xenon System Architecture

5) CPU gives back the Cache line castout data and is routed to the vertex unit as a response to 3)
6) Coherency block determined that the data requested by Vertex Unit is in Main memory and
makes a request to main memory.
7) Main memory gives the data back to vertex unit in response to 3).
This section is not trying to describe the coherency algorithm which is described elsewhere. We
have put in coherency module just to show how the data might flow depending on the architecture
of the GPU and how it will be modified to have the coherency block.
Note that the only GPU initiated transaction for which the CPU responds is a coherency
transaction. In 3) above the coherency block handles the procedural geometry FIFO data
differently. For the procedural geometry FIFOs, the CPU write-no-allocates these in a physical
address range that doesn't exist. When the GPU comes to read these vertices, the coherency
block in the GPU sees that the CPU has these addresses as dirty and issues a castout/invalidate.
The GPU now needs to know to not let that data go out to memory, but to suck the return data from
the CPU up directly.
For the writebacks described in 2) above, the GPU when it determines that the CPU owns this
writeback address (owning meaning that this tag is valid in the CPU's L2), the castout/reload
command that the GPU now generates causes this writeback data to update this memory on the
CPU die. The castout/reload also has the effect of causing the CPU's L1/L2 to refetch this data, but
this now gets read from this cacheable memory block, rather than from main memory.
The reason for the GPU fetching vertex data from the CPU cache is that there is a very high
bandwidth of data (on the order of 16GB) that would damage system performance if it were stored
to memory by the CPU and subsequently read by the vertex processor.
The reason for the GPU to write values of the FIFO tail pointers into memory mapped in the CPU is
that the algorithm requires the CPUC to know very quickly that the GPU is done processing a
certain block of data in the L2 and that the CPUC can reuse that block. The CPUC thread will be
spin waiting on the tail pointer and if that data were in main memory, there would be a latency issue
and a FSB bandwidth issue.
7.14.4 Requirements
• The output of the procedural geometry would be mostly inline tristrip flexible vertex format
data types or any format that is supported by streaming.
• The following could be a possible proposal for the format of inline vertex data.
All inline vertex data would be split into blocks, with each block beginning with a
single 32-bit DWORD header. The header would encode four possible instructions:

0 – This is the last vertex in the mesh.


1 – n inline vertices follow (where n is encoded in the instruction).
2 – Use the i'th previous vertex from the post-transform vertex cache (where i is
encoded in the instruction). The header for the next vertex immediately follows.
3 – NOP. The header for the next vertex immediately follows.
So, for example, if we had 4 inline 32-byte vertices followed by one re-used vertex, the
total size would be 4+4*32+4 = 136 bytes.
7.14.5 Notes
GPU XPS Reads / Writes 1-13-04
PG reads as requested by the Memory Hub (MH) are processed by the BIU by issuing a 128 byte (aligned) read to the
FSB/CPU. This occurs as a request command on the FSB "transmit" interface. It is the only Read operation issued to
the FSB/CPU from the BIU.

The FSB/CPU responds to the BIU with a response command and 128 bytes of data on the FSB "receive" interface.

Microsoft Corporation Confidential 42


Version: 0.40 Xenon System Architecture

The BIU forwards the response data to the MH.

Request/responses are linked by a tag as the CPU does not guarantee ordering if multiple reads are outstanding.

No coherency operation (flush) is issued for the PG reads.

7.15 System Debug Facilities (?)

7.15.1 Deadbox Recovery


7.15.2 Low Level Debug
Low level hardware interfaces for figuring out really really hard bugs, or for developing embedded ROM code, or
programming registers which are not accessible for security reasons (and how we close those holes later).
CPU
GPU
SMC
JTAG
7.15.3 Development Systems
How we program this thing

7.16 System Bandwidth / Latency Roll-Up (Nick)

7.17 Error Conditions

7.17.1 BSB
2-6-04 BSB Completions
Unsupported Request is generated when the GPU receives a request that it does not recognize. Badly formed or
corrupted packets always generate UR (bad CRC, header, etc), and any request received while the GPU’s
BUS_MASTER_ENABLE bit is cleared always generate UR. Other than that, UR is the ‘else’ in a big if;else if;else
if;else statement. It’s therefore easier to tell you what won’t generate a UR than what will.

In production mode, with GPU’s BUS_MASTER_ENABLE bit set:


- Memory writes to the interrupt register won’t generate a UR
- Memory reads and memory writes below top_of_memory won’t generate a UR

In prototype mode, with GPU’s BUS_MASTER_ENABLE bit set:


- Memory writes to the interrupt register won’t generate a UR
- Memory reads and memory writes below top_of_memory won’t generate a UR
- 4 byte memory reads and 4 byte memory writes to nb or gc MMIO won’t generate a UR
- non 4 byte memory reads and non-4 byte memory writes to nb or gc MMIO generate CA (completer abort)
- 4 byte type 0 configuration reads and 4 byte type 0 configuration writes to device 1 or 2, function 0, won’t
generate a UR
- non 4 byte type 0 configuration reads and non 4 byte type 0 configuration writes to device 1 or 2, function 0,
generate CA

I used the tables on page 365, 366 of PCI Express System Architecture by Mindshare, Inc to decide between UR and
CA for invalid requests. Specifically, requests that do not reference address space mapped within the device are UR, and

Microsoft Corporation Confidential 43


Version: 0.40 Xenon System Architecture

requests that violate programming rules for a device are CA. CA will only happen for the specific cases listed above in
prototype mode.

7.18 Reliability

2-6-04 Memory Reliability


The quick answer is that 256Mbytes DRAM memory system will experience a single bit soft error between 2 to 4 times
a year, that’s why computer servers still implement ECC. I am sure GDDR manufacturers characterize their chips for
susceptibility to soft errors, so the data for new technologies should come from them (Michael?).

In general the sources for errors are: radioactive isotopes (in package and PCB materials), cosmic rays and UFOs

Microsoft Corporation Confidential 44


Version: 0.40 Xenon System Architecture

8 Low Level Software Architecture (MarcW?)


8.1 Flash Resident Drivers

8.2 BIOS

8.3 Low Level Drivers

Memory
Audio

8.4 Network Stack

8.5 Procedural Geometry

8.6 Video Mode Selection

Microsoft Corporation Confidential 45


Version: 0.40 Xenon System Architecture

9 Other
9.1 Not Covered

This section states what is not covered in this document, but should be covered elsewhere
• APIs
• Better-together
• Remoting devices
• Video-on PC
• Media device.
• Network security. All done in application layer. Require system to be secure.
• Peripherals (cameras, etc.)
• Performance abtraction for HDD.
• Where do I put my stuff without having to buy an MU.
• Mass-storage performance abstraction: say what should be included in software specification.

Microsoft Corporation Confidential 46

Вам также может понравиться