Вы находитесь на странице: 1из 32

Introduction to Post-Silicon Validation

Nagib Hakim nagib.hakim@intel.com


Acknowledgment: Rand Gray and Monica Martinez Canales

Platform Validation Engineering Intel Corporation

Agenda

Validation Domains and Characteristics Post-Si Focus Areas and Methods Observability / Controllability / Survivability Ongoing challenges in Validation

October 29, 2010

Introduction to Post-Si Validation

Differences between Validation and Test


Validation Goal: Time scale Medium Means Does the design meet the intent? Check design correctness. Weeks / Months Platform / Tester Test patterns, software, logic analyzers, oscilloscopes, DFT, DFV, etc. Small production sample. Test Does the product work as designed? Check manufacturing correctness. Seconds Tester Test patterns, DFT

Volume

Every manufactured die.

October 29, 2010

Introduction to Post-Si Validation

Validation Domains & Characteristics


First Silicon Samples Qualified Si & Platform

Pre-Si (simulation)
Cycle poor Strengths Accurate logic behavior 98% of logic bugs found 90% of circuit bugs found Straightforward debugging Inexpensive bug fixing Limitations Little platform level interaction Not real time

Post-Si (platform)
Cycle rich Strengths Actual target platform 2% of logic bugs found 10% of circuit bugs found Limitations Difficult debugging Expensive bug fixing

Volume Ramp

Launched platform Highest cost bug fixes Bugs need survival strategy

Bugs decline in number over development cycle, but grow in cost!


October 29, 2010 Introduction to Post-Si Validation 4

Where Bugs Are Found

Functional bugs (AKA logic bugs)


Exist in all manufactured parts (1M DPM) 98% found before tape out, 2% post-silicon

Circuit bugs
Not all parts exhibit failures (<1M DPM) Variable with V/T/F, process, and component age Computation limits to model simulation (non-real time) limits extent of variation combinations 90% found pre-Si; 10% post-silicon

October 29, 2010

Introduction to Post-Si Validation

Post-Si Focus Areas


Complementary to pre-Si Exploit post-Si benefits


lots of cycles available Platform level interactions

Instruction set architecture (ISA) and features Memory subsystem/hierarchy Platform power state transitions I/O concurrency I/O margin characterization Core circuit bug hunting

October 29, 2010

Introduction to Post-Si Validation

Platform Validation Infrastructure

Tests generated in server farm

Failures debugged using logic analyzer

Tests executed on validation platform


October 29, 2010 Introduction to Post-Si Validation 7

Functional Bug Hunting

ISA architecture/micro-architecture testing based upon biased random schemes


Random generation of instructions Checking based upon architectural simulation CPU core intensive High throughput of random tests but typically low I/O stress Random power state transition injection

Feature-oriented directed/random tests


Paging, TLB Virtualization

October 29, 2010

Introduction to Post-Si Validation

Random Instruction Testing


Wide coverage for CPU / GPU cores Strengths


Generate Sequence of Random Instructions Architectural Simulation Load and Run on Platform

Finds subtle uarch bugs Stresses CPU pipeline boundary conditions Good at finding micro-code bugs High throughput core testing

Compare results

Limitations

Match?

Yes

Low I/O stress No Need to complement with memory subsystem tests Debug failure Requires servers to generate instruction seeds
October 29, 2010 Introduction to Post-Si Validation 9

Memory Subsystem Validation

Random & directed/random memory test strategy Memory channel intensive Based upon multi-core and multi-processor configurations Target is standard Symmetric Multi-Processor attributes
Cache coherency, consistency, and synchronization Memory ordering
Introduction to Post-Si Validation 10

October 29, 2010

I/O Concurrency

Strategy: simultaneously load all platform busses


QuickPath Interconnect DDR3 Memory channels PCI Express Gen2 USB, SSD SATA / PATA Display

Use made of directed/random and biased random test generation Test cards used to provide determinism (e.g., instead of a variable latency peripheral such as a disk) as reproducibility is required for diagnosis
Introduction to Post-Si Validation 11

October 29, 2010

Compatibility & Performance

Use of industry standard operating systems, applications, and peripherals to verify:


Platform and component behavioral correctness Legacy compatibility of O/S, applications, and peripherals Mobile and desktop client systems Blade and enterprise level server systems Fully integrated platforms: CPU, chipset, BIOS, O/S, applications

Test configuration models end-users


Highly stressful configurations hunting for both functional and performance bugs
Introduction to Post-Si Validation 12

October 29, 2010

Finding Circuit Bugs


Circuit bugs appear as DPM: not all die behave the same way Taxonomy
Timing convergence bugs
Speedpath: circuit operates too slow Min-delay: circuit operating too fast Race: circuit fails due to timing of multiple converging signals Primarily occur in I/O buffers, PLLs, and thermal sensors Silicon doesnt operate in accordance with predicted (simulated) circuit behavior

Analog bugs
-

Fundamentals for circuit bug hunting


Need a sufficiently large population of devices Need to vary environmental conditions Need to stimulate stressful system behavior Stimulus is generally functional; failures look just like functional failures
Introduction to Post-Si Validation 13

October 29, 2010

Circuit Bug Root Causes

On-die signal integrity


Cross-coupling induced noise Droop-event induced noise High dynamic current events Often due to clock gating

Power delivery integrity


Clock domain crossing Process, Voltage, Temperature


Power state transitions Silicon process variation

October 29, 2010

Introduction to Post-Si Validation

14

Ideal Operating Range

Voltage

Frequency

Temperature

Ideally, silicon operates in well-defined volume


Min and max corners defined in a manufacturers specification Uniform over voltage, frequency, temperature, process, and time But what really happens is a bit different
Introduction to Post-Si Validation 15

October 29, 2010

The 2D View of Ideal


Vcc (max)

2D view for simplicity Other factors


Temperature Silicon variability Component age

Vcc (min) F (min) F (max)

October 29, 2010

Introduction to Post-Si Validation

16

Speedpaths
Vcc (max)

Circuit slows down as Vcc decreases Failure disappears


As V increases or As F decreases

Vcc (min) F (min) F (max)

Historically highest % of CPU circuit issues


17

October 29, 2010

Introduction to Post-Si Validation

Min-delays
Vcc (max)

Failure when circuit is too fast Failure disappears


As V decreases or As F increases

Vcc (min) F (min) F (max)

Hard to fix

October 29, 2010

Introduction to Post-Si Validation

18

Shmoo Holes/Cracks
Vcc (max)

Voids within the window Often intermittent Multiple clock domains


Skew within same domain Skew/jitter across domains
19

Vcc (min) F (min) F (max)

October 29, 2010

Introduction to Post-Si Validation

Finding Circuit Marginalities


Exercise in platform-based silicon characterization Method is stress-to-fail (increase FMAX to failure) Stimulus is directed/random
Victim/attacker patterns Software load driven power variation Injected power state transitions Randomized instructions, memory configurations, architectural events

Characterize before/after burn-in (simulate aging) Characterize over large populations to understand silicon variability

October 29, 2010

Introduction to Post-Si Validation

20

I/O Margin Characterization

Stimulus includes
Victim/attacker patterns Resonance stimulus Other noise generators (e.g., dynamic CPU core loads) VCC and timing margined to fail (find extents of eye diagram) Incorporates systematic 3D variation (shmooing) of voltage, temperature, and frequency Incorporates skewed silicon (varied process parameters) and skewed circuit boards (varied trace impedance)
21

October 29, 2010

Introduction to Post-Si Validation

Post-Si Debug Challenges

Basic observability is package pins


Signal observability (higher integration SoC) Probing scope Probing signal integrity

Trend is toward lower observability


Integration increasing towards SoC

Functional and circuit issues require different solutions

October 29, 2010

Introduction to Post-Si Validation

22

Observability / Control / Survivability Architecture


Registers

Stim
Mux

ubreakpt
Tracer

Trace Buffer

TAP
October 29, 2010 Introduction to Post-Si Validation 23

Boundary SCAN (AKA JTAG)


Boundary Scan Cell

TAP Test Access Point

October 29, 2010

Introduction to Post-Si Validation

24

Example Boundary Scan Usage


// Fictional example: // Read a counter in a certain design block // on a certain boundary scan chain // First select a desired design block irscan(MY_CHAIN, SELECT_BLOCK_OPCODE) // scan in block number stored in 4 bits. drscan(MY_CHAIN , 4, BLOCK_NUMBER) // now select the desired counter out of 100 counters // 7 bits used to select counter // This operation also resets counter to 0 irscan(MY_CHAIN, COUNTER_SELECT_OPCODE) drscan(MY_CHAIN, 7, COUNTER_ID) // Wait until something triggers // Read Counter Value irscan(MY_CHAIN, READ_COUNTER_OPCODE) drscan(MY_CHAIN, 32 , 0x0, Counter_Value)

October 29, 2010

Introduction to Post-Si Validation

25

ValidationChallenges
Specs TestPlans

Done? Y

N Tests

Confirmedfailures

FaultIsolation

Fix (Siliconrevision, Code path)

Bugs

Debug

Qualification

26

ValidationChallenges
Specs

Challenge:Consistent,Comprehensive TestPlandevelopment
TestPlans

Done? Y

N Tests

Confirmedfailures

FaultIsolation

Fix (Siliconrevision, Code path)

Bugs

Debug

Qualification

27

ValidationChallenges
Specs TestPlans

Challenge:Test superset/subset
N Tests FaultIsolation Fix (Siliconrevision, Code path) Confirmedfailures

Done? Y

Bugs

Debug

Qualification

28

ValidationChallenges
Specs TestPlans

Done? Y

N Tests

ConfirmedFailures

Challenge:Triaging

FaultIsolation

Fix (Siliconrevision, Code path)

Bugs

Debug

Qualification

29

ValidationChallenges
Specs TestPlans

Done? Y

N Tests

Confirmedfailures

Challenge:Manual FaultIsolation&Debug
Fix (Siliconrevision, Code path)

FaultIsolation

Bugs

Debug

Qualification

30

ValidationChallenges
Specs TestPlans

Done? Y

N Tests

Confirmedfailures

FaultIsolation

Fix (Siliconrevision, Code path)

Bugs

Debug

Qualification

Challenge:Whenarewedone?

31

References
[1] IntelCorp.2003.IntelPlatformandComponentValidation, http://download.intel.com/design/chipsets/labtour/PVPT_WhitePaper.pdf [2] BentleyB.andGrayR.2001.ValidatingthePentium4Processor.Proceedingsofthe38thannual DesignAutomationConference,LasVegas,Nevada,UnitedStates:ACM,2001,pp.244248. [3] Silas,I., Frumkin,I.,Hazan,E., Mor,E.,andZobin,G., SystemLevelValidationoftheIntel Pentium MProcessor,IntelTechnologyJournal,Vol.7,Issue2,May2003 URL:http://developer.intel.com/technology/itj/index.htm [4] Gray,R.2008PostSiliconValidationExperience:History,Trends,andChallenges.GSRCWorkshopon PostSi Validation,Anaheim,June9,2008. [5] Patra,P2007.OntheCuspofaValidationWall.Design&TestofComputers,IEEE,24(2),193196. [6] Keshava,J.2009PostSiliconValidationChallenges.InternationalTestConference2009,ITC'09,Nov.1, 2009 [7] Tiruvallur K.2009.BeyondDesign.Challenges ofIAPlatformProductization.InternationalConference onComputerAidedDesign2009,Nov.2009. [8] Yerramilli,S,2006.OntheNeedforConvergenceBetweenDesignValidation andTest.InInternational TestConference,2006,ITC'06,2006,p14. [9] S.Tasiran andK.Keutzer.Coveragemetricsforfunctionalvalidationofhardwaredesigns.IEEEDesign &TestofComputers,18(7):3645,Jul/Aug2001. [10] Bojan,T.,AguilarArreola,M.,Shlomo,E.,andShachar,T.2007.Functionalcoveragemeasurements andresultsinpostSiliconvalidationofCoreT 2Duofamily.InProceedingsofthe2007IEEEinternational HighLevelDesignValidationandTestWorkshop(November07 09,2007).HLDVT.IEEEComputerSociety, Washington,DC,145150. [11] Nejedlo,J.;Khanna,R.;"Intel IBIST,thefullvisionrealized,"TestConference,2009.ITC2009. International,vol.,no.,pp.111,16Nov.2009 doi:10.1109/TEST.2009.5355667 [12] Park,S.andMitra,S.2008.IFRA:instructionfootprintrecordingandanalysisforpostsiliconbug localizationinprocessors,"Proceedingsofthe45thannualconferenceonDesignautomation,Anaheim, California:ACM,2008,pp.373378. [13] Abramovici,M.,"Asiliconvalidationanddebugsolutionwithgreatbenefitsandlowcosts," InternationalTestConference,2007.ITC2007.IEEEInternational,2007,p.1.

Вам также может понравиться