Вы находитесь на странице: 1из 31

Techniques used to Partition a Complex SoC

into a Multi- HAPS-70 System

Sivarama Prasad Valluri &


Ramanan Sanjeevi Krishnan
Nvidia

March , 2016
Santa Clara

SNUG 2016 1
Agenda

Introduction
Prototyping Flow with Certify
How to address prototyping challenges
Results
Prototyping flow with HAPS ProtoCompiler
Results
Conclusion
SNUG 2016 2
Introduction
Project Overview

SNUG 2016 3
Introduction
Project overview

15% 40%

25%

SNUG 2016 4
Introduction
Why FPGA Prototyping
Only platform working with real world devices
Faster software development and SW regression
Validation platform for HW teams
Thoroughness testing by QA teams
Silicon bring-up preparation by all teams

SNUG 2016 5
Introduction
Requirements of Prototyping Tegra SoC
Accurate Prototype
Quicker Prototype to enable SW early
Quicker the first time
Quicker in subsequent RTL drops
Faster Prototype to iterate faster SW development
An order of magnitude faster than Emulation
Prototype to cover most of the chip
To do Kernel boot on Multi-Processor Setup
Support all Low Speed IOs (I2C, SPI, etc) and High Speed IOs (USB-SS mode, HDMI, e-
MMC) at actual speeds

SNUG 2016 6
Prototyping Flow

SNUG 2016 7
Prototyping Flow
History Old Flow and Issues faced
Used Nvidia Internal legacy Boards with mix of V5, V6 and V7
Fixed Routing in legacy boards couldnt prototype full design (especially the memory
controller)
Maintain many complex scripts to check connectivity
Used manual partition
Had to create Manual FPGA wrappers very difficult repartitioning
Maintain complex pin loc generation scripts
Used manual insertion of Serdes and MUXes for TDM
Higher effort to run sims
Always in debug mode

SNUG 2016 8
Prototyping Flow
Current Flow with Certify & HAPS
RTL+ FDC + Board Files

Pre-Partition

Partitioning Synplify Premier


Certify
+ Xilinx Vivado
CPM

Trace
Assignment

SLP Generation
Synopsys HAPS
Synthesis Synthesis Synthesis
P&R P&R P&R
FPGA1 FPGA2 FPGAn

Prototyping Platform
SNUG 2016 9
Prototyping Flow
Current Flow with Certify & HAPS Advantages seen
HAPS-70
Able to prototype full memory controller supporting all clients
Simple utilities to check the connectivity and speed of connectivity
Used guided partition using Certify
Individual FPGA wrappers created automatically able to explore different partitions
FPGA pin loc generation is abstracted with trace assignment
Used proven HSTDM feature in Certify for pin multiplexing
Able to replicate setups easily across geographies
Overall
Small improvement in schedule. Improve significantly for subsequent projects
Quality of the prototype was better
Prototype was more accurate
SNUG 2016 10
Partition Challenges

SNUG 2016 11
Initial Partition Approach

First time partition


Ran area estimation
Partitioned design based on
Design-hierarchies and IP area/size
External Interface proximity
Layout of the multi-HAPS system

SNUG 2016 12
Interconnect Problem
Importance of a good partition and TDM to reduce interconnects
Huge number of interconnects(ICs) between FPGAs
@W: CU603 |Actual I/O count(1558) after CPM exceeds the total I/O count(1200) for device <>
@W: CU603 |Actual I/O count(3794) after CPM exceeds the total I/O count(1200) for device <>
@W: CU603 |Actual I/O count(14677) after CPM exceeds the total I/O count(1200) for device <>

@W: CU603 |Actual I/O count(13876) after CPM exceeds the total I/O count(1200) for device <>
@W: CU603 |Actual I/O count(20724) after CPM exceeds the total I/O count(1200) for device <>
@W: CU603 |Actual I/O count(26143) after CPM exceeds the total I/O count(1200) for device <>

Approach
Tweak partition to reduce interconnects
Additionally used TDM to address this
Used Certifys HSTDM
Used HSTDM qualification criteria as flop-2-flop(F2F)

SNUG 2016 13
Partition Attempts to Reduce the ICs
Moving logic to minimize interconnects
Moving blocks with signals going from one module to another and coming back.

FPGA A FPGA B
To
M1 M1 M2 256
M2 FPGA
C
To
FPGA 300
C

300

300

SNUG 2016 14
Partition Attempts to Reduce the ICs(2)
Minimize interconnects that are not qualified for TDM
Observing considerable number of non F2F ICs going across multiple FPGAs
which can not be HSTDMed

Approach
Design insight from IP team
Identified combinational buses running across multiple IPs across FPGAs
Moved all the corresponding logic into single FPGA

SNUG 2016 15
Addressing FPGA Clock Crossings
Identifying clock crossings
Inter-FPGA Clock Crossings
Introduces clock skew in destination
FPGA
Can affect the functionality
Approach
Used automation to address the
following
Using HDL Analyst(find/expand
commands)
Populated the list of clock crossings
and their loads into logs
Fix them by replicating the clock
generation logic

SNUG 2016 16
Addressing FPGA Clock Crossings
Fix clock crossings replicate the clock gen logic
Clock
Crossing

FPGA 1
Full Design FPGA 2
CLK
CLK
IP 1 IP 1-
clkge
n
CLK_ip1
IP 1 CLK
Part2

R1 ...
... RmRm1
R1
1
1

R2 R1 ... RmRm2
...
2
2
.
. . .
.
. . .
.
. . .

Rn R1 ... RmRmn
n n

SNUG 2016 17
So Many Partition Trials Any Simpler Way?
Create tools to track/qualify multiple trials
Time taken to do the partitioning change and to check the impact on the ICs
and clock crossings
Iterative process More than one run needed
Manual runs Not efficient & prone to human errors
UI Not the most efficient way as human intervention needed.
Batch mode How to check the impact?

SNUG 2016 18
So Many Partition Trials Any Simpler Way?
Create tools to track/qualify multiple trials
Approach
Used automation to do the following
Apply partition file on the design
Generate an excel sheet with interconnection matrix and calculated connector count info
Check clock crossings and SRP file analysis
E-mail report

SNUG 2016 19
TDM Challenges

SNUG 2016 20
Dynamic Slack Based HSTDM
Use multiple TDM ratios to optimize performance
Selection of appropriate HSTDM ratios
No single button flow for HSTDM selection based on slack
Not possible to hand-pick the HSTDM ratios based on slack for 50k+ signals

Approach
Developed a script to do the slack-based HSTDM placements
Certifys scripting friendly flow allows us to do this
Script applies the HSTDM ratio based on slack
Applies higher HSTDM ratios for slow signals and lower HSTDM ratios for fast
signals
Optimizes the number of ICs with clean timing

SNUG 2016 21
Results

SNUG 2016 22
Results
Project results
Kernel booted much ahead of the tape-out
enabled early SW development ~4 Months before the tape out
Kernel booted on Multi-Processor setup
SW able to execute inter-cluster tests like cache-coherency tests
Kernel boot time 10x faster than for the Emulation Kernel boot time
Able to run the interfaces at speed for driver development

SNUG 2016 23
Prototyping Flow with HAPS ProtoCompiler
Evaluation

SNUG 2016 24
Prototyping Flow with HAPS ProtoCompiler
Overview

HAPS ProtoCompiler

HAPS-70
HAPS-80 HAPS-80

SNUG 2016 25
Prototyping Flow with ProtoCompiler
HAPS ProtoCompiler Vs Certify in Various Stages
Compiler
RTL diagnostics
Improved compiler runtime

Pre-partition
Distributed Processing technology

Partition
Abstract Tss flow
Quick iteration time and reporting
Switches to enable clock gate replication

SNUG 2016 26
Prototyping Flow with ProtoCompiler
HAPS ProtoCompiler Vs Certify in various stages
System Route
Integrated Slack based HSTDM(push
button flow for HSTDM placement)

System generate
RTL lists per partitions

Others
New Database structure

SNUG 2016 27
Results
Evaluation results
Able to improve the performance upto ~15%
Significant reduction in runtime and partitioning effort

Stage Certify HAPS ProtoCompiler


RTL Diagnostics 7.5h 30-45min

Compiler 7.5h 3-4h


Area-Estimation 20+h 4-5h

Partition stage(Each 1.5h 30 min


iteration)

SNUG 2016 28
Conclusion

SNUG 2016 29
Conclusion
Synopsys Prototyping Solution(Certify + HAPS), helped make our project successful

Certify with its rich set of features helped in partitioning the multi-million gate
design into FPGA prototyping system constituting of multiple HAPS boards
Kernel booted much ahead of the silicon arrival
Kernel boot time 10x faster than for the Emulation Kernel boot time
Able to run the interfaces at speed for driver development

HAPS ProtoCompiler adds multiple features and capabilities that can address
the complex prototyping challenges that we experienced in the past
Able to improve the performance up to 15%
Significant reduction in runtime and partitioning effort

Thanks to Nvidia Management & Synopsys ACs & Support Teams!!


SNUG 2016 30
Thank You

SNUG 2016 31

Вам также может понравиться