You are on page 1of 370

Designing for Performance

fpga23000-10-wkbf-rev1

Designing for Performance

Xilinx is disclosing this Document and Intellectual Property (hereinafter the Design) to
you for use in the development of designs to operate on, or interface with Xilinx FPGAs.
Except as stated herein, none of the Design may be copied, reproduced, distributed,
republished, downloaded, displayed, posted, or transmitted in any form or by any means
including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written consent of Xilinx. Any unauthorized use of the Design may violate
copyright laws, trademark laws, the laws of privacy and publicity, and communications
regulations and statutes.
Xilinx does not assume any liability arising out of the application or use of the Design; nor
does Xilinx convey any license under its patents, copyrights, or any rights of others. You
are responsible for obtaining any rights you may require for your use or implementation of
the Design. Xilinx reserves the right to make changes, at any time, to the Design as
deemed desirable in the sole discretion of Xilinx. Xilinx assumes no obligation to correct
any errors contained herein or to advise you of any correction if such be made. Xilinx will
not assume any liability for the accuracy or correctness of any engineering or technical
support or assistance provided to you in connection with the Design.
THE DESIGN IS PROVIDED AS IS" WITH ALL FAULTS, AND THE ENTIRE RISK AS
TO ITS FUNCTION AND IMPLEMENTATION IS WITH YOU. YOU ACKNOWLEDGE
AND AGREE THAT YOU HAVE NOT RELIED ON ANY ORAL OR WRITTEN
INFORMATION OR ADVICE, WHETHER GIVEN BY XILINX, OR ITS AGENTS OR
EMPLOYEES. XILINX MAKES NO OTHER WARRANTIES, WHETHER EXPRESS,
IMPLIED, OR STATUTORY, REGARDING THE DESIGN, INCLUDING ANY
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS.
IN NO EVENT WILL XILINX BE LIABLE FOR ANY CONSEQUENTIAL, INDIRECT,
EXEMPLARY, SPECIAL, OR INCIDENTAL DAMAGES, INCLUDING ANY LOST DATA
AND LOST PROFITS, ARISING FROM OR RELATING TO YOUR USE OF THE
DESIGN, EVEN IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES. THE TOTAL CUMULATIVE LIABILITY OF XILINX IN CONNECTION WITH
YOUR USE OF THE DESIGN, WHETHER IN CONTRACT OR TORT OR OTHERWISE,
WILL IN NO EVENT EXCEED THE AMOUNT OF FEES PAID BY YOU TO XILINX
HEREUNDER FOR USE OF THE DESIGN. YOU ACKNOWLEDGE THAT THE FEES, IF
ANY, REFLECT THE ALLOCATION OF RISK SET FORTH IN THIS AGREEMENT AND
THAT XILINX WOULD NOT MAKE AVAILABLE THE DESIGN TO YOU WITHOUT
THESE LIMITATIONS OF LIABILITY.
The Design is not designed or intended for use in the development of on-line control
equipment in hazardous environments requiring fail-safe controls, such as in the
operation of nuclear facilities, aircraft navigation or communications systems, air traffic
control, life support, or weapons systems (High-Risk Applications). Xilinx specifically
disclaims any express or implied warranties of fitness for such High-Risk Applications.
You represent that use of the Design in such High-Risk Applications is fully at your risk.
2008 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated
brands included herein are trademarks of Xilinx, Inc. PCI, PCIe and PCI Express are
trademarks of PCI-SIG and used under license. The PowerPC name and logo are
registered trademarks of IBM Corp. and used under license. All other trademarks are the
property of their respective owners.

Facilitator Guide

Table of Contents

Table of Contents
INTRODUCTORY MATERIAL
Getting Started

vi

About This Guide

vi

The Program in Perspective

ix

Program Preparation

Training At A Glance

xii

Quick Reference Material

QR-1

MODULES
Course Agenda

Course Agenda

Review of Fundamentals of FPGA Design


Review of Fundamentals of FPGA Design
Apply Your Knowledge Answers
Designing with Virtex-5 FPGA Resources

8
9
10
13

Introduction

14

Overview

15

I/O

21

Block RAMs and FIFO

32

XtremeDSP Solution Cores

42

Other Features

53

Summary

62

Apply Your Knowledge Answers

65

CORE Generator Software System

67

Introduction

68

Overview

69
www.xilinx.com
1-877-XLX-CLAS

Page i

Table of Contents

Facilitator Guide

Using the CORE Generator Software System

73

CORE Generator Software Design Flows

77

Summary

81

Apply Your Knowledge Answers

83

Lab 1: CORE Generator Software System


Lab

84
85

Designing Clock Resources

86

Introduction

87

Overview

88

Clock Management Tile

90

Clock Networks

109

Summary

118

Apply Your Knowledge Answers

121

Lab 2: Designing Clock Resources


Lab

125
126

FPGA Design Techniques

127

Introduction

128

Duplicating Flip-Flops

129

Pipelining

133

I/O Flip-Flops

141

Synchronization Circuits

143

Summary

151

Apply Your Knowledge Answers

153

Synthesis Techniques

154

Introduction

155

Achieving Breakthrough Performance

158

Synthesis Options

166

XST Synthesis Options

177

Summary

179

Page ii

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Table of Contents

Apply Your Knowledge Answers


Lab 3: Synthesis Techniques

181
182

Lab

183

Day One Summary

184

Day One Summary

185

Course Agenda Day Two

191

Course Agenda Day Two

192

Achieving Timing Closure

195

Introduction

196

Timing Reports

198

Interpreting Timing Reports

205

Report Options

214

Summary

220

Apply Your Knowledge Answers

222

Lab 4: Review of Global Timing Constraints


Lab

223
224

Timing Groups and OFFSET Constraints

225

Introduction

226

Overview

227

Creating Groups

233

OFFSET Constraints

243

Summary

250

Apply Your Knowledge Answers

252

Path-Specific Timing Constraints

253

Introduction

254

Inter-Clock Domain Constraints

256

Multicycle Paths

262

False Paths

267

Miscellaneous Constraints

273
www.xilinx.com
1-877-XLX-CLAS

Page iii

Table of Contents

Facilitator Guide

Summary

277

Apply Your Knowledge Answers

279

Lab 5: Achieving Timing Closure

281

Lab

282

Advanced Implementation Options

283

Introduction

284

Overview

286

Advanced MAP and Place & Route Options

288

Xplorer

294

SmartGuide and Partitions

299

Power Optimization

304

Summary

306

Apply Your Knowledge Answers

308

Lab 6: Designing for Performance

309

Lab

310

Power Estimation

311

Introduction

312

Overview

313

XPower Estimator

318

Using the XPower Analyzer Software

321

Summary

326

Apply Your Knowledge Answers

328

Lab 7: FPGA Editor Demo

329

Lab

330

ChipScope Pro Software

331

Introduction

332

Importance of Debug

334

ChipScope Pro Software Cores

336

Design Flows

342

Page iv

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Table of Contents

Summary

344

Lab 8: ChipScope Pro Software

346

Lab

347

Course Summary

348

Course Summary

349

Appendixes
Appendix A: Basic HDL Coding Techniques*

A-1

Appendix B: Spartan-3 FPGA HDL Coding Techniques*

B-1

Appendix C: Virtex-5 FPGA HDL Coding Techniques*

C-1

Appendix D: Synthesis Techniques*

D-1

Inferring Logic and Flip-Flop Resources

D-2

Inferring Memory

D-14

Inferring I/Os and Global Resources

D-22

Inferring DSP48 Resources

D-32

Appendix E: Spartan-3E FPGA 1600E MicroBlaze Processor Development


Kit Demo Board Introduction*

E-1

* Not included in the printed workbook, but available via


ftp://ftp.xilinx.com/pub/documentation/education/fpga23000-10-rev1xlnx_lab_files.zip

www.xilinx.com
1-877-XLX-CLAS

Page v

Getting Started

Facilitator Guide

Getting Started
About This Guide
Whats the Purpose of This Guide?
This facilitator guide provides a master reference document to help
you prepare for and deliver the Designing for Performance course.
What Will I Find in the Guide?
This facilitator guide is a comprehensive package that contains
!

The course delivery sequence

Checklists of any necessary materials and equipment

Presentation scripts and key points to cover

Instructions for managing exercises, case studies, and other


instructional activities

How Is This Guide Organized?


This section, Getting Started, contains all of the preparation
information for the Designing for Performance course, such as
learning objectives, prework, required materials, and room setup.
Following this section is the Training At A Glance table. This
table can serve as your overview reference, showing the module
names, timings, and process descriptions for the entire program.
Finally, the course itself is divided into modules, each of which is
comprised of one or more lessons. A module is a self-contained
portion of the program, usually lasting anywhere from 20 to 90
minutes, while a lesson is a shorter (typically 5-20 minutes) topic
area. Each module begins with a one-page summary showing the
Purpose, Time, Process, and Lessons for the module. Use these
summary pages to get an overview of the module that follows.

Page vi

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Getting Started

About This Guide, continued


How Is the Text Laid Out in This Guide?
Every action in the program is described in this guide by a text
block like this one, with a margin icon, a title line, and the actual
text. The icons are designed to help catch your eye and draw quick
attention to what to do and how to do it. For example, the icon
to the left indicates that you, the instructor, say something next.
The title line gives a brief description of what to do, and is
followed by the actual script, instruction set, key points, etc. that
are needed to complete the action.
A complete list of the margin icons used in this guide is provided
on the following page.

TRAINER NOTE
You may also occasionally find trainer notes such as this one in the
text of this guide. These shaded boxes provide particularly
important information in an attention-getting format.

www.xilinx.com
1-877-XLX-CLAS

Page vii

Getting Started

Facilitator Guide

About This Guide, continued


Graphic Cues

Overhead

Participant
Workbook

Lab
Exercise

Projected
Image

Key Points

Time

Transition

Flipchart

Handouts

Summary

Module
Process

Break /
Lunch

Group
Activity

Role Play

Where Can
I Learn
More?

Materials
Required

Audio Tape Case Study

Instructional
Game

Answers

To say

Video Tape

Assessment Question &


/ Quiz/Test Answer

Custom 5

Key points

Computer/
CDROM

Tool

Custom 6

VH

Module
Purpose

Page viii

Welcome

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Getting Started

The Program in Perspective


Why a Designing for Performance Course?

Attending the Designing for Performance class will help you create
more efficient designs. This course can help you fit your design
into a smaller FPGA or a lower speed grade for reducing system
costs. In addition, by mastering the tools and the design
methodologies presented in this course, you will be able to create
your design faster, shorten your development time, and lower
development costs.
Learning Objectives

After completing this comprehensive training, you will have the


necessary skills to:
!

Describe a flow for obtaining timing closure

Describe the architectural features of the Virtex-5 FPGA

Describe the features of the Digital Clock Manager (DCM) and


Phase-Locked Loop (PLL) and how they can be used to
improve performance

Increase performance by duplicating registers and pipelining

Describe different synthesis options and how they can improve


performance

Create and integrate cores into your design flow by using the
CORE Generator software system

Run behavioral simulation on an FPGA design that contains


cores

Pinpoint design bottlenecks by using Timing Analyzer reports

Apply advanced timing constraints to meet your performance


goals

Use advanced implementation options to increase design


performance

Program Timing

2 days

www.xilinx.com
1-877-XLX-CLAS

Page ix

Getting Started

Facilitator Guide

Program Preparation
Prerequisites
!

The Fundamentals of FPGA Design course or equivalent


knowledge of
FPGA architecture features
The Xilinx implementation software flow and
implementation options
Reading timing reports
Basic FPGA design techniques
Global timing constraints and the Constraints Editor

Intermediate HDL knowledge (VHDL or Verilog)

Solid digital design background

The following recorded e-Learning modules are recommended


Basic HDL Coding Techniques
Spartan-3 FPGA HDL Coding Techniques
Virtex-5 FPGA HDL Coding Techniques

Required Materials
!

Designing for Performance facilitator guide

PowerPoint files

Instructor Preparation
!

Read through the trainer notes

Read the lab setup guide

Lab Setup

Software Requirements
!

Xilinx ISE Foundation design tools 10.1 SP1, including ISE


Simulator
www.xilinx.com/support/download

Page x

ChipScope Pro tool 10.1 SP1 if you are running the optional
ChipScope Pro Software lab

Synplicity Synplify software 9.2 if you are running the Synplify


version of the Synthesis Techniques lab

Exemplar is no longer supported as part of the course


www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Getting Started

Program Preparation
Lab Files/Data Installed
!

ftp://ftp.xilinx.com/pub/documentation/education/
fpga23000-10-rev1-xlnx_lab_files.zip

Hardware Requirements
Note: The demo board is only required for the optional
ChipScope Pro Software lab
!

PC machine running Windows XP Professional (32-bit) with 2


GB RAM

Spartan-3E FPGA 1600E MicroBlaze processor development


board
User Guide:
www.xilinx.com/support/documentation/boards_and_kits
/ug257.pdf (included in the lab zip file)

USB cable for configuration (Type A to Type B included the


kit)
Platform Cable USB NOT required

Power supply for the Spartan-3 FPGA board (included the kit)

Optional:
Serial Cable (DB9 male/female) for computers with serial
ports or a USB-to-RS-232 adapter cable for computers
lacking a serial port
HyperTerminal or equivalent

Special Instructions
None

www.xilinx.com
1-877-XLX-CLAS

Page xi

Training At A Glance

Facilitator Guide

Training At A Glance
Time

Module

Description

5 minutes

Course Agenda

This module covers the agenda for the course.

15 minutes

Review of
Fundamentals of
FPGA Design

This module reviews the Virtex-5 FPGA


architecture and some of the primary functions
of the ISE tools.

60 minutes

Designing with
Virtex-5 FPGA
Resources

This module describes the latest features of the


newest FPGA from Xilinx.

20 minutes

CORE Generator
Software System

This module describes the basics of designing


with the CORE Generator software.

30 minutes

Lab 1: CORE
Generator
Software System

This lab illustrates how to build a block RAM


memory with the CORE Generator software.

45 minutes

Designing Clock
Resources

This module describes how to design a


complete FPGA clocking scheme.

40 minutes

Lab 2: Designing
Clock Resources

This lab illustrates how to build a multiple clock


system with the ISE Architecture Wizard tool.

40 minutes

FPGA Design
Techniques

This module describes how to build a reliable


and fast FPGA design.

40 minutes

Synthesis
Techniques

This module describes how to synthesize a fast


and efficient FPGA design by using the
advanced capabilities of the synthesis tools.

Page xii

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Training At A Glance

30 minutes

Lab 3: Synthesis
Techniques

This lab illustrates how to synthesize a design


by taking advantage of some of the advanced
synthesis options available in the newest
synthesis tools.

10 minutes

Day One
Summary

This module reviews day one of the course.

5 minutes

Course Agenda
Day Two

This module covers the day two agenda for the


course.

45 minutes

Achieving
Timing Closure

This module describes how to read the Timing


Analyzer reports and use the information to
gain timing closure.

45 minutes

Lab 4: Review of
Global Timing
Constraints

This lab illustrates how to use global timing


constraints and the Timing Analyzer to find the
timing-critical paths of a design and develop a
strategy for gaining timing closure.

45 minutes

Timing Groups
and OFFSET
Constraints

This module describes the best ways to group


path endpoints to make the most efficient pathspecific timing constraints.

45 minutes

Path-Specific
Timing
Constraints

This module describes some of the most


common applications for path-specific timing
constraints and how to make them with the
Xilinx Constraints Editor.

45 minutes

Lab 5: Achieving
Timing Closure

This lab illustrates how to make path-specific


timing constraints on a design and use some of
the advanced implementation options in the
ISE tools.

30 minutes

Advanced
Implementation
Options

This module describes the advanced


implementation options available in the ISE
tools.

30 minutes

Lab 6: Designing
for Performance

This lab illustrates how to improve design


performance and maximize results solely with
advanced implementation options.

30 minutes

Power Estimation This optional module describes the power


estimation capabilities included with the ISE
tools.

www.xilinx.com
1-877-XLX-CLAS

Page xiii

Training At A Glance

Facilitator Guide

30 minutes

Lab 7: FPGA
Editor Demo

This optional demonstration illustrates how to


locate logic, view the contents of an FPGA
design, and insert a probe with the FPGA
Editor.

30 minutes

ChipScope Pro
Software

This optional module describes how to use the


Core Inserter and Core Generator tool flows and
plan for debugging with the ChipScope Pro
software.

60 minutes

Lab 8: ChipScope This optional lab illustrates how to use the


Pro Software
ChipScope Pro software to add the Analyzer
ILA core and prepare for debugging.

10 minutes

Course Summary This module reviews day two of the course and
provides a summary of the course.

Page xiv

www.xilinx.com
1-877-XLX-CLAS

Education Services Quick Reference


Designing for Performance
From the Xilinx Education Services Designing for Performance course.
For more information on Xilinx courses, please visit www.xilinx.com/education.
Synthesis
Tip
Reduce fanout

Action

Timing-driven synthesis

Hierarchy management

Retiming

FSM extraction

Manually duplicate logic and flip-flops


Preferred method over letting the synthesis tool
perform the duplication
Set your synthesis tool to keep the redundant logic
Name duplicate logic _A_B, not _1, _2
Replicate the necessary logic in an effort to build
logic that is in parallel, not serial
Try not to over-constrain
Should increase the size of the design
Optimization across hierarchical boundaries can
make node names change and/or disappear
Makes simulating and debugging later in the
design flow difficult
Only allow this to be done across as few
boundaries as possible and as a last effort to
gain timing closure
Maintain critical nodes with the KEEP attribute (or
equivalent)
Move registers forward/backward along a datapath to
decrease the number of LUTs in series
Can make node names change or disappear
Maintain critical nodes with the KEEP attribute
Optimizes your FSM by re-encoding your design
based on the number of states and inputs
Results can be good, but testing each encoding
technique manually is not difficult and allows
determination of which has the best speed and size

Verify good HDL coding style was used

Poor HDL coding style can add logic levels to any


datapathmake certain that good style was used on
your timing-critical paths (see the HDL Coding Style
Recorded e-Learning modules)

Access Verilog and VHDL language templates

From the Project Navigator menu, select Edit


Language Templates

2008 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and disclaimers are as listed at http://www.xilinx.com/legal.htm.
All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice.

www.xilinx.com
1-800-255-7778

FPGA23000-10-QR (v1.0) June 20, 2008


Quick Reference Card Page 1 of 3

Education Services Quick Reference


Designing for Performance
From the Xilinx Education Services Designing for Performance course.
For more information on Xilinx courses, please visit www.xilinx.com/education.
Reading Timing Reports
Tip
Look for a single long delay

Action

Use the Timing Improvement Wizard in the Timing Analyzer

If a high-fanout net, duplicate the source of the net


If a low-fanout net, try to obtain a better placement
with timing-driven packing or MPPR
If there is no single long delay, the path probably has
too many logic levels (go back to synthesis or
pipeline the datapath)
Click the Wizard icon when a constraint fails

Timing Constraints
Tip
Paths that cross unrelated clock domains are not covered
by PERIOD constraints

Action

Use the CLKA and CLKB groups that were created


when you entered PERIOD constraints
Specify a Slow/Fast Path Exception between CLKA
and CLKB
Do not forget to avoid creating a metastability
problem; consider using a FIFO or synchronization
circuit

Bidirectional buses usually create false paths

Group logic by component and place a TIG on paths


that can be ignored

Multicycle paths are usually associated with clock enable


nets

Create a MULTI_CYCLE group containing the clock


enable net
Specify a multicycle path constraint from
MULTI_CYCLE to MULTI_CYCLE

Advanced Implementation Options


Tip
MAP: Timing-driven packing can improve performance by
up to 5 percent

Action

Most effective if unrelated logic has been packed


together, which happens when there is high device
utilization (over 80 percent)
Map Report Design Summary Number of Slices
Containing Unrelated Logic

PAR: Increasing the Overall Effort Level can improve


performance by up to 5 percent

Runtime can increase by 100 percent or more

PAR: Extra Effort can improve performance by up to 3


percent

Runtime can increase by 200 percent or more

PAR: Multi-Pass Place & Route (MPPR) can improve


speed by up to 3 percent

Runtime is nearly the same, but multiple


implementations are running; this is not
recommended for Virtex-5 FPGA designs.
Remember to not run Cost Table 1

Use Xplorer to automatically try different implementation


options

Requires several implementations

2008 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and disclaimers are as listed at http://www.xilinx.com/legal.htm.
All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice.

FPGA23000-10-QR (v1.0) June 20, 2008


Quick Reference Card Page 2 of 3

www.xilinx.com
1-800-255-7778

Education Services Quick Reference


Designing for Performance
From the Xilinx Education Services Designing for Performance course.
For more information on Xilinx courses, please visit www.xilinx.com/education.
Timing Closure

2008 Xilinx, Inc. All rights reserved. All Xilinx trademarks, registered trademarks, patents, and disclaimers are as listed at http://www.xilinx.com/legal.htm.
All other trademarks and registered trademarks are the property of their respective owners. All specifications are subject to change without notice.

www.xilinx.com
1-800-255-7778

FPGA23000-10-QR (v1.0) June 20, 2008


Quick Reference Card Page 3 of 3

Facilitator Guide

Course Agenda

Course Agenda
Purpose

This module covers the agenda for the course.


Time

5 minutes
Process

This module covers the agenda for the course.


Lessons
!

Course Agenda

www.xilinx.com
1-877-XLX-CLAS

Page 1

Course Agenda

Facilitator Guide

Course Agenda
Show Slide 1:

Designing for Performance


Course Agenda

Show Slide 2:

Day One Objectives


After completing this module, you will be able to:

Describe a flow for obtaining timing closure


Describe the architectural features of the Virtex-5 FPGA
Describe the features of the Digital Clock Manager (DCM) and PhaseLocked Loop (PLL) and how they can be used to improve performance
Increase performance by duplicating registers and pipelining
Describe different synthesis options and how they can improve
performance
Create and integrate cores into your design flow by using the CORE
Generator software system
Run behavioral simulation on an FPGA design that contains cores
Course Agenda - 2

Page 2

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Course Agenda

Course Agenda
Key Points
!

This course builds upon the fundamental techniques of


designing into Xilinx FPGAs, which are taught in the
Fundamentals of FPGA Design course.

The modules and labs were developed with version 10.1i of the
Xilinx software, with no service packs. If you have installed a
different version or service pack level, lab results may differ.

Show Slide 3:

Day Two Objectives


After completing this module, you will be able to:

Pinpoint design bottlenecks by using Timing Analyzer reports


Apply advanced timing constraints to meet your performance goals
Use advanced implementation options to increase design performance

Course Agenda - 3

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 3

Course Agenda

Facilitator Guide

Course Agenda
Show Slide 4:

Prerequisites

The Fundamentals of FPGA Design course or equivalent knowledge of

FPGA architecture features


The Xilinx implementation software flow and implementation options
Reading timing reports
Basic FPGA design techniques
Global timing constraints and the Constraints Editor

Intermediate HDL knowledge (VHDL or Verilog)


Solid digital design background
The following recorded e-Learning modules are recommended

Basic HDL Coding Techniques


Spartan-3 FPGA HDL Coding Techniques
Virtex-5 FPGA HDL Coding Techniques

Course Agenda - 4

2008 Xilinx, Inc. All Rights Reserved

Show Slide 5:

Day One Agenda

Review of Fundamentals of FPGA Design


Designing with Virtex-5 FPGA Resources
CORE Generator Software System
Lab 1: CORE Generator Software System
Designing Clock Resources
Lab 2: Designing Clock Resources
FPGA Design Techniques
Synthesis Techniques
Lab 3: Synthesis Techniques

Course Agenda - 5

Page 4

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Course Agenda

Course Agenda
Show Slide 6:

Day Two Agenda

Achieving Timing Closure


Lab 4: Review of Global Timing Constraints
Timing Groups and OFFSET Constraints
Path-Specific Timing Constraints
Lab 5: Achieving Timing Closure
Advanced Implementation Options
Lab 6: Designing for Performance
Power Estimation (Optional)
Lab 7: FPGA Editor Demo (Optional)
ChipScope Pro Software (includes lab) (Optional)
Course Summary
Course Agenda - 6

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The day two agenda includes three optional sections. The


instructor may skip these sections if students are not interested
in the topics, or if time is running short.

www.xilinx.com
1-877-XLX-CLAS

Page 5

Course Agenda

Facilitator Guide

Course Agenda
Show Slide 7:

Where Are We Going?

What should you know about using Xilinx software right now?

Synchronous design techniques


How to specify global design constraints
The basics of using the Xilinx implementation tools

What will you know by the end of this class?

How to use HDL coding techniques


Software options
Constraints
Systematic design flow to obtain your performance objectives

Course Agenda - 7

2008 Xilinx, Inc. All Rights Reserved

Show Slide 8:

Appendix

Note that this course also includes the following appendixes

Appendix A: Designing with Virtex-5 FPGA Resources


Appendix B: Designing Clock Resources
Appendix C: Synthesis Techniques

To reduce size, the appendixes are not included in the printed workbook
The appendixes are included in a supplemental folder with the lab files
and are available via
ftp://ftp.xilinx.com/pub/documentation/education/fpga23000-10-rev1xlnx_lab_files.zip

Course Agenda - 8

Page 6

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Course Agenda

Course Agenda
Show Slide 9:

Latest Product Information


Please visit the following resources for the most current information
on the Xilinx devices described in this course.

For the latest user design information, see the user guides
For the latest characteristics, such as timing, performance, etc., see the
data sheets
For the latest design and software issues or bugs, see the Answer Record
database: Search by FPGA family or software tool

www.xilinx.com/support Answer Browser (under Support Quicklinks)

www.xilinx.com/xlnx/xil_ans_browser.jsp

Note for instructor: Take a moment to click the link above and browse the Records.

Course Agenda - 9

2008 Xilinx, Inc. All Rights Reserved

TRAINER NOTE

Take a moment to click the link above and browse the Records.

Transition to Review of Fundamentals of FPGA Design

www.xilinx.com
1-877-XLX-CLAS

Page 7

Review of Fundamentals of FPGA Design

Facilitator Guide

Review of Fundamentals of FPGA


Design
Purpose

This module
Time

15 minutes
Process

This module reviews the Virtex-5 FPGA architecture and some of


the primary functions of the ISE tools.
Lessons
!

Page 8

Review of Fundamentals of FPGA Design

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Review of Fundamentals of FPGA Design

Review of Fundamentals of FPGA Design


Show Slide 10:

Review of Fundamentals of
FPGA Design

Show Slide 11:

Apply Your Knowledge

1) What is the basic building block of an FPGA?

2) List some Virtex-5 FPGA features

3) List the implementation processes

4) Name the global timing constraints

Review of Fundamentals of FPGA Design - 11

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 9

Review of Fundamentals of FPGA Design

Facilitator Guide

Apply Your Knowledge Answers


Show Slide 12:

Answer

1) What is the basic


building block of an
FPGA?

Slices are the basic


building block of FPGAs
Each slice contains

6-input LUTs:
Combinatorial logic,
Shift Register LUT
(SRL), distributed
memory
Flip-flops
Carry logic
Multiplexers

Review of Fundamentals of FPGA Design - 12

2008 Xilinx, Inc. All Rights Reserved

Answers

1) What is the basic building block of an FPGA?


!

Slices are the basic building block of FPGAs.

Each slice contains:


6-input LUTs: Combinatorial logic, Shift Register LUT (SRL),
distributed memory
Flip-flops
Carry logic
Multiplexers

Page 10

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Review of Fundamentals of FPGA Design

Apply Your Knowledge Answers


Answers

2) List some Virtex-5 FPGA features.


!

Digital Clock Manager (DCM)

Phase-Lock Loop (PLL)

Global clock buffers (BUFGCTRL)

Regional clock resources (BUFIO and BUFR)

Dedicated DSP blocks (DSP48)

Block RAM (RAMB16)

Dedicated FIFOs (FIFO16)

SERDES interface

RocketIO multi-gigabit transceivers

PowerPC embedded processors

Ethernet MAC

3) List the implementation processes.


!

Translate

MAP

Place & Route

4) Name the global timing constraints.


!

PERIOD

PAD-TO-PAD

OFFSET IN and OFFSET OUT

Key Points
!

Valid endpoints for timing paths are:


I/O pins
Internal synchronous points (flip-flops, latches, and RAM
components)

www.xilinx.com
1-877-XLX-CLAS

Page 11

Review of Fundamentals of FPGA Design

Facilitator Guide

Apply Your Knowledge Answers


Key Points
!

Each global constraint covers a different type of path:


PERIOD: Begins and ends at internal synchronous points
PAD-TO-PAD: Begins and ends at I/O pins
OFFSET IN: Begins at I/O pins; ends at internal
synchronous points
OFFSET OUT: Begins at internal synchronous points; ends
at I/O pins

Transition to Designing with Virtex-5 FPGA Resources

Page 12

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Designing with Virtex-5 FPGA


Resources
Purpose

After completing this module, you will be able to:


!

Describe the I/O features of the Virtex-5 FPGA

Describe block RAM and FIFO resources

Explain XtremeDSP solution DSP48 resources

List other resources available in Virtex-5 FPGAs

Time

60 minutes
Process

This module describes the latest features of the newest FPGA from
Xilinx.
Lessons
!

Introduction

Overview

I/O

Block RAMs and FIFO

XtremeDSP Solution Cores

Other Features

Summary

www.xilinx.com
1-877-XLX-CLAS

Page 13

Designing with Virtex-5 FPGA Resources

Introduction
Show Slide 13:

Designing with Virtex-5


FPGA Resources

Show Slide 14:

Objectives
After completing this module, you will be able to:

Describe the I/O features of the Virtex-5 FPGA


Describe block RAM and FIFO resources
Explain XtremeDSP solution DSP48 resources
List other resources available in Virtex-5 FPGAs

Designing with Virtex-5 FPGA Resources - 14

Page 14

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Overview
Show Slide 15:

Lessons

Overview
I/O
Block RAMs and FIFO
XtremeDSP Solution Cores
Other Features
Summary

Designing with Virtex-5 FPGA Resources - 15

2008 Xilinx, Inc. All Rights Reserved

Show Slide 16:

Virtex Family Product and


Process Evolution

Designing with Virtex-5 FPGA Resources - 16

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 15

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Overview
Show Slide 17:

Virtex-5 Family
The Ultimate System Integration Platform

Logic

Logic/Serial

Logic
On-Chip RAM
DSP Capabilities
Parallel I/Os
Serial I/Os
PowerPC Processor

DSP/Serial

Embedded/
Serial

Built on the success of ASMBL


Designing with Virtex-5 FPGA Resources - 17

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 16

The Virtex-5 family is architected as a multi-platform FPGA


family. It is based on the ASMBL architecture that was
introduced in the Virtex-4 FPGA. The ASMBL architecture is a
column-based architecture that provides the benefit of mixing
resources (such as logic, on-chip RAM, DSP, and I/O) in
different proportions to better match your design requirements.

This approach provides an optimal mix of resources for your


needs and helps you to lower your system costsyou only pay
for the resource mix that you need.

The Virtex-5 family has four platforms that are optimized for
logic resources, logic with serial I/O, DSP with serial I/O, and
embedded processing with serial I/O.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Overview
Show Slide 18:

Virtex-5 FPGA Platform


Feature Overview
CLB
BRAM
I/O
CMT
BUFGMUX
DSP48E
BUFIO & BUFR

Designing with Virtex-5 FPGA Resources - 18

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Basic topology: Note the column-based architecture. The


Advanced Silicon Modular Block (ASMBL) architecture allows
Xilinx to assemble multiple programmable platforms with an
optimal blend of features for target application domains
meaning that Xilinx has built subfamilies of the Virtex-5 FPGA
for particular markets.

There are multiple columns of block RAM dispersed across the


device.

IOB banks (a left bank, a right bank, and a center bank) are
available via flip-chip technology.

There are two columns of regions (you will see later that each
region is 20 CLBs tall and half the die in width), but the width
can vary with the device, which is described in more detail
later.

Note that the LXT, SXT, and FXT platforms have the same basic
topology except that the dedicated resources (EMAC, PCI, and
MGT) are all placed on the right side of the die.

www.xilinx.com
1-877-XLX-CLAS

Page 17

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Overview
Show Slide 19:

Virtex-5 LXT Devices

Industrys widest and most flexible offering with embedded MGTs

Wide range of options in RAM, DSP slices, and MGTs


LXT330 device is 2x as large as any other FPGA with MGTs in the industry

EasyPath technology support


Embedded hard IP: PCI Express core, Ethernet MACs
LX20T

LX50T

LX85T

LX110T

LX330T

Logic Cells

19,968

46,080

82,944

110,582

331,776

RAM (kb)

936

2,160

3,888

5,328

11,664

DSP Slices

24

48

48

64

192

Transceiver Speeds

MGTs

500 Mbps to 3.75 Gbps


(down to 100 Mbps with integrated over-sampling circuitry)
4

Designing with Virtex-5 FPGA Resources - 19

12

12

16

24

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 18

Not all family members are shown in this table. Other device
sizes are: LX30T, LX155T, and LX220T.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Overview
Show Slide 20:

Virtex-5 SXT Devices

Same features as the LXT platform


More block RAM and DSP resources per logic cell compared to the LXT
platform
SX35T

SX50T

SX95T

Logic Cells

34,816

52,224

94,208

RAM (kb)

3,024

4,752

8,784

DSP Slices

192

288

640

Transceiver Speeds
MGTs

500 Mbps to 3.75 Gbps


(down to 100 Mbps with integrated over-sampling circuitry)
8

Designing with Virtex-5 FPGA Resources - 20

12

16

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Notice that the smallest SXT device has the same amount of
block RAM as a mid-sized LXT device, and the same number of
DSP slices as the largest LXT device.

www.xilinx.com
1-877-XLX-CLAS

Page 19

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Overview
Show Slide 21:

Virtex-5 FXT Devices

Faster GTX transceiver


PowerPC 440 embedded processor
FX30T

FX70T

FX100T

FX130T

FX200T

Logic Cells

32,768

71,680

102,400

131,072

196,608

RAM (kb)

2,448

5,328

8,208

10,728

16,416

DSP Slices

64

128

256

320

384

Transceiver Speeds

750 Mbps to *6.5 Gbps


(down to 150 Mbps with integrated over-sampling circuitry)

MGTs

16

16

20

24

PPC Processors

Designing with Virtex-5 FPGA Resources - 21

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 20

Transceiver speeds up to 6.5 Gbps are only possible with the -3


speed grade. Consult the Virtex-5 FPGA data sheets and
switching characteristics documents for more information
about GTX transceiver speeds.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

I/O
Show Slide 22:

Lessons
Overview
I/O
Block RAMs and FIFO
XtremeDSP Solution Cores
Other Features
Summary

Designing with Virtex-5 FPGA Resources - 22

2008 Xilinx, Inc. All Rights Reserved

Show Slide 23:

Region
Region
Region
Region

Region
Region
Region
Region

Region
Region
Region
Region

Region
Region

CMT
CMT
GClk

GClk
CMT

CMT
CMT

Region
Region
Region
Region

Region
Region
Region
Region
Region
Region
Region
Region

Region
Region
Region
Region

Region
Region
Region
Region

Region
Region
LX330 Layout

Region
Region

Bank Bank
Bank Bank
Bank Bank
Bank
Bank Bank
Bank Bank
Bank Bank
Bank
Bank
Bank

Region
Region

Region
Region

Bank Bank
Bank Bank
Bank Bank
Bank
Bank

Region
Region

Bank Bank
Bank
Bank Bank
Bank

Region
Region

Bank CFG Bank


Bank
Bank

Region
Region

Bank Bank
Bank Bank
Bank
Bank

Bank Bank
Bank
Bank Bank
Bank Bank
Bank

LX30
Layout

Bank Bank
Bank Bank
Bank Bank
Bank
Bank Bank
Bank Bank
Bank Bank
Bank
Bank
Bank

I/O Banking Architecture


Eight to 24 regions per device

Each region has one bank


Each bank has 40 I/Os and four I/O clocks

Additional I/O banks in the center column

Bank
Bank
Bank
Bank

Region
Region

CFG
CMT
GClk

Each bank has 20 I/Os

With 40 I/Os
With 20 I/Os
Spans halfway across the chip
Dedicated configuration bank
Clock Management Tile (CMT)
Global clock inputs

More and smaller banks compared to the Virtex-4 FPGA


Designing with Virtex-5 FPGA Resources - 23

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 21

Designing with Virtex-5 FPGA Resources

Facilitator Guide

I/O
Show Slide 24:

SelectIO Interface Versatility


Each pin can be input and (3-stateable) output
Each pin can be individually configured for

ChipSync technology, XCITE termination, drive strength, input threshold,


and weak pull-up or pull-down

Each input can be 3.3-V tolerant; limited by its Vcco

Each I/O can have the same performance

Each I/O supports 40 plus voltage and protocol standards, including

No 5-V tolerance, unless current-limiting R is used


Up to 700 Mbps single-ended and 1.25 Gbps differential LVDS

LVCMOS (3.3 V, 2.5 V, 1.8 V,


1.5 V, and 1.2 V)
LVDS, bus LVDS, extended LVDS
LCPECL
PCI, PCI-X
Hyper Transport (LDT)

HSTL (1.8 V, 1.5 V, Classes I, II, III, IV)


HSTL_I_12 (unidirectional only)
DIFF_HSTL_I_18,
DIFF_HSTL_I_18_DCI
DIFF_HSTL_I, DIFF_HSTL_I_DCI
RSDS_25 (point-to-point)

SSTL (2.5 V, 1.8 V, Classes I, II)


DIFF_SSTL_I
DIFF_SSTL2_I_DCI
DIFF_SSTL18_I,
DIFF_SSTL18_I_DCI
GTL, GTL+

Versatile, fast, and homogeneous user I/Os


Designing with Virtex-5 FPGA Resources - 24

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 22

Support for the following standards has been removed:

DIFF_*_DCI

LVDS_25_DCI, LVDSEXT_25_DCI, and ULVDS_DCI

CSE complementary single-ended outputs (replaced with


differential drivers)

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

I/O
Show Slide 25:

Enhancements
Input and Output Buffers

All I/O are lower-cap I/O

Design improvements in the output buffer

No differentiation between the center column and other columns


LVDS output buffer is available on all I/O
Higher performance for single-ended I/O (700 Mbps versus 600 Mbps)
Higher performance for differential I/O (1.25 Gbps versus 1.1 Gbps)
LVDS SDR unchanged at 710 Mbps

Reduction in input differential termination (DIFF_TERM) variation


Inputs from pad can be optionally inverted

Inverters from the IOB to the fabric are removed

Designing with Virtex-5 FPGA Resources - 25

2008 Xilinx, Inc. All Rights Reserved

Show Slide 26:

Enhancements
ChipSync Technology Enhancements

New IODELAY

Used for input or output delay (IDELAY and ODELAY)

Flexibility in REFCLK frequency

Fabric access to IDELAY with optional inverter

Includes separate input from the fabric

IDELAY improvements
Can be any frequency between 175 MHz and 225 MHz
General use of the delay line
Enables building oscillators

Simplified IDELAYCTRL RESET (now edge triggered)

Separate ISERDES/OSERDES reset control


IDELYCTRL is auto placed to match the IODELAY instance
Designing with Virtex-5 FPGA Resources - 26

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 23

Designing with Virtex-5 FPGA Resources

Facilitator Guide

I/O
Show Slide 27:

Easy Interface to SourceSynchronous Memory

ChipSync technology

Fast regional and I/O clocks


Embedded ECC logic

Reduces logic resources


Increases performance

Data

Virtex-5
FPGA

Proven memory interfaces

ChipSync
ChipSync technology
technology

Programmable IDELAY and ODELAY


Integrated I/O SERDES

DDR-II DRAM and QDR/QDR-II, for


example

Forwarded
CLK/DQS
SelectIO
SelectIO
interface
interface

XCITE: Internal impedance control


Designing with Virtex-5 FPGA Resources - 27

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 24

XCITE: Digitally Controlled Impedance (DCI).

Series, parallel, or differential termination is supported.

Temperature and voltage compensation is digitally controlled.

Fewer resistors on the board result in easier PCB design.

Termination at the source or load is available.

Compatibility with all I/O standards (HSTL and SSTL, for


example) is supported.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

I/O
Show Slide 28:

ISERDES Manages
Incoming Data

Frequency division

Dynamic signal alignment

Data width to 10 bits


Bit alignment
Word
Data
alignment
Clock
alignment
CLK
Supports
Dynamic
Phase Alignment
(DPA)

Designing with Virtex-5 FPGA Resources - 28

ChipSync
ChipSync Technology
n
ISERDES
ISERDES

BUFIO
BUFIO

FPGA Fabric
CLKDIV

CLK

BUFR
BUFR

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

ChipSync technology provides two major functions: frequency


reduction and alignment. Although the term DPA is specifically
used in SPI-4.2 (including both bit and word alignment), the
term is also used more broadly. Because every signal has this
circuitry, including clocks, clocks can be aligned as well
making this the most flexible solution available.

www.xilinx.com
1-877-XLX-CLAS

Page 25

Designing with Virtex-5 FPGA Resources

Facilitator Guide

I/O
Show Slide 29:

OSERDES Simplifies
Frequency Multiplication

Two separate SERDES included

Data SERDES: 2, 3, 4, 5, 6, 7, 8, 10 bits


Three-state SERDES: 1, 2, 4 bits

Ideal for memories


ChipSync
ChipSync
Technology
Technology

OSERDES
OSERDES

CLK

n
m

FPGA Fabric

CLKDIV

BUFIO/BUFR
BUFIO/BUFR
DCM/PMCD
DCM/PMCD
Designing with Virtex-5 FPGA Resources - 29

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 26

The figure shows data leaving the chip. Just as data was
divided down upon entering the chip, it must be multiplied up
when leaving. The OSERDES performs this function.

The OSERDES block also allows three-state control to be sped


up, primarily for memory buses. The 1-bit, 2-bit, or 4-bit
settings cover all the various memory configurations.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

I/O
Show Slide 30:

Data Output Alignment


CLK
DATA
64
64delay
delayelements
elementsofof~70
~70toto89
89ps
ps

ChipSync
Technology

ODELAY
INC/DEC

State
Machine

ODELAY
ODELAYcan
canonly
onlybe
beused
usedinin

FIXED
FIXEDmode
mode
The
Thecalibration
calibrationclock
clockcan
canbe
beinternal
internal
ororexternal
external

FPGA Fabric

OSERDES

175225 MHz
(Calibration clk)

Designing with Virtex-5 FPGA Resources - 30

ODELAY CNTRL

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

TIODELAYRESOLUTION = 1/(64 x FREF x 1e6).

The calibration clock range for the IDELAYCNTRL and


ODELAYCNTRL has changed from the Virtex-4 FPGA.

The IODELAY element can now be used independently with


the direct input from the fabric. Also, the delay element can be
used for input or output delay. There is only one delay element
shared by the direct input from the fabric, input logic, and
output logic.

www.xilinx.com
1-877-XLX-CLAS

Page 27

Designing with Virtex-5 FPGA Resources

Facilitator Guide

I/O
Show Slide 31:

Use Examples

SDR resources utilizing ILOGIC and OLOGIC resources can be inferred


IDDR can be inferred

ODDR, ISERDES, and OSERDES resources must be instantiated

See Xilinx Answer Record 15776


Instantiate primitives
IP (CORE Generator & Architecture Wizard) ChipSync Wizard
Memory Interface Generator (MIG)

Virtex-5 FPGA support is available in v1.6

Designing with Virtex-5 FPGA Resources - 31

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 28

The ChipSync Wizard configures a group of I/O blocks into an


interface for use in memory, networking, or any other type of
bus interface. The ChipSync Wizard creates HDL code with
these features configured according to your input.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

I/O
Show Slide 32:

ChipSync Wizard
Memory Applications: General and Data Setup

Designing with Virtex-5 FPGA Resources - 32

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Single data rate:


If the fabric data width is greater than 1, this selection sets
the DATA_RATE data attribute to SDR for the ISERDES and
OSERDES blocks in the resulting configuration.
If the fabric data width is 1, IFD and OFD blocks (flip-flops)
are used instead of ISERDES and OSERDES blocks.

Double data rate:


If the fabric data width is greater than 2, this selection sets
the DATA_RATE data attribute to DDR for the ISERDES
and OSERDES blocks in the resulting configuration.
If the fabric data width is 2, IDDR and ODDR blocks are
used instead of ISERDES and OSERDES blocks.

Number of data bits per clock/strobe:


Specifies the number of data bits in the bus that will be
clocked by each clock or strobe.

www.xilinx.com
1-877-XLX-CLAS

Page 29

Designing with Virtex-5 FPGA Resources

Facilitator Guide

I/O
Key Points
!

DDR_CLK_EDGE property setting:


This option appears only when the data rate is set to double
data rate and the fabric data width is set to 2 in the General
Setup dialog box.
For a description of the DDR modes specified by the
DDR_CLK_EDGE property, refer to Chapter 7, SelectIO
Logic Resources in the Virtex-5 User Guide.

Show Slide 33:

Memory Interface Generator

Generates a complete memory


controller and interface design

Output: RTL, UCF,


documentation, and timing
analysis

Choose from a predefined


catalog of available devices
and interfaces
Checks SSO and all pin selection
rules

VHDL or Verilog

Included with the CORE


Generator software
Designing with Virtex-5 FPGA Resources - 33

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 30

Also available for the CORE Generator software in standalone


mode.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

I/O
Show Slide 34:

Apply Your Knowledge

1) Describe the I/O features of the Virtex-5 FPGA

Designing with Virtex-5 FPGA Resources - 34

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 31

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Block RAMs and FIFO


Show Slide 35:

Lessons

Overview
I/O
Block RAMs and FIFO
XtremeDSP Solution Cores
Other Features
Summary

Designing with Virtex-5 FPGA Resources - 35

2008 Xilinx, Inc. All Rights Reserved

Show Slide 36:

Virtex-5 FPGA Block RAM


and FIFO Enhancements

36-kb size

Performance up to 550 MHz


Multiple configurations

True dual port, simple dual port, single port

Enhances PowerPC processor memory interfacing

Integrated 64-bit error correction


No issues with synchronous clocks on FIFO18/FIFO36
Reduced power
Designing with Virtex-5 FPGA Resources - 36

Page 32

Dual-Port
BRAM

64kb x 1 integrated cascade logic


Maximum data width = 72
Byte-write enable

One 36-kb block RAM or FIFO


Two independent 18-kb RAMs
One 18-kb RAM and one 18-kb FIFO

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

or

FIFO

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Block RAMs and FIFO


Key Points
!

Maximum frequency increased 10 percent

Setup and clock-to-out delays reduced 20 percent

Dynamic power reduced

Capacity doubled

More features added

Show Slide 37:

Virtex-5 FPGA Block RAM


Architecture
18-kb RAM

CLB

CLB

Five
CLBs
High

CLB

ECC & Interconnect

CLB

9-kb
RAM

9-kb
RAM

IO + Control Logic
FIFO Logic

18-kb RAM

9-kb
RAM

9-kb
RAM

CLB
Designing with Virtex-5 FPGA Resources - 37

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Note that each 18-kb RAM is divided into two 9-kb RAMs. This
distinguishing feature helps to reduce power and heat in that
location.

www.xilinx.com
1-877-XLX-CLAS

Page 33

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Block RAMs and FIFO


Show Slide 38:

Independent 18-kb Block


RAM and FIFO

Virtex-5 FPGA block RAM and FIFO can operate as

One 36-kb block RAM and FIFO or


Two independent 18-kb block RAMs or one 18-kb block RAM and
independent 18-kb FIFO
Backwards compatible with the Virtex-4 FPGA

One ECC per tile


36

36

36-kb
Block RAM
or
FIFO

OR

36

Designing with Virtex-5 FPGA Resources - 38

18-kb
Block RAM

36

18-kb
Block RAM
or
FIFO

2008 Xilinx, Inc. All Rights Reserved

Show Slide 39:

Simple Dual-Port or SinglePort Block RAM

Three different styles

Single-port block RAM: one address


driving both ports

Configurations
32kb x 1, 16kb x 2, 8kb x 4, 4kb x 9,
2kb x 18, 1kb x 36

Simple dual-port block RAM: one read


port, one write port

Configurations
32kb x 1, 16kb x 2, 8kb x 4, 4kb x 9,
2kb x 18, 1kb x 36, 512x72
512x72 uses both 18-kb
block RAMs as 512x36

Designing with Virtex-5 FPGA Resources - 39

Page 34

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Addr A

Port A

36
Wdata A

36

Rdata A

36-kb
Memory
Array
Addr B

36
Wdata B

Port B
Rdata B

36

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Block RAMs and FIFO


Show Slide 40:

True Dual-Port Block RAM

Three different styles

True dual-port block RAM: unrestricted flexibility

Can perform read and write operations


simultaneously and independently on
Port A and Port B
In one clock cycle, a total of four
operations can be performed
using both Port A and Port B
Read before write, write before
read, or no change
Wide range of configurations
32kb x 1, 16kb x 2, 8kb x 4, 4kb x 9,
2kb x 18, 1kb x 36
Largest width in the Virtex-4 FPGA
is 512x36

Designing with Virtex-5 FPGA Resources - 40

Addr A

Port A

36
Wdata A

36

Rdata A

36-kb
Memory
Array
Addr B

36

Port B

36

Rdata B

Wdata B

2008 Xilinx, Inc. All Rights Reserved

Show Slide 41:

Block RAM is Cascadable

DQ

Built-in cascade logic for 64kb x 1

DQ

Cascade two adjacent 32-kb block RAMs without


using external CLB logic or compromising
performance

Cascade option for larger arrays using external


CLB logic

DI
A[1
3:0
]

Ram_ Extension
DQ

DI
A[13:0]

1 DO
0

A14

11

1
0

DQ

WE _ Control

DQ

DI

DQ

A[13:0]

Ram_ Extension
DQ
11

1
0

DQ

(To
(To Initiate
Initiate Write
Write Operation)
Operation)

Not Used
1
0

A14

WE _ Control
(To
(To Initiate
Initiate Write
Write Operation)
Operation)

128 kb, 256 kb, 512 kb, 1 Mb,


For depth or width expansion
Example:
Example: Cascade
Cascade eight
eight block
block RAMs
RAMs to
to
build
build 256-kb
256-kb memory
memory

Designing with Virtex-5 FPGA Resources - 41

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 35

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Block RAMs and FIFO


Show Slide 42:

Output Register Set/Reset

Latch mode (DO_REG = 0)

Operation is the same as in the Virtex-4 FPGA


SSR and EN will set/reset the output latch to SRVAL

REG mode (DO_REG = 1)

SSR and EN will set/reset


the output register to
SRVAL
Block RAM
DATA_IN
can be read
SSR
or written
EN[A/B]
by the
other port
REGCE[A/B]
during SSR

36-kb
Block RAM

Memory
Array
Latch
SSR

REG
SSR

(DO_REG=1)
Designing with Virtex-5 FPGA Resources - 42

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 36

Block RAM can also be read or written by the other port during
SSR in latch mode (DO_REG = 0).

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Block RAMs and FIFO


Show Slide 43:

FIFO18/36 Top-Level View

550-MHz maximum frequency

Full featured

2x performance increase over soft implementations


Synchronous or asynchronous read and write
clocks

Four flags

Full, empty, programmable almost full, and


programmable almost empty

DOUT Bus

WREN
> WRCLK

FULL
AFULL
EMPTY
AEMPTY
RDERR
WRERR

RDEN

> RDCLK

RESET

RDCONT<11:0>
WRDCONT<11:>

Optional First Word Fall Through (FWFT)

No phase relationship required

DIN Bus

Immediate availability of the first word after empty


FIFO configurations (same width for read and
write)

FIFO36: 8kb x 4, 4kb x 9, 2kb x 18, 1kb x 36,512x72


FIFO read port is block RAM Port A
FIFO18: 4kb x 4, 2kb x 9, 1kb x 18, 512x36
Utilizes RAMB18/36 for memory in simple dual-port FIFO write port is block RAM Port B
style

Designing with Virtex-5 FPGA Resources - 43

2008 Xilinx, Inc. All Rights Reserved

Show Slide 44:

FIFO18/36

18-kb or 36-kb configuration

If used in 18-kb mode, the


other 18 kb can only be
used as block RAM

Two modes

Multirate or Synchronous

Attribute: EN_SYN

Not supported

Independent read/write port width


Byte write enable
Dedicated cascade logic

Designing with Virtex-5 FPGA Resources - 44

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 37

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Block RAMs and FIFO


Key Points
!

Reading data from the FIFO is synchronous to the rising edge


of RDCLK.

Writing data to the FIFO is synchronous to the rising edge of


WRCLK.

The Full and Almost Full flags are synchronous to the write
clock (WRCLK).

The Empty and Almost Empty flags are synchronous to the


read clock (RDCLK).

Show Slide 45:

Two Modes

Multirate (asynchronous clocks)

Can be used in Standard or


FWFT mode
EN_SYN = FALSE
(default)
DO_REG = 1

Synchronous

Can be used in Standard


mode only

FIRST_WORD_FALL_THROUGH =
FALSE (default)

EN_SYN = TRUE
DO_REG = 0, 1

If DO_REG = 1, adds a pipeline stage to flags and outputimproving Tcko

Designing with Virtex-5 FPGA Resources - 45

Page 38

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Block RAMs and FIFO


Show Slide 46:

Virtex-5 FPGA FIFOs are


Cascadable

Flexible FIFO configuration

No dedicated cascade logic


Expand width, depth, or
both using fabric logic

DIN<35:0>

DIN<35:0> DOUT<35:0>
WREN
RDEN
EMPTY
WRCLK
AFULL
RDCLK

DOUT<35:O>

DIN<35:0> DOUT<35:0>
WREN
RDEN
EMPTY
WRCLK
AFULL
RDCLK

DOUT<71:36>

FIFO
#1

RDEN
WREN
RDEN

DIN<71:36>
WREN

FIFO
#1

EMPTY

AFULL

1kx72 FIFO
DIN<3:0>
WREN

DIN<3:0> DOUT<3:0>

DIN<3:0> DOUT<3:0>

Data_Avail
WREN
Data_Taken
WRCLK
RDCLK

WREN

FIFO
#1

WRCLK

RDCLK
RDEN

RDEN
WRCLK
RDCLK

DOUT<3:0>

Width Cascade

AFULL

FIFO
#2

16kx4 FIFO

Depth Cascade
Designing with Virtex-5 FPGA Resources - 46

2008 Xilinx, Inc. All Rights Reserved

Show Slide 47:

Block RAM and FIFO Use

Inference of block RAM is possible

Specific coding techniques are required

Most block RAM capabilities are available


Dual port, individual clocks, separate read/write ports, output register,
set/reset
See the XST Users Guide RAMs and ROMs
Examples: ftp://ftp.xilinx.com/pub/documentation/misc/examples_v8.zip

Inference of FIFO18/36 is not possible

Xilinx suggests that you use IP (CORE Generator & Architecture Wizard)

Designing with Virtex-5 FPGA Resources - 47

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 39

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Block RAMs and FIFO


Key Points
!

Xilinx suggests instantiation of memory cores for the following


reasons.

Portability: If you change to the latest device, you can swap out
new cores to utilize new features. In addition, each family
and/or vendor will have different memory capabilities.

The cores that were created by the IP (CORE Generator &


Architecture Wizard) tool will:
Create nearly any size memory and automatically include
any extra logic that is required for connecting or cascading
Specify the required attributes based on the GUI selections
Only bring out the necessary portsgreatly simplifying
HDL instantiation

Show Slide 48:

IP (CORE Generator &


Architecture Wizard)

Designing with Virtex-5 FPGA Resources - 48

Page 40

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Block RAMs and FIFO


Key Points
!

The memories that were created from the CORE Generator


and FIFO Generator software automatically include the
necessary constraints and attributesmaking it easy to
instantiate the resulting core into your code with minimal
effort.

Show Slide 49:

Apply Your Knowledge

2) Compare the following I/O resources in the Virtex-5 FPGA to the Virtex4 FPGA

Banking

ChipSync technology resources

Designing with Virtex-5 FPGA Resources - 49

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 41

Designing with Virtex-5 FPGA Resources

Facilitator Guide

XtremeDSP Solution Cores


Show Slide 50:

Lessons

Overview
I/O
Block RAMs and FIFO
XtremeDSP Solution Cores
Other Features
Summary

2008 Xilinx, Inc. All Rights Reserved

Designing with Virtex-5 FPGA Resources - 50

Show Slide 51:

0
1

PCOUT

BCOUT

Virtex-4 FPGA DSP48 Slice


A:B

Subtract

18

M
A

18

48
P

17-bit shift

18x18
18x182s
2scomplement
complementmultiplier
multiplier

17-bit shift

CARRYIN

48-bit
48-bitadder/subtractor/accumulator
adder/subtractor/accumulator

Dynamic
Dynamicuser-controlled
user-controlledoperating
operatingmodes
modes

OPMODE

17-bit
17-bitright
rightshift
shiftfor
formulti-precision
multi-precisionmultiplies
multiplies
Optional
Optionalinput/pipeline/output
input/pipeline/outputregisters
registers
Symmetric
Symmetricrounding
roundingsupport
support

Designing with Virtex-5 FPGA Resources - 51

Page 42

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

PCIN

BCIN

Cascading
Cascading18-bit
18-bitBBbus
busand
and48-bit
48-bitPPbus
bus

Facilitator Guide

Designing with Virtex-5 FPGA Resources

XtremeDSP Solution Cores


Key Points
!

As a reminder, here is the DSP48 slice in the Virtex-4 FPGA. It


will be used as a comparison with the Virtex-5 FPGA DSP48E
slice.

Show Slide 52:

0
1

PCOUT

BCOUT

Virtex-5 FPGA (25x18)


Multiplier
A:B

Subtract

18

25

25

18

M
P

48
P

C
More
Moreefficient
efficientfor
for25x25
25x25applications
applications

17-bit shift

17-bit shift

CARRYIN

four
fourDSP48s)
DSP48s)
Single
Singleprecision
precisionfloating
floatingpoint
point
multiplication;
multiplication;24x24
24x24unsigned
unsigned
High-end
High-endaudio
audioand
andimage
imageprocessing
processing
More
Moreefficient
efficientfor
forcomplex
complex25x18
25x18multipliers
multipliers
Low
Lowpower
powerFFTs
FFTs(4G
(4Gwireless)
wireless)
Designing with Virtex-5 FPGA Resources - 52

OpMode
PCIN

BCIN

35x25
35x25inintwo
twoDSP48E
DSP48Eslices
slices(vs.
(vs.35x35
35x35inin

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

USE_MULT replaces LEGACY_MODE. USE_MULT=NONE


disables the multiplier to lower power.

ADD/ACC/MACC extension; 96-bit ADD/ACC via two cascaded


DSP48Es:
!

Primarily used in CIC filters

Also useful for single precision floating point addition

Internal CARRYOUT signal (CARRYCASCOUT/


CARRYCASCIN) facilitates 96-bit ACC

Fabric output register is needed to align the lower 48-bit P with


upper if pipelining

Two-deep A:B provides upper DSP48E slice input alignment

More headroom for 25x18 MACC


www.xilinx.com
1-877-XLX-CLAS

Page 43

Designing with Virtex-5 FPGA Resources

Facilitator Guide

XtremeDSP Solution Cores


Key Points
!

Ternary adder used in MACC function (single CARRYOUT not


sufficient)
Additional CARRYOUT signal needed to extend the MACC
internally (MULTSIGNOUT/MULTSIGNIN)
MULTSIGNOUT not available as fabric output

Special OPMODE[6:0] = 1001000 for upper DSP48E

Lower DSP48E uses normal OPMODE setting for MACC

Show Slide 53:

48

0
1

PCOUT

BCOUT

Independent C Input
A:B
Subtract

18

25

25

18
48

C
Virtex-5
Virtex-5FPGA
FPGADSP:
DSP:Independent
IndependentCCinput
input

M
P

48
P

Eliminates
EliminatesVirtex-4
Virtex-4FPGA
FPGAissues
issuessuch
such

as
as

17-bit shift

17-bit shift

CARRYIN

within
withinaatile
tile
Simulation
Simulationissues
issuesinincases
caseswhere
wheretwo
two
DSP48s
are
DSP48s areininaatile
tileand
andonly
onlyone
oneuses
uses
CCinput
input
Requires
DRC
checks
Requires DRC checks
Understanding
Understandingthe
therules
rulesand
and
regulations
regulationsofofusing
usingthe
theCCinput
input

Designing with Virtex-5 FPGA Resources - 53

Page 44

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

OpMode
PCIN

BCIN

MAP
MAPproblems
problemswith
withDSP48
DSP48slices
slices

Facilitator Guide

Designing with Virtex-5 FPGA Resources

XtremeDSP Solution Cores


Show Slide 54:

48

0
1

A:B
ALUMode

18

PCOUT

BCOUT

SIMD and Logic Unit

30

25

25

18

M
0

48

C
C
A:B
A:Bexpanded
expandedtoto48
48bits
bits(36
(36bits
bitsininthe
theVirtex-4
Virtex-4FPGA)
FPGA)
SIMD
SIMD(Single
(SingleInstruction
InstructionMultiple
MultipleData)
Data)

17-bit shift

17-bit shift

48-bit adder is splittable into segments


48-bit adder is splittable into segments
Quad 12-bit or dual 24-bit configurations
Quad 12-bit or dual 24-bit configurations
Common control/instruction: OPMODE and ALUMODE
Common control/instruction: OPMODE and ALUMODE
CARRYOUTs for each segment (2-input arithmetic)
CARRYOUTs for each segment (2-input arithmetic)
CARRYINs only available to the lowest segment
CARRYINs only available to the lowest segment

CARRYIN

OpMode
PCIN

BCIN

48

Bit-wise
Bit-wiselogic
logicoperations
operationsavailable
available

XOR,
XOR,XNOR,
XNOR,AND,
AND,NAND,
NAND,OR,
OR,NOR,
NOR,NOT
NOT
Controlled
Controlleddynamically
dynamicallybybyALUMODE
ALUMODE

Designing with Virtex-5 FPGA Resources - 54

2008 Xilinx, Inc. All Rights Reserved

Show Slide 55:

Two-Input Logic Functions


ALUMODEs

ALUMODE[3:0]

0
P
A:B

0
1

0
PCIN
P
C

OPMODE[3:0]

Designing with Virtex-5 FPGA Resources - 55

Logic Unit Mode

OPMODE[3:2]

ALUMODE[3:0]

X XOR Z

00

0100

X XNOR Z

00

0101

X XNOR Z

00

0110

X XOR Z

00

0111

X AND Z

00

1100

X AND (NOT Z)

00

1101

X NAND Z

00

1110

(NOT X) OR Z

00

1111

X XNOR Z

10

0100

X XOR Z

10

0101

X XOR Z

10

0110

X XNOR Z

10

0111

X OR Z

10

1100

X OR (NOT Z)

10

1101

X NOR Z

10

1110

(NOT X) AND Z

10

1111

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 45

Designing with Virtex-5 FPGA Resources

Facilitator Guide

XtremeDSP Solution Cores


ALUMODEs
Logic Unit Mode
X XOR Z

OPMODE[3:2]
00

ALUMODE[3:0]
0100

X XNOR Z

00

0101

X XNOR Z

00

0110

X XOR Z

00

0111

X AND Z

00

1100

X AND (NOT Z)

00

1101

X NAND Z

00

1110

(NOT X) OR Z

00

1111

X XNOR Z

10

0100

X XOR Z

10

0101

X XOR Z

10

0110

X XNOR Z

10

0111

X OR Z

10

1100

X OR (NOT Z)

10

1101

X NOR Z

10

1110

(NOT X) AND Z

10

1111

text

Key Points
!

This table shows how the ALU can be configured for two-input
operations where the multiplier output is not used. If
OPMODE[3:2] is set to 00, then the Y multiplexer is
contributing a value of 0 to the 3-input adder. If OPMODE[3:2]
is set to 10, then the Y multiplexer is contributing an all 1s
value to the adder.

48-bit dynamic ALU-like functionality

Limited shift capability:


1-bit left shift, 17-bit right shift, but no 1-bit right shift
1-bit barrel shift

Additional logic operations:

Page 46

48-bit bitwise XOR, XNOR, AND, NAND, OR, NOR, NOT

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

XtremeDSP Solution Cores


Show Slide 56:

48

1
A

A:B
ALUMode

18

30
0

PCOUT

ACOUT

BCOUT

A Input Cascade

25

25

18
48

0
1

48
P

Lower
Lowerpower
powerconsumption
consumption

17-bit shift

17-bit shift

Dedicated
Dedicatedrouting
routingwithin
withinthe
theDSP
DSPcolumn
column

CARRYIN

Allows
Allowsefficient
efficientadaptive
adaptivefilter
filterimplementation
implementation

Loads
Loadscoefficients
coefficientsserially
seriallyininaashadow
shadowregister
register
while
whilethe
thefilter
filterisisstill
stilloperating
operating
New
coefficients
loaded
to
the
filter
register
in
New coefficients loaded to the filter register in
parallel
parallel
Separate 2-deep A/B CE facilitates wave CE
Separate 2-deep A/B CE facilitates wave CE

OpMode

Designing with Virtex-5 FPGA Resources - 57

PCIN

ACIN

BCIN

2008 Xilinx, Inc. All Rights Reserved

Show Slide 57:

1
A

A:B
ALUMode

18

30
0

PCOUT

48

25

25

18
48

Extend
Extendsymmetric
symmetricrounding
roundingtotomulti-precision
multi-precision

operations
operations

Support
Supportfor
forconvergent
convergentrounding
rounding

0
1

17-bit shift

17-bit shift

ACIN

BCIN

PATTERN_DETECT

CARRYIN

Requires fabric logic; dynamic rounding point


Requires fabric logic; dynamic rounding point

C or MC

Overflow/underflow
Overflow/underflowimplemented
implementedininDSP48E
DSP48E

Support
Supportfor
foraccumulator
accumulatorterminal
terminalcount
count

Support
Supportfor
forsaturation
saturationlogic
logic

48

Counter
Counterauto-reset
auto-reset

OpMode
PCIN

ACOUT

BCOUT

Pattern Detector

Pattern
Patterndetector
detectoroutputs
outputsslower
slowerthan
thanPP
Designing with Virtex-5 FPGA Resources - 58

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 47

Designing with Virtex-5 FPGA Resources

Facilitator Guide

XtremeDSP Solution Cores


Key Points
!

The pattern detector at the output of the DSP48E slice provides


support for convergent rounding, overflow/underflow, block
floating point, and support for accumulator terminal count
(counter auto reset). The pattern detector can detect if the
output of the DSP48E slice matches a pattern as qualified by a
mask. This enables functions such as A:B NAND C = = 0 or A:B
(bitwise logic) C = = Pattern to be implemented.

For more information on pattern detection, refer to the Virtex-5


XtremeDSP Design Considerations User Guide.

Show Slide 58:

Multiply (35 X 25)


25

DSP48_1
OPMODE 0010101
ALUMODE 0000

B[34:17]

18
ACIN

DSP48_0
OPMODE 0000101
ALUMODE 0000

A
A[24:0]

Designing with Virtex-5 FPGA Resources - 59

Page 48

25

0,B[16:0]

P[42:0] = OUT[59:17]

SHIFT 17

P
18

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

P[16:0] = OUT[16:0]

Facilitator Guide

Designing with Virtex-5 FPGA Resources

XtremeDSP Solution Cores


Show Slide 59:

Implement or Accelerate
DSP Functions
DSP Operation

Logic

DSP48E

Fast Fourier Transform (FFT)


Finite Impulse Response (FIR)
Infinite Impulse Response (IIR)
C Integer Comb (CIC)
Quadrature Filter
Decimating Filter
Interpolating Filter
Linear Phase Filter
CORDIC Functions
Butterworth Function
Chebyshev Function
Bessel Function
Forward Error Correction (FEC)
Pre-distortion
Encoding
Encryption
Compression
Designing with Virtex-5 FPGA Resources - 60

2008 Xilinx, Inc. All Rights Reserved

Show Slide 60:

IP Support
IP (COREGen & Architecture Wizard)

IP is currently supported in the IP (COREGen and Architecture Wizard)


tool for the ISE 10.1i software
A sampling of cores to be supported
Multiplier

Adder

Multiply and Accumulate (MAC)

Dynamic Control

MAC FIR
MAD

Designing with Virtex-5 FPGA Resources - 61

Serial Divider
CORDIC
FFT
SIN COS LUT
DDS
Multiplier Generator

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 49

Designing with Virtex-5 FPGA Resources

Facilitator Guide

XtremeDSP Solution Cores


Show Slide 61:

High Precision, High


Bandwidth
Virtex-5 FPGA
Solution

High-Precision Functions

Number of
Function
Instances in
V5LX330

Maximum
Bandwidth
@ 500 MHz

25x18 MACC

1 DSP48E
Slice

192 Operations

105 GMACCs/sec

25x18 Multiply plus Addition/Subtraction

1 DSP48E
Slice

192 Operations

210 GOPs/sec

48+48 Addition/Subtraction

1 DSP48E
Slice

192 Operations

105 GOPs/sec

35x25 Complex Multiplication

4 DSP48E
Slices

48 Operations

26 GOPs/sec

24x24 Single Precision Floating Point

2 DSP48E
Slices

96 Operations

53 GOPs/sec

Designing with Virtex-5 FPGA Resources - 62

2008 Xilinx, Inc. All Rights Reserved

Show Slide 62:

Apply Your Knowledge


Dynamically Reconfigurable DSP OPMODEs

OPMODEs
OPMODE
1
0
0
0
0
1
1
0
1
1

X Select

OPMODE
2
0
1
0
1

Y Select

Notes

0
M
48'hffffffffffff
C

Default
Must select with OPMODE[1:0]=01
Used mainly for ALU bitwise operations

Z Select

Notes

3
0
0
1
1
6
0
0
0
0
1
1
1
1

OPMODE
5
0
0
1
1
0
0
1
1

0
M
P
A:B

4
0
1
0
1
0
1
0
1

0
PCIN
P
C
P
Shift(PCIN)
Shift(P)

Notes
Default
Must select with OPMODE[3:2]=01

or ((-Z + (X + Y + PCIN) 1)(1)

Default

3) Given this OPMODE table, what is the


OPMODE for the following functions?

Used for MACC extend only

C + A:B
(A x B) + C
P + C + PCIN

Illegal selection

Designing with Virtex-5 FPGA Resources - 63

Page 50

Add/Subtract Output: (Z +/+/- (X + Y + PCIN)

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

XtremeDSP Solution Cores

OPMODEs
OPMODE
1
0
0
0
0
1
1
0
1
1

X Select

OPMODE
2
0
1
0
1

Y Select

Notes

0
M
48'hffffffffffff
C

Default
Must select with OPMODE[1:0]=01
Used mainly for ALU bitwise operations

Z Select

Notes

3
0
0
1
1
6
0
0
0
0
1
1
1
1

OPMODE
5
0
0
1
1
0
0
1
1

0
M
P
A:B

4
0
1
0
1
0
1
0
1

0
PCIN
P
C
P
Shift(PCIN)
Shift(P)

Notes
Default
Must select with OPMODE[3:2]=01

Default

Used for MACC extend only

Illegal selection

www.xilinx.com
1-877-XLX-CLAS

Page 51

Designing with Virtex-5 FPGA Resources

Facilitator Guide

XtremeDSP Solution Cores


Key Points

Page 52

There are over 40 different modes. Each DSP48E slice is


individually controllable. Logic-driven or memory-driven
operation can be changed in a single clock cycleenabling
resource sharing for maximum utilization.

Note: The add/subtract functionality depends also on the


ALUMODE selected. For example, if ALUMODE = 0001 and
CARRYIN = 1, the function implemented is X Minus Z.

M = multiplier output

P = P registers

C = C input

A = A input

B = B input

A:B = A concatenated with B

PCIN = Cascaded PCOUT from previous DSP48E slice

Shift (PCIN) = 17-bit shifted PCIN

Shift (P) = 17-bit shifted P

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Other Features
Show Slide 63:

Lessons

Overview
I/O
Block RAMs and FIFO
XtremeDSP Solution Cores
Other Features
Summary

Designing with Virtex-5 FPGA Resources - 64

2008 Xilinx, Inc. All Rights Reserved

Show Slide 64:

Virtex-5 FPGA Tri-Mode


EMAC Description

Second-generation Tri-Mode10/100/1000 Mbps


Ethernet MAC blocks

UNH compliance tested


Four integrated TEMACs in
every Virtex-5 LXT and SXT device
Can be used with the RocketIO
GTP transceivers to build fully
integrated 1000-Base X Interface
Saves programmable logic resources

Full or half duplex

Designing with Virtex-5 FPGA Resources - 65

EMAC
EMAC

EMAC
EMAC

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 53

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Other Features
Key Points
!

Fully integrated 10/100/1000 Mbps Ethernet Media Access


Controller:
The TEMAC supports a configurable full-duplex operation
in 10/100/1000 Mbps. It also supports a configurable halfduplex operation in 10/100 Mbps.
Each one is dedicated in the silicon, so it is proven
technology. These were built from experience with popular
IP from Xilinxthe EMAC from the CORE Generator
software. The Tri-Mode EMAC is IEEE 802.3 compliant.
Originally in the Virtex-4 FPGA, the EMAC block was a part
of the PPC block. But in the Virtex-5 family, it is its own
independent resource, located as part of a block RAM
column. There are two EMAC blockseach EMAC block
has two independent EMACs with a shared host interface.
As always, remember that this is a dedicated resource that,
if not used, will be wasted.
The CORE Generator software provides an example design
that shows a programmable PHY interface and a client side
that connects to the FPGA resources via a FIFO. For more
information, see the Virtex-5 data sheet and the sample
design referenced in the EMAC User Guide.

Page 54

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Other Features
Show Slide 65:

Full-Featured Ethernet
Functionality

IEEE 802.3 compliant


Programmable PHY interface support

Supports VLAN and jumbo frames


Receive address filter
Network traffic monitoring and filtering

Use RocketIO transceiver or


SelectIO technology
MII, GMII, RGMII, SGMII
PCS/PMA for 1000BASE-X

Real-time statistics for TX/RX

Fewer clocks needed than


previous generation
Designing with Virtex-5 FPGA Resources - 65

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The TEMAC is fully featured:


The client side of the embedded Ethernet MAC can be
connected to a Direct Memory Access Controller (DMA
engine). The DMA engine is then connected to the processor
bus, which allows an embedded processor to access the
Ethernet port. The TEMAC also supports a hardwareselectable Device Control Register (DCR) bus or generic host
bus interface.
The client side of the embedded Ethernet MAC is connected
to a FIFO to complete a single Ethernet port. This port is
connected to a switch or routing matrix, which can contain
several ports and be directly connected to the FPGA logic
resources.

www.xilinx.com
1-877-XLX-CLAS

Page 55

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Other Features
Key Points
The CORE Generator software provides an example design
for the embedded Tri-Mode Ethernet MAC in the Virtex-5
FPGA for any of the supported physical interfaces. The
supported PHY interfaces include GMII, MII, RGMII, and
SGMII. These interfaces are implemented inside the FPGA
by using programmable logic; they are not dedicated.
However, the CORE Generator software makes creating
these interfaces relatively easy.
The TEMAC resides in the same column as the dedicated
PCI core.
Show Slide 66:

Virtex-5 FPGA PCI Express


Integrated Endpoint Block

Full featured and compliant to base specification 1.1

Saves FPGA resources

Integrated in all T devices


Adjacent to high-speed serial transceivers
Electrical signaling
Protocol (CRC, automatic retry)
Quality of Service (QoS)
Hot pluggable

CC
FF
GG

PHY
PHY Layer
Layer
Data
Data Layer
Layer
Trans.
Trans. Layer
Layer
Embedded
Embedded
PCI
PCI Core
Core

Supports 1-, 2-, 4-, or 8-lane implementations


Uses transceiver blocks to provide fully integrated PCIe core endpoint
Designing with Virtex-5 FPGA Resources - 66

Page 56

GTP
GTP Transceiver
Transceiver
1,
1, 2,
2, 44 or
or 88 Lanes
Lanes

Meets all key requirements

Highly configurable endpoint solution

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Virtex-5
FPGA

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Other Features
Key Points
!

The PCIe integrated Endpoint block is highly complex and


customizable. The PCIe Wizard is provided to customize and
generate a PCIe standard subsystem via a simple set of menu
options. The PCIe standard subsystem contains the PCIe
integrated Endpoint block, GTP transceiver tiles, block RAMs,
clock module, and a reset module, which are all automatically
configured and connected. The options available in the wizard
determine the correct attribute settings and tie off any
unneeded ports. Selecting the desired options in the wizard
generates a completely customized wrapper.

The PCIe cores are placed in a column of block RAM on the


right side of the die.

Show Slide 67:

GTP Transceiver

Industrys lowest power MGTs

Advanced features and capabilities

Flexible TX and RX equalization

Ease-of-design with new design and


debug tools

Now available in all LXT and SXT devices

Shortening design cycles and reducing


time to market

Enhanced standards support

Covering serial standards between 100


Mbps and 3.2 Gbps
Embedded hard cores: PCI Express core
and Ethernet

Designing with Virtex-5 FPGA Resources - 67

Virtex-5 LXT FPGA die has a


column of GTP transceivers

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

GTP transceivers are placed as dual transceiver GTP_DUAL


tiles in the Virtex-5 LXT devices. This configuration allows two
transceivers to share a single PLL with the TX and RX functions
of both, reducing size and power consumption. The GTP
transceivers are placed on the right edge of the die.
www.xilinx.com
1-877-XLX-CLAS

Page 57

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Other Features
Show Slide 68:

GTP Transceiver Standards


Coverage
Market
Datacom

Telecom

Computing/Communication

Storage

Video

Designing with Virtex-5 FPGA Resources - 68

Standard

Speed
(bits per second per channel)

1G Ethernet

1.25 G

XAUI

3.125 G

10G Base CX-4

3.125 G (x4)

OC-3, OC-12, OC-48 /


SDH STM-1, STM-4, STM-16

155 M, 622 M, 2.488 G

OBSAI

768 M, 1.536 G, 3.072 G

CPRI

614 M,1.228 G, 2.457 G

SFI-5

2.448 - 3.125 G

PCI Express Standard

2.5 G

Serial Rapid IO

3.125 G

InfiniBand

2.5 G

Fibre Channel

1.0625 G, 2.125 G

SATA

1.5 G, 3.0 G

SAS

1.5 G, 3.0 G

SDI

270 M

DVB-ASI

270 M

HD-SDI

1.485 G, 1.4835 G, 2.97 G

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 58

The RocketIO Wizard automatically configures GTP and GTX


transceivers to support one of these protocols or performs
custom configurations.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Other Features
Show Slide 69:

GTX Transceiver

High-performance MGTs

Available in all FXT devices


Additional serial standards up to 6.5 Gbps

PCI Express standard Gen2 (5.0 Gbps)


Interlaken (3.125, 6.25 Gbps)
OIF-CE16G(SR) and (LR) (6.25 Gbps)
FC-4 (4.25 Gbps)
SATA Gen 3 (6.0 Gbps)
SAS Rev 5 (6.0 Gbps)
Serial RapidIO standard (6.25 Gbps)

Advanced features and capabilities

Flexible gearbox to support 64B/66B and 64B/67B encoding

Designing with Virtex-5 FPGA Resources - 69

2008 Xilinx, Inc. All Rights Reserved

Show Slide 70:

GTP and GTX Transceiver


Tool Support

Xilinx standard tools and design flow

ISE software 10.1


RocketIO Wizard in the CORE Generator tool
IBERT in the ChipScope Pro tools

IBERT: Integrated Bit Error Rate Tester

SmartModel simulations on industry-leading platforms


Cadence NC Verilog
Mentor ModelSim
Synopsys VCS

HSPICE models for signal integrity simulation and analysis

Designing with Virtex-5 FPGA Resources - 70

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 59

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Other Features
Show Slide 71:

Integrated PowerPC 440


Processor Core

High performance

>1100 DMIPS @ 550 MHz


7-stage execution pipeline

Third-generation FPGA with the


PowerPC processor

Enhanced CoreConnect bus architecture


Processor Local Bus (PLB v4.6) interface

Designing with Virtex-5 FPGA Resources - 71

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 60

The PowerPC processor is available in FXT platform devices


only.

PowerPC processor development is covered in the Embedded


Systems Development course.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Other Features
Show Slide 72:

Enhanced PowerPC 440


Processor Block

Enhanced CoreConnect bus architecture

128-bit PLB v4.6

Non-blocking crossbar for higher bandwidth and low latency


Dedicated interface for connection to block RAM and external memory
Auto-synchronized for non-integer PLB-to-CPU clock ratios
All IP cores have been updated to support PLB v4.6
MicroBlaze processor v7 also supports PLB v4.6

32-kB level 1 Instruction and data caches

Designing with Virtex-5 FPGA Resources - 72

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 61

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Summary
Show Slide 73:

Lessons

Overview
I/O
Block RAMs and FIFO
XtremeDSP Solution Cores
Other Features
Summary

Designing with Virtex-5 FPGA Resources - 73

2008 Xilinx, Inc. All Rights Reserved

Show Slide 74:

Apply Your Knowledge

4) What is the easiest method for building resources such as I/O,


memory, and DSP48 functions?

Designing with Virtex-5 FPGA Resources - 75

Page 62

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Summary
Show Slide 75:

Summary

All I/O are lower-cap I/O


Center column I/O banks have 20 pins
Each region has one I/O bank with 40 pins
The I/O blocks contain I/O register resources as well as I/OSERDES
The I/OSERDES block provides source-synchronous capabilities utilizing
dedicated resources
The XtremeDSP solution block provides maximum performance and low
power for DSP applications
Block RAMs are now configurable (for smaller memory applications) and
cascadable (for larger memory applications)
The FIFO16 resources are implemented with dedicated FIFO logic and
are also cascadable
Designing with Virtex-5 FPGA Resources - 74

2008 Xilinx, Inc. All Rights Reserved

Where Can I Learn More?


!

Virtex-5 FPGA data sheets

Virtex-5 FPGA user guides


Virtex-5 FPGA User Guide
Virtex-5 FPGA XtremeDSP Design Considerations User Guide

DSP, I/O, block RAM, and FIFO primitives: Software


Documentation Libraries Guide

Virtex-5 FPGA home page


www.xilinx.com/virtex5
Links to everything related to the Virtex-5 FPGA: white
papers, boards, training, data sheets, and user guides

Virtex-5 FPGA memory application notes


Memory interface data capture, DDR-2 controllers, QDR II
SRAM, and DDR SDRAM controller
Application Note XAPP802: Memory Interface Application
Notes Overview

www.xilinx.com
1-877-XLX-CLAS

Page 63

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Summary
Where Can I Learn More?
!

Memory Corner: www.xilinx.com Technology Solutions


Memory
Includes the Memory Interface Generator

Page 64

Software manuals

Xilinx Education Services courses

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing with Virtex-5 FPGA Resources

Apply Your Knowledge Answers


Answers

1) Describe the I/O features of the Virtex-4 FPGA.


!

ILOGIC includes SDR and DDR input register resources


DDR: Added same edge and same edge pipelined

OLOGIC includes SDR and DDR output register resources


DDR: Added same edge

ISERDES includes 1-to-10 serial-to-parallel converter, BITSLIP,


IDELAY
One to up to 10 serial-to-parallel converter utilizing a
master/slave ISERDES pair

OSERDES includes 10-to-1 parallel-to-serial converter


(Up to) 10-to-1 parallel-to-serial converter utilizing a
master/slave OSERDES pair

Answers

2) Compare the following I/O resources in the Virtex-5 FPGA to the Virtex-4
FPGA.
!

Banking

ChipSync technology resources

Electrical
Standards
Banking
Architecture
ChipSync
Technology

Virtex-4 FPGA

Virtex-5 FPGA

>30

>40

(with clock-capable I/O)

(all I/Os same, homogeneous)

64 I/Os per bank

40 I/Os per bank

9 to 17 banks

13 to 35 banks

First generation

Added output delay (ODELAY)

text

www.xilinx.com
1-877-XLX-CLAS

Page 65

Designing with Virtex-5 FPGA Resources

Facilitator Guide

Apply Your Knowledge Answers


Answers

3) Given this OPMODE table, what is the OPMODE for the


following functions?
!

C + A:B
OPMODE = 011 00 11 or 000 11 11

(A x B) + C
OPMODE = 011 01 01

P + C + PCIN
OPMODE = 001 11 10

4) What is the easiest method for building resources such as I/O,


memory, and DSP48 functions?
!

Inference
Basic I/O (single-ended)
Single Block RAMs
Multipliers

Use of CORE Generator and Architecture Wizard software


Larger Block RAM memories
FIFOs
DSP functions, arithmetic functions, MACCs, FIR filters, etc.
(see the High-Precision, High-Bandwidth table on page 50)

ChipSync Wizard
DDR
SERDES

Memory Interface Generator


Memory Controllers
Configure I/O for Memory Interface

Transition to CORE Generator Software System

Page 66

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

CORE Generator Software System

CORE Generator Software System


Purpose

After completing this module, you will be able to:


!

Describe the differences between LogiCORE and


AllianceCORE solutions

Identify two benefits of using cores in your designs

Create customized cores by using the CORE Generator


software system GUI

Instantiate cores into your schematic or HDL design

Run behavioral simulation on a design that contains cores

Time

20 minutes
Process

This module describes the basics of designing with the CORE


Generator software.
Lessons
!

Introduction

Overview

Using the CORE Generator Software


System

CORE Generator Software Design Flows

Summary

www.xilinx.com
1-877-XLX-CLAS

Page 67

CORE Generator Software System

Facilitator Guide

Introduction
Show Slide 76:

CORE Generator Software


System

Show Slide 77:

Objectives
After completing this module, you will be able to:

Describe the differences between LogiCORE and AllianceCORE


solutions
Identify two benefits of using cores in your designs
Create customized cores by using the CORE Generator software
system GUI
Instantiate cores into your schematic or HDL design
Run behavioral simulation on a design that contains cores

CORE Generator Software System - 77

Page 68

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

CORE Generator Software System

Overview
Show Slide 78:

Lessons

Overview
Using the CORE Generator Software
System
CORE Generator Software Design
Flows
Summary

CORE Generator Software System - 78

2008 Xilinx, Inc. All Rights Reserved

Show Slide 79:

What Are Cores?

A core is a ready-made function that you can instantiate into your design
as a black box
Cores can range in complexity

Simple arithmetic operators, such as adders, accumulators, and multipliers


System-level building blocks, such as filters, transforms, and memories
Specialized functions, such as bus interfaces, controllers, and
microprocessors

Some cores can be customized

CORE Generator Software System - 79

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 69

CORE Generator Software System

Facilitator Guide

Overview
Key Points
!

Note: The terms function and core are sometimes used


interchangeably in this module to indicate a design entity
such as a multiplier or Finite Impulse Response (FIR) filter
that the CORE Generator software is able to create.

Intellectual Property (IP) is another term that is often used in


association with cores. Cores are one type of IP.

Show Slide 80:

Benefits of Using Cores

Save design time

Cores are created by expert designers who have in-depth knowledge of


Xilinx FPGA architecture
Guaranteed functionality saves time during simulation

Increase design performance

Cores that contain mapping and placement information have predictable


performance that is constant over device size and utilization
The data sheet for each core provides performance expectations

Use timing constraints to achieve maximum performance

CORE Generator Software System - 80

Page 70

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

CORE Generator Software System

Overview
Show Slide 81:

Types of Cores

LogiCORE solutions

AllianceCORE solutions

CORE Generator Software System - 81

2008 Xilinx, Inc. All Rights Reserved

Show Slide 82:

LogiCORE Solutions

Typically customizable
Fully tested, documented, and supported by Xilinx
Many are pre-placed for predictable timing
Many are unlicensed and provided for free with Xilinx software

More complex LogiCORE solution products are licensed

VHDL and Verilog flow support for several EDA tools


Schematic flow support for most cores

CORE Generator Software System - 82

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 71

CORE Generator Software System

Facilitator Guide

Overview
Show Slide 83:

AllianceCORE Solutions

Point-solution cores

Sold and supported by Xilinx AllianceCORE solution partners

Typically not customizable (some HDL versions are customizable)


Partners can be contacted directly to provide customized cores
A free evaluation version of the module is available

You will need to contact the IP Center for licensing and ordering information

All cores are optimized for Xilinx; some are pre-placed


Typically supplied as an Electronic Design Interchange Format (EDIF)
netlist
VHDL and Verilog flow support; some schematic support

CORE Generator Software System - 83

2008 Xilinx, Inc. All Rights Reserved

Show Slide 84:

Sample Functions

LogiCORE solutions
DSP functions
Time skew buffers, Finite
Impulse Response (FIR)
filters, and correlators
Math functions
Accumulators, adders,
multipliers, integrators, and
square root
Memories
Pipelined delay elements,
single- and dual-port RAM
Synchronous FIFOs
PCI master and slave
interfaces, PCI bridge
CORE Generator Software System - 84

Page 72

AllianceCORE solutions
Peripherals
DMA controllers
Programmable interrupt
controllers
UARTs
Communications and
networking
ATM
Reed-Solomon encoders
and decoders
T1 framers
Standard bus interfaces
PCMCIA, USB

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

CORE Generator Software System

Using the CORE Generator Software System


Show Slide 85:

Lessons

Overview
Using the CORE Generator Software
System
CORE Generator Software Design
Flows
Summary

CORE Generator Software System - 85

2008 Xilinx, Inc. All Rights Reserved

Show Slide 86:

CORE Generator Software


System

A Graphical User Interface (GUI) allows central access to LogiCORE IP


products, as well as

Interfaces with design entry tools

Data sheets
Customizable parameters (available for some cores)
Creates graphical symbols for schematic-based designs
Creates instantiation templates for HDL-based designs

Web Links tab provides access to the Xilinx Website and the IP Center

The IP Center contains new cores to download and install

You always have access to the latest cores

CORE Generator Software System - 86

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 73

CORE Generator Software System

Facilitator Guide

Using the CORE Generator Software System


Key Points
!

The CORE Generator software is a software application that


manages information about each available core, organizes cores
for easy browsing, and (for unlicensed LogiCORE solution
products) creates the actual files needed to integrate a core into
your design.

To view information about AllianceCORE products, visit the IP


Center on the Web at www.xilinx.com/ipcenter.

Show Slide 87:

Invoking the CORE


Generator System

From the Project


Navigator, select
Project New Source
Select IP
(CORE Generator &
Architecture Wizard)
and enter a filename
Click Next and then
select the type of core

CORE Generator Software System - 87

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 74

To learn more about the Architecture Wizard, refer to the


Architecture Wizard and the Floorplan Editor REL module in
the Fundamentals of FPGA Design course.

If you are not using the Project Navigator, enter coregen at a


command prompt (UNIX shell or DOS box).

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

CORE Generator Software System

Using the CORE Generator Software System


TRAINER NOTE

Demo Instructions:
1. To open an existing project: Select File Open Project.
2. Browse to one of the lab project directories.
3. Select an ISE software file and click Open.
4. Follow the instructions in the slide above to open the CORE
Generator software.
5. Enter file name: test_core.

Show Slide 88:

Core Customize Window

version
information

Schematic
Symbol
(unused ports
grayed out)

Customizable
Parameters
spread over
several pages

Data sheet
access

CORE Generator Software System - 88

2008 Xilinx, Inc. All Rights Reserved

TRAINER NOTE

Demo Instructions:
1. Enter parameters for the core you selected.
2. Click Next to show additional pages of parameters.

www.xilinx.com
1-877-XLX-CLAS

Page 75

CORE Generator Software System

Facilitator Guide

Using the CORE Generator Software System


Show Slide 89:

Core Data Sheets


Performance
expectations (not shown)
Features
AlsoFunctionality and
Pinout (next page)

Resource utilization

CORE Generator Software System - 89

2008 Xilinx, Inc. All Rights Reserved

TRAINER NOTE

Demo Instructions:
!

Page 76

In the customize GUI, click the View Data Sheet button.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

CORE Generator Software System

CORE Generator Software Design Flows


Show Slide 90:

Lessons

Overview
Using the CORE Generator Software
System
CORE Generator Software Design
Flows
Summary

CORE Generator Software System - 90

2008 Xilinx, Inc. All Rights Reserved

Show Slide 91:

Schematic Design Flow

Generate a core

Generate Core

.NGC
and
symbol

.xco

Instantiate the symbol onto your


schematic

When a schematic is added to


your design, a symbol is
automatically created
Creates an NGC file and
schematic symbol

Instantiate

Implement

Simulate

Treated as a black boxno


underlying schematic

Proceed with normal schematic


flow
CORE Generator Software System - 91

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 77

CORE Generator Software System

Facilitator Guide

CORE Generator Software Design Flows


Key Points
!

The XCO file is a log of the options used to create the core. You
can use this file to confirm that the correct options were used
during core generation. You can also use this file to create
another core with the same options. This file can also be used in
batch mode.

An NGC file is a Xilinx intermediate file for a core. It is merged


with the other netlists in your design during the Translate
phase. It is a Xilinx proprietary netlist format used to maintain
Xilinx IP.

Show Slide 92:

HDL Design Flow


compxlib.exe
XilinxCoreLib

Generate
Core
.xco
Instantiate

.VHO,
.VEO

.NGC

Core generation
and integration

Implement

Simulate

CORE Generator Software System - 92

.VHD, .V

Compile library for


behavioral simulation
(one time only)

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 78

The next few slides describe each step in the HDL flow in more
detail.

The XCO file is a log of the options used to create the core. You
can use this file to confirm that the correct options were used
during core generation. You can also use this file to create
another core with the same options. This file can also be used in
batch mode.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

CORE Generator Software System

CORE Generator Software Design Flows


Key Points
!

In Project Navigator, the XCO file is automatically added to the


project.

In the Language Template, the instantiation templates will be


added. To see the templates, select Edit Language Templates
or click the Language Templates icon in the horizontal toolbar.

Show Slide 93:

HDL Design Flow


Compile Simulation Library

Before your first behavioral simulation, you must run compxlib.exe to


compile the XilinxCoreLib simulation library

Located in the $XILINX\bin\<platform> directory


Supports Mentor Graphics ModelSim and SpeedWave, Cadence
NC-Verilog, and Synopsys VCS and Scirocco simulation tools

If you download new or updated cores, additional simulation models will


be automatically extracted during installation

CORE Generator Software System - 93

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

If you are using a simulator that is not supported by compxlib


script, refer to the CORE Generator Guide and your simulator
documentation for information on how to compile the
XilinxCoreLib library.

www.xilinx.com
1-877-XLX-CLAS

Page 79

CORE Generator Software System

Facilitator Guide

CORE Generator Software Design Flows


Show Slide 94:

HDL Design Flow


Core Generation and Integration

Generate or purchase a core

Instantiate the core into your HDL source

Netlist file (NGC)


Instantiation template files (VHO or VEO)
Behavioral simulation wrapper files (VHD or V)
Cut and paste from the templates provided in the VEO or VHO file

The design is ready for synthesis and implementation


Use the wrapper files for behavioral simulation

The ISE software automatically uses wrapper files when cores are present
in the design
VHDL: Analyze the wrapper file for each core before analyzing the file that
instantiates the core

CORE Generator Software System - 94

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 80

Instantiation template files provide a template with all of the


correct port declarations for the core.

Simply cut and paste the template into your source file, change
the instance name, if desired, and replace the dummy signal
names with your own signal names.

During synthesis, the core will be treated as a black box. During


the first stage of implementation, the Xilinx tools will read in
the EDIF file that was created by the CORE Generator software
system.

Many VHDL simulators require lower-level files to be analyzed


before the file that references them. Remember to analyze the
wrapper files for your cores before you analyze the file that
references them.

Most Verilog simulators do not have this order dependency.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

CORE Generator Software System

Summary
Show Slide 95:

Lessons

Overview
Using the CORE Generator Software
System
CORE Generator Software Design
Flows
Summary

CORE Generator Software System - 95

2008 Xilinx, Inc. All Rights Reserved

Show Slide 96:

Apply Your Knowledge

1) What is the main difference between LogiCORE and AllianceCORE


solution products?

2) What is the purpose of compxlib.exe?

3) What is the difference between the VHO/VEO files and the VHD/V files
that are created by the CORE Generator software?

CORE Generator Software System - 96

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 81

CORE Generator Software System

Facilitator Guide

Summary
Show Slide 97:

Summary

A core is a ready-made function that you can insert into your design
LogiCORE solution products are sold and supported by Xilinx
AllianceCORE solution products are sold and supported by AllianceCORE
solution partners
Using cores can save design time and provide increased performance
Cores can be used in schematic or HDL design flows

CORE Generator Software System - 97

2008 Xilinx, Inc. All Rights Reserved

Where Can I Learn More?


!

Xilinx IP Center: www.xilinx.com/ipcenter


Software updates
Download new cores as they are released
Get core licensing help
IP evaluation

TRAINER NOTE

Demo Instructions:
1. Open a browser and go to www.xilinx.com/ipcenter.
2. Explore a few of the links on this page to see what is
available.

Page 82

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

CORE Generator Software System

Apply Your Knowledge Answers


Answers

1) What is the main difference between LogiCORE and


AllianceCORE solution products?
!

LogiCORE solution products are sold and supported by Xilinx.

AllianceCORE solution products are sold and supported by


AllianceCORE solution partners.

2) What is the purpose of compxlib.exe?


!

compxlib.exe makes it easy to compile the XilinxCoreLib library


before your first behavioral simulation.

3) What is the difference between the VHO/VEO files and the


VHD/V files that are created by the CORE Generator software?
!

VHO/VEO files contain instantiation templates.

VHD/V files are wrappers for behavioral simulation that


reference the XilinxCoreLib library.

Transition to Lab 1: CORE Generator Software System

www.xilinx.com
1-877-XLX-CLAS

Page 83

Lab 1: CORE Generator Software System

Facilitator Guide

Lab 1: CORE Generator Software


System
Purpose

After completing this lab, you will be able to:


!

Create a custom memory component made of block RAM by


using the CORE Generator tool

Create a custom asynchronous FIFO by using the CORE


Generator tool

Time

30 minutes
Process

This lab illustrates how to build a block RAM memory with the
CORE Generator software.
General Flow

Page 84

Step 1: Build the block RAM memory

Step 2: Build the asynchronous FIFO

Step 3: Implement the design

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Lab 1: CORE Generator Software System

Lab
Designing for Performance Lab Workbook
!

Refer to the separate lab workbook for the CORE Generator


Software System lab.

TRAINER NOTE

Remind students that the lab workbook contains two versions of


each lab: a version with only general instructions and a version
that includes detailed steps following a general instruction. The
labs with only general instructions comprise the first section of the
lab workbook and the detailed versions comprise the second
section.

Transition to Designing Clock Resources

www.xilinx.com
1-877-XLX-CLAS

Page 85

Designing Clock Resources

Facilitator Guide

Designing Clock Resources


Purpose

After completing this module, you will be able to:


!

Specify the resources available in the Clock Management Tile


(CMT)

Describe the basics of the PLL capabilities

Detail the clocking resources available in the Virtex-5 FPGA

Time

45 minutes
Process

This module describes how to design a complete FPGA clocking


scheme.
Lessons

Page 86

Introduction

Overview

Clock Management Tile

Clock Networks

Summary

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Introduction
Show Slide 98:

Designing Clock Resources

Show Slide 99:

Objectives
After completing this module, you will be able to:

Specify the resources available in the Clock Management Tile (CMT)


Describe the basics of the PLL capabilities
Detail the clocking resources available in the Virtex-5 FPGA

Designing Clock Resources - 99

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 87

Designing Clock Resources

Facilitator Guide

Overview
Show Slide 100:

Lessons

Overview
Clock Management Tile
Clock Networks
Summary

Designing Clock Resources - 100

2008 Xilinx, Inc. All Rights Reserved

Show Slide 101:

Virtex-5 FPGA Delivers


Powerful Clock Management

Combination of digital and


analog technology

Optimized clocking resource


mix
Highest performance

Both DCMs and PLLs

PLL can accept an input


clock up to 710 MHz

More than 2x jitter filtering

Simple design creation through


cores
Designing Clock Resources - 101

Page 88

PLL

Up to 550 MHz

DCM

Clock
Buffers

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Select by:

Function
Component

Automatic
HDL code

Facilitator Guide

Designing Clock Resources

Overview
Show Slide 102:

Three Types of Clock


Resources

I/O Column

Global
Global
clocks
clocks

I/O
I/O
clocks
clocks

Clock
Clock region
region height:
height:
20
20 CLBs
CLBs
40
40 I/Os
I/Os (1
(1 bank)
bank)
Clock
Clock region
region width:
width:
One
One half
half the
the chip
chip

Global
Global
Muxes
Muxes

Regional
Regional
clocks
clocks

824
824 clock
clock regions
regions per
per
device
device

Performance matched to
application needs
710-MHz I/O Clocks
710
710-MHz
550-MHz Global Clocks
550
550-MHz
300-MHz Regional Clocks
300
300-MHz

Designing Clock Resources - 102

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 89

Designing Clock Resources

Facilitator Guide

Clock Management Tile


Show Slide 103:

Lessons

Overview
Clock Management Tile
Clock Networks
Summary

Designing Clock Resources - 103

2008 Xilinx, Inc. All Rights Reserved

Show Slide 104:

Virtex-5 FPGA Clock


Management Tile

Up to six CMTs per device

DCM

Fifth-generation, all-digital technology


Provides the most clocking functions
Same functionality as in the Virtex-4 FPGA

PLL

Reduces internal clock jitter


Supports higher jitter on reference clock inputs
Replaces discrete PLLs and Voltage
Controlled Oscillators (VCOs)

PMCD removed

Functionality ported to PLL

Designing Clock Resources - 104

Page 90

CMT

Each with two DCMs and one PLL


No external PWR/GND pins

Powerful combination of
flexibility and precision

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Management Tile


Key Points
!

Using the PLL to implement a PMCD implementation is not a


wise use of the PLL capabilities.

Show Slide 105:

Standard CMT Configurations


Use
Use each
each DCM
DCM
and
and PLL
PLL
individually
individually

InClk 1

DCM

InClk 2

PLL

InClk 3

DCM
DCM

InClk 1

PLL

Filter
Filter DCM
DCM
output
output clock
clock
jitter
jitter

InClk 1
Designing Clock Resources - 105

To Global
Clocks

CMT
To Global
Clocks

Filter
Filter high
high clock
clock jitter
jitter
before
before reaching
reaching the
the
DCM
DCM

CMT

PLL

To Global
Clocks

DCM

CMT

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Dedicated connections exist between the DCM outputs and PLL


inputs, as well as from the PLL outputs to the DCM input.

Note that the grouping of the three elements in each Clock


Management Tile (CMT) is meaningful. Within each CMT,
DCMs and PLLs can be cascaded together through direct local
connections.

There are three options:


Option 1: All three clocking elements (two DCMs and one
PLL) can be used independently.
Option 2: The PLL can be used to filter high input clock jitter
before passing the clock to one or both DCMs for clock
generation functions.

www.xilinx.com
1-877-XLX-CLAS

Page 91

Designing Clock Resources

Facilitator Guide

Clock Management Tile


Key Points
Option 3: The PLL can take a single DCM output clock and
create an ultra-low jitter version for global clock
distribution.
!

It is expected that the second option will be especially useful. In


the past, there have been cases where external PLLs needed to
be used on the PCB board in order to filter the jitter from a
noisy clock source before sending the clock into the FPGA. Now
that function can be pulled inside the FPGA, saving PCB board
space and cost.

Show Slide 106:

CMT General Use Model


Get the Best of Both Worlds
In Order To

Use

Remove clock insertion delay

DCM

Phase shift clocks

DCM

Correct clock duty cycles

DCM

Synthesize Fout = Fin * M/D

DCM or PLL*

Filter clock jitter

PLL

Switch between input clock sources dynamically

PLL

Implement the Virtex-4 FPGA PMCD function

PLL

* See the Virtex-5 FPGA data sheet to evaluate performance trade-offs between DCM and PLL usage

The Virtex-5 FPGA delivers advanced DCM and


Virtex
Virtex-5
PLL technology for superior clocking capability
Designing Clock Resources - 106

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 92

The DCM provides finer resolution for phase shifting of


functions.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Management Tile


Show Slide 107:

DCM Features

Functionally equivalent to the Virtex-4 FPGA

Operate from 19 MHz(1)550 MHz


Remove clock insertion delay

DCM_BASE

CLKIN
CLKFB

Zero delay clock buffer

Correct clock duty cycles


Synthesize FOUT = FIN * M/D

DRP address space is different

RST

CLKO
CLK90
CLK180
CLK270
CLK2X
CLK2X180
CLKDV
CLKFX
CLKFX180
LOCKED

CLKO
CLK90
CLK180
CLK270
Phase
CLK2X
Shift
CLK2X180
CLKDV
DRP
CLKFX
CLKFX180
LOCKED
RST

M, D values up to 32

Additional DCM_ADV features

DCM_ADV

CLKIN
CLKFB

Each DCM can be invoked with either the


DCM_BASE or DCM_ADV primitive

Dynamically phase shift clocks in


increments of period/256 or with direct delay line control
Use the Dynamic Reconfiguration Port (DRP) to adjust parameters without
reconfiguring

Note: As low as 1 MHz for some frequency synthesis


Designing Clock Resources - 107

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

DCM and DRP address mapping in the Virtex-5 FPGA has


changed from the Virtex-4 FPGA. Otherwise, functionality is
the same as in the Virtex-4 FPGA.

www.xilinx.com
1-877-XLX-CLAS

Page 93

Designing Clock Resources

Facilitator Guide

Clock Management Tile


Show Slide 108:

PLL Features

PLL_ADV

Used as a frequency synthesizer and


jitter filter for either external or internal
clocks in conjunction with the DCMs
of the CMT
Operate from 19 MHz(1)550 MHz

Filter clock jitter

Synthesize Fout = Fin * M/(D*O)

Additional PLL_ADV features

CLKIN1
CLKOUT<5:0>
CLKFBOUT
CLKFBIN
RST

LOCKED

Each PLL can be invoked


with either the PLL_BASE
or PLL_ADV primitive
PLL input with >
400-ps jitter

M: 164, D: 152, O: 1128

Designing Clock Resources - 108

RST

LOCKED

2x reduction in input clock jitter

Dynamically switch between clock


sources without global clock buffers
Cascade clocks to and from DCMs
Use the DRP to adjust parameters
without reconfiguring

CLKOUTDCM
CLKIN2
<5:0>
CLKINSEL
CLKFBDCM
REL
DRP

PLL_BASE

Inputs up to 710 MHz


VCO up to 1.1 GHz for more flexible
frequency synthesis

CLKIN1 CLKOUT<5:0>
CLKFBOUT
CLKFBIN

PLL output with <


100-ps jitter

PLL
Example measurement with a 400-MHz clock in a quiet XC5VLX30 device

Port existing PLL designs into the FPGA

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 94

Note: As low as 1 MHz for some frequency synthesis.

Clock switching: Assert reset, switch clocks, deassert reset.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Management Tile


Show Slide 109:

PLL Primitives
PLL_ADV
CLKIN1 CLKOUT<5:0>
CLKFBOUT
CLKIN2
CLKFBIN

PLL_BASE

CLKIN1 CLKOUT<5:0>
CLKFBIN
CLKFBOUT

RST

CLKINSEL
REL

CLKOUTDCM
<5:0>
CLKFBDCM

DADDR(4:0]
DI(15:0)
DWE
DEN
DCLK

LOCKED

RST

Designing Clock Resources - 109

DO(15:0)
DRDY

LOCKED

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

CLKIN1/CLKIN2: Clock inputs to the PLL.

CLKFBIN: Feedback clock input to the PLL. PLL aligns the


CLKIN1/2 signal to the CLKFBIN signal.

CLKINSEL: Controls whether CLKIN1 or CLKIN2 is routed to


the PLL. Asynchronous switching; must hold PLL in reset
during switching.
CLKINSEL = 1 CLKIN1 selected
CLKINSEL = 0 CLKIN2 selected

RST: Asynchronous reset; must release to re-enable the PLL.

DADDR[4:0]: Address select signals for the Dynamic


Reconfiguration Port (DRP); allows dynamic reprogramming of
the PLL.

DI[15:0]: Data input to the DRP.

DWE: Write enable to the DRP.

DEN: Enable signal for the DRP.

DCLK: Clock signal for the DRP.


www.xilinx.com
1-877-XLX-CLAS

Page 95

Designing Clock Resources

Facilitator Guide

Clock Management Tile


Key Points
!

CLKOUT[5:0]: Clock outputs from the PLL. Each one is


individually controllable, but all are based off the same VCO.

CLKFBOUT: PLL feedback output. This signal is used for


configuring how the PLL de-skews.

CLKOUTDCM[5:0]: Specially buffered version of the


CLKOUT[5:0] signals that can be used to connect to the DCM.
Otherwise, identical to CLKOUT[5:0].

CLKFBDCM: Feedback output used for de-skew when the PLL


and DCM are cascaded (PLL2DCM or DCM2PLL). CLKFBDCM
is the same as CLKFBOUT.

LOCKED: Indicates that the PLL has locked onto the reference
clock and is tracking the phase.

DO[15:0]: DRP output signals. This allows the stored PLL


configuration values to be read by an application.

DRDY: READY output indicating the DRP interface is ready for


the next sequence of reads or writes.

Show Slide 110:

PLL Basics
Lock Detect
Lock Monitor

CLKINSEL
CLKIN1
CLKIN2

PFD

CP

LOCKED
LF

VCO

M
CLKFBIN
FVCO = FIN * M / D
FOUT = FVCO / O = FIN * M / D / O
Designing Clock Resources - 110

Page 96

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

8-phase
taps

O0

CLKOUT0

O1

CLKOUT1

O2

CLKOUT2

O3

CLKOUT3

O4

CLKOUT4

O5

CLKOUT5

Facilitator Guide

Designing Clock Resources

Clock Management Tile


Key Points
!

The PLL will multiplex two clock input signals. The clock then
goes into the D counter which is used to divide down the input
clock. At the output of the VCO are eight clocks with differing
phases. All eight of these phase-shifted clocks can feed any of
the six outputs (O0O5).

Each output O can be further used to divide the output clock.


One of the output clocks is used as a feedback clock. One of the
phase-shifted clocks is used to detect alignment of rising edges
of the input clock with the VCO rising edges. This feedback
clock goes through an M counter, which can be used to
multiply the clock frequency.

The final output clock frequencies are determined by the


following calculation: FOUT = FIN * M / D / O where O = the
divide by value at the O0O5 output stage.
Eight phases: 0, 45, 90, 135, 180, 225, 270, 315
D: Programmable counter
PFD: Phase Frequency Detector compares both phase and
frequency of the input (reference) clock (from the D counter)
and the feedback clock (from the M counter). Only the rising
edges are considered because as long as a minimum
High/Low pulse is maintained, the duty cycle is not
important. The PFD is used to generate a signal proportional
to the phase and frequency between the two clocks. This
signal drives the Charge Pump (CP) and Loop Filter (LF) to
generate a reference voltage to the VCO. The PFD produces
an up or down signal to the CP and LF to determine
whether the VCO should operate at a higher or lower
frequency. When VCO operates at too high of a frequency,
the PFD activates a down signal, causing the control voltage
to be reduced and decreasing the VCO operating frequency.
When the VCO operates at too low of a frequency, an up
signal will increase voltage.
Loop filter: The loop determines the dynamic characteristics
of the PLL. The loop-filtered signal controls the VCO. The
loop filter is designed to match the characteristics required
by the application of the PLL in an FPGAprimarily a large
input clock bandwidth and the ability to track and maintain
lock.

www.xilinx.com
1-877-XLX-CLAS

Page 97

Designing Clock Resources

Facilitator Guide

Clock Management Tile


Key Points
VCO: Voltage Controlled Oscillator. The VCO generates
eight output phases. Each output phase can be selected as
the reference clock to the output counters.
O: Output counter. Each of the six counters can be
independently programmed for generating up to six output
clocks, each using a different phase.
M: M counter, which controls the feedback clock of the PLL,
allowing a wide range of frequency synthesis.
Show Slide 111:

PLL Equations
FVCO

Calculating the VCO frequency

FVCO = FIN * M / D
For example

FOUT

FIN = 250 MHz, M = 4, D = 1


FVCO = 250 * 4 / 1 = 1000 MHz
FIN = 87MHz, M = 20, D = 3
FVCO = 87 * 20 / 3 = 580 MHz

Calculating FOUT

FOUT = FVCO / O = FIN * M / D / O


For example

FIN = 250 MHz, M = 4, D = 1, O = 2


As a general rule:
FOUT = 250 * 4 / 1 / 2 = 500 MHz
High FVCO equates to lower jitter and more power
Low FVCO equates to higher jitter but lower power
FIN = 87 MHz, M = 20, D = 3, O = 3
FOUT = 87 * 20 / 3 / 3 = 193.33 MHz

Designing Clock Resources - 111

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 98

At the Phase Frequency Detector (PFD), FIN / D = FVCO / M.

All outputs operate off a common VCO frequency. This puts


constraints on what the output frequencies can be.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Management Tile


Key Points
!

For example:
FIN = 100 MHz, M = 5, D = 1
FVCO = 100 * 5 / 1 = 500 MHz
Possible output clocks are 500 MHz (O = 1), 250 MHz (O =
2), 166.67 MHz (O = 3),

For example:
FIN = 100 MHz, M = 10, D = 1
FVCO = 100 * 10 / 1 = 1000 MHz
Possible output clocks are 1000 MHz (O = 1; too fast for the
clock networks), 500 MHz (O = 2), 333.33 MHz (O = 3),

As a general rule, run VCO frequency as high as possible.


Running the VCO at a higher frequency allows more
frequencies to be synthesized and results in lower jitter. The
tradeoff is increased power consumption.

Show Slide 112:

PLL Counter Attributes

Want FPFD as high as possible

Want FVCO as high as possible

Make D as small as possible

Better jitter performance


Higher power
Larger range of output frequencies available
Caveat
Making M too large can increase jitter
More characterization is required

Determining M & D values*

DMIN = FIN/FPDFMAX
DMAX = FIN/FPDFMIN
MMIN = FVCOMIN/FIN
MMAX = (DMAX * FVCOMAX)/FIN
MIDEAL = (DMIN * FVCOMAX)/FIN

Counter
Counterattributes
attributes

OODivide:
Divide:CLKOUT[0:5]_DIVIDE
CLKOUT[0:5]_DIVIDE=={1128}
{1128}
DDDivide:
Divide:DIVCLK_DIVIDE
DIVCLK_DIVIDE=={152}
{152}

MMMultiply:
Multiply:CLKFBOUT_MULT
CLKFBOUT_MULT=={164}
{164}

*Relevant minimum and maximum numbers are shown in the Key Points section
Designing Clock Resources - 112

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 99

Designing Clock Resources

Facilitator Guide

Clock Management Tile


Key Points

Page 100

More information on attributes can be found in the Appendix.


For the most up-to-date information, go to www.xilinx.com and
refer to the Virtex-5 FPGA data sheet or the Virtex-5 FPGA User
Guide.

FINMIN = 19 MHz

FINMAX = 710 MHz

FPFDMIN = 19 MHz

FPFDMAX = 550 MHz (in 3, 500 in 2, 450 in 1)

FVCOMIN = 400 MHz

FVCOMAX = 1.1 GHz

O, D, and M attributes are all integer values.

Smallest D counter value: DMIN = int (roundup(FIN/FPDFMAX))

Largest D counter value: DMAX = int (rounddown (FIN/FPDFMIN))

Smallest M counter value: MMIN =


int (roundup(FVCOMIN/(FIN/DMIN)))

Largest M counter value: MMAX =


int (rounddown(FVCOMAX/(FIN/DMAX)))

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Management Tile


Show Slide 113:

PLL Attributes

Phase shift

Not all values are available; see the Key Points section for more
information
The higher the VCO frequency, the more options that are available

Directly related to having a higher CLKOUT[0:5]_DIVIDE number

CLKOUT[0:5]_PHASE = {0.0360.0}
Phase shifts that are always possible
0, 45, 90, 135, 180, 225, 270, 315

Duty cycle

The higher the VCO frequency, the more options that are available

Directly related to having a higher CLKOUT[0:5]_DIVIDE number

CLKOUT[0:5]_DUTY_CYCLE = {0.010.99}

Default
= 0.5
*Relevant
minimum
and maximum numbers are shown in the Key Points section

Designing Clock Resources - 113

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The phase shift value is specified as a real valuerepresenting


the degrees of phase shift.

The duty cycle value is specified as a real valuerepresenting a


percentage duty cycle.

Phase shift calculation:


!

Phase shift step size:


phase_step = 360 / (8 * CLKOUTn_DIVIDE)

Maximum phase shift:


If CLKOUTn_DIVIDE <=64, phase_shift_max = 360
If CLKOUTn_DIVIDE > 64, phase_shift_max =
(64 / CLKOUTn_DIVIDE) * 360 + 7 * phase_step

www.xilinx.com
1-877-XLX-CLAS

Page 101

Designing Clock Resources

Facilitator Guide

Clock Management Tile


Key Points

Duty cycle calculation:


!

Minimum duty cycle:


If CLKOUTn_DIVIDE > 64, min_duty_cycle =
(CLKOUTn_DIVIDE 64 / CLKOUTn_DIVIDE)
If CLKOUTn_DIVIDE <= 64, min_duty_cycle =
(1 / CLKOUTn_DIVIDE)

Duty cycle step = (0.5 / CLKOUTn_DIVIDE)

Maximum duty cycle:


If CLKOUTn_DIVIDE > 64, max_duty_cycle =
(64.5 / CLKOUTn_DIVIDE)
If CLKOUTn_DIVIDE <= 64, max_duty_cycle =
((CLKOUTn_DIVIDE 0.5) / CLKOUTn_DIVIDE)

Show Slide 114:

Apply Your Knowledge

1) Given

You need the PLL to do the following

Input clock frequency = 133 MHz, targeting a Virtex-5 LX50 3 FPGA


Output clocks

266 MHz, 0 degrees phase shift, 50 percent duty cycle


266 MHz, 45 degrees phase shift, 50 percent duty cycle
66 MHz, 90 degrees phase shift, 25 percent duty cycle

Specify the optimal settings for the PLL

DIVCLK_DIVIDE =
CLKFBOUT_MULT =
CLKOUT1_PHASE =
CLKOUT2_PHASE =
CLKOUT3_PHASE =

Designing Clock Resources - 114

Page 102

CLKOUT1_DUTY_CYCLE =
CLKOUT2_DUTY_CYCLE =
CLKOUT3_DUTY_CYCLE =

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

CLKOUT1_DIVIDE =
CLKOUT2_DIVIDE =
CLKOUT3_DIVIDE =

Facilitator Guide

Designing Clock Resources

Clock Management Tile


!

FINMIN = 19 MHz

FINMAX = 710 MHz

FPDFMIN = 19 MHz

FPDFMAX = 550 MHz (in 3, 500 in 2, 450 in 1)

FVCOMIN = 400 MHz

FVCOMAX = 1.1 GHz

*********** Workspace ***********

DMIN = FIN/FPDFMAX =

DMAX = FIN/FPDFMIN =

MMIN = FVCOMIN/FIN =

MMAX = (DMAX * FVCOMAX)/FIN =

MIDEAL = (DMIN * FVCOMAX)/FIN =

www.xilinx.com
1-877-XLX-CLAS

Page 103

Designing Clock Resources

Facilitator Guide

Clock Management Tile


Show Slide 115:

PLL Use Example


Frequency Synthesizer and Jitter Filter
IBUFG

BUFG
CLKIN 1

CLKOUT 0
CLKOUT 1

CLKFBIN

CLKOUT 2

RST

CLKOUT 4

This path could


come from
BUFG

CLKOUT 3

CLKOUT 5
CLKFBOUT
LOCKED

Nothing in this
feedback path keys the
software that
INTERNAL feedback is
desired

Use: Used when maintaining the phase relationship between the input
and output clocks is not required
PLL attribute

Compensation = Internal

Designing Clock Resources - 115

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 104

Used when the input clock and output clock do not need to
have any phase relationship; that is, if the PLL is used strictly as
a frequency synthesizer or jitter filter.

The COMPENSATION attribute specifies the PLL phase


compensation for the incoming clock. For example, the
SYSTEM_SYNCHRONOUS setting attempts to compensate all
clock delay for 0 hold time. SOURCE_SYNCHRONOUS is used
when a clock is aligned with data.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Management Tile


Show Slide 116:

PLL Use Example


Clock Network De-Skew
IBUFG

BUFG
CLKIN 1
CLKFBIN
RST

To logic

CLKOUT 0
CLKOUT 1
CLKOUT 2
CLKOUT 3
CLKOUT 4
CLKOUT 5
CLKFBOUT
LOCKED

BUFG

This line could


be used to
clock logic

Use: Used when maintaining the phase relationship between the input
and output clocks is desired
PLL attribute

Compensation = Source Synchronous or System Synchronous

Designing Clock Resources - 116

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

There are two reasons that clock de-skew from a PLL requires
two global clock buffers:

1. The CLKFBOUT feedback path should match the delay on the


CLKOUT path. The global clock networks are balanced clock
networkseach inserts an equal amount of delay.
2. The CLKFBOUT should be used to provide the feedback clock
signal due to this restriction: both input frequencies to the PFD
block must be identical. That is, the CLKIN1 input frequency/D
is equal to the CLKFBIN frequency. For example, FIN/D = FFB
= FVCO/M. Therefore, CLKFBOUT provides a frequency equal
to FIN/D. In most cases, the CLKOUTn signals will not match
the frequency of the FIN/D going into the PFD block of the
PLL.

www.xilinx.com
1-877-XLX-CLAS

Page 105

Designing Clock Resources

Facilitator Guide

Clock Management Tile


Show Slide 117:

PLL Use Example


Zero Delay Buffer
IBUFG

Inside FPGA
CLKIN 1
CLKFBIN

IBUFG

RST

BUFG

OBUF

BUFG

OBUF

CLKOUT 0
CLKOUT 1
CLKOUT 2
CLKOUT 3
CLKOUT 4
CLKOUT 5

Route outside the


part; there will be
a maximum delay
that can be
introduced

CLKFBOUT
LOCKED

Use: Used to create an external clock buffer (clock mirror) when maintaining the
phase relationship between the input and external output clock is desired
PLL attribute

Compensation = External

Designing Clock Resources - 117

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 106

The delay line on the CLKFBOUT trace should match the delay
on the trace for the CLKOUT0 path; that is, the edges should be
aligned.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Management Tile


Show Slide 118:

PLL Use Example


DCM2PLL
BUFG
BUFG

IBUFG
CLKIN

CLK 0

CLKIN 1

CLKOUT 0

CLK 90

CLKIN 2

CLKOUT 1

CLKFBIN

CLK 180

CLKFBIN

CLKOUT 2

RST

CLK 270
CLK 2 X
CLK 2X 180

DCM

RST

CLKOUT 3

CLKINSEL

CLKOUT 4

DADDR [ 4: 0]

CLKOUT 5

CLKDV

DI[ 15: 0]

CLKFX

DWE

CLKOUTDCM 0

CLKFX 180

DEN

CLKOUTDCM 1

DCLK

CLKOUTDCM 2

REL

CLKOUTDCM 3

LOCKED

Designing Clock Resources - 118

PLL Attribute
Compensation = DCM2PLL

CLKFBOUT

CLKOUTDCM 4

Feedback path
CANNOT include both the
DCM and PLL

To Logic

CLKOUTDCM 5
CLKFBDCM

DCM LOCKs first


the PLL LOCKs

LOCKED

PLL

DO[ 15: 0]
DRDY

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

You must burn a BUFG to eliminate delay in DCM. PLL will


not automatically compensate for this. The feedback path
cannot include both the DCM and PLL.

Use: This example combines DCM frequency synthesis (CLK90


or CLKDV, for example, could also be used) and possibly DPS
capabilities (for greater phase shift resolution than PLL
provides) and then utilizes the PLL for jitter filtering. CLK0 can
be used in the design with the delay compensation. There is one
dedicated connection from the DCM to the PLL within the same
CMT tile. If more than one is used, you should use a BUFG to
route them from the DCM to the PLL.

This is just one possible examplethere are many


configurations of using a DCM to drive the PLL.

www.xilinx.com
1-877-XLX-CLAS

Page 107

Designing Clock Resources

Facilitator Guide

Clock Management Tile


Show Slide 119:

PLL Use Example


PLL2DCM
BUFG

IBUFG
CLKIN1

CLKOUT0

CLKIN

CLK 0

CLKIN2

CLKOUT1

CLKFBIN

CLKOUT2

CLKFBIN

CLK 180

RST

CLK 270

RST

CLKOUT3

CLKINSEL

CLKOUT4

CLK2X

DADDR[4:0]

CLKOUT5

CLK 2X 180

DI[ 15:0]

To logic

CLK 90

CLKFBOUT

CLKDV

DWE

CLKOUTDCM0

CLKFX

DEN

CLKOUTDCM1

CLKFX 180

DCLK

CLKOUTDCM2

REL

CLKOUTDCM3

DCM

LOCKED

CLKOUTDCM4
CLKOUTDCM5
CLKFBDCM

PLL Attribute
Compensation = PLL2DCM

LOCKED
DO[ 15:0]

PLL

DRDY

Use the PLL to filter reference clock jitter


before going to the DCM
Designing Clock Resources - 119

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 108

In this example, the PLL is used to filter clock jitter prior to


forwarding it to the DCM.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Networks
Show Slide 120:

Lessons

Designing Clock Resources - 120

Overview
Clock Management Tile
Clock Networks
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 121:

Virtex-5 FPGA Clock


Regions and I/O Banks
Four diff or singleended BUFIOs
All clock regions are 20
CLBs tall versus 16 in
the Virtex-4 FPGA

Clock regions
match I/O banks
40 I/Os per bank and
clock region

Clock regions
span one half the die
4 RCLKs per region
2 BUFRs per region
Designing Clock Resources - 121

10 GCLKs per region

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 109

Designing Clock Resources

Facilitator Guide

Clock Networks
Key Points
!

There are four BUFIOs per clock region. BUFIOs can no longer
span regions, which is the reason for the increase in the number
per region. They are still implemented differentially.

There are two BUFRs per clock region. However, there are now
four regional clock tracks, allowing the BUFRs in vertically
adjacent regions to drive the other two or all four.

Clock regions are slightly larger, but now also match the I/O
banks. The I/O banks in the Virtex-4 FPGA crossed two clock
regions.

You now have access to 10 global clocks per region (versus


eight in the Virtex-4 FPGA).

Show Slide 122:

Virtex-5 FPGA Global


Clocking
10
10 global
global clocks
clocks per
per
region
region (full
(full crossbar)
crossbar)

CMT
CMT
CMT
CMT
CMT
CMT

Global
Global
Muxes
Muxes

IBUFGs
IBUFGs

Global resources
for all devices
20 global clock inputs
32 global clock multiplexers
2 or 6 CMTs

IBUFGs
IBUFGs

CMT
CMT
CMT
CMT
CMT
CMT

Designing Clock Resources - 122

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 110

BUFGCTRL is the same as in the Virtex-4 FPGA.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Networks
Show Slide 123:

Global Clocking Features


Global Clock Inputs
(IBUFG or IBUFGDS)
Flexibility

20 total

Performance

Global Clock Multiplexers


(BUFGCTRL)

20 differential (40 pins) or


20 single-ended (20 pins)

32 total
Optional clock enable
Guaranteed glitch-less
switching
Also use as an asynchronous
multiplexer

Span two I/O banks

Up to 550 MHz
Differential for maximum performance
High fanout (access to all clock loads in the FPGA)
Low skew
Short clock insertion delay

Designing Clock Resources - 123

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

In the Virtex-4 FPGA, GCLK pins spanned four I/O banks. In


the Virtex-5 FPGA, they only span two banksa lower and
upper bank.

www.xilinx.com
1-877-XLX-CLAS

Page 111

Designing Clock Resources

Facilitator Guide

Clock Networks
Show Slide 124:

Virtex-5 FPGA I/O Clocking


I/O Column
Per
Per region:
region:
Four
Four clock-capable
clock-capable I/Os
I/Os
Four
Four I/O
I/O clock
clock buffers
buffers
Four
Four I/O
I/O clock
clock nets
nets
BUFIOs
cannot drive
drive IOCLK
IOCLK
BUFIOs cannot
track
track in
in adjacent
adjacent region
region

Clock-Capable I/O
I/O Clock Buffer (BUFIO)
I/O Clock Net (IOCLK)

Ideal for sourcesourcesynchronous


interfaces
Designing Clock Resources - 124

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 112

The four BUFIOs in each clock region can no longer drive an


IOCLK net in vertically adjacent regions. This was done to
allow the IOCLK track to run up to 710 MHz internally.
(Crossing into another region caused too much delay.)

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Networks
Show Slide 125:

Virtex-5 FPGA Regional


Clocking
Per
Per region:
region:
Four
Four clock-capable
clock-capable I/Os
I/Os
Two
Two regional
regional clock buffers
buffers
Four
Four regional
regional clock
clock nets
nets

2
2

Clock-capable I/O

2
2

Regional Clock Buffer (BUFR)

Regional Clock Net (RCLK)

2
2

Easily create many clock


domains per FPGA

4
2

Designing Clock Resources - 125

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The two regional clock buffers can be used to drive any of the
four regional clock nets in an adjacent region. This approach
allows more flexibility for regional clocks than in the Virtex-4
FPGA, which had only two regional clock buffers (BUFR) and
two regional clock tracks (RCLK) per clock region.

300-MHz performance is achieved in the highest speed grade.

BUFR or regional clocking supports divide by 1, 2, 3, 4, 5, 6, 7,


or 8 to support ISERDES.

www.xilinx.com
1-877-XLX-CLAS

Page 113

Designing Clock Resources

Facilitator Guide

Clock Networks
Show Slide 126:

I/O and Regional Clocking


Features
ClockClock-Capable I/Os
Flexibility

Exist in all I/O columns


Four CCIOs per region

Four differential (8 pins) or


Four single-ended (4 pins)

Adjacent to HCLK row

Two CCIOs above and


Two CCIOs below

I/O Clocks
(BUFIO IOCLK)
Exist in all I/O
columns
Four BUFIO
drivers per
region
Four IOCLKs per
region
Span single
region

Performance

710-MHz differential

Designing Clock Resources - 126

710-MHz
differential

Regional Clocks
(BUFR RCLK)
Exist in non-center
I/O columns
Two BUFR drivers
per region
Four RCLKs per
region
Span up to three
regions (one above
and below)
Clock divider range
from 1 to 8
300 MHz

2008 Xilinx, Inc. All Rights Reserved

Show Slide 127:

Use

PLL and DCMs must be instantiated

Direct primitive instantiation

IP (CORE Generator & Architecture Wizard)

Attributes placed in the netlist; can include global clock buffers

BUFGs can be inferred

Xilinx suggests that you instantiate all clock resources

Place attributes in the UCF or HDL

Direct primitive instantiation


Place attributes in the UCF or HDL
IP (CORE Generator Tool & Architecture Wizard)
Attributes placed in the netlist

BUFIOs and BUFRs must be instantiated

Currently, there is no support for regional clocking resources in the


Architecture Wizard

Designing Clock Resources - 127

Page 114

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Networks
Key Points
!

In the future, creating and customizing regional clocking


resources will be possible.

Show Slide 128:

Clock Wizard

Choose
Choose function
function
Optimal
Optimal DCM/PLL
DCM/PLL flow
flow
automatically
automatically selected
selected
- or -

Choose
Choose component
component
Program
Program as
as desired
desired

Designing Clock Resources - 128

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 115

Designing Clock Resources

Facilitator Guide

Clock Networks
Hidden Slide 129:

Select Your Options


Xilinx Clocking Wizard
Use
Use the
the GUI
GUI to
to
instantiate
instantiate and
and
program
program your
your
clocking
clocking
components
components

Wizard generates
ready-toready
to-use VHDL
ready-to-use
or Verilog

Designing Clock Resources - 129

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The Xilinx Clocking Wizard automates setting of attributes for


the primitive.

TRAINER NOTE

This slide (hidden in the PowerPoint presentation) is a screenshot


of the Clocking Wizard.

Page 116

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Clock Networks
Hidden Slide 130:

BUFR and BUFIO


Instantiation
VHDL
VHDL
Library
LibraryUNISIM;
UNISIM;
use
useUNISIM.vcomponents.all;
UNISIM.vcomponents.all;
. .. .. .
component
componentBUFIO
BUFIO
port
I I : :ininstd_logic;
port( (
std_logic;
OO: :out
outstd_logic);
std_logic);
component
BUFR
component BUFR
generic
BUFR_DIVIDE
generic( (
BUFR_DIVIDE: :string);
string);
port
I I : :ininstd_logic;
port( (
std_logic;
CE
CE : :ininstd_logic;
std_logic;
CLR
CLR: :ininstd_logic;
std_logic;
OO : :out
outstd_logic);
std_logic);
. .. .. .
BUFIO_inst
:
BUFIO
BUFIO_inst : BUFIO
port
I I=>
portmap
map( (
=>input_clk,
input_clk,
OO=>
=>clk_bufio);
clk_bufio);
BUFR_inst
BUFR_inst: :BUFR
BUFR
generic
genericmap
map(BUFR_DIVIDE
(BUFR_DIVIDE=>
=>BYPASS)
BYPASS)
port
I I=>
portmap
map( (
=>clk_bufio,
clk_bufio,
CE
=>
clk_enable,
CE => clk_enable,
CLR
CLR=>
=>async_rst,
async_rst,
OO=>
=>clk_bufr);
clk_bufr);

Designing Clock Resources - 130

Verilog
Verilog
BUFIO
BUFIObufio_inst
bufio_inst
(.I(input_clk),
(.I(input_clk),
.O(clk_bufio));
.O(clk_bufio));
BUFR
BUFRbufr_inst
bufr_inst
(.I(clk_bufio),
(.I(clk_bufio),
.CE(clock_enable),
.CE(clock_enable),
.CLR(async_rst),
.CLR(async_rst),
.O(clk_bufr));
.O(clk_bufr));
////"BYPASS",
"BYPASS","1",
"1","2",
"2","3",
"3","4",
"4","5",
"5","6",
"6","7",
"7","8"
"8"
defparam
defparambufr_inst.BUFR_DIVIDE
bufr_inst.BUFR_DIVIDE=="BYPASS";
"BYPASS";

2008 Xilinx, Inc. All Rights Reserved

TRAINER NOTE

This slide (hidden in the PowerPoint presentation) shows HDL


examples of clock buffer instantiation.

www.xilinx.com
1-877-XLX-CLAS

Page 117

Designing Clock Resources

Facilitator Guide

Summary
Show Slide 131:

Lessons

Designing Clock Resources - 131

Overview
Clock Management Tile
Clock Networks
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 132:

Apply Your Knowledge

2) Compare the following resources in the Virtex-5 FPGA to the Virtex-4


FPGA

Clock region size


Global clock inputs
Number of global clock buffers
Number of global clock buffers per region
Clock-capable inputs per clock region
BUFIO buffers per region
I/O clock nets per region
I/O clock region span
BUFR buffers per region
Regional clock (RCLK) nets per region
Regional clock span

Designing Clock Resources - 132

Page 118

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Summary
Show Slide 133:

Apply Your Knowledge

3) To perform the following, which should you use: the DCM or PLL?

Remove clock insertion delay


Phase shift clocks
Correct clock duty cycles
Synthesize frequency
Filter clock jitter
Switch between input clocks dynamically
Implement Virtex-4 FPGA PMCD functionality

Designing Clock Resources - 133

2008 Xilinx, Inc. All Rights Reserved

Show Slide 134:

Summary

The new Clock Management Tile (CMT) includes two DCMs and one PLL
The new PLL includes filter jittering and frequency synthesis capabilities
Clock region = 20 CLBs, 40 IOBs, and 1 I/O bank
Twenty global input clock buffers (differential)
Thirty-two global clock buffers (differential)
Ten global clocks per region
Four BUFIOs per region (differential); BUFIO cannot drive into adjacent
regions
Two BUFRs per region; can drive into adjacent regions
Four regional clock tracks per region

Designing Clock Resources - 134

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 119

Designing Clock Resources

Facilitator Guide

Summary
Where Can I Learn More?
!

Virtex-5 FPGA data sheets

Virtex-5 FPGA user guides


Virtex-5 FPGA User Guide
Virtex-5 FPGA XtremeDSP Design Considerations User Guide
Virtex-5 FPGA Configuration User Guide
Virtex-5 FPGA Packaging and Pinout Specification

Virtex-5 FPGA home page


www.xilinx.com/virtex5
Links to everything related to the Virtex-5 FPGA: white
papers, boards, training, data sheets, and user guides

Page 120

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Apply Your Knowledge Answers


Answers

1) Given:
!

Input clock frequency = 133 MHz, targeting a Virtex-5 LX50 3


FPGA

You need the PLL to do the following:


!

Output clocks
266 MHz, 0 degrees phase shift, 50 percent duty cycle
266 MHz, 45 degrees phase shift, 50 percent duty cycle
66 MHz, 90 degrees phase shift, 25 percent duty cycle

Specify the optimal settings for the PLL:


!

DIVCLK_DIVIDE = 1

CLKFBOUT_MULT = 8

CLKOUT1_PHASE = 0.0

CLKOUT1_DUTY_CYCLE = 0.5

CLKOUT1_DIVIDE = 4

CLKOUT2_PHASE = 45.0

CLKOUT2_DUTY_CYCLE = 0.5

CLKOUT2_DIVIDE = 4

CLKOUT3_PHASE = 90.0

CLKOUT3_DUTY_CYCLE = 0.25

CLKOUT3_DIVIDE = 16

www.xilinx.com
1-877-XLX-CLAS

Page 121

Designing Clock Resources

Facilitator Guide

Apply Your Knowledge Answers


!

FINMIN = 19 MHz

FINMAX = 710 MHz

FPDFMIN = 19 MHz

FPDFMAX = 550 MHz (in 3, 500 in 2, 450 in 1)

FVCOMIN = 400 MHz

FVCOMAX = 1.1 GHz

*********** Workspace ***********

DMIN = FIN/FPDFMAX = 133/550 = .241; minimum D value is 1

DMAX = FIN/FPDFMIN = 133/19 = 7

MMIN = FVCOMIN/FIN = 400/133 = 3

MMAX = (DMAX * FVCOMAX)/FIN = (7 * 1100)/133 = 57.9, truncated to 57

MIDEAL = (DMIN * FVCOMAX)/FIN =(1 * 1100)/133 = 8.27, truncated to 8

Page 122

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Designing Clock Resources

Apply Your Knowledge Answers


Answers

2) Compare the following resources in the Virtex-5 FPGA to the Virtex-4 FPGA.
Virtex-4 FPGA

Virtex-5 FPGA

Clock Region Size

16 CLBs, 32 I/Os ( I/O


bank)

20 CLBs, 40 I/Os (full I/O


bank)

Global Clock Inputs

Up to 32 differential (64
pins)
or 32 single-ended (32
pins)

Up to 20 differential (40
pins)
or 20 single-ended (20
pins)

Global Clock Buffers

32

32

Global Clock Buffers per


Region

10

Clock-Capable Inputs
per Region

BUFIO Buffers per


Region

I/O Clock Nets per


Region

3 regions (1 above and


below)

1 region

BUFR Buffers per


Region

Regional Clock Nets per


Region

3 regions (1 above and


below)

3 regions (1 above and


below)

BUFIO Clock Region


Span

Regional Clock Span


text

www.xilinx.com
1-877-XLX-CLAS

Page 123

Designing Clock Resources

Facilitator Guide

Apply Your Knowledge Answers


Answers

3) To perform the following, which should you use: the DCM or PLL?
In Order To

Use

Remove clock insertion delay

DCM

Phase shift clocks

DCM

Correct clock duty cycles

DCM
DCM or PLL*

Synthesize FOUT = FIN * M/D


Filter clock jitter

PLL

Switch between input clock sources dynamically

PLL

Implement Virtex-4 FPGA PMCD function

PLL

The DCM provides finer resolution for phase shifting of functions.


* See the Virtex-5 FPGA data sheet to evaluate performance trade-offs between
DCM and PLL usage.
Transition to Lab 2: Designing Clock Resources

Page 124

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Lab 2: Designing Clock Resources

Lab 2: Designing Clock Resources


Purpose

After completing this lab, you will be able to:


!

Customize the DCM components by using the Clocking Wizard

Connect the global clock buffers to the DCM outputs by using


the Clocking Wizard

Time

40 minutes
Process

This lab illustrates how to build a multiple clock system with the
ISE Architecture Wizard tool.
General Flow
!

Step 1: Create the DCM_divider_V5 core

Step 2: Create the DCM_divide_and_phase_shift_V5 core

Step 3: Implement the design

www.xilinx.com
1-877-XLX-CLAS

Page 125

Lab 2: Designing Clock Resources

Facilitator Guide

Lab
Designing for Performance Lab Workbook
!

Refer to the separate lab workbook for the Designing Clock


Resources lab.

Transition to FPGA Design Techniques

Page 126

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

FPGA Design Techniques


Purpose

After completing this module, you will be able to:


!

Increase design performance by duplicating flip-flops

Increase design performance by adding pipeline stages

Increase board performance by using I/O flip-flops

Build reliable synchronization circuits

Time

40 minutes
Process

This module describes how to build a reliable and fast FPGA


design.
Lessons
!

Introduction

Duplicating Flip-Flops

Pipelining

I/O Flip-Flops

Synchronization Circuits

Summary

www.xilinx.com
1-877-XLX-CLAS

Page 127

FPGA Design Techniques

Facilitator Guide

Introduction
Show Slide 135:

FPGA Design Techniques

Show Slide 136:

Objectives
After completing this module, you will be able to:

Increase design performance by duplicating flip-flops


Increase design performance by adding pipeline stages
Increase board performance by using I/O flip-flops
Build reliable synchronization circuits

FPGA Design Techniques - 136

Page 128

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Duplicating Flip-Flops
Show Slide 137:

Lessons

FPGA Design Techniques - 137

Duplicating Flip-Flops
Pipelining
I/O Flip-Flops
Synchronization Circuits
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 138:

Duplicating Flip-Flops

High-fanout nets can be slow and


hard to route
Duplicating flip-flops can fix both
problems

Reduced fanout shortens net


delays
Each flip-flop can fanout to a
different physical region of the chip
to reduce routing congestion

Design trade-offs

Gain routability and performance


Increase design area
Increase fanout of other nets

FPGA Design Techniques - 138

fn1

fn1

fn1

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 129

FPGA Design Techniques

Facilitator Guide

Duplicating Flip-Flops
Show Slide 139:

Duplicating Flip-Flops
Example

The source flip-flop drives two


register banks that are constrained
to different regions of the chip
The source flip-flop and pad are
not constrained
PERIOD = 5 ns timing constraint
Implemented with default options
Longest path = 6.806 ns

Fails to meet timing constraint

FPGA Design Techniques - 139

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 130

In this simple design, the source flip-flop is trapped between


the two sets of loads. Moving the source flip-flop closer to one
register moves it farther away from the other register. The
overall result is that timing cannot be met.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Duplicating Flip-Flops
Show Slide 140:

Duplicating Flip-Flops
Example

The source flip-flop has been


duplicated
Each flip-flop drives a region of
the chip

Each flip-flop can be placed


closer to the register that it is
driving
Shorter routing delays

Longest path = 4.666 ns

Meets timing constraint

FPGA Design Techniques - 140

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

By duplicating the source flip-flop, the tools are able to move


each flip-flop closer to its set of loads.

The trade-off is that the paths from the input pad to the
duplicated flip-flops are increased. This design does not contain
an OFFSET IN constraint. If you have an OFFSET IN
requirement, you must consider how much slack you have on
the OFFSET before deciding to duplicate the flip-flop.

You can also consider duplicating the input pad that is


connected to the flip-flops. This method allows the
implementation tools to keep the input setup time short, while
improving the internal clock frequency. The trade-off is that an
additional I/O pin is used, and you must route the external
signal to two I/O pins.

www.xilinx.com
1-877-XLX-CLAS

Page 131

FPGA Design Techniques

Facilitator Guide

Duplicating Flip-Flops
Show Slide 141:

Tips on Duplicating Flip-Flops

Name duplicated flip-flops _a, _b; NOT _1, _2

Numbered flip-flops are mapped into the same slice by default


Duplicated flip-flops should be separated

Most synthesis tools have automatic fanout-control features

However, they do not always pick the best division of loads


Also, duplicated flip-flops will be named _1, _2

Many synthesis tools will optimize-out duplicated flip-flops

Especially if the loads are spread across the chip

Explicitly create duplicate flip-flops in your HDL code

Set your synthesis tool to keep redundant logic

Do not duplicate flip-flops that are sourced by asynchronous signals

Synchronize the signal first


Feed the synchronized signal to multiple flip-flops

FPGA Design Techniques - 141

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 132

If duplicated flip-flops are named numerically (for example,


signal_rep0 and signal_rep1), the implementation tools see this
as a bus, and the flip-flops are mapped into the same slice. If
this happens, routing congestion will still be a problem.

You can work around this problem by using the timing-driven


packing option, which is covered in the Advanced
Implementation Options module.

Synchronize the signal first: Synchronization circuits will be


discussed later in this module.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Pipelining
Show Slide 142:

Lessons
Duplicating Flip-Flops
Pipelining
I/O Flip-Flops
Synchronization Circuits
Summary

2008 Xilinx, Inc. All Rights Reserved

FPGA Design Techniques - 142

Show Slide 143:

Pipelining Concept
fMAX =
n MHz

fMAX
2n MHz

FPGA Design Techniques - 143

two logic levels

one
level

one
level

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 133

FPGA Design Techniques

Facilitator Guide

Pipelining
Key Points
!

Inserting flip-flops into a datapath is called pipelining.

Pipelining increases performance by reducing the number of


logic levels (LUTs) between flip-flops.

All Xilinx FPGA device families support pipelining. The basic


slice structure is a logic level (four-input LUT) followed by a
flip-flop.

Adding a pipeline stage, as shown in this example, will not


exactly double fMAX. The flip-flop that is added to the circuit
has an input setup time and a clock-to-Q time that make the
pipelined circuit run at less than double the original frequency.

You will see a more detailed example of increasing


performance by pipelining later in this lesson.

Show Slide 144:

Pipelining Considerations

Are enough flip-flops available?

Are there multiple logic levels between flip-flops?

Refer to the Synthesis or MAP Report


In general, you will not run out of flip-flops (except for the Virtex-5 FPGA)
If there is only one logic level between flip-flops, pipelining will not improve
performance
Refer to the Post-Map Static Timing Report or Post-Place & Route Static
Timing Report

Can the system tolerate latency?

FPGA Design Techniques - 144

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 134

Available flip-flops: The Design Summary section of the MAP


Report contains resource utilization information. In most cases,
you will have enough flip-flops available to add pipeline stages.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Pipelining
Key Points
!

Logic levels: Timing reports show which paths are the longest
and how many logic levels are in each path. Look at the
detailed path analysis section of the report and count the
number of look-up table delays (Tilo) to determine the number
of logic levels in the path.

Show Slide 145:

Latency in Pipelines

Each pipeline stage


adds one clock cycle
of delay before the
first output will be
available

Also called filling


the pipeline

After the pipeline is


filled, a new output
is available every
clock cycle

FPGA Design Techniques - 145

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

This is an example of a pipelined circuit. Follow the data as it


goes through the multiplication and then the addition.

Latency in pipelines can be visualized as a factory assembly


line. Raw materials enter the assembly line and then go through
several stations. At each station, one production step is
performed. When you start the assembly line, you have to wait
before the first finished product is produced. After that waiting
period, a constant stream of products comes off the assembly
line.

www.xilinx.com
1-877-XLX-CLAS

Page 135

FPGA Design Techniques

Facilitator Guide

Pipelining
Show Slide 146:

Pipelining Example

Original circuit

Two logic levels between SOURCE_FFS and DEST_FF


fMAX = ~233 MHz
LUT
D

LUT

LUT

SOURCE_FFS
DEST_FF
LUT

FPGA Design Techniques - 146

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

This path has two logic levels between SOURCE_FFS and


DEST_FF. The first logic level is the column of three LUTs in
parallel. The second logic level is the single LUT on the right.

The delays in this path are:


SOURCE_FFS clock-to-Q
Net delay
LUT delay
Net delay
LUT delay
DEST_FF setup time

Page 136

Estimating no clock skew and routing delays to be


approximately 1.5 ns each, this circuit could run at about 233
MHz in a Virtex-5 device (slowest speed grade).

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Pipelining
Show Slide 147:

Pipelining Example

Pipelined circuit

One logic level between each set of flip-flops


fMAX = ~385 MHz
LUT

LUT

LUT

LUT

SOURCE_FFS
DEST_FF
PIPE_FFS

FPGA Design Techniques - 147

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

After adding a pipeline stage, the circuit has been split into two
paths. The first path is from SOURCE_FFS through one logic
level to PIPE_FFS. The second path is from PIPE_FFS through
one logic level to DEST_FF.

The delays in either path are:


Starting FF clock-to-Q
Net delay
LUT delay
Ending FF setup time

Estimating each routing delay to be approximately 1.5 ns, as in


the original circuit, this circuit could run at about 385 MHz (a 65
percent increase).

www.xilinx.com
1-877-XLX-CLAS

Page 137

FPGA Design Techniques

Facilitator Guide

Pipelining
Show Slide 148:

Apply Your Knowledge

1) Given the original circuit, what is wrong with the pipelined circuit?
2) How can the problem be corrected?

FPGA Design Techniques - 148

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

This simple circuit is a multiplexed adder where the SELECT


signal determines whether the output is (A + B) or (C + D).

The designer decides to add a pipeline stage to the circuit to


improve performance, but something is not quite right. What is
the correct way to pipeline this circuit?

Original Circuit

Page 138

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Pipelining
Pipelined Circuit

Show Slide 149:

Answers

1) What is wrong with the


pipelined circuit?

Latency mismatch
Older data is mixed with
newer data
Circuit output is incorrect

2) How can the problem be


corrected?

Add a flip-flop on SELECT


All data inputs now experience the same amount of latency

FPGA Design Techniques - 149

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 139

FPGA Design Techniques

Facilitator Guide

Pipelining
Answer

Page 140

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

I/O Flip-Flops
Show Slide 150:

Lessons

FPGA Design Techniques - 150

Duplicating Flip-Flops
Pipelining
I/O Flip-Flops
Synchronization Circuits
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 151:

I/O Flip-Flop Overview

Each IOB tile in the Virtex-5 FPGA contains flip-flops

Located in the ILOGIC and OLOGIC blocks


Single data rate or double data rate support
SERDES support

I/O flip-flops provide guaranteed setup, hold, and clock-to-out times when
the clock signal comes from a BUFG

FPGA Design Techniques - 151

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 141

FPGA Design Techniques

Facilitator Guide

I/O Flip-Flops
Key Points
!

Spartan-3 FPGA I/O blocks contain two registers on the


input, output, and output 3-state enable to support single and
double data rate.

Show Slide 152:

Accessing I/O Flip-Flops

During synthesis

Timing-driven synthesis can force flip-flops into Input/Output Blocks (IOBs)


Some tools support attributes or synthesis directives to mark flip-flops for
placement in an IOB

Xilinx Constraint Editor

Select the Misc tab and specify registers that should be placed into IOBs

You need to know the instance name for each register

During the MAP phase of implementation


In the Map Properties dialog box, the Pack I/O Registers/Latches into IOBs
option is selected by default
Timing-driven packing will also move registers into IOBs for critical paths

Check the MAP Report to confirm that IOB flip-flops have been used

IOB Properties section

FPGA Design Techniques - 152

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Refer to your synthesis tool documentation for details on how


to access IOB flip-flops.

The Constraints Editor Misc tab is covered in the Path-Specific


Timing Constraints module.

Timing-driven packing is covered in the Advanced


Implementation Options module.

The ChipSync Wizard also accesses I/O flip-flops.

The following I/O flip-flop resources must be instantiated:


IDDR flip-flops using SAME_EDGE or
SAME_EDGE_PIPELINED mode
ODDR flip-flops
ISERDES and OSERDES components

Page 142

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Synchronization Circuits
Show Slide 153:

Lessons

FPGA Design Techniques - 153

Duplicating Flip-Flops
Pipelining
I/O Flip-Flops
Synchronization Circuits
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 154:

Synchronization Circuits

What is a synchronization circuit?

Why do you need synchronization circuits?

Captures an asynchronous input signal and outputs it on a clock edge


To prevent setup and hold time violations
To ensure a more reliable design

When do you need synchronization circuits?

Signals cross between unrelated clock domains

Chip inputs that are asynchronous

Between related clock domains, relative PERIOD constraints are sufficient

FPGA Design Techniques - 154

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 143

FPGA Design Techniques

Facilitator Guide

Synchronization Circuits
Key Points
!

For clock domains that have a clearly defined and constant


phase relationship, proper timing constraints ensure that there
are no setup or hold time violations. For more information on
constraining paths between clock domains, refer to the PathSpecific Timing Constraints module.

Show Slide 155:

Setup and Hold


Time Violations

Violations occur when


the flip-flop input changes
too close to a clock edge
Three possible results

Flip-flop clocks in an old


data value
Flip-flop clocks in a new
data value
Flip-flop output becomes
metastable

FPGA Design Techniques - 155

Page 144

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Synchronization Circuits
Show Slide 156:

Metastability

Flip-flop output enters a transitory state

Neither a valid 0 nor a valid 1

Remains in this state for an unpredictable length of time before settling to a


valid 0 or 1

Due to a statistical nature, the occurrence of metastable events can only


be reduced, not eliminated
Mean Time Between Failure (MTBF) is exponentially related to the length
of time the flip-flop is given to recover

Can be interpreted as 0 by some loads and as 1 by others

A few extra ns of recovery time can dramatically reduce the chances of a


metastable event

The circuits shown in this section allow maximum time for metastable
recovery
FPGA Design Techniques - 156

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

When a signal is at a metastable value, it can be interpreted as a


logic 0 by some parts of the circuit and as a logic 1 by other
parts. This inconsistency will never be shown during
simulation and can be very hard to track down during boardlevel testing.

When the flip-flop leaves the metastable state, it can go to either


a logic 0 or 1. There is no known correct state when the flipflop input changes so close to the clock edge. Still, having an
incorrect value propagating through your circuit is better
than having a metastable value.

<recovery time> = <time before the data is used> - <datapath


delay>

Example: If CLK1 has a period of 50 ns, and the datapath delay


is 45 ns, then the recovery time of FF1 is 50 to 45 = 5 ns.

www.xilinx.com
1-877-XLX-CLAS

Page 145

FPGA Design Techniques

Facilitator Guide

Synchronization Circuits
Metastability

FF

FF

CLK1
Show Slide 157:

Synchronization Circuit 1

Use when input pulses will always be at least one clock period wide
The extra flip-flops guard against metastability
Guards against metastability
Asynchronous input

FF1

Synchronized signal

FF2

CLK

FPGA Design Techniques - 157

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 146

This circuit is a simple 2-bit shift register.

The recovery time for FF1 is: <CLK period> <datapath delay>

<datapath delay> = <FF1 CLK-to-Q> + net delay + <FF2 setup>

If the flip-flops are placed in the same slice, the net will use a
fast-feedback routing connection to give FF1 the maximum
possible recovery time.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Synchronization Circuits
Show Slide 158:

Synchronization Circuit 2

Use when input pulses may be less than one clock period wide

FF1 captures short pulses


VCC

Guards against metastability


D

FF1

FF2

Synchronized signal

FF3

Asynchronous input
CLR

CLK

FPGA Design Techniques - 158

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

To obtain this circuit, add a flip-flop that is clocked by the


asynchronous input to the front of Synchronization Circuit 1
and an AND gate.

FF1 is a flip-flop with asynchronous clear.

The AND gate prevents FF1 from being reset if the input to the
circuit is still HIGH. This allows for long input pulses as well as
short ones. If multiple short pulses occur on the input within a
space of three clock cycles, only the first pulse will be seen by
this circuit. This is always a danger when passing data from a
fast clock domain into a slower clock domain.

FF2 and FF3 act in the same way as FF1 and FF2 in
Synchronization Circuit 1.

To avoid using this circuit, which uses the input as a clock


signal (probably not on a global buffer), you can use a faster
clock signal to synchronize inputs (allowing you to use
Synchronization Circuit 1). You can then use a divided version
of the same clock signal in the rest of the design.

www.xilinx.com
1-877-XLX-CLAS

Page 147

FPGA Design Techniques

Facilitator Guide

Synchronization Circuits
Key Points
!

Because the clocks have a fixed-phase relationship, you will not


need to resynchronize the signals as they cross between the
clock domains; however, you will need to use timing
constraints to prevent setup or hold violations.

Use the CLK2X output of a DLL or the CLKFX output of a DCM


to get a faster clock signal for synchronizing inputs.

Show Slide 159:

Capturing a Bus

Leading edge detector

Input pulses must be at least one CLK period wide

Asynchronous
input CLK

One-shot enable
D

D Q

FF1

FF2

D QQ
CE

CLK
n bit
bus

FPGA Design Techniques - 159

Synchronized
bus inputs

Sync_Reg
D QQ

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 148

First, the data bus is registered by the asynchronous clock.


Second, the one-shot enable (synchronized to CLK) signals to
the internal circuit that data is captured (via CE).

Note: Now, there is a level of logic between the one-shot enable


generator and the synchronization register. If FF1 becomes
metastable, it has less time to recover before its output is used
to enable Sync_Reg.

Recovery time = <CLK period> <datapath delay>

<datapath delay> = <FF1 CLK-to-Q> + net delay + <LUT


delay> + <Sync_Reg setup>
www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Synchronization Circuits
Key Points
!

A Falling Edge detector can be designed a couple of different


ways:

1. Invert asynchronous input Clk.

2. Change the AND gate to have bubble on the top instead of on


the bottom.

Show Slide 160:

Capturing a Bus

Leading edge detector

Input pulses may be less than one CLK period wide


VCC

One-shot enable
D

Asynchronous
Input CLK
CLK

FF1

FF2

FF3

CLR

D
CE

n bit
bus

FPGA Design Techniques - 160

D Q

Synchronized
bus inputs

Sync_Reg

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

First, the data bus is registered by the asynchronous clock.


Second, the one-shot enable (synchronized to CLK) signals to
the internal circuit that data is captured (via CE).

Note: Now, there is a level of logic between the one-shot enable


generator and the synchronization register. If FF2 becomes
metastable, it has less time to recover before its output is used
to enable Sync_Reg.

Recovery time = <CLK period> <datapath delay>

<datapath delay> = <FF1 CLK-to-Q> + net delay + <LUT


delay> + <Sync_Reg setup>

www.xilinx.com
1-877-XLX-CLAS

Page 149

FPGA Design Techniques

Facilitator Guide

Synchronization Circuits
Key Points
!

A Falling Edge detector can be designed a couple of different


ways:

1. Invert asynchronous input Clk.

2. Change the one-shot enable AND gate to have bubble on the


top instead of on the bottom, remove Reset AND gate bubble,
and, finally, change VCC to GND.

Show Slide 161:

Synchronization Circuit 3

Use a FIFO to cross domains

FPGA Design Techniques - 161

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 150

You must still synchronize FIFO status flags (FULL,


ALMOST_FULL, EMPTY, ALMOST_EMPTY) based on read
and write operations, synchronized to read and write clocks,
respectively. In general, this can be accomplished by
synchronizing the slower enable (read/write) to the faster clock
by using Synchronization Circuit 1. If synchronizing to the
slower clock, use Synchronization Circuit 2.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Summary
Show Slide 162:

Lessons

FPGA Design Techniques - 162

Duplicating Flip-Flops
Pipelining
I/O Flip-Flops
Synchronization Circuits
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 163:

Apply Your Knowledge

3) High fanout is one reason to duplicate a flip-flop. What is another


reason?

4) Provide an example of when you do not need to resynchronize a signal


that crosses between clock domains

5) What is the purpose of the extra flip-flop in the synchronization


circuits shown in this module?

FPGA Design Techniques - 163

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 151

FPGA Design Techniques

Facilitator Guide

Summary
Show Slide 164:

Summary

You can increase circuit performance by

Some trade-offs

Duplicating flip-flops
Adding pipeline stages
Using I/O flip-flops
Duplicating flip-flops increases circuit area
Pipelining introduces latency and increases circuit area

Synchronization circuits increase reliability

FPGA Design Techniques - 164

2008 Xilinx, Inc. All Rights Reserved

Where Can I Learn More?


!

User Guides: www.xilinx.com Documentation Doc Type


User Guides
Switching Characteristics
Detailed Functional Description Input/Output Blocks
(IOBs)

Application notes: www.xilinx.com Documentation Doc


Type Application Notes
Application Note XAPP094: Metastability Recovery
Application Note XAPP225: Data-to-Clock Phase Alignment

Page 152

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

FPGA Design Techniques

Apply Your Knowledge Answers


Answers

1) What is wrong with the pipelined circuit?


!

Latency mismatch

Older data is mixed with newer data

Circuit output is incorrect

2) How can the problem be corrected?


!

Add a flip-flop on SELECT

All data inputs now experience the same amount of latency

3) High fanout is one reason to duplicate a flip-flop. What is


another reason?
!

Loads are divided among multiple locations on the chip

4) Provide an example of when you do not need to resynchronize a


signal that crosses between clock domains.
!

Well-defined phase relationship between the clocks

Example: Clocks are the same frequency, 180 degrees out of


phase

Use related PERIOD constraints to ensure that datapaths will


meet timing

5) What is the purpose of the extra flip-flop in the


synchronization circuits shown in this module?
!

To allow the first flip-flop time to recover from metastability

Transition to Synthesis Techniques

www.xilinx.com
1-877-XLX-CLAS

Page 153

Synthesis Techniques

Facilitator Guide

Synthesis Techniques
Purpose

After completing this module, you will be able to:


!

Specify Xilinx resources that need to be instantiated for various


FPGA synthesis tools

Identify synthesis tool options that can be used to increase


performance

Describe an approach to using your synthesis tool to obtain


higher performance

Time

40 minutes
Process

This module describes how to synthesize a fast and efficient FPGA


design by using the advanced capabilities of the synthesis tools.
Lessons

Page 154

Introduction

Achieving Breakthrough Performance

Synthesis Options

XST Synthesis Options

Summary

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Introduction
Show Slide 165:

Synthesis Techniques

Show Slide 166:

Objectives
After completing this module, you will be able to:

Specify Xilinx resources that need to be instantiated for various FPGA


synthesis tools
Identify synthesis tool options that can be used to increase performance
Describe an approach to using your synthesis tool to obtain higher
performance

Synthesis Techniques - 166

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 155

Synthesis Techniques

Facilitator Guide

Introduction
Show Slide 167:

Recommended REL Modules

Three recorded e-Learning modules are available for you to improve your
HDL coding style

Basic HDL Coding Techniques

Design guidelines (good design practices)


Best ways to pipeline your design
Finite State Machine design

Spartan-3 FPGA HDL Coding Techniques

Coding for hardware resources


SRL, multiplexers, carry logic
Coding to reduce your design size
Managing your control signals (sets, resets, clocks, clock enables)
Block RAM

Synthesis Techniques - 167

2008 Xilinx, Inc. All Rights Reserved

Show Slide 168:

Recommended REL Modules

Three recorded e-Learning modules are available for you to improve your
HDL coding style

Virtex-5 FPGA HDL Coding Techniques

Managing your control sets


Control signal recommendations
How to build a fast and efficient Virtex-5 FPGA design
How to migrate an older design to a Virtex-5 FPGA

All of these RELs are available at no charge at


www.xilinx.com/support/training/free-courses.htm

Synthesis Techniques - 168

Page 156

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Introduction
Show Slide 169:

Timing Closure

Synthesis Techniques - 169

2008 Xilinx, Inc. All Rights Reserved

Timing Closure

www.xilinx.com
1-877-XLX-CLAS

Page 157

Synthesis Techniques

Facilitator Guide

Achieving Breakthrough Performance


Show Slide 170:

Lessons

Synthesis Techniques - 170

Achieving Breakthrough Performance


Synthesis Options
XST Synthesis Options
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 171:

Breakthrough Performance

Three steps to achieve breakthrough performance


1. Utilize embedded (dedicated) resources

Performance by construction
DSP48, FIFO, block RAM, ISERDES, OSERDES,
PowerPC processor, EMAC, and MGT, for example

2. Write code for performance

Use synchronous design methodology


Ensure the code is written optimally for critical paths
Pipeline
Xilinx FPGAs have abundant registers: one register per LUT

3. Drive your synthesis and Place & Route tools

Try different optimization techniques


Add critical timing constraints in synthesis
Preserve hierarchy
Apply full and correct constraints
Use High effort

Synthesis Techniques - 171

Page 158

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Virtex
-4 FPGA
Virtex
Virtex-4
Performance Meter

Facilitator Guide

Synthesis Techniques

Achieving Breakthrough Performance


Key Points
!

Applying full and correct constraints refers to applying


constraints for all clocks in the design. Additionally, false paths
and multicycle paths should be correctly constrained, as should
the I/O.

The timing closure flow chart was created to help achieve


breakthrough performance.

Show Slide 172:

500-MHz Fabric Guidelines

I3
I2
I1
I0

I3
I2
I1
I0

SET
CE
D
Q
RST

One Level of Logic Only

SET
CE
D
Q
RST

For the fabric to achieve maximum performance, note the following


important considerations

1. Do not exceed more than one level of logic. That is why the registers are
there
2. Carry chains should not exceed 14* before being registered
3. You may need placement constraints to keep functions together

Synthesis Techniques - 172

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 159

Synthesis Techniques

Facilitator Guide

Achieving Breakthrough Performance


Show Slide 173:

Use Embedded Blocks

Embedded block timing is correct by construction

Offers as much as 3x the performance


of soft implementations
Examples

Not dependent on programmable routing

FIFO at 500 MHz


DSP slices at 500 MHz
PowerPC processor at up to
550 MHz

XtremeDSP Solution
Slice

Smart RAM FIFO


PowerPC Processor

Synthesis Techniques - 173

2008 Xilinx, Inc. All Rights Reserved

Show Slide 174:

Simple Coding Steps Yield


3x Performance

Use pipeline stagesmore bandwidth


Use synchronous resetbetter system control
Use Finite State Machine (FSM) optimizations
Use inferable resources

Multiplexer
Shift Register LUT (SRL)
Block RAM, LUT RAM
Cascade DSP

Avoid high-level constructs (loops, for example) in code

Many synthesis tools produce slow implementations

See the Synthesis and Simulation Design Guide:


Guide:
Help Software Manuals Synthesis and Simulation Design Guide
Synthesis Techniques - 174

Page 160

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Achieving Breakthrough Performance


Key Points
!

These are just the most obvious suggestions. For every design,
there may be more tricks or other clever things that can
improve performance.

Pipelining is the one thing that helps the most, and for most
systems today, pipelining is always an option because
bandwidth is what defines the system, not the latency. Latency
can be important, but if it is, it is usually the latency in a
different order of magnitude than the one that is caused by
pipelining.

FPGAs have lots of registers, so re-timing and clever use of


arithmetic functions can yield tremendous performance. If
designers need to balance the latency among different paths in
the system, the SRLs can be used to compensate efficiently for
delay differences.

Show Slide 175:

Synthesis Guidelines

Use timing constraints

Define tight but realistic individual clock constraints


Put unrelated clocks into different clock groups

Use these synthesis options to start (they dont always work best on every
design)

Turn off resource sharing


Move flip-flops from IOBs closer to logic
Turn on FSM optimization
Use the retiming option

Synthesis Techniques - 175

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Resource sharing is a technique used by synthesis tools to


decrease circuit area, usually resulting in lower performance.

www.xilinx.com
1-877-XLX-CLAS

Page 161

Synthesis Techniques

Facilitator Guide

Achieving Breakthrough Performance


Key Points
!

The decision to move flip-flops into and out of IOBs can also be
made by the MAP process during implementation, if timingdriven packing is used. This option will be discussed in the
Advanced Implementation Options module at the end of this
course.

Show Slide 176:

Synplicity Example

Use constraints
Synplify and Synplify Pro software
stop optimizing when the constraints
are met
Use SCOPE to enter all timing constraints
Define real, individual clock
constraints
If the clocks are unrelated, always
put them into different clock groups
Using the global frequency field can
deteriorate results

Synthesis Techniques - 176

(*) Synplicitys data

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 162

As shown on the graph on the right, the green line (estimated


performance) and the red line (actual performance) are very
close to each other, which means that the synthesis tools do a
fairly good job estimating the performance. The important
thing to note is that the performance increases significantly
when the right set of constraints is used (in this example ~55
percent of the maximum circuit performance). Keep in mind
that these are only the synthesis constraints. There is no change
in the code.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Achieving Breakthrough Performance


Key Points
!

The XST and Precision software tools likewise have similar


results when clock constraints are applied. For XST software,
period and input/output delays can be specified via the XST
Constraints File (XCF). For Precision software, this information
can be specified individually on each clock via the Design
Hierarchy window, by right-clicking the Clocks folder and
selecting Set Clock Constraints.

For example, with a design that has two clocks (one a lowfrequency clock and other a high-frequency clock), the logic for
each domain will be optimized to meet the constraint. For a
low-frequency clock, the logic can be optimized for area
saving resources, while the logic of the high-frequency clock
domain can increase the area to meet the constraint. It is very
important that this information is provided to the XST,
Synplify, or Precision software, as all are constraint-driven
tools.

Show Slide 177:

Impact of Constraints

Non-timing-constrained designs can be optimized for area rather than


performance
LUT

LUT

LUT

LUT

LUT

LUT

LUT

LUT

LUT

LUT
LUT

Non-Timing Driven
Total LUTs: 5
Clock Freq: 423.7 MHz
Synthesis Techniques - 177

Timing Driven
(Bigger but Faster!!!)

Total LUTs: 6
Clock Freq: 591.7 MHz (+ 40%)
2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 163

Synthesis Techniques

Facilitator Guide

Achieving Breakthrough Performance


Key Points
!

This example shows what happens when constraints are used


properly. If there is no performance requirement, the tools
generate a design that is as small as possible. The same is true
when there are several solutions that all meet the requirements;
the smallest implementation will be used.

Non-Timing Driven

Timing Driven

Page 164

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Achieving Breakthrough Performance


Show Slide 178:

Place & Route Guidelines

Timing constraints

Recommended options

Use tight, realistic constraints

By default, effort is set to Standard

Timing-driven MAP
Xplorer

Tools to help meet timing

Using the correct Place &


Route options can have a
dramatic impact on design
performance

High-effort Place & Route

Floorplanning
(Use the PACE and PlanAhead software tools)
Physical synthesis tools

Other available options

Incremental design
Modular design flows

Synthesis Techniques - 178

2008 Xilinx, Inc. All Rights Reserved

Show Slide 179:

Impact of Constraints in
Tools
Reed-Solomon design from www.opencores.org 2.1

Performance

1.6
1.4
1.0

No constraints;
Standard effort

Synthesis Techniques - 179

No constraints
in synthesis;
Place & Route
with High effort
and constraint

Constraints in
synthesis
and Place &
Route (High
effort)

Constraints in
synthesis and Place
& Route; retiming
in synthesis;
High effort in PAR

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 165

Synthesis Techniques

Facilitator Guide

Synthesis Options
Show Slide 180:

Lessons

Achieving Breakthrough Performance


Synthesis Options
XST Synthesis Options
Summary

Synthesis Techniques - 180

2008 Xilinx, Inc. All Rights Reserved

Show Slide 181:

Synthesis Options

There are many synthesis options that can help you obtain your
performance and area objectives

Timing-driven synthesis
FSM extraction
Retiming
Register duplication
Hierarchy management
Resource sharing
Physical optimization

Synthesis Techniques - 181

Page 166

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Synthesis Options
Show Slide 182:

Timing-Driven Synthesis

Synplify, Precision, and XST software


Timing-driven synthesis uses performance objectives to drive the
optimization of the design

Based on your performance objectives, the tools will try several algorithms
to attempt to meet performance while keeping the amount of resources in
mind
Performance objectives are provided to the synthesis tool via timing
constraints

Synthesis Techniques - 182

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Synplify software: Communicate constraints via SCOPE.

Precision software: Communicate constraints by entering them


in a constraint file (SDC file) or by entering them individually
for each clock from the hierarchy window.

XST software: Communicate constraints via the XCF. For more


information, see the XST User Guide in the online software
documents (Help Software Manuals XST User Guide).

www.xilinx.com
1-877-XLX-CLAS

Page 167

Synthesis Techniques

Facilitator Guide

Synthesis Options
Show Slide 183:

Timing Constraints Editor

Synplify and Precision software


The timing constraints editor allows you to apply timing constraints for
your tool

These constraints will be used to drive synthesis optimization (for those


tools that use constraint-driven synthesis)
These constraints will also be passed (by default) on to the Xilinx
implementation tools via a Netlist Constraints File (NCF)

XST constraints

Communicated via the XCF

See the XST User Guide in Software Manuals: Help XST User Guide
XST Design Constraints

Synthesis Techniques - 183

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 168

XST constraints are entered into a text file. For more


information on timing and non-timing XST constraints, see the
XST User Guide.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Synthesis Options
Show Slide 184:

FSM Extraction

Synplify, Precision, and XST software


Finite State Machine (FSM) extraction optimizes your state machine by
re-encoding and optimizing your design based on the number of states
and inputs

By default, the tools will use FSM extraction

Safe state machines

By default, the synthesis tools will remove all decoding for illegal states

Must be turned on to use safe FSM implementation

Even if you include VHDL when others or Verilog default cases


See Notes for more information

Synthesis Techniques - 184

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

For more information on the specifics of how your synthesis


tool will re-encode your FSM, see the user guide provided by
each vendor.

To change FSM settings from the tool run standalone:


Precision: Tools Set Options, check Use Safe FSM
Synplify: In Scope, specify syn_encoding <state_registers> =
safe

To change FSM settings from the ISE software:


XST: Synthesize XST HDL Options: Safe Implementation
= Yes.
Synplify: syn_encoding = safe in the Scope constraint file.
Include the file from the Synthesis Properties menu
Precision: Synthesis Properties Input Options: Use Safe
FSM = checked.

www.xilinx.com
1-877-XLX-CLAS

Page 169

Synthesis Techniques

Facilitator Guide

Synthesis Options
Show Slide 185:

Retiming

Synplify, Precision, and XST software


Retiming: The synthesis tool automatically tries to move register stages to
balance combinatorial delay on each side of the registers
Before Retiming
D

After Retiming
D

Synthesis Techniques - 185

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

To access retiming:
Synplify software: Enable under Implementation Options or
the Retiming option in the Run window in the Synplify Pro
software (Synplify Options Configure VHDL or
Verilog Compiler).
Precision software: Check the box in the Setup Design
dialog box.
XST: Enable under the Properties dialog box for Synthesize
XST Xilinx Specific Options Register balancing.

Page 170

Retiming results will be design dependent. In some situations,


retiming may not provide any benefit (highly pipelined
designs); however, it may improve performance for some
designs.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Synthesis Options
Show Slide 186:

Register Duplication

Synplify, Precision, and XST software


Register duplication is used to reduce fanout on registers (to improve
delays)

Xilinx recommends manual register duplication

Most synthesis vendors create signals <signal_name>_rep0, _rep1, etc.

Implementation tools pack logic with related names into the same slice, which
can prohibit a register from being moved closer to its destination

When manually duplicating registers, do not use a number at the end

Use synthesis options to prevent duplicate registers from being re-merged

Example: <signal_name>_0dup, <signal_name>_1dup

Synthesis Techniques - 186

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Register duplication of the output 3-state register is used so that


the IOB 3-state register can be moved inside the IOB (to reduce
clk-to-output delays). Note that for the 3-state register to be
placed in the IOB, its fanout must be one.

www.xilinx.com
1-877-XLX-CLAS

Page 171

Synthesis Techniques

Facilitator Guide

Synthesis Options
Show Slide 187:

Hierarchy Management

Synplify, Precision, and XST software


The basic settings are

Flatten the design: Allows total combinatorial optimization across all


boundaries (XST default)
Maintain hierarchy: Preserves hierarchy without allowing optimization of
combinatorial logic across boundaries (Xilinx recommended)

If you have followed the synchronous design guidelines, use the setting
-maintain hierarchy
If you have not followed the synchronous design guidelines, use the
setting -flatten the design
Your synthesis tool may have additional settings

Refer to your synthesis documentation for details on these settings

Synthesis Techniques - 187

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

To access hierarchy control:


Synplify software: SCOPE Constraints Editor
Synplify also has an additional setting: Maintain hierarchy
but allow optimization. This setting allows combinatorial
logic to be optimized while maintaining hierarchy in the
netlist (setting in Synplify is firm).
Precision software: After compiling the design, right-click
Modules in the Design Hierarchy window and select
Preserve Hierarchy or Flatten Hierarchy.
XST: Synthesize XST Synthesis Option Keep
Hierarchy. Note that the default is NO.

Page 172

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Synthesis Options
Show Slide 188:

Hierarchy Preservation
Benefits

Easily locate problems in the code based on the hierarchical instance


names contained within static timing analysis reports
Enables floorplanning and incremental design flow
The primary advantage of flattening is to optimize combinatorial logic
across hierarchical boundaries

If the outputs of leaf-level blocks are registered, there is generally no need


to flatten

However, preserving hierarchy can limit register retiming (balancing) and


register duplication

Synthesis Techniques - 188

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Registering outputs of each leaf-level block is part of the


synchronous design techniques methodology. Registering the
output boundaries helps because you know the delays from one
block to the next. That is, the delays are not variable based on
combinatorial outputs. Logic cannot be optimized across a
registered boundary. Therefore, if you do register outputs, you
know the delay is minimized from one hierarchical or
functional block to the next and you also know that no logic
optimization can occur across hierarchical domains.

In addition to the benefits listed above, preserving hierarchy


has the added benefit of limiting name changes to registers
thus, the element names used in a UCF will generally not
change. If you flatten the design, the register and element
names and hierarchical path and references in a flattened
design can change from one iteration to the next. In this case,
maintaining the UCF can be quite a burden.

However, preserving hierarchy can prevent register balancing


(retiming) and register duplication. Nevertheless, the benefits of
preserving hierarchy generally outweigh the benefits of
flattening except when you have combinatorial outputs.
www.xilinx.com
1-877-XLX-CLAS

Page 173

Synthesis Techniques

Facilitator Guide

Synthesis Options
Key Points
!

And in general, preserve hierarchy for large designs. For


smaller designs, preserve the hierarchy if you registered leaflevel outputs; otherwise, you might consider flattening the
design. If you flatten the design, remember the extra burdens of
name changes (UCF and static timing analysis) from one
iteration to the next and the limits on floorplanning.

Show Slide 189:

Schematic Viewers

Synplify, Precision, and XST software


Allows you to view synthesis results graphically

Check the number of logic levels between flip-flops


Locate net and instance names quickly
View the design as generic RTL or technology-specific components

Works best when hierarchy has been preserved during synthesis

Synthesis Techniques - 189

Page 174

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Synthesis Options
Show Slide 190:

Cross-Probing

Cross-probing: Synplify and Precision software

From the Timing Analyzer, click a reported worst-case path and that path
will be highlighted in the synthesis schematic viewer
Cross-probe to the code

Review the code to determine whether or not it can be rewritten to improve


performance
Apply timing constraints in your synthesis tool to optimize this path better

You may need to set some environment variables for this to work

For more information, see Application Note XAPP406: Cross-Probing to


Synplicity and Exemplar

Synthesis Techniques - 190

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

To find a particular application note, it is easier to just search


for the application note number. For a list of all application
notes, go to www.xilinx.com Documentation Doc Type
Application Notes See all Application Notes.

www.xilinx.com
1-877-XLX-CLAS

Page 175

Synthesis Techniques

Facilitator Guide

Synthesis Options
Show Slide 191:

Physical Optimization

Synplicity Amplify FPGA Physical Optimizer or Mentor Precision Physical


software (add-on tools)
Based on the critical paths in the design, the tools will attempt to optimize
and physically locate the associated logic closely together to minimize the
routing delays
Essentially, this is a way to provide critical path information to the
synthesis tool so that it can attempt to optimize those paths further

Synthesis Techniques - 191

Page 176

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

XST Synthesis Options


Show Slide 192:

Lessons

Synthesis Techniques - 192

Achieving Breakthrough Performance


Synthesis Options
XST Synthesis Options
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 193:

New XST Switches

LUT combining (Virtex-5 FPGA only)

Recall that the 6-input LUT is actually


two 5-input LUTs

This allows XST to map to this


configuration

Area can save LUTs (can be significant)


Area tries to balance size with speed

Reduce control sets (Virtex-5 FPGA only)

XST will assign synchronous set/reset and


CEs to LUT inputs
Low fanout controls signals assigned this
way can reduce the number of control sets,
which improves device utilization
Can be controlled with HDL coding style

Synthesis Techniques - 193

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 177

Synthesis Techniques

Facilitator Guide

XST Synthesis Options


Show Slide 194:

New XST Features

Inference of SRL for shift register with set/reset

XST uses SRL resources if the HDL description contains a single


asynchronous, synchronous set, or synchronous reset signal
This will require extra logic, because SRL does not support a set or reset
functionality
Inference is done if the shift register has at least four stages

Synthesis Techniques - 194

Page 178

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Summary
Show Slide 195:

Lessons

Synthesis Techniques - 195

Achieving Breakthrough Performance


Synthesis Options
XST Synthesis Options
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 196:

Apply Your Knowledge

1) List a few of the options in the synthesis tools that help you increase
performance

2) What is the approach presented here for obtaining breakthrough


performance?

Synthesis Techniques - 196

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 179

Synthesis Techniques

Facilitator Guide

Summary
Show Slide 197:

Summary

Your HDL coding style can affect synthesis results


Infer resources whenever possible
Most resources are inferable, either directly or with an attribute
If you cannot infer the resource you need, instantiate it
Take advantage of the synthesis options provided to help you meet your
timing objectives
Use synchronous design techniques and timing-driven synthesis to
achieve higher performance

Synthesis Techniques - 197

2008 Xilinx, Inc. All Rights Reserved

Where Can I Learn More?


!

Synthesis & Simulation Design Guide , XST User Guide, and


Constraints Guide
Help Software Manuals

User guides
www.xilinx.com Documentation Doc Type User
Guides

Virtex-5 FPGA data sheets and user guides


www.xilinx.com Documentation Devices FPGA
Device Family Virtex-5

Page 180

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Synthesis Techniques

Apply Your Knowledge Answers


Answers

1) List a few of the options in the synthesis tools that help you
increase performance.
!

Timing-driven synthesis

FSM extraction

Retiming

Register duplication

Physical optimization

2) What is the approach presented here for obtaining breakthrough


performance?
Three steps to achieve breakthrough performance:
1. Utilize embedded (dedicated) resources.
!

Performance by construction

DSP48, FIFO, Block RAM, ISERDES, OSERDES, PowerPC


processor, EMAC, and MGT, for example

2. Write code for performance.


!

Pipeline
Xilinx FPGAs have abundant registers: one register per LUT

3. Drive your synthesis and Place & Route tools.


!

Apply full and correct timing constraints

Utilize optional settings

Use High effort

Transition to Lab 3: Synthesis Techniques

www.xilinx.com
1-877-XLX-CLAS

Page 181

Lab 3: Synthesis Techniques

Facilitator Guide

Lab 3: Synthesis Techniques


Purpose

After completing this lab, you will be able to:


!

Access synthesis options for the targeted software

Read the Synthesis Report in the targeted software to find


performance estimates

Modify the synthesis timing constraints and, in the case of XST,


access and modify the contents of the Xilinx Constraints File
(XCF) for synthesis

Time

30 minutes
Process

This lab illustrates how to synthesize a design by taking advantage


of some of the advanced synthesis options available in the newest
synthesis tools.
General Flow

Page 182

Step 1: Review the existing design

Step 2: Apply a PERIOD constraint

Step 3: Apply with a tighter PERIOD constraint

Step 4: Apply with an even tighter PERIOD constraint

Step 5: Apply with yet an even tighter PERIOD constraint

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Lab 3: Synthesis Techniques

Lab
Designing for Performance Lab Workbook
!

Refer to the separate lab workbook for the Synthesis


Techniques lab.

Transition to Day One Summary

www.xilinx.com
1-877-XLX-CLAS

Page 183

Day One Summary

Facilitator Guide

Day One Summary


Purpose

This module reviews day one of the course.


Time

10 minutes
Process

This module reviews day one of the course.


Lessons
!

Page 184

Day One Summary

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Day One Summary

Day One Summary


Show Slide 198:

Day One Summary

Show Slide 199:

Day One Review

Describe the flow for obtaining your performance objectives


Describe the available clocking resources in the Virtex-5 FPGA
Explain the architectural features of the Virtex-5 FPGA
Name one technique for building synchronization circuits that can provide
maximum recovery time
Describe a few of the synthesis options that can boost performance
Describe the synthesis approach that helps obtain higher performance
How can CORE Generator software system cores improve
performance?

Day One Summary - 199

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 185

Day One Summary

Facilitator Guide

Day One Summary


Show Slide 200:

Timing Closure

2008 Xilinx, Inc. All Rights Reserved

Day One Summary - 200

Show Slide 201:

Day One Review Answers

Describe the available clocking resources in the Virtex-5 FPGA


CMT

32 global clock buffers


Provides glitch-free switching among clocks
Drives differential global clock trees
Any 10 global clocks can access any clock region

BUFIO

Provides a large range of multiply and divide-by values


Filters clock jitter

BUFGCTRL

DLL: Eliminates clock skew


DFS: Generates new clock frequencies
DPS: Phase shifts a clock signal

4 BUFIOs per region


Drives I/O and BUFRs

BUFR

2 BUFRs per region


4 regional clock tracks

Day One Summary - 201

Page 186

10

10

10

10

10

10

PLL

10

DCM

26 Clock Management Tiles


Each CMT has 2 DLLs and 1 PLL

32
10

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Day One Summary

Day One Summary


Show Slide 202:

Day One Review Answers

Explain the architectural features of the Virtex-5 FPGA

IOSERDES: Input/output 10-bit parallel/serial converters in the I/O tile

DSP48

Can be used in parallel for up to 10 bits


18x18 2s complement optionally pipelined multiplier
Dynamic user-controlled operating modes (OPMODEs)
48-bit adder, adder, subtractor, and accumulator options
Symmetric rounding support
Optionally registered input and outputs
Expandable for 25x18 and 35x25 (using 2 DSP48s) multiplier applications
C-input is completely independent
A:B is expandable to 48 bits (suitable for SIMD applications)
A input cascade (efficient filter implementation)
2008 Xilinx, Inc. All Rights Reserved

Day One Summary - 202

Show Slide 203:

Day One Review Answers

Explain the architectural features of the Virtex-5 FPGA

Block RAM

36-kb block memory that can be segmented into (2) 18-kb block memories or (1) 18-kb
block memory and (1) 18-kb FIFO
Optional output register for performance up to 550 MHz
Cascade mode for 64kb x 1

FIFO16

LXT

SXT

FXT

Uses block RAM for storage


Dedicated flag and status logic
Two modes (multirate and synchronous)
MGTs, PCI Express integrated Endpoint block, and tri-mode EMAC block features
Same as LXT, but with more block RAM and DSP resources
Same as LXT, plus GTX transceivers (instead of MGTs), and the PowerPC 440
processor

Day One Summary - 203

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 187

Day One Summary

Facilitator Guide

Day One Summary


Show Slide 204:

Day One Review Answers

Name one technique for building synchronization circuits that can provide
maximum recovery time

Use an extra flip-flop (two-bit shift register) to provide maximum recovery


time to the first flip-flop

Describe a few of the synthesis options that can boost performance

Timing-driven optimization, retiming, register replication, Finite State


Machine (FSM) extraction, timing constraints entry, hierarchy management,
and physical optimization

2008 Xilinx, Inc. All Rights Reserved

Day One Summary - 204

Show Slide 205:

Day One Review Answers

Describe the synthesis approach that helps obtain higher


performance
1. Utilize embedded (dedicated) resources

Performance by construction (use all the dedicated hardware you can)


DSP48, PowerPC processor, EMAC, MGT, FIFO, block RAM, ISERDES, and OSERDES,
for example

2. Write code for performance

Use synchronous design methodology


Ensure the code is written optimally for critical paths
Pipeline (not as necessary with the Virtex-5 FPGA)

3. Drive your synthesis and Place & Route tools

Try different optimization techniques


Add critical timing constraints in synthesis
Preserve hierarchy
Apply full and correct constraints
Utilize optional settings
Use High effort

Day One Summary - 205

Page 188

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Day One Summary

Day One Summary


Show Slide 206:

Day One Review Answers

How can CORE Generator software system cores improve performance?

These cores are pre-optimized for the Xilinx architecture

2008 Xilinx, Inc. All Rights Reserved

Day One Summary - 206

Show Slide 207:

Day One Summary

A flow for achieving timing closure was presented


The Virtex-5 FPGA architecture has many dedicated resources that can
improve performance and lower power
The DCM and PLL has many features that can increase design
performance
There are many clock features available for high-speed design
You can increase design performance by duplicating flip-flops, pipelining,
and using I/O flip-flops
Synthesis tools have many different options to improve synthesis results
CORE Generator software system cores can be used to take full
advantage of the Xilinx FPGA architecture

Day One Summary - 207

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 189

Day One Summary

Facilitator Guide

Day One Summary


Transition to Course Agenda Day Two

Page 190

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Course Agenda Day Two

Course Agenda Day Two


Purpose

This module covers the day two agenda for the course.
Time

5 minutes
Process

This module covers the day two agenda for the course.
Lessons
!

Course Agenda Day Two

www.xilinx.com
1-877-XLX-CLAS

Page 191

Course Agenda Day Two

Facilitator Guide

Course Agenda Day Two


Show Slide 208:

Designing for Performance


Course Agenda Day Two

Show Slide 209:

Day One Objectives


Yesterday you learned how to:

Describe a flow for obtaining timing closure


Describe the architectural features of the Virtex-5 FPGA
Describe the features of the Digital Clock Manager (DCM) and PhaseLocked Loop (PLL) and how they can be used to improve performance
Increase performance by duplicating registers and pipelining
Describe different synthesis options and how they can improve
performance
Create and integrate cores into your design flow by using the CORE
Generator software system
Run behavioral simulation on an FPGA design that contains cores
Course Agenda Day Two - 209

Page 192

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Course Agenda Day Two

Course Agenda Day Two


Show Slide 210:

Day Two Objectives


After completing this course, you will be able to:

Pinpoint design bottlenecks by using Timing Analyzer reports


Apply advanced timing constraints to meet your performance goals
Use advanced implementation options to increase design performance

Course Agenda Day Two - 210

2008 Xilinx, Inc. All Rights Reserved

Show Slide 211:

Day One Agenda

Review of Fundamentals of FPGA Design


Designing with Virtex-5 FPGA Resources
CORE Generator Software System
Lab 1: CORE Generator Software System
Designing Clock Resources
Lab 2: Designing Clock Resources
FPGA Design Techniques
Synthesis Techniques
Lab 3: Synthesis Techniques

Course Agenda Day Two - 211

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 193

Course Agenda Day Two

Facilitator Guide

Course Agenda Day Two


Show Slide 212:

Day Two Agenda

Achieving Timing Closure


Lab 4: Review of Global Timing Constraints
Timing Groups and OFFSET Constraints
Path-Specific Timing Constraints
Lab 5: Achieving Timing Closure
Advanced Implementation Options
Lab 6: Designing for Performance
Power Estimation (Optional)
Lab 7: FPGA Editor Demo (Optional)
ChipScope Pro Software (includes lab) (Optional)
Course Summary
Course Agenda Day Two - 212

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The ChipScope Pro Software lab is not available for classes


that use Toolwire for labs.

Transition to Achieving Timing Closure

Page 194

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Achieving Timing Closure


Purpose

After completing this module, you will be able to:


!

Interpret a timing report and determine the cause of timing


errors

Apply Timing Analyzer report options to create customized


timing reports

Time

45 minutes
Process

This module describes how to read the Timing Analyzer reports


and use the information to gain timing closure.
Lessons
!

Introduction

Timing Reports

Interpreting Timing Reports

Report Options

Summary

www.xilinx.com
1-877-XLX-CLAS

Page 195

Achieving Timing Closure

Facilitator Guide

Introduction
Show Slide 213:

Achieving Timing Closure

Show Slide 214:

Objectives
After completing this module, you will be able to:

Interpret a timing report and determine the cause of timing errors


Apply Timing Analyzer report options to create customized timing reports

Achieving Timing Closure - 214

Page 196

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Introduction
Show Slide 215:

Timing Closure

Achieving Timing Closure - 215

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 197

Achieving Timing Closure

Facilitator Guide

Timing Reports
Show Slide 216:

Lessons

Achieving Timing Closure - 216

Timing Reports
Interpreting Timing Reports
Report Options
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 217:

Timing Reports

Timing reports enable you to determine how and why constraints were not
met

The Project Navigator can create timing reports at two points in the design
flow

Reports contain detailed descriptions of paths that fail their constraints

Post-Map Static Timing Report


Post-Place & Route Static Timing Report

The Timing Analyzer is a utility for creating and reading timing reports

Achieving Timing Closure - 217

Page 198

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Timing Reports
Key Points
!

After implementing a design, use timing reports to determine


overall design performance.

You should review the details for each failed constraint to


determine why the design does not meet performance
objectives.

Show Slide 218:

Using the Timing Analyzer

Double-click Analyze Post-Place


& Route Static Timing

Opens the Post-Place & Route


Static Timing Report
Allows you to create custom reports

Open a plain text version by clicking


Static Timing Report in the Design
Summary screen

Achieving Timing Closure - 218

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Although the plain text timing report contains the same


information as the Timing Analyzer version, it does not contain
hyperlinks to other tools.

www.xilinx.com
1-877-XLX-CLAS

Page 199

Achieving Timing Closure

Facilitator Guide

Timing Reports
TRAINER NOTE

Demo Instructions:
1. Launch the Project Navigator and open the Timing Closure
lab project.
2. Expand the Implement, Place & Route and Generate PostPlace & Route Static Timing processes.
3. Double-click Analyze Post-Place & Route Static Timing.

Show Slide 219:

Timing Analyzer GUI

Hierarchical browser

Timing objects window

Timing tab
Quickly navigate to specific
report sections
Summarizes the path
displayed in the path detail
window

Report text

Links to the Timing


Improvement Wizard and
interactive data sheet
Logic highlighted in blue can be cross-probed

Achieving Timing Closure - 219

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 200

Clicking a link in the Delay Type column opens the interactive


data sheet on the Web. Customized for your target device and
speed grade, this data sheet includes timing model drawings
for clearly defining each incremental delay in the timing path.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Timing Reports
TRAINER NOTE

Demo Instructions:
1. Click the Timing Improvement Wizard link to show the
popup dialog box.
2. Click a Tilo delay to open the interactive data sheet.

Show Slide 220:

Cross-Probing

Shows the placement of logic in a delay path

Floorplan-implemented view for seeing the actual placement and routing


used
Technology view shows logical path through components

Achieving Timing Closure - 220

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

This enables you to quickly view the placement of logic in


critical paths.

www.xilinx.com
1-877-XLX-CLAS

Page 201

Achieving Timing Closure

Facilitator Guide

Timing Reports
Show Slide 221:

Timing Report Structure

Timing constraints

Number of paths covered and number of paths that failed for each
constraint
Detailed descriptions of the longest paths

Data sheet report

Timing summary

Setup, hold, and clock-to-out times for each I/O pin


Number of errors (number of failing paths)
Timing score (total number of ps of all constraints that were missed)

Timing report description

Allows you to easily duplicate the report

Achieving Timing Closure - 221

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 202

Timing reports also contain headers with information such as


design name, device targeted, and software version.

The timing score is a key indicator of overall design


performance. The timing score represents the total number of
picoseconds by which the design fails to meet constraints. A
design that meets all constraints has a timing score of 0.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Timing Reports
Show Slide 222:

Report Example

Constraint summary

Total delay

Number of paths covered


Number of timing errors
Length of critical path
Clock and data breakdown

Detailed path description

Delay types are described


in the data sheet
Worst-case conditions are
assumed, unless pro-rated

Achieving Timing Closure - 222

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The Timing Constraint Report lists each constraint as well as


the longest delay paths for each constraint. The report also
breaks down the delay paths into incremental delays.

Use the detailed path description to locate the logic in the


design that is causing the path to fail. If you do not label nets or
choose descriptive instance names, or if your synthesis tool has
created default net names, analyzing this report may be
difficult.

The tools account for clock distribution delay on input and


output paths as well as clock skew on internal flip-flop to flipflop paths.

All delays reported are for worst-case temperature and voltage.


You can prorate delays by specifying the worst-case
temperature and voltage that you expect your device to
encounter. Prorating will be discussed later in this module and
also in the Path-Specific Timing Constraints module.

www.xilinx.com
1-877-XLX-CLAS

Page 203

Achieving Timing Closure

Facilitator Guide

Timing Reports
Key Points
!

In the far right column, the instance name in black text is the
physical resource associated with each delay. Use this instance
name to locate the logic in the FPGA Editor. The blue names are
logical resources, which can be used to locate the logic in the
floorplanner or in the RTL viewer of your synthesis tool.

Logic and routing breakdown can be useful during post-MAP


timing analysis to determine whether the constraints are
reasonable. This will be discussed further in the next lesson.

Show Slide 223:

Timing Improvement Wizard

Makes intelligent design suggestions

When a path fails to meet a timing constraint, the Timing Analyzer shows
its icon
The Wizard asks questions and provides useful suggestions
Answers range from design change guidance to implementation tool
options

Achieving Timing Closure - 223

Page 204

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Interpreting Timing Reports


Show Slide 224:

Lessons

Achieving Timing Closure - 224

Timing Reports
Interpreting Timing Reports
Report Options
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 225:

Estimating Design
Performance

Performance estimates are available before implementation is complete


Synthesis Report

Logic delays are accurate


Routing delays are estimated based on fanout
Reported performance is generally accurate to within 20 percent

Post-Map Static Timing Report

Logic delays are accurate


Routing delays are listed as 0 ns*
Use the 60/40 rule to obtain a more realistic performance estimate

Achieving Timing Closure - 225

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 205

Achieving Timing Closure

Facilitator Guide

Interpreting Timing Reports


Key Points
!

The Synthesis Report is the first place where performance


estimates are given. The estimate is not very accurate this early
in the implementation process, but it can be an indicator of
whether synthesis results are good enough to proceed to the
next step.

The Post-Map Static Timing Report is useful because it is based


on the Xilinx timing constraints, and this report shows detailed
descriptions of the longest paths covered by each constraint.
The routing delays are not accurate, but performance can be
estimated by using the logic delays and the 60/40 rule (covered
next).

* If MAP is run with the timing-driven packing option, routing


delays will be estimated based on logic placement and fanout.

Show Slide 226:

60/40 Rule

A rule of thumb to determine whether timing constraints are reasonable


Open the Post-Map Static Timing Report
Look at the percentage of the timing constraint that is used up by logic
delays

Under 60 percent: Good chance that the design will meet timing
60 to 80 percent: Design may meet timing if advanced options are used
Over 80 percent: Design will probably not meet timing (go back to improve
synthesis results)

Achieving Timing Closure - 226

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 206

The 60/40 rule is a long-standing rule of thumb used by Xilinx


designers. It states that logic delays should not exceed 60
percent of the timing budget.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Interpreting Timing Reports


Show Slide 227:

Analyzing Post-Place &


Route Timing

There are many factors that contribute to timing errors, including

Each root cause has a different solution

Neglecting synchronous design rules or using incorrect HDL coding style


Poor synthesis results (too many logic levels in the path)
Inaccurate or incomplete timing constraints
Poor logic mapping or placement
Rewrite HDL code
Add path-specific timing constraints
Resynthesize or reimplement with different software options

Correct interpretation of timing reports can reveal the most likely cause

Therefore, the most likely solution

Achieving Timing Closure - 227

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The next several slides show examples of timing errors and


how to identify the root cause of the failure.

www.xilinx.com
1-877-XLX-CLAS

Page 207

Achieving Timing Closure

Facilitator Guide

Interpreting Timing Reports


Show Slide 228:

Case 1
Data Path: source to dest
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.325
Tilo
0.146
net (fanout=1)
1.500
Tilo
0.146
net (fanout=1)
0.174
Tilo
0.146
net (fanout=1)
0.204
Tas
0.159
---------------------------Total
3.072ns

Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
dest
-----------------------------(0.869ns logic, 2.203ns route)
(28.3% logic, 71.7% route)

This path is constrained to 3 ns


What is the primary cause of the timing failure?

Achieving Timing Closure - 228

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 208

Tas is the setup time of a slice flip-flop relative to the LUT


inputs to the slice (that is, this delay includes a Tilo delay).

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Interpreting Timing Reports


Show Slide 229:

Case 1 Answer
Data Path: source to dest
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.325
Tilo
0.146
net (fanout=1)
1.500
Tilo
0.146
net (fanout=1)
0.174
Tilo
0.146
net (fanout=1)
0.204
Tas
0.159
---------------------------Total
3.072ns

Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
dest
-----------------------------(0.869ns logic, 2.203ns route)
(28.3% logic, 71.7% route)

What is the primary cause of the timing failure?

The net_2 signal has a long delay and low fanout


Most likely cause is poor placement

Achieving Timing Closure - 229

2008 Xilinx, Inc. All Rights Reserved

Show Slide 230:

Poor Placement: Solutions

Increase placement effort level (or overall effort level)


Timing-driven packing, if the placement is caused by packing unrelated
logic together

Cross-probe to the floorplanner to see what has been packed together


This option is covered in the Advanced Implementation Options module

PAR extra effort or Xplorer

Area constraints with the PlanAhead tool or PACE

Covered in the Advanced Implementation Options module


Covered in the Designing with the PlanAhead Analysis and Design Tool
course

Achieving Timing Closure - 230

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 209

Achieving Timing Closure

Facilitator Guide

Interpreting Timing Reports


Show Slide 231:

Case 2
Data Path: source to dest
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.125
Tilo
0.146
net (fanout=187)
2.500
Tilo
0.146
net (fanout=1)
0.174
Tilo
0.146
net (fanout=1)
0.204
Tas
0.159
---------------------------Total
3.872ns

Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
dest
-----------------------------(0.869ns logic, 3.003ns route)
(22.4% logic, 77.6% route)

This path is also constrained to 3 ns


What is the primary cause of the timing failure?
Achieving Timing Closure - 231

2008 Xilinx, Inc. All Rights Reserved

Show Slide 232:

Case 2 Answer
Data Path: source to dest
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.125
Tilo
0.146
net (fanout=187)
2.500
Tilo
0.146
net (fanout=1)
0.174
Tilo
0.146
net (fanout=1)
0.204
Tas
0.159
---------------------------Total
3.872ns

What is the primary cause of the timing failure?

The signal net_2 has a long delay, but the fanout is not low
Most likely cause is high fanout

Achieving Timing Closure - 232

Page 210

Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
dest
-----------------------------(0.869ns logic, 3.003ns route)
(22.4% logic, 77.6% route)

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Interpreting Timing Reports


Show Slide 233:

High Fanout: Solutions

Most likely solution is to duplicate the source of the high-fanout net

If the net is the output of a flip-flop, the solution is to duplicate the flip-flop

If the net is driven by combinatorial logic, locating the source of the net in
the HDL code may be more difficult

Use manual duplication (recommended) or synthesis options

Use synthesis options to duplicate the source

Achieving Timing Closure - 233

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

For more information about duplicating flip-flops, see the


FPGA Design Techniques module.

www.xilinx.com
1-877-XLX-CLAS

Page 211

Achieving Timing Closure

Facilitator Guide

Interpreting Timing Reports


Show Slide 234:

Case 3
Data Path: source to dest
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.521
Tilo
0.146
net (fanout=1)
0.180
Tilo
0.146
net (fanout=1)
0.223
Tilo
0.146
net (fanout=1)
0.123
Tilo
0.146
net (fanout=1)
0.310
Tilo
0.146
net (fanout=1)
0.233
Tilo
0.146
net (fanout=1)
0.308
Tas
0.159
---------------------------Total
3.205ns

Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
lut_4
net_5
lut_5
net_6
lut_6
net_7
dest
-------------------------------------(1.307ns logic, 1.898ns route)
(40.8% logic, 59.2% route)

This path is also constrained to 3 ns


What is the primary cause of the timing failure?
Achieving Timing Closure - 234

2008 Xilinx, Inc. All Rights Reserved

Show Slide 235:

Case 3 Answer
Data Path: source to dest
Delay type
Delay(ns)
---------------------------Tcko
0.272
net (fanout=7)
0.521
Tilo
0.146
net (fanout=1)
0.180
Tilo
0.146
net (fanout=1)
0.223
Tilo
0.146
net (fanout=1)
0.123
Tilo
0.146
net (fanout=1)
0.310
Tilo
0.146
net (fanout=1)
0.233
Tilo
0.146
net (fanout=1)
0.308
Tas
0.159
---------------------------Total
3.205ns

What is the primary cause of the timing failure?

There are no really long delays, but there are a lot of logic levels (7)

Achieving Timing Closure - 235

Page 212

Logical Resource(s)
------------------source
net_1
lut_1
net_2
lut_2
net_3
lut_3
net_4
lut_4
net_5
lut_5
net_6
lut_6
net_7
dest
-------------------------------------(1.307ns logic, 1.898ns route)
(40.8% logic, 59.2% route)

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Interpreting Timing Reports


Key Points
!

The seven logic levels in this path include the six Tilo delays
plus the Tas delay (setup time going through a LUT).

Show Slide 236:

Too Many Logic


Levels: Solutions

The implementation tools cannot do much to improve performance


The netlist must be altered to reduce the amount of logic between
flip-flops
Possible solutions

Check whether the path is a multicycle path

Use the retiming option during synthesis to distribute logic more evenly
among flip-flops
Confirm that good coding techniques were used to build this logic
(no nested if or case statements)
Add a pipeline stage

If yes, add a multicycle path constraint

Achieving Timing Closure - 236

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 213

Achieving Timing Closure

Facilitator Guide

Report Options
Show Slide 237:

Lessons

Achieving Timing Closure - 237

Timing Reports
Interpreting Timing Reports
Report Options
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 238:

Types of Timing Reports

Analyze Against Timing Constraints

Compares design performance with timing constraints


Most commonly used report format

Used for Post-Map and Post-Place & Route Static Timing Reports if the
design contains constraints

Analyze Against Auto-Generated Design Constraints

Determines the longest paths in each clock domain


Use with designs that have no constraints defined

Used for Post-Map and Post-Place & Route Static Timing Reports if the
design contains no constraints

Achieving Timing Closure - 238

Page 214

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Report Options
Key Points
!

Choosing which report to create depends on whether you used


timing constraints.

The Analyze Against Timing Constraints Report is the most


useful report if your design contains timing constraints. This
report provides you with information on each of your
constraints.

The Analyze Against Auto-Generated Constraints Report is


only used with designs that do not contain timing constraints.

Show Slide 239:

Types of Timing Reports

Analyze Against User-Specified Paths by Defining Endpoints

Custom report for selecting sources and destinations

Analyze Against User-Specified Paths by Defining Clock and I/O Timing

Allows you to define PERIOD and OFFSET constraints on-the-fly


Use with designs that have no constraints defined

Achieving Timing Closure - 239

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The Analyze Against User-Specified Paths by Defining


Endpoints Report allows you to create custom reports that
focus on specific paths in the design.

The Analyze Against User-Specified Paths by Defining Clock


and I/O Timing Report is only used with designs that do not
contain timing constraints.

www.xilinx.com
1-877-XLX-CLAS

Page 215

Achieving Timing Closure

Facilitator Guide

Report Options
Key Points
!

Clicking the icons in the toolbar will create a report using the
currently defined options. To access the report options shown
next, you must select a report type from the Analyze menu.

Show Slide 240:

Timing Constraints Tab

After selecting a Timing Analyzer


report, you can select from
various report options
Report failing paths: Lists only
the paths that fail to meet your
specified timing constraints
Report unconstrained paths:
Allows you to list some or all of
the unconstrained paths in your
design
You can also select which
constraints you want reported
Achieving Timing Closure - 240

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 216

Selecting a report from the Analyze menu displays an options


dialog box.

Select the Report paths option to create reports after MAP but
before Place & Route. This format has detailed path information
on the longest paths for each constraint, even if they are not
timing errors (default format for the Post-Map Static Timing
Report).

Select the Report failing paths option to create reports after


Place & Route. This format has detailed path information on
just the paths that fail to meet timing (default format for the
Post-Place & Route Static Timing Report).

Reporting unconstrained paths is useful when you are not


certain which paths were covered by your timing constraints.
www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Report Options
Key Points
!

You can select which constraints that you want to apply to the
design during the report creation. If a constraint is not selected,
the tools will act as if the constraint did not exist. For example,
if you disable a multicycle path constraint, those paths will be
analyzed and reported under the global PERIOD constraint
(probably as timing errors).

TRAINER NOTE

Demo Instructions:
Viewing the report options:
!

Select Analyze Against Timing Constraints.

Note: There may not be timing constraints for this design. If there
are timing constraints, they are listed in the Timing Constraints tab
as in the figure above.

Show Slide 241:

Options Tab

Speed grade

Constraint details

Specify the number of detailed


paths reported per constraint
Report details hold violations

Timing report contents

Generate new timing information


without reimplementing

Include or exclude report


sections

Prorating

Specify your own worst-case


environment

Achieving Timing Closure - 241

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 217

Achieving Timing Closure

Facilitator Guide

Report Options
Key Points
!

The Speed Grade option lets you easily determine whether


moving to a faster or slower speed-grade device will meet your
timing needs.

Select the Report fastest paths/verbose hold paths option if


you have clock signals on non-global routing resources. (Skew
analysis is automatically performed on all global clocks.)

Prorating values are best entered in the Constraints Editor


because these prorated delays will be used during Place &
Route. This mechanism only updates the generated timing
report for the new environmental conditions that are specified.

If prorating gets you close to timing closure, try entering the


prorated values in the Constraints Editor and reimplement.

Prorating is not always available for the newest device families.

Show Slide 242:

Filter Paths by Net Tab

Restrict which paths are reported


by selecting specific nets
Each net is assigned to be
included by default
Net filter values

Exclude paths containing this


net
Include Only paths containing
this net
Default

Achieving Timing Closure - 242

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 218

Filtering is method for reducing the number of reported paths


by covering or excluding paths that contain a particular net.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Report Options
Key Points
!

If you specify some nets as Exclude, then all nets marked


Default will be included. If you specify some nets as Include
Only, then all nets marked Default will be excluded.

If you have disabled some constraints on the Timing


Constraints tab, this tab can be used to filter out false paths
from the report. For example, if you disabled a multicycle path
constraint, you can Exclude the associated clock enable net to
prevent those paths from being analyzed against the global
PERIOD constraint.

Show Slide 243:

Path Tracing Tab

Restrict which paths are


reported by selecting path
endpoints or path types

Achieving Timing Closure - 243

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

This allows you to reduce the size of a report by specifying the


path endpoints to be reported.

You can also enable or disable analysis on some specific type of


paths.

Asynchronous Set/Reset to Output and Recovery enables path


tracing through CLB flip-flop asynchronous set or reset inputs
to the Q output.
www.xilinx.com
1-877-XLX-CLAS

Page 219

Achieving Timing Closure

Facilitator Guide

Summary
Show Slide 244:

Lessons

Achieving Timing Closure - 244

Timing Reports
Interpreting Timing Reports
Report Options
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 245:

Apply Your Knowledge

1) To which resources is the timing report linked?

2) List the possible causes of timing errors

Achieving Timing Closure - 245

Page 220

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Achieving Timing Closure

Summary
Show Slide 246:

Summary

Timing reports enable you to determine how and why constraints were not
met
Use the Synthesis Report and Post-Map Static Timing Report to estimate
performance before running Place & Route
The detailed path description offers clues to the cause of timing failures
Cross-probe to see the placement and a technology view of a timing path
The Timing Analyzer can generate various types of reports for specific
circumstances

Achieving Timing Closure - 246

2008 Xilinx, Inc. All Rights Reserved

Where Can I Learn More?


!

Timing Analyzer Overview/Online Help


Help Help Topics

www.xilinx.com
1-877-XLX-CLAS

Page 221

Achieving Timing Closure

Facilitator Guide

Apply Your Knowledge Answers


Answers

1) To which resources is the timing report linked?


!

Timing Improvement Wizard

Interactive data sheet on the Web

Floorplanner-implemented view for cross-probing

Technology view for cross-probing

2) List the possible causes of timing errors.


!

Neglecting synchronous design rules or using incorrect HDL


coding style

Poor synthesis results (too many levels of logic)

Inaccurate or incomplete path-specific timing constraints

Poor logic mapping or placement

Transition to Lab 4: Review of Global Timing Constraints

Page 222

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Lab 4: Review of Global Timing Constraints

Lab 4: Review of Global Timing


Constraints
Purpose

After completing this lab, you will be able to:


!

Enter global timing constraints in the Constraints Editor

Read reports to determine whether constraints were met

Analyze the failing paths in the timing report to determine the


cause

Describe possible solutions to the failing paths

Time

45 minutes
Process

This lab illustrates how to use global timing constraints and the
Timing Analyzer to find the timing-critical paths of a design and
develop a strategy for gaining timing closure.
General Flow
!

Step 1: Enter global timing constraints

Step 2: Implement the design and analyze the timing

Step 3: Implement the design and analyze the timing with


Offset In and Offset Out constraints

www.xilinx.com
1-877-XLX-CLAS

Page 223

Lab 4: Review of Global Timing Constraints

Facilitator Guide

Lab
Designing for Performance Lab Workbook
!

Refer to the separate lab workbook for the Review of Global


Timing Constraints lab.

Transition to Timing Groups and OFFSET Constraints

Page 224

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

Timing Groups and OFFSET


Constraints
Purpose

After completing this module, you will be able to:


!

Use the Constraints Editor to create groups of path endpoints

Use the Constraints Editor to create path-specific OFFSET


constraints

Time

45 minutes
Process

This module describes the best ways to group path endpoints to


make the most efficient path-specific timing constraints.
Lessons
!

Introduction

Overview

Creating Groups

OFFSET Constraints

Summary

www.xilinx.com
1-877-XLX-CLAS

Page 225

Timing Groups and OFFSET Constraints

Facilitator Guide

Introduction
Show Slide 247:

Timing Groups and


OFFSET Constraints

Show Slide 248:

Objectives
After completing this module, you will be able to:

Use the Constraints Editor to create groups of path endpoints


Use the Constraints Editor to create path-specific OFFSET constraints

Timing Groups and OFFSET


Constraints - 248

Page 226

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

Overview
Show Slide 249:

Lessons

Timing Groups and OFFSET


Constraints - 249

Overview
Creating Groups
OFFSET Constraints
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 250:

Path-Specific Timing
Constraints

Using global timing constraints (PERIOD, OFFSET, and PAD-TO-PAD)


will constrain your entire design
Using only global constraints often leads to over-constrained designs

Constraints are too tight


Increases compile time and can prevent timing objectives from being met
Review performance estimates provided by your synthesis tool or the PostMap Static Timing Report

Path-specific constraints override the global constraints on specified paths

This allows you to loosen the timing requirements on specific paths

Timing Groups and OFFSET


Constraints - 250

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 227

Timing Groups and OFFSET Constraints

Facilitator Guide

Overview
Key Points
!

The key to effective constraining is applying only the


constraints that are required to communicate your performance
objectives. If you specify unrealistic expectations that you do
not really need to be met, your compile time will increase, and
you may have difficulty getting your design to complete the
Place & Route phase of implementation.

Path-specific constraints provide an accurate method of


communicating design performance objectives. Global
constraints are very powerful and can constrain every delay
path in your design. Path-specific constraints allow you to
define critical timing paths that require further optimization,
multicycle paths that are not required to be constrained as
tightly, and false paths that are not required to be constrained
at all. Path-specific timing constraints provide the
implementation tools the greatest flexibility to meet your
system timing objectives and are a critical part of highperformance design.

Show Slide 251:

More About Path-Specific


Timing Constraints

Areas of your design that can benefit from path-specific constraints

Multicycle paths
Paths that cross between clock domains
Bidirectional buses
I/O timing

Path-specific timing constraints should be used to define your


performance objectives and should not be placed indiscriminately

Timing Groups and OFFSET


Constraints - 251

Page 228

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

Overview
Key Points
!

Implementing path-specific constraints on designs that contain


multicycle paths or bidirectional buses is very important.
Constraints placed on these designs often loosen or remove a
large number of constrained paths, which gives the
implementation tools a great deal of flexibility in meeting your
system timing objectives.

Show Slide 252:

Global Constraints Review


Using the global PERIOD, OFFSET IN, and OFFSET OUT constraints
constrains all of these paths
This makes it easy to control the overall performance of your design

ADATA

FLOP1

FLOP2

FLOP3

OUT1

CLK
BUFG

FLOP4

FLOP5

OUT2

BUS [7..0]

CDATA
Timing Groups and OFFSET
Constraints - 252

2008 Xilinx, Inc. All Rights Reserved

Global Constraints Review


ADATA

BUF

FLO

FLO

FLO

D Q

D Q

D Q

FLO

FLO

D Q

D Q

OUT

OUT2

BUS [7..0]

CDATA

www.xilinx.com
1-877-XLX-CLAS

Page 229

Timing Groups and OFFSET Constraints

Facilitator Guide

Overview
Key Points
!

In this example, three global constraints cover most of the paths


in the design. Because most of the delay paths are covered,
controlling design performance by adjusting the constraints is
easy.

The caveat to using global constraints is that they constrain


many delay paths to the same timing requirement and do not
allow you to constrain specific paths to a separate delay.

Show Slide 253:

Path-Specific
Constraint Example

A path-specific constraint can optimize as little as one path


This provides you greater control over the performance of your design
and allows the implementation tools the greatest flexibility in meeting your
performance and utilization needs
ADATA

FLOP1

FLOP2

DQ

D Q

FLOP3
D Q

OUT1

CLK
BUFG

FLOP4

FLOP5

D Q

DQ

OUT2

BUS [7..0]

CDATA

Timing Groups and OFFSET


Constraints - 253

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 230

While global constraints are powerful because of their wide


scope, path-specific constraints are also powerful because of
their precision. By loosening or tightening the constraints on
specific paths, you provide the implementation tools more
flexibility and a greater chance of meeting all of your timing
goals.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

Overview
Show Slide 254:

The Constraints Editor

Creating path-specific constraints


requires two steps

Step 1: Create groups of path


endpoints
Step 2: Communicate the timing
objective between the groups

Timing Groups and OFFSET


Constraints - 254

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Creating path-specific timing constraints is a two-step process:


Grouping path endpoints and defining the constraint length.
Groups of path endpoints can contain flip-flops, RAMs, latches,
or pads. The most commonly used path-specific timing
constraints are Slow/Fast Path Exceptions and Multicycle
Paths.

www.xilinx.com
1-877-XLX-CLAS

Page 231

Timing Groups and OFFSET Constraints

Facilitator Guide

Overview
TRAINER NOTE

Demo Instructions:
Opening a project and launching the Constraints Editor:
1. Open the ISE software.
2. Select File Open Project.
3. Browse to the Review lab.
4. Select tc_review_lab.npl and click Open.
5. In the Source window, select the
correlate_and_accumulate.ucf file
6. In the Process window, double-click Create Timing
Constraints.

Page 232

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

Creating Groups
Show Slide 255:

Lessons

Timing Groups and OFFSET


Constraints - 255

Overview
Creating Groups
OFFSET Constraints
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 256:

Creating Groups of
Endpoints

Path-specific timing constraints will only be effective if path endpoints can


be easily grouped together

Otherwise, constraining a large design would be time consuming and


painstaking

The Constraints Editor makes this easy by allowing you to define groups
of path endpoints (pads, flip-flops, latches, and RAMs)
Specific delay paths can then be constrained with advanced timing
constraints

Timing Groups and OFFSET


Constraints - 256

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 233

Timing Groups and OFFSET Constraints

Facilitator Guide

Creating Groups
Key Points
!

Global constraints use predefined groups of path endpoints.


Before you can create path-specific constraints, you must create
your own groups of path endpoints.

Creating path-specific constraints requires grouping path


endpoints and creating constraints between those groups. The
best thing about the Constraints Editor is that it allows large
quantities of paths to be constrained by grouping only a few
components. It also allows you to constrain one or several paths
with a single constraint.

The challenge when creating path-specific timing constraints is


grouping path endpoints. This is sometimes difficult because
synthesis tools do not always maintain instance or net names.

Show Slide 257:

Creating Groups of
Endpoints

With the Constraints Editor,


grouping path endpoints is
made easy with the following
options

By Nets
By Instance Name
By Hierarchy
By Element Type
By Clock Edge
Through Points
By DCM Output

Timing Groups and OFFSET


Constraints - 257

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 234

Group elements associated by nets: Group all path endpoints


that are driven by a specific net (such as a clock enable).

Group elements by instance name: Group path endpoints by


name (wildcards are allowed).
www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

Creating Groups
Key Points
!

Group elements by hierarchy: Group all path endpoints in a


specific level of hierarchy.

Group elements by element type: Group synchronous


endpoints (not pads) by output net name (wildcards are
allowed). This method is mainly used by schematic designers,
but it can also be used with HDL design if you know the net
names of interest.

Timing THRU Points: Group nets or 3-state buffers to be used


as THRU points in path-specific constraints. Remember that 3state buffers are not path endpoints.

Group elements by clock edge: Group synchronous elements


that are clocked by the same edge of the same clock signal. This
option is useful for designs that use DDR output flip-flops.

Group elements by DCM Output pins: Group together the


output clock signals from one DCM component.

Show Slide 258:

Grouping by Nets or
Output Net Name

Step 1: Enter a group name


Step 2: Select the type of net to
search for

Clock or enable net


Optional filter string

Matching nets appear in the


Available list
Step 3: Select nets and click Add

Nets appear in the Time Name


Targets window

Timing Groups and OFFSET


Constraints - 258

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 235

Timing Groups and OFFSET Constraints

Facilitator Guide

Creating Groups
Key Points
!

The Group elements associated by Nets option, shown in the figure


above, is used to select a control signal that connects to a group
of registers, latches, or RAMs. All components driven by the
control signal become a group of path endpoints and can be
given a reference name, such as Control_Registers.

Most synthesis tools have schematic viewers that allow you to


determine easily the clock enable signal name generated by the
synthesis tool. The Xilinx Floorplanner or the FPGA Editor can
also be used. Most designers find that the net names chosen by
their synthesis tool are recognizable.

The Group elements by output net name option (not shown, but
the dialog box is similar) is commonly used with designs that
use schematic design flows; however, if your synthesis tool
maintains the names of nets connected to the outputs of your
synchronous elements, it can be useful.

TRAINER NOTE

Demo Instructions:
Creating a group of path endpoints:
1. In the Constraints Editor, click Create next to Group
elements associated by Nets (TNM_NET).
2. Enter MY_CLKEN_GRP in the Time Name field.
3. Select Enable Nets from the drop-down list.
4. Click the Add All to move the nets into the Time Name
Targets window.
5. Click OK to create the group.

Page 236

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

Creating Groups
Show Slide 259:

Grouping by Instance Name


or Hierarchy

Steps are the same


Design element types are
different

Instance name: FFs, pads,


latches, RAMs, CPUs,
HSIOs
Hierarchy: User levels,
levels created by Xilinx

Timing Groups and OFFSET


Constraints - 259

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The Grouping by instance name option requires you to know the


name of each resource that is to be a part of a group. If you
preserve hierarchy during synthesis, finding the logic that you
want to group can be easier. Instance names can be found in
your synthesis tools Schematic Viewer or the Xilinx
Floorplanner.

The Grouping by hierarchy option (not shown) is not a commonly


used method. The best use of this option is in a schematic flow
or when your design contains cores. Both of these types of
designs will contain both user and Xilinx levels of hierarchy.
You can apply constraints to all of the cores in your design by
searching for Xilinx levels of hierarchy.

www.xilinx.com
1-877-XLX-CLAS

Page 237

Timing Groups and OFFSET Constraints

Facilitator Guide

Creating Groups
Show Slide 260:

Grouping by Clock Edge

Step 1: Enter a group


name
Step 2: Select a previously
defined group

Optional filter to help find


the group

Step 3: Select a clock


edge

Timing Groups and OFFSET


Constraints - 260

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 238

The Grouping by Clock Edge option is used to take an existing


group of flip-flops and create a subgroup that is clocked on a
specific clock edge.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

Creating Groups
Show Slide 261:

Grouping by DCM Outputs

Step 1: Enter a group


name
Step 2: Select a DCM
instance

Optional filter to help find


the group

Step 3: Select outputs and


click Add

Timing Groups and OFFSET


Constraints - 261

2008 Xilinx, Inc. All Rights Reserved

Show Slide 262:

Timing THRU Points

Allows you to optimize paths through specific nets and 3-state buffers
In this example, a group of nets was named TEOUTS. A constraint can
now be referenced such that only the delay paths through the TEOUTS
nets will be optimized
TPTHRU = TEOUTS
D

reg
MYCTR

reg
D

reg

Timing Groups and OFFSET


Constraints - 262

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 239

Timing Groups and OFFSET Constraints

Facilitator Guide

Creating Groups
Key Points
!

THRU points allow you to select particular paths through nets


and 3-state buffers.

If you use HDL, you can have difficulty determining what the
net names are in your design. You can use your synthesis tools
Schematic Viewer or the Xilinx Floorplanner to find net names.

Show Slide 263:

Timing THRU Points

Step 1: Enter a TPTHRU name


Step 2: Select nets or 3-state
buffers

Optional filter string

Step 3: Select items and click


Add

Timing Groups and OFFSET


Constraints - 263

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 240

Remember that THRU points allow you to identify nets and 3state buffers so that particular paths that use those resources
can be specifically constrained.

This option is most often used for identifying false paths, which
can occur when bidirectional buses are a part of your design.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

Creating Groups
TRAINER NOTE

Demo Instructions:
Creating a group of nets:
1. In the Advanced tab, click the Create button next to Timing
THRU Points (TPTHRU).
2. Enter MY_MID_PT in the TPTHRU Name field.
3. Select tensout<0> through tensout<6> to be in this group.
You can use the Filter field to help find the nets. Enter tens*
in the Filter field and click Find to help narrow the search.
4. Click OK to create the group.

Show Slide 264:

Managing Groups

Groups that you have defined are written into the UCF

INST <element_name> TNM = <group_name>; OR


NET <net_name> TNM_NET = <group_name>; OR
TIMEGRP <group_name> = <elements>;

To add items to an existing group, click one of the grouping buttons and
use the same time name
To delete a group, delete it with a text editor
You cannot remove items from a group with the Constraints Editor

Edit the UCF with a text editor

Timing Groups and OFFSET


Constraints - 264

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

The INST constraint is used when you create groups by


instance name or level of hierarchy.
www.xilinx.com
1-877-XLX-CLAS

Page 241

Timing Groups and OFFSET Constraints

Facilitator Guide

Creating Groups
Key Points

Page 242

The NET constraint is used when you create groups by net.

The TIMEGRP constraint is used when you create groups by


output net name.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

OFFSET Constraints
Show Slide 265:

Lessons

Timing Groups and OFFSET


Constraints - 265

Overview
Creating Groups
OFFSET constraints
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 266:

Review of Global OFFSET


Constraints

Use the Pad to Setup and Clock to Pad columns to specify OFFSETs for
all I/O paths on each clock domain
Easiest way to constrain most I/O paths

However, this can lead to an over-constrained design

Timing Groups and OFFSET


Constraints - 266

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 243

Timing Groups and OFFSET Constraints

Facilitator Guide

OFFSET Constraints
Key Points
!

If you have large numbers of I/O pins with similar timing


requirements, set the global OFFSET constraints to that
requirement.

Then use path-specific OFFSET constraints to override the


global constraints for I/O pins that have different requirements.

Show Slide 267:

Pin-Specific OFFSET
Constraints

Use the Pad to Setup and Clock to Pad columns to specify OFFSET
constraints for each I/O pin
Use this type of constraint when only a few I/O pins need different timing

Timing Groups and OFFSET


Constraints - 267

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 244

Pin-specific OFFSET In/Out constraints can be entered in the


Ports tab of the Constraint Editor.

You can select a large number of I/O paths by holding down


the Shift or Ctrl key and clicking each I/O pin under the
appropriate column heading. After selecting the pads, rightclick and select Clock to Pad or Pad to Setup.

Creating a large number of pin-specific constraints usually


requires the implementation tools to take more time during
Place & Route. To reduce compile time, creating group OFFSET
In/Out constraints (the next few pages) is recommended.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

OFFSET Constraints
TRAINER NOTE

Demo Instructions:
!

Click the Ports tab to view where you can enter OFFSET
constraints for specific inputs and outputs.

Show Slide 268:

Creating Groups of Pads

Groups of I/O pads can be made in the Ports section

Use Shift-click or Ctrl-click to select multiple pads


Right-click and select Pad to Setup or Clock to Pad and enter the length
of the constraint

Timing Groups and OFFSET


Constraints - 268

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

This option constrains multiple paths to I/O pads at once.

For example, a global constraint of 20 ns on inputs is sufficient


for most paths, but a single bus may need to be constrained to
10 ns. The global OFFSET can be 20 ns, and the path-specific
OFFSET for the group of bus pins can be 10 ns. Group OFFSETs
are easier to enter than pin-specific OFFSETs and are faster to
compile.

You can also create groups of I/O pads by using the buttons in
the Advanced tab; however, I/O pads do not always have
common names for easy grouping. The Ports tab allows you to
easily create groups of pads with arbitrary names.
www.xilinx.com
1-877-XLX-CLAS

Page 245

Timing Groups and OFFSET Constraints

Facilitator Guide

OFFSET Constraints
Key Points
!

Note that this allows you to easily make a group offset


constraint in case your design is a double-data rate application
as well.

Show Slide 269:

Creating Group OFFSET


Constraints

OFFSET IN/OUT constraints can also be entered in the Advanced tab


The Pad to Setup and Clock to Pad options allow you to enter OFFSET
IN/OUT constraints on specific groups of pads

Just group your pads by element type first

Timing Groups and OFFSET


Constraints - 269

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 246

Instead of using the Ports tab, you may find it easier to use the
Advanced tab in some cases.

For example, if you want to constrain input paths that end at a


specific group of registers called critical_inputs, specifying an
OFFSET IN from All Pads to critical_inputs may be easier
instead of grouping all the input pads that feed those registers.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

OFFSET Constraints
Show Slide 270:

Group OFFSET Constraints

Select a group of pads

Enter a timing requirement

Optional: Change clock


domain

Optional: Select a group of


synchronous elements

Timing Groups and OFFSET


Constraints - 270

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

This dialog box appears when you select the Pad to Setup or
Clock to Pad buttons in the Ports tab or the Advanced tab.

www.xilinx.com
1-877-XLX-CLAS

Page 247

Timing Groups and OFFSET Constraints

Facilitator Guide

OFFSET Constraints
Show Slide 271:

Source-Synchronous
OFFSET Constraints

For source-synchronous
inputs, you can specify
the width of the valid
data window by
specifying a rising
edge constraint and
a falling edge constraint

Timing Groups and OFFSET


Constraints - 271

2008 Xilinx, Inc. All Rights Reserved

Show Slide 272:

OFFSET Constraints with


Two-Phase Clocks

OFFSET constraints define the relationship between the data and the
reference clock edge at the pins of the FPGA

Defined in the global PERIOD constraint with the HIGH or LOW keyword

If all I/Os are clocked on a single edge, use the HIGH or LOW keyword in
the PERIOD constraint to define which edge is used
If both clock edges are used, use the opposite keyword in the OFFSET
constraint

Relative to Clock Edge option

Timing Groups and OFFSET


Constraints - 272

Page 248

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

OFFSET Constraints
Key Points
!

If the HIGH keyword is used in the PERIOD constraint, then


the reference clock edge is a rising edge. This is the default.

If the LOW keyword is used, then the reference clock edge is a


falling edge.

If you need the OFFSET constraint to be able to reference both


clock edges, use one keyword in the PERIOD constraint, and
the opposite keyword when defining the OFFSET constraint.

www.xilinx.com
1-877-XLX-CLAS

Page 249

Timing Groups and OFFSET Constraints

Facilitator Guide

Summary
Show Slide 273:

Lessons
Overview
Creating Groups
OFFSET Constraints
Summary

Timing Groups and OFFSET


Constraints - 273

2008 Xilinx, Inc. All Rights Reserved

Show Slide 274:

Apply Your Knowledge

1) How do path-specific timing constraints improve the performance of


your design?
2) How would you constrain this design to obtain a maximum internal
clock frequency of 100 MHz?

The input will be valid at least 3 ns before the rising edge of CLK. The
output must be valid 4 ns after the falling edge of CLK.

3) Write the appropriate OFFSET constraints


IN

OUT
C

CLK
RESET_A
RESET_B
Timing Groups and OFFSET
Constraints - 274

Page 250

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Timing Groups and OFFSET Constraints

Summary
Apply Your Knowledge

IN

D Q

D Q

D Q

D Q

OUT

CLK
RESET_A
RESET_B

Show Slide 275:

Summary

Path-specific constraints are used to override global constraints

Creating path-specific constraints is a two-step process

Keeps your design from becoming over-constrained


Allows the software to make intelligent trade-offs to meet all of your
performance goals
Create groups of path endpoints
Communicate the timing objective between the groups

Path-specific OFFSET constraints can be entered on either the Ports tab


or the Advanced tab
When using both clock edges for I/O, write separate OFFSET constraints
for each clock edge

Timing Groups and OFFSET


Constraints - 275

2008 Xilinx, Inc. All Rights Reserved

Where Can I Learn More?


!

Constraints Guide
Help Software Manuals

www.xilinx.com
1-877-XLX-CLAS

Page 251

Timing Groups and OFFSET Constraints

Facilitator Guide

Apply Your Knowledge Answers


Answers

1) How do path-specific timing constraints improve the


performance of your design?
!

Path-specific timing constraints provide more flexibility to the


implementation tools for meeting all of your timing objectives.

2) How would you constrain this design to obtain a maximum


internal clock frequency of 100 MHz?
!

Enter a global PERIOD constraint of 10 ns on the CLK signal.

3) Write the appropriate OFFSET constraints.


!

Assuming that the PERIOD constraint uses the HIGH keyword


and 50-percent duty cycle:
OFFSET = IN 3 ns BEFORE CLK;
OFFSET = OUT 4 ns AFTER CLK FALLING;

Transition to Path-Specific Timing Constraints

Page 252

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Path-Specific Timing Constraints

Path-Specific Timing Constraints


Purpose

After completing this module, you will be able to:


!

Constrain paths that cross between clock domains by using the


Constraints Editor

Constrain multicycle paths by using the Constraints Editor

Define false paths by using the Constraints Editor

Describe how constraints are prioritized

Time

45 minutes
Process

This module describes some of the most common applications for


path-specific timing constraints and how to make them with the
Xilinx Constraints Editor.
Lessons
!

Introduction

Inter-Clock Domain Constraints

Multicycle Paths

False Paths

Miscellaneous Constraints

Summary

www.xilinx.com
1-877-XLX-CLAS

Page 253

Path-Specific Timing Constraints

Facilitator Guide

Introduction
Show Slide 276:

Path-Specific Timing
Constraints

Show Slide 277:

Objectives
After completing this module, you will be able to:

Constrain paths that cross between clock domains by using the


Constraints Editor
Constrain multicycle paths by using the Constraints Editor
Define false paths by using the Constraints Editor
Describe how constraints are prioritized

Path-Specific Timing Constraints - 277

Page 254

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Path-Specific Timing Constraints

Introduction
Show Slide 278:

Timing Closure

Path-Specific Timing Constraints - 278

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 255

Path-Specific Timing Constraints

Facilitator Guide

Inter-Clock Domain Constraints


Show Slide 279:

Outline

Inter-Clock Domain Constraints


Multicycle Paths
False Paths
Miscellaneous Constraints
Summary

2008 Xilinx, Inc. All Rights Reserved

Path-Specific Timing Constraints - 279

Show Slide 280:

Constraining Between Rising


and Falling Clock Edges

The PERIOD constraint automatically accounts for two-phase clocks

Includes adjustments for non-50-percent duty cycle clocks

Example: A PERIOD constraint of 10 ns on CLK will apply a 5-ns


constraint between these two flip-flops
No path-specific constraints are required for this case

OUT
CLK

Path-Specific Timing Constraints - 280

Page 256

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Path-Specific Timing Constraints

Inter-Clock Domain Constraints


Key Points
!

Recall that the PERIOD constraint allows you to specify the


clock duty cycle. The implementation tools automatically
reduce the length of the constraint when some flip-flops are
triggered off the negative edge of the same clock.

If your HDL code contains some processes that are triggered on


a rising edge and other processes that are triggered on a falling
edge, your synthesis tool will create a circuit like the one
shown.

If you manually create an inverted clock and use that clock in


your HDL code, your synthesis tool can create logic different
than what is shown. This can prevent the Xilinx software from
correctly constraining these paths.

Show Slide 281:

Constraining Between
Related Clock Domains

Create a PERIOD constraint for one clock

Define all related clocks in terms of this PERIOD constraint

The implementation tools will use the relationships to determine how to


cross between clock domains
DCM with multiple outputs

Define a PERIOD constraint on the input to the DCM


The implementation tools will push the constraint onto each output
All constraints will be defined relative to the original PERIOD constraint

Path-Specific Timing Constraints - 281

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

If your design contains clocks that have a fixed relationship in


frequency and/or phase, you should define your global
PERIOD constraints relative to each other.

www.xilinx.com
1-877-XLX-CLAS

Page 257

Path-Specific Timing Constraints

Facilitator Guide

Inter-Clock Domain Constraints


Key Points
!

To learn more about creating related PERIOD constraints, refer


to the Global Timing Constraints module in the Fundamentals
of FPGA Design course.

Show Slide 282:

Constraining Between
Unrelated Clock Domains

In this example, the delay path between the two clock domains is not
covered by either of the PERIOD constraints

You must add a synchronization circuit when crossing between unrelated


clock domains

This is the default behavior

A constraint is not technically needed, but you may want to constrain the
path for completeness
PERIOD CLK_A
DQ

PERIOD CLK_B

D Q

D Q

D Q

OUT1

CLK_A
CLK_B

Path-Specific Timing Constraints - 282

2008 Xilinx, Inc. All Rights Reserved

Constraining Between Unrelated Clock Domains

Key Points
!

Page 258

If the two clocks are asynchronous (no known phase


relationship), then you should also insert a synchronization
circuit between the clock domains.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Path-Specific Timing Constraints

Inter-Clock Domain Constraints


Show Slide 283:

Constraining Between
Unrelated Clock Domains

To constrain the path between the two clock domains (highlighted in gray)

Define groups of registers CLK_A and CLK_B with the Group by Nets
option

Automatically done if you have specified a PERIOD constraint for both clock
domains

Place a Slow/Fast Exception between the two groups of registers


PERIOD CLK_A
D

5 ns

PERIOD CLK_B
D

OUT1
CLK_A
CLK_B

Path-Specific Timing Constraints - 283

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

When crossing between unrelated clock domains, there will be


a synchronization circuit to handle setup or hold violations;
however, a timing constraint is useful to ensure that signals
cross between the clock domains in a reasonable amount of
time (for example, one clock period).

www.xilinx.com
1-877-XLX-CLAS

Page 259

Path-Specific Timing Constraints

Facilitator Guide

Inter-Clock Domain Constraints


Show Slide 284:

Constraining Between
Unrelated Clock Domains

Step 1: Create the groups by


using the Group by Nets option

Group by clock net


Skip this step if PERIOD
constraints are defined

Step 2: Create the constraint


by clicking Slow/Fast
Exceptions

Path-Specific Timing Constraints - 284

2008 Xilinx, Inc. All Rights Reserved

Show Slide 285:

Constraining Between
Unrelated Clock Domains

Enter a name for this constraint

Must begin with TS

Select the groups that define the


constraint

Specify the value of the


constraint

Path-Specific Timing Constraints - 285

Page 260

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Path-Specific Timing Constraints

Inter-Clock Domain Constraints


Key Points
!

Creating the groups is not shown.

www.xilinx.com
1-877-XLX-CLAS

Page 261

Path-Specific Timing Constraints

Facilitator Guide

Multicycle Paths
Show Slide 286:

Outline

Inter-Clock Domain Constraints


Multicycle Paths
False Paths
Miscellaneous Constraints
Summary

Path-Specific Timing Constraints - 286

2008 Xilinx, Inc. All Rights Reserved

Show Slide 287:

Multicycle Path Constraints

Always at least one clock cycle


between updates
Typically, the registers are
controlled by a clock enable

CLK

PRE2

TC

50 MHz

CE

Q0 Q1

A prescaled counter is one example

Registers in COUT14 are updated every four clock cycles


Paths between these registers are multicycle paths

Path-Specific Timing Constraints - 287

Page 262

200 MHz

Multicycle paths occur when


registers are not updated on
consecutive clock cycles

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

COUT14
Q2 Q3 Q4

Q14 Q15

Facilitator Guide

Path-Specific Timing Constraints

Multicycle Paths
Key Points
!

Another common place to find multicycle paths is in state


machines. A request for data may be sent in one state, but the
state machine may need to go through multiple states before
the data is used. The path from the data request back into the
state machine would be a multicycle path, even though there is
no clock-enable signal involved.

Show Slide 288:

Creating Multicycle Path


Constraints

Step 1: Create a global PERIOD


constraint (not shown)
Step 2: Create groups by using
the Group by Nets option

Group by enable net

Step 3: Click Multicycle


Paths

Path-Specific Timing Constraints - 288

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Before you can create a multicycle path constraint, you must


first create a PERIOD constraint on the clock net. You then
group the synchronous elements that contain the multicycle
paths. Finally, you create the multicycle path constraint.

www.xilinx.com
1-877-XLX-CLAS

Page 263

Path-Specific Timing Constraints

Facilitator Guide

Multicycle Paths
Show Slide 289:

Creating Multicycle Path


Constraints

Enter a TIMESPEC name

Select the groups that were


previously defined

Define the constraint relative


to the PERIOD constraint

Path-Specific Timing Constraints - 289

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 264

When defining new constraints relative to existing constraints,


the Xilinx software simply multiplies or divides the reference
constraint value by the specified factor, keeping the units the
same.

For example, if the PERIOD constraint ts_clk has been defined


as a period length of 5 ns, you should multiply the constraint by
four to define a multicycle path constraint (5 ns x 4 = 20 ns).

If the PERIOD constraint had been defined as a frequency of


200 MHz, you would divide the constraint by four in order to
define the multicycle path constraint (200 MHz / 4 = 50 MHz).

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Path-Specific Timing Constraints

Multicycle Paths
Show Slide 290:

Apply Your Knowledge


Background Information

Prescaled 16-bit counter is created in two blocks

Q0 and Q1 in block PRE2 toggle at 200 MHz


Q[15:2] toggle every fourth clock edge (50 MHz)
The design is fully synchronous because all registers share the same clock

However, COUT14 registers are disabled 3/4 of the time so they do not have
to meet a 200-MHz PERIOD constraint
200 MHz

CLK

PRE2

TC

50 MHz

CE

Q0 Q1

Path-Specific Timing Constraints - 290

COUT14
Q2 Q3 Q4

Q14 Q15

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

A prescaled counter is simply a 2-bit counter that generates a


clock-enable signal that allows the most significant bits to
toggle at less than the full clock rate. Designers sometimes use
this type of counter because of its extremely high performance
(often faster than traditional carry logic implementations).

Prescaled counters are faster because only the Least Significant


Bits (LSBs) must toggle at the full clock rate. The Most
Significant Bits (MSBs) are disabled three out of every four
clock cycles because the prescaler is counting to four and
enabling the MSBs only one out of every four clock cycles.

Because the LSBs have the only critical paths, this gives the
implementation tools the greatest placement flexibility for the
MSBs, and the counter can easily be placed to obtain peak
performance.

www.xilinx.com
1-877-XLX-CLAS

Page 265

Path-Specific Timing Constraints

Facilitator Guide

Multicycle Paths
Apply Your Knowledge

200 MHz

CLK

PRE2

TC

50 MHz

COUT14

CE

Q2 Q3 Q4

Q0 Q1

Q14 Q15

Show Slide 291:

Apply Your Knowledge

1) What constraints need to be placed on this design to ensure it will meet


the performance objectives?
2) How would you enter these constraints through the Constraints Editor?
3) How do multicycle path constraints improve the performance of your
design?
200 MHz

CLK

PRE2

TC

50 MHz

CE

Q0 Q1

Path-Specific Timing Constraints - 291

Page 266

COUT14
Q2 Q3 Q4

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Q14 Q15

Facilitator Guide

Path-Specific Timing Constraints

False Paths
Show Slide 292:

Outline

Inter-Clock Domain Constraints


Multicycle Paths
False Paths
Miscellaneous Constraints
Summary

Path-Specific Timing Constraints - 292

2008 Xilinx, Inc. All Rights Reserved

Show Slide 293:

False Paths

The False Paths option


prevents constraints from
being applied to specific
paths

Use the False Paths


option to reduce the
number of constrained
paths in your design

Path-Specific Timing Constraints - 293

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 267

Path-Specific Timing Constraints

Facilitator Guide

False Paths
Key Points
!

False paths are useful when your design has paths that are not
required to be constrained. Most commonly, these paths are
bidirectional paths that are not exercised during normal
operation; however, any path that you know will meet your
timing objectives can be defined as a false path.

Show Slide 294:

Defining False Paths

Use the False Paths (FROM:TO:TIG)


option to define false paths between
groups of path endpoints

TIG = Timing IGnore


Prevents any constraints from
being applied to the paths
Paths through specific nets or
3-state buffers can be defined with
the THRU Points option

What is wrong with this example?

Path-Specific Timing Constraints - 294

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 268

Use the False Paths option to identify paths between groups of


path endpoints that should not have any constraints covering
them. While this will not remove the constraint, it will remove
these paths from the scope of the constraint.

In this example, all paths between flip-flops will be marked as


false paths. This effectively negates the PERIOD constraint. To
fix this problem, either change the groups of selected endpoints
or define and select a THRU point.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Path-Specific Timing Constraints

False Paths
Show Slide 295:

Defining False Paths by Nets

The False Paths by Nets option


allows you to ignore timing
constraints on a specific net

Any delay path containing


the RESET net will not be
constrained

The Ignored TIMESPECs


option allows specific
constraints to be ignored

Path-Specific Timing Constraints - 295

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

This option prevents any constraint from being applied to paths


that contain a specific net.

The Ignored TIMESPECs option prevents specific timing


constraints from being applied to the selected nets. By default,
all constraints are ignored for the selected nets. Select this
option to ignore specific constraints, while still allowing the
tools to apply other constraints to paths containing the selected
nets.

www.xilinx.com
1-877-XLX-CLAS

Page 269

Path-Specific Timing Constraints

Facilitator Guide

False Paths
Show Slide 296:

Apply Your Knowledge

4) If a PERIOD constraint were placed on this design, what delay paths


would be constrained?
5) If the goal is to optimize the input and output times without constraining
the paths between registers, what constraints are needed?
Assume that a global PERIOD constraint is already defined

Status
Register

Control
Register
Control_Enable

Status_Enable

BIDIR_PAD(7:0)
BIDIR_BUS(7:0)
Path-Specific Timing Constraints - 296

2008 Xilinx, Inc. All Rights Reserved

Apply Your Knowledge


Status
Register

Control
Register
Control_Enable
BIDIR_PAD(7:0)
BIDIR_BUS(7:0)

Page 270

www.xilinx.com
1-877-XLX-CLAS

Status_Enable

Facilitator Guide

Path-Specific Timing Constraints

False Paths
Show Slide 297:

Answer

4) If a PERIOD constraint were placed on this design, what delay paths


would be constrained?

Paths between the control registers and the status registers would be
constrained
Paths from each register feeding back to itself are also constrained

Status
Register

Control
Register
Control_Enable

Status_Enable

BIDIR_PAD(7:0)
BIDIR_BUS(7:0)
Path-Specific Timing Constraints - 297

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Because 3-state buffers are not path endpoints, all delay paths
through 3-state buffers can be unnecessarily constrained when
you use only global constraints. In this case, removing
constraints between the registers that drive the bus can be
useful.

www.xilinx.com
1-877-XLX-CLAS

Page 271

Path-Specific Timing Constraints

Facilitator Guide

False Paths
Show Slide 298:

Answer

5) If the goal is to optimize the input and output times without constraining
the paths between registers, what constraints are needed?

Enter OFFSET constraints in the Global tab


Define False Paths by Nets

Select the BIDIR_BUS[7:0] nets


Select the global PERIOD constraint to be ignored
Status
Register

Control
Register
Control_Enable

Status_Enable

BIDIR_PAD(7:0)
BIDIR_BUS(7:0)
Path-Specific Timing Constraints - 298

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

There are other ways to define the false paths:


Create a THRU point (BIDIR_BUS) that includes the
BIDIR_BUS[7:0] nets and define a path from All Flip-Flops
through BIDIR_BUS to All Flip-Flops.
Create a group (CONTROL) for the Control Register and a
group (STATUS) for the Status Register and define false
paths from CONTROL to STATUS and from STATUS to
CONTROL. This option only works if there are no other
paths between the registers that need to be covered by the
PERIOD constraint.

Page 272

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Path-Specific Timing Constraints

Miscellaneous Constraints
Show Slide 299:

Outline

Inter-Clock Domain Constraints


Multicycle Paths
False Paths
Miscellaneous Constraints
Summary

Path-Specific Timing Constraints - 299

2008 Xilinx, Inc. All Rights Reserved

Show Slide 300:

Miscellaneous Tab

Create area groups from TGs

Select nets to be routed on low-skew


resources

Use for high-fanout control


signals

Mark asynchronous registers

Good way to group logic without area


constraints

Prevents X propagation
during simulation

Assign individual registers to IOBs


Define initial values for storage elements
Path-Specific Timing Constraints - 300

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 273

Path-Specific Timing Constraints

Facilitator Guide

Miscellaneous Constraints
Key Points
!

The Map Process Properties option allows you to globally


merge registers into the IOBs. This option also allows you to
identify specific registers to be merged.

Marking a register as asynchronous will prevent an X value


from propagating during simulation when a setup/hold
violation occurs. This constraint has no effect on
implementation.

You do not need to mark clock nets with the


USELOWSKEWLINES constraint. Clock signals that do not use
global buffers automatically use the low-skew resources.

For more information on the constraints shown here, see the


Constraints Guide in the online documentation (Help
Software Manuals).

Show Slide 301:

Prorating Constraints

Prorating allows the tools to use the most accurate information

The implementation tools use the worst-case operating temperature and


voltage for your chosen device package (85 C for Commercial, 100 C for
Industrial)

Specify your own worst-case conditions

This will prorate the device delay characteristics to accurately reflect your
worst-case system conditions

Path-Specific Timing Constraints - 301

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 274

Prorating constraints adds greater timing accuracy to the


implementation tools. The new worst-case timing delays will
then be applied when timing constraints are used.
www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Path-Specific Timing Constraints

Miscellaneous Constraints
Key Points
!

If you prorate your constraints, make sure that you enter the
worst-case temperature and VCC that your device might ever
encounter.

Timing reports contain only the worst-case operating condition


delays. The Timing Analyzer can also create customized timing
reports for different worst-case operating conditions.

Show Slide 302:

Timing Constraint Priority

False paths

Highest

Must be allowed to override any


timing constraint

FROM THRU TO
FROM TO
Pin-specific OFFSETs
Group OFFSETs

Global PERIOD and OFFSETs

Groups of pads or registers


Lowest priority constraints

Path-Specific Timing Constraints - 302

Lowest

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

This is the way constraints are prioritized. Priority also explains


why the same path can be constrained multiple times. The more
specific the constraint, the higher the priority. Also note that the
value of the constraint has no effect on its priority.

www.xilinx.com
1-877-XLX-CLAS

Page 275

Path-Specific Timing Constraints

Facilitator Guide

Miscellaneous Constraints
Show Slide 303:

Timing Constraint
Interaction

Whenever a path is covered by more than one constraint, the tools must
choose which constraint to use for timing analysis
If the constraints are of different types, the highest priority constraint is
applied
If the constraints are of the same type (example: FROM TO), the decision
is more complex
Priority can be dictated with the PRIORITY keyword in the UCF

Values from 1000 to 1000


Lower number is higher priority
Example: TIMESPEC TS_01 = FROM src TO dest 7 ns PRIORITY 1;

Path-Specific Timing Constraints - 303

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Page 276

If two constraints cover the same paths and have the same
priority level, the software follows a set of rules to determine
which constraint will be applied.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Path-Specific Timing Constraints

Summary
Show Slide 304:

Outline

Inter-Clock Domain Constraints


Multicycle Paths
False Paths
Miscellaneous Constraints
Summary

Path-Specific Timing Constraints - 304

2008 Xilinx, Inc. All Rights Reserved

Show Slide 305:

Summary

Use a Slow/Fast Exception to constrain paths that cross between clock


domains
Identifying multicycle and false paths allows the implementation tools to
make appropriate trade-offs

These paths will use slower routing resources, which frees up fast routing
for critical signals

Prorating your operating conditions gives the tools the most accurate
picture of your design environment
In general, more specific constraints have a higher priority than less
specific constraints

Path-Specific Timing Constraints - 305

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 277

Path-Specific Timing Constraints

Summary
Where Can I Learn More?
!

Constraints Guide
Help Software Manuals

Page 278

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Facilitator Guide

Path-Specific Timing Constraints

Apply Your Knowledge Answers


Answers

1) What constraints need to be placed on this design to ensure it


will meet the performance objectives?
!

Global PERIOD constraint of 5 ns (or 200 MHz)

Multicycle path constraint of 5 x 4 = 20 ns (or 200 / 4 = 50 MHz)

2) How would you enter these constraints through the Constraints


Editor?
!

PERIOD constraint: Use the Global tab

Multicycle path constraint


Group the flip-flops in COUT14 by clock enable net (group
name: MSB)
Constrain from MSB to MSB

3) How do multicycle path constraints improve the performance of


your design?
!

They allow the implementation tools to place some logic further


apart and use slower routing resources.

4) If a PERIOD constraint were placed on this design, what delay


paths would be constrained?
!

Paths between the control registers and the status registers


would be constrained.

Paths from each register feeding back to itself are also


constrained.

www.xilinx.com
1-877-XLX-CLAS

Page 279

Path-Specific Timing Constraints

Facilitator Guide

Apply Your Knowledge Answers


Answers

5) If the goal is to optimize the input and output times without


constraining the paths between registers, what constraints are
needed?
!

Enter OFFSET constraints in the Global tab.

Define False Paths by Nets:


Select the BIDIR_BUS[7:0] nets
Select the global PERIOD constraint to be ignored

Transition to Lab 5: Achieving Timing Closure

Page 280

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Lab 5: Achieving Timing Closure

Lab 5: Achieving Timing Closure


Purpose

After completing this lab, you will be able to:


!

Use timing reports more effectively

Enter false path timing constraints (TIGs) by using the


Constraints Editor

Time

45 minutes
Process

This lab illustrates how to make path-specific timing constraints on


a design and use some of the advanced implementation options in
the ISE tools.
General Flow
!

Step 1: Evaluate the design performance with TIGs

Step 2: Remove the TIGs and implement the design

Step 3: Analyze the timing

www.xilinx.com
1-877-XLX-CLAS

Page 281

Lab 5: Achieving Timing Closure

Facilitator Guide

Lab
Designing for Performance Lab Workbook
!

Refer to the separate lab workbook for the Achieving Timing


Closure lab.

Transition to Advanced Implementation Options

Page 282

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

Advanced Implementation Options


Purpose

After completing this module, you will be able to:


!

Time

30 minutes
Process

This module describes the advanced implementation options


available in the ISE tools.
Lessons
!

Introduction

Overview

Advanced MAP and Place & Route Options

Xplorer

SmartGuide and Partitions

Power Optimization

Summary

www.xilinx.com
1-877-XLX-CLAS

Page 283

Advanced Implementation Options

Facilitator Guide

Introduction
Show Slide 306:

Advanced Implementation
Options

Show Slide 307:

Objectives
After completing this module, you will be able to:

Increase design performance by using advanced MAP and Place & Route
options
Increase design performance by using the Xplorer tool
Save implementation time by using SmartGuide and partitions

Advanced Implementation Options - 307

Page 284

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

Introduction
Show Slide 308:

Timing Closure

Advanced Implementation Options - 308

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 285

Advanced Implementation Options

Facilitator Guide

Overview
Show Slide 309:

Lessons

Overview
Advanced MAP and Place &
Route Options
Xplorer
SmartGuide and Partitions
Power Optimization
Summary

Advanced Implementation Options - 309

2008 Xilinx, Inc. All Rights Reserved

Show Slide 310:

Introduction

Xilinx recommends using the default options and global timing constraints
the first time you implement a design
If your design does not meet timing goals, follow the recommended flow
presented earlier

Early in the design cycle, examine ways of changing your HDL code

Confirm that good coding styles were used


Try synthesis options, such as retiming or adding pipeline stages, to reduce
logic levels
If you are early in the design cycle, you do not want to run a full
implementation every time a change is madethis will be time consuming
and frustrating

Increase the Place & Route effort level


Apply path-specific timing constraints for synthesis and implementation

Advanced Implementation Options - 310

Page 286

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

Overview
Key Points
!

To check whether your timing constraints are reasonable before


running the Place & Route process, examine the Post-Map Static
Timing Report. The logic portion of your delay paths should
consume no more than 60 to 70 percent of the timing budget.

Show Slide 311:

When to Use
Advanced Options

If timing is still not met, consider using advanced MAP or Place & Route
(PAR) options

MAP: Perform timing-driven packing

Uses timing constraints to pack critical paths

PAR: Extra Effort

Xplorer is an automated method for trying different combinations of


implementation options
These options will increase the software runtime

This module discusses the expected trade-offs and benefits of each option

Advanced Implementation Options - 311

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 287

Advanced Implementation Options

Facilitator Guide

Advanced MAP and Place & Route Options


Show Slide 312:

Lessons

Overview
Advanced MAP and Place & Route
Options
Xplorer
SmartGuide and Partitions
Power Optimization
Summary

Advanced Implementation Options - 312

2008 Xilinx, Inc. All Rights Reserved

Show Slide 313:

Timing-Driven Packing

Timing constraints are used to optimize which pieces of logic are packed
into each slice

Normal (standard) packing is performed


PAR is run through the placement phase
Timing analysis analyzes the amount of slack in constrained paths
If necessary, packing changes are made to allow better placement

The output of MAP contains both mapping and placement information

The Post-Map Static Timing Report contains more realistic net delays
Place & Route runtime is reduced because some placement is already
performed

Advanced Implementation Options - 313

Page 288

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

Advanced MAP and Place & Route Options


Show Slide 314:

Example

Originally, the flip-flops were packed together into a slice


After placement and timing analysis, the flip-flops are packed into different
slices to allow independent movement
Timing-Driven Pack

Standard Pack

FF1

FF1
FF2

Advanced Implementation Options - 314

FF2

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

In this simple example, two flip-flops were originally packed


into one slice. They may share common inputs or the packing
may be necessary to fit the design into the target device.

During placement, it becomes clear that FF1 should move to the


top of the die and FF2 should move to the bottom (in order to
meet timing constraints).

If timing-driven packing is enabled, the design goes back into


the MAP process with this knowledge. The flip-flops will be
packed into two separate slices to allow independent
movement.

www.xilinx.com
1-877-XLX-CLAS

Page 289

Advanced Implementation Options

Facilitator Guide

Advanced MAP and Place & Route Options


Show Slide 315:

Turning on
Timing-Driven Packing

Set the Property


Display Level to Advanced
Check Perform TimingDriven Packing and
Placement
Set other options if needed

Advanced Implementation Options - 315

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 290

To set the Property Display Level, open the Map Properties


dialog box and select Advanced from the Property Display
Level drop-down list at the bottom.

After you select Perform Timing-Driven Packing and


Placement, other options will become available. You can set the
Place & Route effort level that will be used (Map Effort Level),
whether to use Extra Effort (for placement, covered in the next
lesson), which Placer Cost Table to use (covered later), whether
to use register duplication to improve timing, and if global
optimization routine should be run.

Register Duplication: Duplicates registers to improve timing


when running timing-driven packing.

Global Optimization: This option directs MAP to perform


global optimization routines on the fully assembled netlist
before mapping the design. Global optimization includes logic
remapping and trimming, logic and register replication and
optimization, and logic replacement of 3-states.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

Advanced MAP and Place & Route Options


Show Slide 316:

Trade-Offs

Typical performance improvement: five to eight percent

Has the greatest effect on high-density designs when unrelated packing


has occurred

Density improvements are also seen

Look in the Map Report, Design Summary section

If no unrelated packing has occurred, performance improvement will be


minimal

Number of slices containing unrelated logic

Runtime for the MAP process always increases

Up to 200 percent
But you recover some of this increased runtime by saving runtime during
Place & Route

Advanced Implementation Options - 316

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Unrelated packing occurs when the software puts unrelated


logic into the same slice to fit the design into the target device.
This placement can affect performance because the pieces of
logic in the slice may need to be placed in different locations to
meet timing.

Timing-driven packing can fix this situation. Timing analysis


will show that the unrelated logic needs to be separated to meet
timing.

If no unrelated packing has occurred, the only change that


timing-driven packing can make to the design is to merge flipflops into IOBs to meet OFFSET constraints.

www.xilinx.com
1-877-XLX-CLAS

Page 291

Advanced Implementation Options

Facilitator Guide

Advanced MAP and Place & Route Options


Show Slide 317:

PAR Extra Effort

Only available when the Place & Route effort level is set to High
Two settings: Normal and Continue on Impossible

Use the Normal setting only


Continue on Impossible will run until user break (Ctrl-C)

Typical performance improvement: four percent


Runtime for the Place & Route process always increases

Potential 200-percent increase or more

Advanced Implementation Options - 317

2008 Xilinx, Inc. All Rights Reserved

Show Slide 318:

Setting Extra Effort

Set the Place & Route


Property Display Level to
Advanced
Set Place & Route Effort
Level (Overall) to High
Set Extra Effort

Advanced Implementation Options - 318

Page 292

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

Advanced MAP and Place & Route Options


Key Points
!

You can also set the Placer Effort and Router Effort separately.
If you set the Placer Effort to High, but leave the Router Effort
at Standard, the Extra Effort option will only be used during
placement. This trick can increase your productivity by
decreasing software runtime.

www.xilinx.com
1-877-XLX-CLAS

Page 293

Advanced Implementation Options

Facilitator Guide

Xplorer
Show Slide 319:

Lessons

Overview
Advanced MAP and Place & Route
Options
Xplorer
SmartGuide and Partitions
Power Optimization
Summary

Advanced Implementation Options - 319

2008 Xilinx, Inc. All Rights Reserved

Show Slide 320:

Xplorer

Iterates through the implementation process, trying different combinations


of properties

Automatically stops when all timing constraints are met


Options used include

Overall Effort Level (MAP and PAR)


Timing-Driven Map (MAP)
Extra Effort Level (MAP and PAR)
Multi-Pass Place and Route (PAR)
Global Optimization (MAP)
Retiming (MAP)
Register Duplication (MAP)
Logic Optimization (MAP)
Optimization Strategy/Cover Mode (MAP)
Allow Logic Optimization Across Hierarchy (MAP)

Advanced Implementation Options - 320

Page 294

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

Xplorer
Key Points
!

The combination of synthesis, MAP, and PAR options will vary


by device family. For the Virtex-4 FPGA, Xplorer uses all of
the options listed. For the Virtex-5 FPGA, Xplorer uses all but
Multi-Pass Place and Route.

Because your design can be effectively resynthesized when


using Xplorer, it is important to understand that some nodes
may be eliminated or renamed in the process, increasing the
difficulty of debugging and timing simulation. You can still use
the KEEP attribute in your HDL to maintain nodes for testing or
timing simulation. KEEP will not let any of these options
change the originally synthesized node.

Xplorer has 20 pre-assigned combinations of options for each


device family. These combinations have shown to be some of
the best-tested combinations. Xilinx recommends that you try
the default number of iterations for your device family to
determine which combination of options is best for your design.
Your results will vary by design, so there is no way to
determine which set of options will work best. If you have
enough time, you may wish to run all 20 combinations of
options.

Show Slide 321:

Xplorer Options

Overall Effort Level (MAP and PAR) Enables PAR to work longer and harder
Timing-Driven Map (MAP) Enables MAP to group timing-critical logic in the
same slice or CLB
Extra Effort Level (MAP and PAR) Even longer and harder
Multi-Pass Place and Route (PAR) Enables you to generate different results
with cost tables (not recommended for the Virtex-5 FPGA)
Global Optimization (MAP) Enables re-mapping, logic trimming, logic and
register duplication, and logic optimization
Retiming (MAP) Enables register migration
Register Duplication (MAP) Duplicates registers to reduce fanout
Logic Optimization (MAP) Duplicates logic to reduce logic levels
Optimization Strategy/Cover Mode (MAP) Controls how MAP assigns logic to
LUTs
Allow Logic Optimization Across Hierarchy (MAP) Last effort to reduce logic
levels
Advanced Implementation Options - 321

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 295

Advanced Implementation Options

Facilitator Guide

Xplorer
Key Points

Page 296

Timing Driven MAP Allows the router to use the fastest


routing resource available which surrounds each CLB.

Extra Effort Level Uses normal setting; runs continuously and


requires Ctrl + C to stop.

Multi-Pass Place and Route (MPPAR) Uses a different


algorithm to place and route the design. The caveat is that there
is no way to determine which of the 100 cost tables will work
best for your design. So when you try MPPAR, Xilinx
recommends saving all or most of your iterations so that you
can compare. Be aware that this basically runs additional PAR
iterations. Therefore, if your original PAR run was four hours
and you try 10 cost tables, you are setting your computer to
work for approximately 40 hours.

Global Optimization Enables logic remapping (grouping of


nodes into LUTs), logic trimming (removal), logic and register
replication (high fanout nets), and logic optimization (besides
Boolean optimization, XST can duplicate logic to reduce logic
levels).

Re-timing Enables register migration forward or backwards


on a timing-critical path with the intention of balancing a
timing-critical path.

Register Duplication Duplicates registers in an effort to


reduce the fanout of nets that are on a timing-critical path.

Logic Optimization Besides the standard Boolean


optimization techniques that are common in synthesis, XST can
duplicate logic to reduce logic levels.

Optimization Strategy/Cover Mode Controls how MAP


assigns logic to LUTs. The Area option reduces the overall
number of LUTs in the design. The Speed option reduces the
number of logic levels. The Balanced option blends the two
modes.

Logic Optimization Across Hierarchy/Ignore Keep Hierarchy


Although Xilinx recommends that you maintain design
hierarchy (so that you can maintain more node names for
debugging and timing simulation) Xplorer canas a last
resortselectively remove a designs hierarchy.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

Xplorer
Show Slide 322:

Running Xplorer

Right-click Implement Design


and select Properties
Select Xplorer Properties
Select Timing Closure from the
Xplorer Mode drop-down list
Set other options and click OK
Double-click Implement Design

Advanced Implementation Options - 322

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

Xplorer properties:
Xplorer Mode: Select Timing Closure to enable Xplorer.
Turn Off Xplorer After Run Completes: By default, after
Xplorer completes, the mode is set back to Off so that the
next implementation will not use Xplorer. Select No to
ensure that Xplorer is used every time the implementation
process is run.
Maximum Number of Iterations: Up to 20 iterations can be
run. Xplorer will stop when timing closure is achieved or
after the maximum number of iterations.
Enable Retiming: Available for Virtex-4 and Virtex-5 FPGA
designs only. This option allows Xplorer to use the retiming
option during the MAP process to move registers forward
or backward to balance the delays between timing paths.
Macro Search Path: This is the same as the Translate option.

www.xilinx.com
1-877-XLX-CLAS

Page 297

Advanced Implementation Options

Facilitator Guide

Xplorer
Key Points
Other Xplorer Command Line Options: Xplorer can also be
run from the command line. Xplorer also has an additional
mode called Best Performance Mode where you are able to
specify the name of a clock signal. This option does not
allow you to specify more than one clock and it does not
allow you to optimize the entire design, just the logic on one
clock domain. Use this option at your own discretion.
Show Slide 323:

Xplorer Results

Xplorer compares the results of all iterations


Best result is saved to the project directory

All other results are deleted (unless you run from the command line)

Information on all iterations is available in the Design Summary screen


Xplorer also allows you to set the best options (found after running
Xplorer) and then run Multi-Pass Place and Route (MPPR)

Advanced Implementation Options - 323

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 298

Xplorer uses the timing score to compare results. The timing


score is the total number of ps of all constraints that are missed.
A timing score of 0 indicates that all timing constraints were
met.

Running MPPAR is not recommended for the Virtex-5 FPGA.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

SmartGuide and Partitions


Show Slide 324:

Lessons

Overview
Advanced MAP and Place & Route
Options
Xplorer
SmartGuide and Partitions
Power Optimization
Summary

Advanced Implementation Options - 324

2008 Xilinx, Inc. All Rights Reserved

Show Slide 325:

SmartCompile

Two strategies for maintaining some PAR results while still making some
changes to a design

Partitions are used to maintain implementation results while still making design
changes

SmartGuide is used to maintain timing results while still making design changes

This is an instance-based method for preserving hierarchical blocks in a design


You do not have to re-verify a preserved partition
This is a timing-based method for preserving parts of a design that have not changed
Implementation tools have the flexibility to not maintain logic if it helps other paths to meet
timing constraints

Why do you care?

Saves verification time


Faster implementation time

Advanced Implementation Options - 325

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 299

Advanced Implementation Options

Facilitator Guide

SmartGuide and Partitions


Key Points

Page 300

One of the biggest challenges is preserving timing when a


design is modified. For example, you have part of a design with
a critical timing path that has been met with a great deal of
effort (multiple implementations with different options and/or
detailed timing constraints). But you also have another part of
the design that is modified and the critical timing path fails to
meet timing. This would normally require you to modify your
constraints and/or reimplement because of the new design
changes.

Verification is reduced with both of these flows because, if a


block of the design is exactly preserved, it does not need to be
re-verified. Both of these flows allow you to maintain parts of
their design.

Preserving a block of the design rather than reimplementing is


generally faster. There are edge cases where this will not be true
due to the interaction between the preserved portion of the
design and the new/modified portion of the design being
implemented.

These two design preservation techniques are not compatible


with each other. A design can use one or the other, but not both
at the same time.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

SmartGuide and Partitions


Show Slide 326:

SmartGuide
Timing Preservation in the Midst of Changes
Physical Layout

Use SmartGuide when you want to minimize the impact of a


small change
Turn on SmartGuide by right-clicking the top-level of your
design hierarchy and selecting Use SmartGuide

Also supported with TCL and command line scripts

SmartGuide

Small
Change

Advanced Implementation Options - 326

Physical Layout
With Small Design
Change

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

This flow requires you to have implemented a successful


design, which means that your original design should meet
your timing constraints. That is, your good timing results will
be maintained and paths that fail to meet your timing
constraints will not be maintained.

This flow allows the implementation tools to preserve as much


as possible that which meets your timing constraints but does
not guarantee that all of the paths that meet your timing
constraints will be preserved. Some paths may be changed to
help other failing paths meet their timing constraints.

This flow will save a significant amount of implementation


time.

SmartGuide information will be included in the MAP and PAR


reports generated by the implementation tools.

www.xilinx.com
1-877-XLX-CLAS

Page 301

Advanced Implementation Options

Facilitator Guide

SmartGuide and Partitions


Key Points
!

Note that for a LUT to be guided its equation can vary between
iterations. After the new logic is added, the tools complete a
clean-up phase where critical paths from the new and the old
logic may be re-placed and routed to help meet timing
constraints. This phase greatly improves the chances that the
tools will meet all of your timing objectives.

SmartGuide can be used after a first implementation has been


completed.

Show Slide 327:

Partitions

Top

Implementation Preservation

Set Partitions

Partitions guarantee exact preservation


of implementation results

A1

A2
Logical Design (HDL)

Provides control over what is preserved

Assert partitions by right-clicking each


level of hierarchy to be maintained and
selecting New Partition

2
Implement Design

Original Physical Layout

3
Make Changes
- For example: C
Modified; A, B
Preserved
Physical Layout after change
Advanced Implementation Options - 327

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 302

This flow requires you to have implemented a successful


design, which means that your original design should meet
your timing constraints. Any block that you want preserved (by
defining a partition) will be preserved exactly.

The remaining logic that is not preserved will then be


reimplemented, thus saving implementation time.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

SmartGuide and Partitions


Key Points
!

Good hierarchical design practices must still be used. This flow


is also supported with Tcl and command line scripts. You do
not have to re-verify a preserved partition. Partitions are set on
hierarchical blocks.

Partitions can also be set when using Synplify Pro software. Just
set a compile point for each level of hierarchy that will be a
partition.

Partition information is included with the XST, MAP, and PAR


reports generated by the implementation tools.

Partitions must be set before the first synthesis of your design


to ensure that the original synthesis output can be matched.

By placing a partition on a hierarchical boundary, you can


guarantee that the interface on a partition boundary will not
change. Yielding a timing-critical delay path that crosses a
partition is possible. When this occurs, having a registered
output at the partition is recommended.

www.xilinx.com
1-877-XLX-CLAS

Page 303

Advanced Implementation Options

Facilitator Guide

Power Optimization
Show Slide 328:

Lessons

Overview
Advanced MAP and Place & Route
Options
Xplorer
SmartGuide and Partitions
Power Optimization
Summary

Advanced Implementation Options - 328

2008 Xilinx, Inc. All Rights Reserved

Show Slide 329:

Power Optimization

PAR has a power reduction


option
Optimizes routing to reduce
power consumption

Tries to reduce the overall


routing used at the expense of
timing and implementation time

XST has a power optimization


switch

Helps map logic to block RAMs


and DSP slices which use less
power (Virtex-4 and Virtex-5
FPGAs)

Advanced Implementation Options - 329

Page 304

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

Power Optimization
Key Points
!

The PAR power reductions switch will try to route a high


fanout net so that as much of the common wire is shared as
possible.

www.xilinx.com
1-877-XLX-CLAS

Page 305

Advanced Implementation Options

Facilitator Guide

Summary
Show Slide 330:

Lessons

Overview
Advanced MAP and Place & Route
Options
Xplorer
SmartGuide and Partitions
Power Optimization
Summary

Advanced Implementation Options - 330

2008 Xilinx, Inc. All Rights Reserved

Show Slide 331:

Apply Your Knowledge

1) Under what conditions will timing-driven packing have the most impact on
design performance?

2) What is the trade-off when using PAR with the Extra Effort option?

3) How does Xplorer help to improve design performance?

Advanced Implementation Options - 331

Page 306

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Advanced Implementation Options

Summary
Show Slide 332:

Summary

The timing closure flow still applies

Make certain that you have tried the options included in the timing closure
flow diagram if you have timing problems

Xplorer has a number of Map and PAR options it can run for you
SmartGuide and partitions enable you to save successful results and
reduce your implementation time

Advanced Implementation Options - 332

2008 Xilinx, Inc. All Rights Reserved

Where Can I Learn More?


!

Online help
Click the Help button in the Process Properties window

Development System Reference Guide: MAP and PAR chapters


Help Software Manuals
Documentation can also be installed on your local machine

Application Notes
Help Xilinx On the Web Application Notes
Application Note XAPP918: Incremental Design Reuse and
Partitions

www.xilinx.com
1-877-XLX-CLAS

Page 307

Advanced Implementation Options

Facilitator Guide

Apply Your Knowledge Answers


Answers

1) Under what conditions will timing-driven packing have the


most impact on design performance?
!

When unrelated logic is packed together into the same slice,


which usually occurs with high device utilization (usually over
70%).

2) What is the trade-off when using PAR with the Extra Effort
option?
!

PAR runtime can increase by a factor of two or more.

3) How does Xplorer help to improve design performance?


!

By automatically iterating through different implementation


options.

Transition to Lab 6: Designing for Performance

Page 308

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Lab 6: Designing for Performance

Lab 6: Designing for Performance


Purpose

After completing this lab, you will be able to:


!

Utilize the Overall Effort Level, Timing-Driven Packing, and


Extra Effort Level implementation options to improve design
performance

Utilize Multi-Pass Place & Route (MPPR) to try and achieve


timing closure

Time

30 minutes
Process

This lab illustrates how to improve design performance and


maximize results solely with advanced implementation options.
General Flow
!

Step 1: Implement with higher effort levels

Step 2: Implement with MPPR

Step 3: Analyze the MPPR timing

www.xilinx.com
1-877-XLX-CLAS

Page 309

Lab 6: Designing for Performance

Facilitator Guide

Lab
Designing for Performance Lab Workbook
!

Refer to the separate lab workbook for the Designing for


Performance lab.

Transition to Power Estimation

Page 310

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Power Estimation

Power Estimation
Purpose

After completing this module, you will be able to:


!

List the three phases of the design cycle where power


calculations can be performed

Estimate power consumption by using the XPower Estimator


spreadsheet

Estimate power consumption by using the XPower Analyzer


software

Time

30 minutes
Process

This optional module describes the power estimation capabilities


included with the ISE tools.
Lessons
!

Introduction

Overview

XPower Estimator

Using the XPower Analyzer Software

Summary

www.xilinx.com
1-877-XLX-CLAS

Page 311

Power Estimation

Facilitator Guide

Introduction
Show Slide 333:

Power Estimation

Show Slide 334:

Objectives
After completing this module, you will be able to:

List the three phases of the design cycle where power calculations can be
performed
Estimate power consumption by using the XPower Estimator spreadsheet
Estimate power consumption by using the XPower Analyzer software

Power Estimation - 334

Page 312

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Power Estimation

Overview
Show Slide 335:

Lessons

Overview
XPower Estimator
Using the XPower Analyzer
Software
Summary

2008 Xilinx, Inc. All Rights Reserved

Power Estimation - 335

Show Slide 336:

Power Consumption
Overview

As devices become larger and faster,


power consumption goes up
First-generation FPGAs had

Lower performance
Lower power requirements
No package power concerns

Package Power
Limit

PMAX

Todays FPGAs have

Much higher performance


Higher power requirements
Package power limit concerns

Power Estimation - 336

High Density

Low
Density
Real-World Design
Power Consumption

Performance (MHz)

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 313

Power Estimation

Facilitator Guide

Overview
Key Points
!

The first generation of FPGAs was relatively small in size and


slow in performance. Power consumption rarely exceeded the
operating envelope of commonly available packages; however,
given the density and performance levels of the new generation
of FPGA devices, power consumption issues can no longer be
ignored.

Selecting the correct packagein particular, the ability to


dissipate heat efficiently away from the siliconis now also an
important design issue.

The Virtex family of FPGAs has gone one step further by


incorporating thermal management on chip, allowing for active
monitoring of the silicon via dedicated pins.

Show Slide 337:

Power Consumption
Concerns

High-speed and high-density designs require more power, leading to


higher junction temperatures
Package thermal limits exist

125 C for plastic


150 C for ceramic

Power directly limits

System performance
Design density
Package options
Device reliability

Power Estimation - 337

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 314

Junction temperature within an FPGA device is a function of


the power consumption and thermal resistance of the selected
package.

Several factors determine power consumption within an FPGA,


including supply voltage, system speed, and device utilization.
www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Power Estimation

Overview
Key Points
!

As devices get bigger and faster, power consumption can


become a limiting factor in determining device utilization and
performance. For example, you may not be able to use all of the
resources of a device or run the FPGA as fast as possible
without risking reliability problems because of overheating.

Show Slide 338:

Estimating Power
Consumption

Estimating power consumption is a complex calculation

Power consumption of an FPGA is almost exclusively dynamic


Power consumption is dependent on design and is affected by

Output loading
System performance (switching frequency)
Design density (number of interconnects)
Design activity (percent of interconnects switching)
Logic block and interconnect structure
Supply voltage

Power Estimation - 338

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 315

Power Estimation

Facilitator Guide

Overview
Show Slide 339:

Estimating Power
Consumption

Power calculations can be performed at three distinct phases of the


design cycle

Concept phase: A rough estimate of power can be calculated based on


estimates of logic capacity and activity rates

Design phase: Power can be calculated more accurately based on detailed


information about how the design is implemented in the FPGA

System integration phase: Power is calculated in a lab environment

Use the XPower Estimator spreadsheet

Use the XPower Analyzer software


Use actual instrumentation

Accurate power calculation at an early stage in the design cycle will result
in fewer problems later
Power Estimation - 339

2008 Xilinx, Inc. All Rights Reserved

Key Points

Page 316

Estimating power consumption usually has one of two goals:


thermal reliability evaluation or power-supply sizing.

The XPower Analyzer software bridges the gap between the


XPower Estimator and lab measurements by using the
implemented design files to estimate power consumption more
closely.

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Power Estimation

Overview
Show Slide 340:

Activity Rates

Accurate activity rates (also known as toggle rates) are required for
meaningful power calculations
Clocks and input signals have an absolute frequency
Synchronous logic nets use a percentage activity rate

One hundred percent indicates that a net is expected to change state on


every clock cycle
Allows you to adjust the primary clock frequency and see the effect on
power consumption
Can be set globally to an average activity rate on groups or individual nets

Logic elements also use a percentage activity rate

Based on the activity rate of output signals of the logic element


Logic elements have capacitance

Power Estimation - 340

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 317

Power Estimation

Facilitator Guide

XPower Estimator
Show Slide 341:

Lessons

Overview
XPower Estimator
Using the XPower Analyzer
Software
Summary

2008 Xilinx, Inc. All Rights Reserved

Power Estimation - 341

Show Slide 342:

XPower Estimator
www.xilinx.com/power

Excel spreadsheets with power estimation formulas built in

Enter design data in white boxes


Power estimates are shown in gray boxes

Sheets

Summary (device totals)


Logic and I/O
Block RAMs and FIFOs
DCMs and PLLs
DSP48
PPC and MGT

Power Estimation - 342

Page 318

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Power Estimation

XPower Estimator
Key Points
!

XPower Estimators are a set of Excel spreadsheets that can be


found on the Web at www.xilinx.com/power. The spreadsheets
currently support Virtex-4, Virtex-5, and Spartan-3E FPGAs.

This Web page also provides access to power estimation tools


for older Xilinx device families. Some tools are Web-based, and
some are Excel spreadsheets using a different format than the
XPower Estimators.

Show Slide 343:

Web Power Tool:


Summary and Quiescent

Power Estimation - 343

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 319

Power Estimation

Facilitator Guide

XPower Estimator
Show Slide 344:

Web Power Tool:


Logic, Memory, and DSP48

2008 Xilinx, Inc. All Rights Reserved

Power Estimation - 344

Show Slide 345:

Web Power Tool:


DCM and I/O

Power Estimation - 345

Page 320

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Power Estimation

Using the XPower Analyzer Software


Show Slide 346:

Lessons

Overview
XPower Estimator
Using the XPower Analyzer
Software
Summary

2008 Xilinx, Inc. All Rights Reserved

Power Estimation - 346

Show Slide 347:

What is XPower Software?

A utility for estimating the power consumption and junction temperature of


FPGA and CPLD devices
Reads an implemented design (NCD file) and timing constraint data
You supply activity rates

Clock frequencies
Activity rates for nets, logic elements, and output pins
Capacitive loading on output pins
Power supply data and ambient temperature
Detailed design activity data from simulation (VCD file)

The XPower tool calculates the total average power consumption and
generates a report

Power Estimation - 347

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 321

Power Estimation

Facilitator Guide

Using the XPower Analyzer Software


Key Points
!

The XPower tool is accurate to within +/ 10 percent, given


accurate activity rates. The XPower software can only calculate
average power and cannot predict power spikes that can occur.

Supported device families:


FPGAs: Virtex-5, Virtex-4, Spartan-3E, Virtex, Virtex-E,
Virtex-II, Virtex-II Pro, Spartan-II, Spartan-IIE, and
Spartan-3 devices
CPLDs: CoolRunner XPLA3, CoolRunner-II, and
CoolRunner-IIS devices

A Value Change Dump (VCD) file is created by a simulation


tool (Mentor Graphics ModelSim, for example). Defined by
Verilog IEEE Standard 1364, the VCD file contains information
about signal or variable value changes. The VCD file can be
read by the Xilinx XPower tool to provide accurate power
estimation.

In order for the XPower tool to match instance and net names
from the VCD file to items in the NCD file, the VCD file must
be from a post-Place & Route simulation.

Show Slide 348:

Running XPower Software

Expand Implement Design


Place & Route
Double-click XPower Analyzer
to launch the XPower
tool in interactive mode
Use the Generate Power Data
process to create reports using
VCD files or TCL scripts

Power Estimation - 348

Page 322

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Power Estimation

Using the XPower Analyzer Software


Show Slide 349:

XPower Software GUI


Summary

Power Estimation - 349

2008 Xilinx, Inc. All Rights Reserved

Key Points
!

To launch the XPower tool from the Project Navigator in the


ISE software, expand the Implement Design process, expand
Place & Route, and double-click XPower Analyzer.

The upper-left portion of the GUI contains the Summary Bar.


The Summary displays estimated junction temperature,
quiescent power, and dynamic power.

On the left is the View window, which allows you to browse


the Thermal information, power supply information, and
settings for your design.

The Thermal Information window allows you to modify the


airflow, ambient temperature, and the theta-j-a of your device
package. This is useful for what-if analysis. The Voltage Source
Information window allows you to customize the internal
voltage for your component. The Settings window allows you
to modify your average toggle rates.

The By Type option allows you to categorize your designs


power consumption into the device resources your design uses.
This allows you to segment your power consumption by
signals, IO, or hierarchy.
www.xilinx.com
1-877-XLX-CLAS

Page 323

Power Estimation

Facilitator Guide

Using the XPower Analyzer Software


Key Points
!

The Main View window on the right displays power calculation


data for the currently selected data view. Reports are also
displayed in this window.

At the bottom is the History window, which displays text


messages.

Show Slide 350:

XPower Software Options


Settings

Report type

Default activity rate

Power Estimation - 350

Page 324

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Power Estimation

Using the XPower Analyzer Software


Show Slide 351:

XPower Software Summary


Report

To obtain this summary report, select


Tools Generate Summary Report
or click the icon in the horizontal toolbar

Power summary
| I(mA) | P(mW) |
---------------------------------------------------------------Total estimated power consumption |
|
206 |
--Total Vccint 1.20V |
69 |
83 |
Total Vccaux 2.50V |
45 |
113 |
Total Vcco33 3.30V |
3 |
10 |
--Inputs |
0 |
0 |
Outputs |
Vcco33 |
0 |
0 |
Signals |
0 |
0 |
--Quiescent Vccint 1.20V |
69 |
83 |
Quiescent Vccaux 2.50V | 45 |
113 |
Quiescent Vcco33 3.30V |
3 |
10 |
Thermal summary
---------------------------------------------------------------Estimated junction temperature
|
29C
Ambient temp |
25C
Case temp |
28C
Theta J-A |
21C/W

Power Estimation - 351

|
|
|
|

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 325

Power Estimation

Facilitator Guide

Summary
Show Slide 352:

Lessons

Overview
XPower Estimator
Using the XPower Analyzer
Software
Summary

2008 Xilinx, Inc. All Rights Reserved

Power Estimation - 352

Show Slide 353:

Apply Your Knowledge

1) Compare the total estimated power created by the XPower Analyzer


software and XPower Estimator tools. Are they close to one another?

2) Power estimations are typically made during which three phases of the
design cycle?

3) What methods can be used to enter activity rates into the XPower
Analyzer software?

Power Estimation - 353

Page 326

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Power Estimation

Summary
Show Slide 354:

Summary

Power calculations can be performed at three distinct phases of the


design cycle

Concept phase: (XPower Estimator spreadsheet)


Design phase: (XPower Analyzer software)
System integration phase: (Lab measurements)

Accurate power calculation at an early stage in the design cycle will result
in fewer problems later
The XPower Analyzer software is a utility for estimating the power
consumption and the junction temperature of FPGA and CPLD devices
The XPower Analyzer software uses activity rates to calculate total
average power consumption

Power Estimation - 354

2008 Xilinx, Inc. All Rights Reserved

Where Can I Learn More?


!

XPower Analyzer help


Help Help Topics
Documentation can also be installed on your local machine

XPower Estimator spreadsheets, Application Notes, XPower


FAQ
Help Xilinx On the Web Xilinx Power Tools Web
Page

IC Packaging recorded e-learning module


www.xilinx.com/support/training/rel/packaging.htm

www.xilinx.com
1-877-XLX-CLAS

Page 327

Power Estimation

Facilitator Guide

Apply Your Knowledge Answers


Answers

1) Compare the total estimated power created by the XPower


Analyzer software and XPower Estimator tools. Are they close to
one another?
!

Yes

2) Power estimations are typically made during which three phases


of the design cycle?
!

Concept phase: A rough estimate based on estimated logic


capacity and activity rates

Design phase: A more accurate estimate based on information


about how the design is implemented in the FPGA

System integration phase: Actual power usage is measured in a


lab environment

3) What methods can be used to enter activity rates into the


XPower Analyzer software?
!

Load a VCD file

Manually enter activity rates

Specify default activity rates

Transition to Lab 7: FPGA Editor Demo

Page 328

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Lab 7: FPGA Editor Demo

Lab 7: FPGA Editor Demo


Purpose

After participating in this demonstration, you will be able to:


!

Locate logic and nets in the FPGA Editor

View the contents of slices and Input/Output Blocks (IOBs)

Add a probe

Time

30 minutes
Process

This optional demonstration illustrates how to locate logic, view


the contents of an FPGA design, and insert a probe with the FPGA
Editor.
General Flow
!

Step 1: Open the design in the FPGA Editor

Step 2: View the slice and IOB contents

Step 3: Add a probe

www.xilinx.com
1-877-XLX-CLAS

Page 329

Lab 7: FPGA Editor Demo

Facilitator Guide

Lab
Designing for Performance Lab Workbook
!

Refer to the separate lab workbook for the FPGA Editor


demo/lab.

Transition to ChipScope Pro Software

Page 330

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

ChipScope Pro Software

ChipScope Pro Software


Purpose

After completing this module, you will be able to:


!

Describe the value of the ChipScope Pro software

Describe how the ChipScope Pro software works

List what cores are available

Use the Core Generator and Core Inserter tools

Plan for and perform debugging with the ChipScope Pro


software

Time

30 minutes
Process

This optional module describes how to use the Core Inserter and
Core Generator tool flows and plan for debugging with the
ChipScope Pro software.
Lessons
!

Introduction

Importance of Debug

ChipScope Pro Software Cores

Design Flows

Summary

www.xilinx.com
1-877-XLX-CLAS

Page 331

ChipScope Pro Software

Facilitator Guide

Introduction
Show Slide 355:

ChipScope Pro Software

Show Slide 356:

ChipScope Pro Software Lab


Logistics

To participate in the lab you must have the following

Spartan-3E FPGA starter kit

ChipScope Pro software 10.1 installed


ISE 10.1 software installed

Spartan-3E FPGA starter board, power supply, and configuration cable

Take out and set up your Spartan-3E FPGA board

Verify that you have power


SPARTAN-3E STARTER KIT and www.xilinx.com/s3estarter should scroll
across the LCD

ChipScope Pro Software - 356

Page 332

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

ChipScope Pro Software

Introduction
Show Slide 357:

Objectives
After completing this module, you will be able to:

Describe the value of the ChipScope Pro software


Describe how the ChipScope Pro software works
List what cores are available
Use the Core Generator and Core Inserter tools
Plan for and perform debugging with the ChipScope Pro software

ChipScope Pro Software - 357

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 333

ChipScope Pro Software

Facilitator Guide

Importance of Debug
Show Slide 358:

Lessons

ChipScope Pro Software - 358

Importance of Debug
ChipScope Pro Software Cores
Design Flows
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 359:

What Engineers are Saying

FPGA designs are getting more complex

Designs are getting faster


Design times are getting shorter

Debug and verification is more challenging

Debug and verification consume a significant portion* of FPGA design time


Debug and verification need to be easier and integrated into the FPGA
design flow

*An FPGA design survey conducted by Xilinx indicates that FPGA debug and verification accounts for
nearly half of FPGA design time
ChipScope Pro Software - 359

Page 334

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

ChipScope Pro Software

Importance of Debug
Show Slide 360:

Logic of Debug

Create Design

Engineers are trained to solve problems


Debug is problem solving
Break a problem into basic parts
Remove or reduce variables
and variation
Predict and verify

Debug is an iterative process


Verification is a component
of debug

Modify Design
Probe
Design

Identify Fix

Analyze
Debug Data

Confirming no problems remain

Reconfigurable nature of FPGAs enables


an iterative debug process
ChipScope Pro Software - 360

Verify Design

2008 Xilinx, Inc. All Rights Reserved

Show Slide 361:

Xilinx ChipScope Pro Software


Dramatically Shortens Debug and Verification

Works the way you solve problems

Breaks a problem into basic parts


Removes variation introduced by
external debug solutions
Enables a very fast, iterative
process of prediction and verification

Provides what you have requested

Shrink
Shrink overall
overall design
design
time
time by
by 25%
25%
Final Device
ChipScope Pro 20%
OnOn-Chip Verification of
Design
and Debug Tool Time

Reduction of debug and verification time


A powerful tool that is easy to use
Focus on solving the problem, not on
learning the tool
Integrated part of the Xilinx FPGA design
flow

ChipScope Pro Software - 361

40%
of
Design
Time

Design
Implementation

Design
Specification

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 335

ChipScope Pro Software

Facilitator Guide

ChipScope Pro Software Cores


Show Slide 362:

Lessons

ChipScope Pro Software - 362

Importance of Debug
ChipScope Pro Software Cores
Design Flows
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 363:

What is the ChipScope Pro


Software?

Tailored debug and verification cores


Efficient core generation and insertion tools
Total control via JTAG

ChipScope Pro Software - 363

Page 336

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

ChipScope Pro Software

ChipScope Pro Software Cores


Show Slide 364:

Multiple Debug Cores to Address


Different Debug Challenges
Integrated Logic Analysis (ILA)
Core

Virtual Input/Output (VIO)


Core
Virtual inputs and outputs
Stimulate logic with pulse trains

OPB GPIO
Bridge

IBA/PLBv46-specific bus
analysis core integrated with
EDK
IBA/OPB and IBA/PLB still
supported
Protocol detection
Debug and verify control,
address, and data buses

PLB Bus

Access internal nodes and signals


Debug and verify signal behavior
Define detailed trigger conditions

Agilent Trace Core 2 (ATC2)

OPB Bus

Integrated Bus Analysis (IBA)


Core

User Logic

Arbiter

Aurora

Agilent-created core enabling


on-chip debug of Xilinx FPGAs via
Agilent FPGA Dynamic Probing

OPB SDRAM

View cores as virtual test headers


headers
placed anywhere in the design

ChipScope Pro Software - 364

2008 Xilinx, Inc. All Rights Reserved

Show Slide 365:

Core Resources

ChipScope Pro software cores utilize FPGA resources

For what?

You must leave room for the ChipScope Pro software cores in the FPGA

Block RAM: trigger and data storage


Slice logic: trigger comparisons

This may require using a larger part in the same package as you will use in
production

ChipScope Pro software 10.1 includes a built-in resource estimator

ChipScope Pro Software - 365

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 337

ChipScope Pro Software

Facilitator Guide

ChipScope Pro Software Cores


Core Resources
Depth
256

Depth
512

Depth
1024

Depth
2048

Depth
4096

1 block RAM

15

2 block RAMs

31

15

4 block RAMs

63

31

15

8 block RAMs

127

63

31

15

16 block RAMs

255

127

63

31

15

32 block RAMs

255

127

63

31

64 block RAMs

255

127

63

128 block RAMs

255

127

256 block RAMs

255

text

Show Slide 366:

Using ChipScope Pro


Software

or

Core
Inserter

ChipScope
ChipScopePro
Pro
Core
CoreGenerator
Generator

Attach internal nodes for


viewing to the ChipScope
Pro software core
Generate the ChipScope Pro
software cores by using the
ChipScope Pro Core
Generator or Core Inserter
tools

Instantiate
InstantiateCores
Coresinto
into
Source
SourceHDL
HDL
Connect
ConnectInternal
InternalSignals
Signals
to
toCore
Core(in
(inSource
SourceHDL)
HDL)

Place and route the design with


the Xilinx ISE implementation
tools
Download the bitstream to the
device under test and analyze
the design with the ChipScope
Pro software

ChipScope Pro Software - 366

Page 338

Core
Generator

Place ChipScope Pro


software cores into the
design

ChipScope
ChipScopePro
ProCore
Core
Inserter
(intonetlist)
netlist)
Inserter(into

Synthesize
Synthesize

Implement
Implement
Download
Downloadand
andDebug
Debug
Using
UsingChipScope
ChipScopePro
ProSoftware
Software

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Synthesize
Synthesize

Facilitator Guide

ChipScope Pro Software

ChipScope Pro Software Cores


Show Slide 367:

ChipScope Pro Software


ICON Core

ICON (Integrated Control) core: This core controls up to 15 capture cores

The ICON core interfaces between the JTAG interface and the capture
cores

Capture cores: customizable cores for creating triggers and data storage

Customizable number, width, and storage of trigger ports

ILA (Integrated Logic Analyzer) core: capture core for HDL designs
ILA/ATC (Integrated Logic Analyzer with Agilent Trace) core: similar to the ILA
core, except data is captured off-chip by the Agilent Trace Port Analyzer
IBA/OPB (Integrated Bus Analyzer for CoreConnect On-Chip Peripheral Bus)
core: capture core for debugging CoreConnect OPB buses
IBA/PLB (Integrated Bus Analyzer for CoreConnect Processor Local Bus)
core: similar to the IBA/OPB core, except for the PLB bus
IBA/PLBv46 supported through EDK
VIO (Virtual Input/Output) core: define and generate virtual I/O ports

ChipScope Pro Software - 367

2008 Xilinx, Inc. All Rights Reserved

Show Slide 368:

ChipScope Pro Software ILA


Core

User-selectable, one to four trigger ports

Up to 256 channels per trigger port


Multiple match units on the same trigger port
Up to 16 match units

Trigger condition sequencer

ChipScope Pro Software - 368

For example, 4 trigger ports, 4 match units


each = 16 match conditions

Defines complex trigger sequences that include


up to 16 states or levels

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 339

ChipScope Pro Software

Facilitator Guide

ChipScope Pro Software Cores


Show Slide 369:

Things to Know About ILA


Cores

Integrated Logic Analyzer (ILA) cores can be added with either the Core
Generator or Core Inserter tools
A design can contain up to 16 ILA cores
Maximum speed of the ILA core
7.1.01i (H.39)
Slowest Middle
Fastest
Speed
Speed
Speed
Grade
Grade
Grade
176 MHz 202 MHz 240 MHz
247 MHz 276 MHz 311 MHz
155 MHz 177 MHz
N/A
152 MHz 177 MHz
N/A
275 MHz 322 MHz 374 MHz

Device
2v1000a
2vp7a
3s400b
3s500e
4vlx25c

6.3.03i (G.38)
Slowest Middle
Fastest
Speed
Speed
Speed
Grade
Grade
Grade
232 MHz 267 MHz 310 MHz
267 MHz 307 MHz 343 MHz
163 MHz 187 MHz
N/A
154 MHz 177 MHz
N/A
246 MHz 289 MHz
N/A

a) Performance degradation due to non-optimal path chosen by ISE software tools (Map CR205561)
b) Performance degradation due to new Spartan-3 FPGA speed files and minor path routing differences
c) Performance improvement due to new Virtex-4 FPGA speed files (including new -12 speed grade)

ChipScope Pro Software - 369

2008 Xilinx, Inc. All Rights Reserved

Show Slide 370:

ChipScope Pro Software VIO


Core

Insert virtual pins into your design

Input or output
Synchronous or asynchronous

Up to 256 bits each

System clock or JTAG clock

Inputs are virtual LEDs

Outputs are virtual DIP switches

Different refresh rates are available


Force value or pulse train into the
FPGA

ChipScope Pro Software - 370

Page 340

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

ChipScope Pro Software

ChipScope Pro Software Cores


Show Slide 371:

Things to Know About VIO


Cores

Can only be added with the ChipScope Pro Core Generator tool
Uses no block RAM, only logic
Inputs are like LEDs, for examining signals
Outputs are switches or pushbuttons, for driving signals

ChipScope Pro Software - 371

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 341

ChipScope Pro Software

Facilitator Guide

Design Flows
Show Slide 372:

Lessons

ChipScope Pro Software - 372

Importance of Debug
ChipScope Pro Software Cores
Design Flows
Summary

2008 Xilinx, Inc. All Rights Reserved

Show Slide 373:

Core Inserter Flow

Core Inserter inserts cores


directly into the netlist

HDL code is untouched


Only post-synthesis nodes
are available
Bypass this tool to remove
cores
Inserter must perform the
first portion of translate
Core generation and
insertion are done together
ChipScope Pro Core
Inserter tool is run from
within Project Navigator

ChipScope Pro Software - 373

Page 342

ChipScope
ChipScopePro
Pro
Core
CoreGenerator
Generator
Instantiate
InstantiateCores
Coresinto
into
Source
SourceHDL
HDL
Connect
ConnectInternal
InternalSignals
Signals
to
toCore
Core(in
(inSource
SourceHDL)
HDL)

ChipScope
ChipScopePro
ProCore
Core
Inserter
Inserter(into
(intonetlist)
netlist)

Synthesize
Synthesize

Implement
Implement
Download
Downloadand
andDebug
Debug
Using
UsingChipScope
ChipScopePro
ProSoftware
Software

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Synthesize
Synthesize

Facilitator Guide

ChipScope Pro Software

Design Flows
Show Slide 374:

Core Generator Flow

Generate cores that are


instantiated directly into the
HDL

Allows access to all HDL


nodes
Requires changes to the
code
Must comment out cores to
remove them
Uses standard
implementation flow
Core generation and
insertion done separately

ChipScope Pro Software - 374

ChipScope
ChipScopePro
Pro
Core
CoreGenerator
Generator
Instantiate
InstantiateCores
Coresinto
into
Source
SourceHDL
HDL
Connect
ConnectInternal
InternalSignals
Signals
to
toCore
Core(in
(inSource
SourceHDL)
HDL)

Synthesize
Synthesize

ChipScope
ChipScopePro
ProCore
Core
Inserter
Inserter(into
(intonetlist)
netlist)

Synthesize
Synthesize

Implement
Implement
Download
Downloadand
andDebug
Debug
Using
UsingChipScope
ChipScopePro
ProSoftware
Software

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 343

ChipScope Pro Software

Facilitator Guide

Summary
Show Slide 375:

Lessons

Importance of Debug
ChipScope Pro Software Cores
Design Flows
Summary

2008 Xilinx, Inc. All Rights Reserved

ChipScope Pro Software - 375

Show Slide 376:

Summary

Shorten debug time by up to 50 percent

Break the problem into manageable parts


ChipScope Pro software enables rapid iteration

Add ChipScope Pro software cores at any time

Specialized cores allow you to focus on solving problems

Debug in three simple steps


ILA for viewing results
VIO for driving changes

Minimal impact to FPGA design

Design at system speed


Optimized cores consume minimal FPGA resources

ChipScope Pro Software - 376

Page 344

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

ChipScope Pro Software

Summary
Where Can I Learn More?
!

www.xilinx.com/chipscopepro
View recorded ChipScope Pro software
product demos
Access a 60-day free evaluation version
of the ChipScope Pro tools
Access ChipScope Pro software
documentation (user guide, at-a-glance summary of
features)
Obtain information on Agilent FPGA Dynamic Probe
technology (combine on-chip debug with the power of a
logic analyzer)

Transition to Lab 8: ChipScope Pro Software

www.xilinx.com
1-877-XLX-CLAS

Page 345

Lab 8: ChipScope Pro Software

Facilitator Guide

Lab 8: ChipScope Pro Software


Purpose

After completing this lab, you will be able to:


!

Use the Core Inserter tool to add ChipScope Pro software


cores to an existing design

Use the ChipScope Pro Analyzer tool to configure an FPGA, set


trigger conditions, analyze, and debug a design

Time

60 minutes
Process

This optional lab illustrates how to use the ChipScope Pro software
to add the Analyzer ILA core and prepare for debugging.
General Flow

Page 346

Step 1: Download the non-working design

Step 2: Create ChipScope Pro software cores

Step 3: Debug the design

Step 4: Examine resource utilization

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Lab 8: ChipScope Pro Software

Lab
Designing for Performance Lab Workbook
!

Refer to the separate lab workbook for the ChipScope Pro


Software lab.

Transition to Course Summary

www.xilinx.com
1-877-XLX-CLAS

Page 347

Course Summary

Facilitator Guide

Course Summary
Purpose

This module reviews day two of the course and provides a


summary of the course.
Time

10 minutes
Process

This module reviews day two of the course and provides a


summary of the course.
Lessons
!

Page 348

Course Summary

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Course Summary

Course Summary
Show Slide 377:

Designing for Performance


Course Summary

Show Slide 378:

Day Two Review

How can you use the Timing Analyzer to improve design performance?

How do path-specific timing constraints help you to meet your


performance objectives?

What advanced software settings can you use to increase performance?

Course Summary - 378

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 349

Course Summary

Facilitator Guide

Course Summary
Show Slide 379:

Day Two Review Answers

How can you use the Timing Analyzer to improve design performance?

Use the detailed path descriptions to find the root cause of timing errors
Cross-probe to the Floorplan Editor to view the placement of logic

How do path-specific timing constraints help you to meet your


performance objectives?

Multicycle and false paths provide the Xilinx implementation tools greater
flexibility in meeting your timing objectives
Path-specific (critical paths) constraints have a higher priority in the
implementation tools

2008 Xilinx, Inc. All Rights Reserved

Course Summary - 379

Show Slide 380:

Day Two Review Answers

What advanced software settings can you use to increase performance?

MAP: Timing-driven packing


PAR: Extra effort level
Xplorer

Course Summary - 380

Page 350

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Facilitator Guide

Course Summary

Course Summary
Show Slide 381:

Day One Summary

A flow for achieving timing closure was presented


The Virtex-5 FPGA architecture has many dedicated resources that can
improve performance and lower power
The DCM and PLL has many features that can increase design
performance
There are many clock features available for high-speed design
You can increase design performance by duplicating flip-flops, pipelining,
and using I/O flip-flops
Synthesis tools have many different options to improve synthesis results
CORE Generator software system cores can be used to take full
advantage of the Xilinx FPGA architecture

2008 Xilinx, Inc. All Rights Reserved

Course Summary - 381

Show Slide 382:

Day Two Summary

Timing reports are used to identify critical paths and analyze the cause of
timing failures
Multicycle, false path, and critical path timing constraints can be easily
specified via the Advanced tab in the Xilinx Constraints Editor
Advanced implementation options, such as timing-driven packing, extra
effort level, and Xplorer can help increase performance

Course Summary - 382

2008 Xilinx, Inc. All Rights Reserved

www.xilinx.com
1-877-XLX-CLAS

Page 351