Вы находитесь на странице: 1из 24

Introspective 3D Chips

S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K.


Banerjee, T. Sherwood (UCSB), ASPLOS 2006

Shimin Chen
(LBA Reading Group
Presentation)

Motivation

Focus: run-time monitoring for development


Tool overhead amount of analysis at test-time
Previous research: specialized on-chip h/w modules
At odds with economics of consumer
microprocessors

May require significant amount of area


Often introduce interconnect congestion
Replicated on every processors whether used or not

Challenge: enabling these techniques with a


minimum of impact on typical end-user systems

Solution: Add-On using 3D

Optionally adding a layer to a


processor specifically for analysis
Developers: processors with this
layer
End users: processors without this
layer

Outline

Introduction
Benefits of Introspection in 3D
Quantifying the Technology
(Methodology)
Architectural Ramifications
(Evaluation)
Conclusion

Benefits of Introspection in
3D

Cutting interconnect impact


Reducing cost for commodity parts
Enabling more powerful software
analysis

Cutting Interconnect
Impact

Previous: gathering data from all over chip for


centralized analysis
Global interconnect

Cross almost every design block


Consume significant top metal layer
Run at high speed
Require wire buffering &
even pipeline latches
Reserve silicon for buffers

Cutting Interconnect
Impact

Previous: global interconnect


3D: Area for inter-layer vias localized to positions of
taps

Reducing Cost for Commodity


Parts

225 million PCs in use vs. 0.7 million


programmers
Need to consider two costs:
Cost of a consumer system:

cirtuit that drives the post and the vertical column


of vias

Cost of a developer system:

adding an extra layer

Enabling More Powerful SW


Analysis

More h/w resources allocated to


analysis

Area
power

Outline

Introduction
Introspection in 3D
Quantifying the Technology
Architectural Ramifications
Conclusion

Cross Section of 3D Chip

Posts:

5um x 5um cross


30 - 40 um high

(compare normal
metal wire: 1um x
1um)

Estimating Interconnect
Overhead

Optimal buffer size and inter-buffer


separation
2D interconnect overhead
3D interconnect overhead
Metalization area

Number of Vertical Posts

Estimate that 1024 bits of profile data will


be generated per cycle (?)

Gathering Profile Data on


Pentium 4

Example HW Monitor

32KB

32KB

16KB

32KB
RISC ARM

16KB

130nm technology, area: 16mm2, power: 2.7W

Outline

Introduction
Introspection in 3D
Quantifying the Technology
Architectural Ramifications
Conclusion

Four Types of Systems to


Compare

Basic System (Sbase)

System with integrated profiling HW


(Sintegrated)

System with profiling HW stacked (Sstacked)

System with profiling stubs (Sstubs)

Routability

Based on Pentium 4 analysis

Sintegrated:

Total wire length=5682.3 mm


Total buffers=~20,000

Sstacked:

Total buffers=1024 (one per post)

Area for Wires and Buffers

Power

Thermal

Thermal

Conclusion

Economic argument: cost of


specialized H/W is decoupled from
consumer market
H/W stubs add only 0.021 mm2
area and 0.9% power

Thank you!

Вам также может понравиться