Вы находитесь на странице: 1из 30

Cache Coherence Protocols

in
Shared Memory Multiprocessors

Mehmet Şenvar

Cache Coherence Protocols 1


Outline
 Introduction
 Background Information
 The cache coherence problem
 Cahce Enforcement Strategies
 Consistency models
 Simple Solutions
 Hardware Protocols
 Snooping protocols
 Directory-based protocols
 Compiler and Software protocols
 Future work and conclusions
Cache Coherence Protocols 2
The Cache Coherence Problem
 Caches allow greater performance by storing
frequently used data in faster memory
 Since all processors share the same address space, it
is possible for more than one processor to cache an
address (or data item) at a time
 If one processor updates the data item without
informing the other processor, inconsistencies may
result and cause incorrect executions

Cache Coherence Protocols 3


Cache Coherence Problem

Cache Coherence Protocols 4


Cache Coherence (cont.)
 For correct execution, coherence must be enforced
between the caches
 Two major factors are:
 performance
 implementation cost
 Four primary design issues are:
 coherence detection strategy
 coherence enforcement strategy
 precision of block-sharing information
 cache block size

Cache Coherence Protocols 5


Cache Enforcement Strategies
 A cache enforcement strategy is the
mechanism which makes caches consistent
 write-update (WU)
 write-invalidate (WI)
 hybrid protocols, competitive-update (CU)

 Performance of WU and WI vary depending


on the application and the number of writes
 Hybrid protocols switch between WU and WI
based on the # of writes to a block

Cache Coherence Protocols 6


Consistency Models
 A consistency model defines how the
consistency of data values is maintained
 Some consistency models are:
 sequential consistency
 weak consistency
 release consistency

 Weak consistency models are more efficient to


implement and require fewer coherence
messages

Cache Coherence Protocols 7


Shared Caches (1)
Processors share a single cache, essentially punting
the problem.
• Useful for very small machines.
• E.g., DPC in the Encore, Alliant FX/8.
• Problems are limited cache bandwidth and cache
interference
• Benefits are fine-grain sharing and prefetch effects

Cache Coherence Protocols 8


Non-cacheable Items (2)
 Make shared data non-cacheable
 One of the simplest software solution
 Also at hardware, make cache locations
unreachable

Cache Coherence Protocols 9


Broadcast Writes (3)
 Every cache write request is sent to all other
caches
 Firstly need to discover whether each cache
hold this data
 Other copies are either updated or invalidated
 Significant additional memory transactions
occur

Cache Coherence Protocols 10


Hardware Protocols
 Snoop Bus Mechanism
 Directory Based Methods
 Full Directory
 Limited Directory

 Chained Directory

Cache Coherence Protocols 11


Snoop Bus Protocol
 Snooping protocols rely on a shared bus between the
processors for coherence
 On a processor write, the write is passed through the cache
to main memory on the bus
 Any processor caching the address may update or
invalidate its cache entry as appropriate
 Snooping protocols do not scale well beyond 32
processors because of the shared bus
 The choice between WU, WI, and CU is especially
important to reduce communication

Cache Coherence Protocols 12


MESI (4-state) Invalidation Protocol
 Each line in the cache can be in one of 4 states
 Modifed (exclusive) : only in 1 cache, modified
 Exclusive (unmodified) : only in 1 cache,
unmodified
 Shared (unmodified)

 Invalid

Cache Coherence Protocols 13


MESI State Transition Diagram

Cache Coherence Protocols 14


MESI Example

Cache Coherence Protocols 15


Directory-Based Protocols
 Directory-based protocols do not rely on a shared bus
to exchange coherence information (use point-to-
point connections)
 more scaleable (can have hundreds of processors)
 each processor can have its own memory
 implement weak consistency for efficiency

Cache Coherence Protocols 16


Directory-Based Protocols (cont.)
 Each node maintains a directory storing cache
information and memory information
 A processor communicates with the directory to access
memory
 if a processor requests a non-local memory page, the
directory uses its information to find the page
 Then, it uses messages to retrieve the page and insure all
other processors have consistent info.
 Since the directory maintains which processors are caching
the page, it only needs to send messages to those processors

Cache Coherence Protocols 17


Directory-Based Protocols (cont.)
 Designing a directory requires defining:
 cache block granularity
 cache controller design
 directory structure
 Cache block granularity is the size of the cache and the
size of a cache line
 CC-NUMA machines have a separate, smaller cache from main
memory
 COMA machines use node’s entire memory as cache for remote
pages
 Block size affects performance (false sharing)

Cache Coherence Protocols 18


Directory-Based Protocols (cont.)
 Cache controller is hardware that maintains the
directory and processes memory requests
 custom hardware
 programmable protocol processor

 The directory structure is how the cache and


memory information is organized
 p+1-bit full directory
 linked-list directories
 tagged directories

Cache Coherence Protocols 19


Directory Models
 Full Directory
 Link to all caches for all shared locations
 Limited Directory
 To some caches having shared data, n < N
 Chained (linked)Directory
 To one chache, form ths cache to others,
single/double link

Cache Coherence Protocols 20


Directory Sample (full)

Cache Coherence Protocols 21


Lock-Based Protocols
 New work that promises to be more scaleable than
directory protocols
 Implements scope consistency which is similar to lazy
release consistency
 Coherence information exchanged by reading and
writing notices from the lock which protects the shared
memory
 Currently, implemented in software similar to DSM,
but may move to hardware if performance gains can be
realized
Cache Coherence Protocols 22
Software Protocols
 Software protocols enforce consistency with
limited hardware support by relying either on
the compiler or specialized software handlers
 Similar to distributed shared memory (DSM)
systems but at a lower level
 sharing usually in blocks not pages
 needs to be more efficient for better performance

 architecture support for sharing

Cache Coherence Protocols 23


Classification of Software
Protocols
 Several criteria distinguish software protocols:
 dynamism - compile-time or run-time analysis
 selectivity - level of coherence actions
 restrictiveness - conservative or as-needed consistency
enforcement
 adaptivity - can protocol adapt to access patterns
 granularity - size and structure of coherence data
 blocking - program block on which coherence is enforced
 positioning - position of coherence instructions
 updating - how memory is updated after a write
 checking - how incoherence is detected

Cache Coherence Protocols 24


Software Coherence with
Limited Hardware Support
 Compiler must generate consistent code as no hardware
coherence provided
 Hardware maintains time tags which are updated on every
write
 On a read, compiler generates coherence reads which check
time tags to insure data is consistent
 Relies on the compiler to detect read which may be
inconsistent, and the hardware must maintain these time tags
 Using tags, it is also possible to perform dynamic self-
invalidation of blocks
 Many techniques based on using these time tags

Cache Coherence Protocols 25


Limited Hardware Support
(cont.)
 If hardware has no time tags, Petersen and Li
developed an algorithm which uses only page
translation hardware and page status tables
 Sharing information is maintained by a software
handler at the page-level
 On a page access or fault, the software handler checks
the sharing information, updates page tables, and
performs coherence actions
 Slower than hardware as software handlers involve
the OS and are on the critical memory access path

Cache Coherence Protocols 26


Enforcing Coherence by
Restricting Parallelism
 Compilers can also guarantee coherence by
structuring the language to limit parallelism
 easier to enforce coherence
 limits the programmer and potential parallelism
 simplifies compiler design
 good performance can be achieved with no hardware
support
 Parallel language restrictions include:
 doall parallel loops
 master/slave processes

Cache Coherence Protocols 27


Optimizing Compilers
 Optimizing compilers are designed to maintain
coherence with limited hardware support without
overly restricting the programmer
 rely on detecting data dependencies
 may use synchronization variables (locks, barriers)
 can provide the hardware with hints
 can detect when coherence is not needed
 may have problems with dynamic sharing
 offer good performance, but are hard to design

Cache Coherence Protocols 28


Future Work
 Hardware protocols are well defined, and the directory
structure is near optimal
 Cost improvements can be obtained by mass producing
cache controller chips
 Software protocols are a good area for future research
because they are also applicable at higher-levels of
sharing (DSM, databases, ...)
 Optimizing compilers need to be improved to detect
data dependencies and optimize code for the parallel
environment
Cache Coherence Protocols 29
Conclusions
 Hardware protocols offer the best performance but
require high hardware costs
 Software protocols can be used when there is no
hardware support with a slight performance penalty
 Optimizing compilers can enforce coherence or
provide hints to the hardware
 A combination of hardware and compiler
optimizations is the best

Cache Coherence Protocols 30

Вам также может понравиться