Академический Документы
Профессиональный Документы
Культура Документы
Lecture 12.
Multi-Processors
(P&H 5.10, 6.5, H&P 5.1-5.5)
Hyeran Jeon
0x0FFFFFFF
0xFFFFFFFF
Page Segment
two (segment base and
Addressing fields One (offset)
offset)
Programmer visible? Invisible May be visible
– accessed to get the physical address of a Virtual page number Page offset
given virtual address
Page table register x table entry size in byte
• Page table register
Page table
– contains the base address of the page
table +
• Table entry address =
Page table register value + Virtual page RWX V M R .. Physical page number
number x Page table entry size in byte
• Meta data
– RWX : Read/Write/eXecute
– V : Valid (If valid, the page is in memory.
Otherwise, page fault exception!) 29 12 11 0
– M : Modified Physical page number Page offset
– R : Referenced (used for page
replacement)
SJSU SAN JOSÉ STATE
4 UNIVERSITY
Translating Using a Hierarchical Page Table
29 12 11 0
Physical page number Page offset
Tag Index Block
29 87 54 0
physical address is 30
bits in this example
= Data
SAN JOSÉ STATE
8 SJSU
Match Hit! UNIVERSITY
Multiprocessor
• ILP is an optimization within a processor (or core)
• Parallelism in processor unit Multiprocessor!
– Multiple CPUs or Multiple Cores within a CPU
• Two representative types:
– Shared-memory (SMP/CMP): shared data management is a key issue
– Message-passing: explicit exchange of data among the processors that
use their own memory
CPU CPU
Moore’s Law
CPU
• Write propagation
– Writes are visible to other processors
• Write serialization
– All writes to the same location are seen in the same order by all
processors
PrRd/-- BusRd/--
PrWr/BusWr
BusRd/-- V I BusWr/--
• Why problem?
– Whenever a cache that has the block wants to write to the block, the
cache needs to broadcast “invalidate” even though it is the only cache
that holds the block copy.
– How can we reduce unnecessary broadcasting overhead?
• In the next class...
• Block is placed into the E state if there isn’t any other caches that hold the
same block
– “Shared” signal on bus can detect that the copy is unique; snooping caches assert
the signal if they also have a copy
– On a read miss, go to E or S depending on the value returned by the shared line
– Silent transition from E to M is possible on write
I I
BusRd/-- BusRd/--
BusUpgr/-- BusUpgr/--
BusRdX/-- BusRdX/--
cache cache
bus
XMemory
SJSU SAN JOSÉ STATE
16 UNIVERSITY
Example 1
Processor
PrRd/--
A Processor
PrRd/--
B
PrWr/-- PrWr/--
Example:
M M
Data block X is only in memory.
Processor A reads block X. PrRd/-- PrWr/ PrRd/-- PrWr/
-- --
Then, Processor B reads block X. BusRd/ BusRd/
Flush E PrWr/ Flush E PrWr/
BusUpgr BusUpgr
[Processor A] Read_on X BusRdX/ BusRd/-- BusRdX/ BusRd/--
Action: PrRd(S)/BusRd Flush PrRd/--
BusRd/-- PrWr/
Flush PrRd/--
BusRd/-- PrWr/
Transition: I (initial) E BusRdX BusRdX
_ _
BusRdX/-- BusRdX/--
S PrRd(S)/ S PrRd(S)/
BusRd BusRd
PrRd(S)/ PrRd(S)/
BusUpgr/-- BusUpgr/--
BusRd BusRd
BusRdX/-- BusRdX/--
I I
BusRd/-- BusRd/--
BusUpgr/-- BusUpgr/--
BusRdX/-- BusRdX/--
cache cache
bus
XMemory
SJSU SAN JOSÉ STATE
17 UNIVERSITY
Example 1
Processor
PrRd/--
A Processor
PrRd/--
B
PrWr/-- PrWr/--
Example:
M M
Data block X is only in memory.
Processor A reads block X. PrRd/-- PrWr/ PrRd/-- PrWr/
-- --
Then, Processor B reads block X. BusRd/ BusRd/
Flush E PrWr/ Flush E PrWr/
BusUpgr BusUpgr
[Processor A] Read_on X BusRdX/ BusRd/--
BusRd/-- BusRdX/ BusRd/--
Action: PrRd(S)/BusRd Flush PrRd/--
BusRd/-- PrWr/
Flush PrRd/--
BusRd/-- PrWr/
Transition: I (initial) E BusRdX BusRdX
_ _
BusRdX/-- BusRdX/--
S PrRd(S)/ S PrRd(S)/
[Processor B] Read on X BusRd BusRd
PrRd(S)/ PrRd(S)/
Action(B): PrRd(S)/BusRd BusUpgr/--
BusRd
BusUpgr/--
BusRd
BusRdX/-- BusRdX/--
Transition(B): I (initial) S
Action(A): BusRd/-- I I
Transition(A): ES BusRd/--
BusUpgr/--
BusRd/--
BusUpgr/--
BusRdX/-- BusRdX/--
cache X cache
bus
XMemory
SJSU SAN JOSÉ STATE
18 UNIVERSITY
Example 2
Processor
PrRd/--
A Processor
PrRd/--
B
PrWr/-- PrWr/--
Example:
M M
Data block X is only in memory.
Processor A reads then writes block X. PrRd/-- PrWr/ PrRd/-- PrWr/
-- --
Then, Processor B reads block X. BusRd/ BusRd/
Flush E PrWr/ Flush E PrWr/
BusUpgr BusUpgr
BusRdX/ BusRd/-- BusRdX/ BusRd/--
Flush PrRd/-- Flush PrRd/--
BusRd/-- PrWr/ BusRd/-- PrWr/
BusRdX BusRdX
_ _
BusRdX/-- BusRdX/--
S PrRd(S)/ S PrRd(S)/
BusRd BusRd
PrRd(S)/ PrRd(S)/
BusUpgr/-- BusUpgr/--
BusRd BusRd
BusRdX/-- BusRdX/--
I I
BusRd/-- BusRd/--
BusUpgr/-- BusUpgr/--
BusRdX/-- BusRdX/--
cache cache
bus
XMemory
SJSU SAN JOSÉ STATE
19 UNIVERSITY
Example 2
Processor
PrRd/--
A Processor
PrRd/--
B
PrWr/-- PrWr/--
Example:
M M
Data block X is only in memory.
Processor A reads then writes block X. PrRd/-- PrWr/ PrRd/-- PrWr/
-- --
Then, Processor B reads block X. BusRd/ BusRd/
Flush E PrWr/ Flush E PrWr/
BusUpgr BusUpgr
[Processor A] Read_on X BusRdX/ BusRd/-- BusRdX/ BusRd/--
Action: PrRd(S)/BusRd Flush PrRd/--
BusRd/-- PrWr/
Flush PrRd/--
BusRd/-- PrWr/
Transition: I (initial) E BusRdX BusRdX
_ _
BusRdX/-- BusRdX/--
S PrRd(S)/ S PrRd(S)/
BusRd BusRd
PrRd(S)/ PrRd(S)/
BusUpgr/-- BusUpgr/--
BusRd BusRd
BusRdX/-- BusRdX/--
I I
BusRd/-- BusRd/--
BusUpgr/-- BusUpgr/--
BusRdX/-- BusRdX/--
cache cache
bus
XMemory
SJSU SAN JOSÉ STATE
20 UNIVERSITY
Example 2
Processor
PrRd/--
A Processor
PrRd/--
B
PrWr/-- PrWr/--
Example:
M M
Data block X is only in memory.
Processor A reads then writes block X. PrRd/-- PrWr/ PrRd/-- PrWr/
-- --
Then, Processor B reads block X. BusRd/ BusRd/
Flush E PrWr/ Flush E PrWr/
BusUpgr BusUpgr
[Processor A] Read_on X BusRdX/ BusRd/-- BusRdX/ BusRd/--
Action: PrRd(S)/BusRd Flush PrRd/--
BusRd/-- PrWr/
Flush PrRd/--
BusRd/-- PrWr/
Transition: I (initial) E BusRdX BusRdX
Silent BusRdX/--
_
BusRdX/--
_
S PrRd(S)/ S PrRd(S)/
[Processor A] Write on Xtransition BusRd BusRd
PrRd(S)/ PrRd(S)/
Action: PrWr/-- BusUpgr/--
BusRd
BusUpgr/--
BusRd
BusRdX/-- BusRdX/--
Transition: EM
I I
BusRd/-- BusRd/--
BusUpgr/-- BusUpgr/--
BusRdX/-- BusRdX/--
cache X cache
bus
XMemory
SJSU SAN JOSÉ STATE
21 UNIVERSITY
Example 2
Processor
PrRd/--
A Processor
PrRd/--
B
PrWr/-- PrWr/--
Example:
M M
Data block X is only in memory.
Processor A reads then writes block X. PrRd/-- PrWr/ PrRd/-- PrWr/
-- --
Then, Processor B reads block X. BusRd/ BusRd/
Flush E PrWr/ Flush E PrWr/
BusUpgr BusUpgr
[Processor A] Read_on X BusRdX/ BusRd/-- BusRdX/ BusRd/--
Action: PrRd(S)/BusRd Flush PrRd/--
BusRd/-- PrWr/
Flush PrRd/--
BusRd/-- PrWr/
Transition: I (initial) E BusRdX BusRdX
_ _
BusRdX/-- BusRdX/--
S PrRd(S)/ S PrRd(S)/
[Processor A] Write on X BusRd BusRd
PrRd(S)/ PrRd(S)/
Action: PrWr/-- BusUpgr/--
BusRd
BusUpgr/--
BusRd
BusRdX/-- BusRdX/--
Transition: EM
I I
[Processor B] Read on X BusRd/--
BusUpgr/--
BusRd/--
BusUpgr/--
Action(B): PrRd(S)/BusRd BusRdX/-- BusRdX/--
Transition(B): I (initial) S
cache X cache
Action(A): BusRd/Flush
Transition(A): MS X
bus
XMemory
SJSU SAN JOSÉ STATE
22 UNIVERSITY
Snoopy Invalidation Tradeoffs
• Cache-to-cache vs. Memory-to-cache transfer
– On a BusRd, should data come from another cache or memory?
– Another cache
• might be faster if memory is slow or highly contended
• what if there are several caches sharing the same block? who will provide the
block?
– Memory
• would be simpler because no need to identify who (if there are several caches
sharing the same block) will provide the block
• requires writeback on MS transition
• Writeback on MS
– Is this necessary? What if the block is updated multiple times by several
caches for a while? Do we need to update memory whenever updating the
block?
Valid
• S state in MOESI is “Shared and potentially dirty”
M O Modified
• AMD Opteron uses MOESI
E S Clean
Not
Shar Shar
Dirty until ed ed I
one shared
copy is
SJSU SAN JOSÉ STATE
24 evicted UNIVERSITY
MOESI
PrRd/--
PrWr/
PrWr/-- • Red-colored transitions: Unique
BusUpgr transitions in MOESI compared
BusRd/Flush BusRd/ M to MESI
PrRd/-- Flush
O • Flush here does not update
PrRd/-- PrWr/
BusRd/
Flush --
memory unless the block is
E PrWr/
BusUpgr
evicted most block
Any request on this communication is via cache-to-
BusRdX/Flush
block is responded by PrRd/-- cache
this cache PrWr/
BusRdX
_
Valid
BusRdX/Flush S PrRd(S)/
BusRd
PrRd(S)/ M O Modified
BusUpgr/--
BusRd
BusRdX/--
BusUpgr/-- E S Clean
BusRdX/Flush I Not
BusRd/-- Shar
Shar
BusUpgr/--
ed ed I
BusRdX/--
This cache loses
ownership. 25 SJSU SAN JOSÉ STATE
UNIVERSITY
Multi-level Cache Hierarchy
• Processors typically have multi-level cache hierarchy
(i.e. L1, L2..)
• When a coherence request arrives, caches in all levels
should be checked long latency
P P P
L1 L1 L1
L2 L2 L2
Interconnection Network
Memory
I I
BusRd/-- BusRd/--
BusUpgr/-- BusUpgr/--
miss BusRdX/-- BusRdX/--
cache cache
bus
XMemory
SJSU SAN JOSÉ STATE
29 UNIVERSITY
Coherence Miss Example
Example: Processor A Processor B
PrRd/-- PrRd/--
Data block X is only in memory. PrWr/-- PrWr/--
cache X X cache
bus
XMemory
SJSU SAN JOSÉ STATE
30 UNIVERSITY
Coherence Miss Example
Example: Processor A Processor B
PrRd/-- PrRd/--
Data block X is only in memory. PrWr/-- PrWr/--
Processor A reads block X.
Processor B writes block X.
M M
Lastly, Processor A reads block X.
BusRd/ BusRd/ PrRd/-- PrWr/
PrRd/-- PrWr/
[Processor A] Read on X Flush BusRd/-- BusUpgr Flush BusRd/-- BusUpgr
encounter
Transition(B): I (initial) M a miss
Action(A): BusRdX/-- I I
Transition(A): SI
as X is out-dated BusRd/-- BusRd/--
miss BusUpgr/--
BusRdX/--
BusUpgr/--
BusRdX/--
[Processor A] Read on X
Action(A): PrRd/BusRd cache X X cache
Transition(A): IS X
Action(B): BusRd/Flush bus
XMemory
SJSU SAN JOSÉ STATE
Transition(B): MS 31 UNIVERSITY
Types of Coherence Misses
• Two types
– True sharing: Cache miss occurred due to the update on a word
in a cache block that your processor actually use
hit
Write on Word1 Word1’ Word2 M Word1 Word2 I
miss
Word1’ Word2 S Read Word1 Word1’ Word2 S Cache miss due to
hit false sharing (B
Write on Word2 Word1’ Word2’ M Word1’ Word2 I never read Word2
miss
Read Word1 Word1’ Word2’ but encounters a
Word1’ Word2’ S S
miss due to the
updated Word2)
• Format
– Similar to Midterm1 and 2
– Calculator/pen/erasure are allowed
• Review
– Lecture slides
– Homework 1 to 6 solutions
– Quiz solutions
– Midterm solutions
• Amdahl’s law
– Make the common case fast
– Overall Speedup =
– I-type
LD/ST: sign-extension
Branch: sign-extension + shift-left-2
• Hazards
– Structural
• HW Organization (i.e. unified i- and d-memory)
– Data : stall pipeline stages until operand value becomes ready
• Data dependency
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10
LW IF ID EXE MEM WB
Next
IF IF ID EXE MEM WB
inst
LW $s2,0($s5) SW IF ID
Flush
LW IF
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10
BEQ $a0,$a1,L1 (NT)
L2: ADD $s1,$t1,$t2 BEQ IF ID EXE MEM WB
LW IF nop nop
Adder
Shift Sum
Left
2 operand value is needed
4 A
S
in ID stage (not in EXE
+
B
stage)
Read
0 CLK
Reg. 1 #
1 Read
PC Addr. Data Reg. 2 #
Read
Write data 1 Zero
ALU
Reg. #
==
I-Cache / I-MEM Read Res. Addr.
Write
data 2
0 0
Data Read
Register File 1 Data
1
Write
Sign Data
Extend D-Cache /
16 32
D-MemSAN JOSÉ STATE
50 SJSU UNIVERSITY
Lecture 5
• Early Branch Determination w/ predicted NT
Actual Branch Outcome
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10
BEQ $a0,$a1,L1 (NT)
L2: ADD $s1,$t1,$t2 BEQ IF ID EXE MEM WB
SUB IF ID EXE
Instruction in the
target address (L2)
SJSU SAN JOSÉ STATE
51 UNIVERSITY
Lecture 5
• Early Branch Determination w/ predicted NT If BNE has dependency
with its preceding inst,
Actual Branch Outcome
OR, one cycle stall is
BEQ $a0,$a1,L1 (NT)
CC1 CC2 CC3 CC4 CC5
required
CC6
to getCC8
CC7
the OR’s
CC9 CC10
result value
L2: ADD $s1,$t1,$t2 BEQ IF ID EXE MEM WB
SW $t5,0($s1) AND IF IF
nop nop nop
LW $s2,0($s5) ADD IF ID EXE
SUB IF ID
Instruction in the
target address (L2)
SJSU SAN JOSÉ STATE
52 UNIVERSITY
Lecture 6
• Dynamic Branch Predictor
– 1-bit and 2-bit Saturating Counter in each entry of a branch prediction buffer
• Could have more than two bits but two bits cover most patterns (i.e. loops)
Predict NT
Predict T
Transistion on T outcome
Transistion on NT outcome
11 10
0 1
00 01
Branch Prediction Buffer
prediction bit
FSM for Last-Outcome FSM for 2-bit
0 T
Prediction Saturating Counter
1 NT
2 T
3 T
T: Taken
SJSU SAN
4 NT JOSÉ STATE
NT: Not Taken 53 UNIVERSITY
Lecture 6
• Two-bit predictor is good for branches in loops
• How to improve prediction rate for the branches
other than the branches in loops?
Example:
Example: 2-bit history register and 2-bit predictor with the following initial values.
There are four branches in the code and their actual outcomes are T, T, NT, NT, respectively
Predictor Code
History (Predicted NT) BNE
0 00 (Actual outcome : T)
00 ..
1 10 BEQ
..
2 00 BEQ
(Update predictor)
3 01 ..
BNE
(Update history)
SJSU SAN JOSÉ STATE
56 UNIVERSITY
Lecture 6
• Global history and global predictor
– Last N branches outcome is used to index global branch predictor buffer
– All branches share the same branch predictor buffer
– For the cases when all branches are strongly correlated with each other
shift-left by 1
Example: 2-bit history register and 2-bit predictor with the following initial values.
regardless the
There are four branches in the code and their actual outcomes are T, T, NT, NT, respectively
branch outcome!!
Shift-left by 1. Predictor Code
History (Predicted NT) BNE
Add the recent 0 00 (Actual outcome : T)
branch outcome 0001 1 10
..
BEQ
to the LSB of ..
history register. 2 00 BEQ
(Update predictor)
Discard the shifted 3 01 ..
BNE
MSB to keep 2-bit only.
(Update history)
SJSU SAN JOSÉ STATE
57 UNIVERSITY
Lecture 6
• Global history and global predictor
– Last N branches outcome is used to index global branch predictor buffer
– All branches share the same branch predictor buffer
– For the cases when all branches are strongly correlated with each other
Example: 2-bit history register and 2-bit predictor with the following initial values.
There are four branches in the code and their actual outcomes are T, T, NT, NT, respectively
Predictor Code
History
0 01 BNE
01 ..
1 10 BEQ
..
2 00 BEQ
3 01 ..
BNE
exception
handler
code fetched
IF ID EX ME
WB
IF ID FP1 FP2 FP3 FP4 FP5
SJSU SAN JOSÉ STATE
63 UNIVERSITY
Lecture 7
• Now let’s see the timing in 2-way superscalar
IF ID EX ME
WB
IF ID FP1 FP2 FP3 FP4 FP5
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12 CC13 CC14 CC15 CC16 CC17
ld.s $f0, 0($r1) IF ID EXE
IF ID EX ME
WB
IF ID FP1 FP2 FP3 FP4 FP5
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12 CC13 CC14 CC15 CC16 CC17
ld.s $f0, 0($r1) IF ID EXE MEM
IF ID EX ME
WB
IF ID FP1 FP2 FP3 FP4 FP5
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 CC11 CC12 CC13 CC14 CC15 CC16 CC17
ld.s $f0, 0($r1) IF ID EXE MEM WB
ld.s $f1, 0($t2) IF ID EXE MEM WB
Not enough floating
subi $t3, $t3, #1 IF ID EXE MEM WB
point instructions
add.s $f2, $f1, $f0 IF ID ID FP1 FP2 FP3 FP4 FP5 WB
and dependencies
addi $t1, $t1, #4 IF ID EXE MEM WB
no perf
addi $t2, $t2, #4 IF ID EXE MEM WB
improvement
st.s $f2, -4($t1) IF ID ID ID ID EXE MEM WB
bnez $t3, Loop IF IF IF IF ID EXE MEM WB
AD1
If reg is already in AD2
reg file, fill the AD3
operand field with ML1
reg id ML2
F0 F2 F4 F6 F8 F10 F12
Register Status (Qi): LD2 LD1 …
SJSU SAN JOSÉ STATE
78 UNIVERSITY
Lecture 8
no. Instruction ISSUE EXE WB
Assume
Assumewe wehave
have: :
I1 ld.s $f6, 34($t2) 1 2-3 - -11MUL/DIV
MUL/DIVunit,
unit,11LD/ST
LD/STunit,
unit,11Arithmetic
Arithmeticunit
unit
I2 ld.s $f2, 45($t3) 2 Instruction takes :
Instruction takes :
I3 mul.s $f0, $f2, $f4 3 - -Load:
Load:22cycles,
cycles,Add/Sub:
Add/Sub:22cycles
cycles
I4 sub.s $f8, $f2, $f6 - -Mult: 10 cycles, Divide: 40 cycles
Mult: 10 cycles, Divide: 40 cycles
I5 div.s $f10 $f0, $f6
I6 add.s $f6, $f8, $f2 Reservation Stations
Busy Op Vj Vk Qj Qk A
LD1 1 I1 - $t2 34 + Regs[$t2]
• Commit
– When an inst is the oldest in the ROB
• i.e. ROB-head points to it
– Write result (if ready/finished bit is set)
• If register producing instruction: write to architected
register file
• If store: write to memory
– Advance ROB-head to next instruction
• Conservative approach
– A ready load must wait until addresses of all preceding stores are known
On Processor Die L3
• AMAT in multi-level cache organization
Off-Chip
= Thit(L1) + Miss_rate(L1) x
[ Thit(L2) + Miss_rate(L2) x
{ Thit(L3) + Miss_rate(L3) x T(memory) } ]
:
block location
Byte 63 Byte 33 Byte 32 1
:
2
# of blocks
0111 0111 1111 1111 0001 1100 32-byte data containing 0x77FF1C68 3
: : :
:
SJSU SAN JOSÉ STATE
92 UNIVERSITY
Lecture 10
52-bit
• Given a 2MB, direct-mapped caches, line (block) size=64bytes
• Data address is 52 bits
Tag Index Block
• Tag size?
– block offset: 6 bits
– # blocks = 221/26 = 215 # bits in index: 15 bits ? bit ? bit ? bit
– # bits in an address: 52 bits
– # bits in tag = # bits in an address - # bits in index - # bits in block offset
= 52 – 15 – 6 = 31 bits
• Random
– Replace a randomly chosen line
• FIFO
– Replace the oldest line
• LRU (Least Recently Used) 0
LRU group
– Replace the least recently used line
0 0
• pseudo-LRU
LRU LRU
– LRU but with less overhead
A B C D
CPU
All accesses L1 misses
L1
L2
All stores
Write Buffer
• Write back
– The value is written only to the cache line. The modified cache line is written to
main memory only when it has to be replaced.
– To distinguish modified cache line dirty bit is used.
miss! 1st miss (200 cycles) miss! 2nd miss (200 cycles)
Blocking Cache 1 1 1 …
1 2
1 miss (200 cycles)
st
Non-Blocking Cache
with 2 MSHRs 1111 2nd miss (200 cycles)
Primary miss
MSHR-1 allocated
2222
Secondary miss MSHR released 3rd miss (200 cycles)
Primary miss
MSHR-2 allocated 3333
2 MSHRs are all occupied Primary miss
No further accesses are acceptable MSHR-1 allocated
97
• Page replacement 31 12 11 0
• +
Size of the page table?
– 32-bit virtual address
RWX V M R .. Physical page number
– 4 KB page
– page table entry: 4 B
31 12 11 0
Virtual page number Page offset
.. TLB
..
29 12 11 0
Physical page number Page offset
PrWr/
PrWr/-- • Red-colored transitions: Unique
BusUpgr transitions in MOESI compared
BusRd/Flush
PrRd/--
BusRd/ M to MESI
Flush
O • Flush here does not update
PrRd/-- PrWr/
BusRd/
Flush
--
memory unless the block is
E PrWr/
BusUpgr
evicted most block
Any request on this communication is via cache-to-
BusRdX/Flush
block is responded by PrRd/-- cache
this cache PrWr/
BusRdX
_
Valid
BusRdX/Flush S PrRd(S)/
BusRd
PrRd(S)/ M O Modified
BusUpgr/--
BusRd
BusRdX/--
BusUpgr/-- E S Clean
BusRdX/Flush I Not
BusRd/-- Shar
Shar
BusUpgr/--
ed ed I
BusRdX/--
This cache loses
ownership.