Вы находитесь на странице: 1из 20


Traditional Von Neumann machine is SISD - it has one instruction stream, one
CPU and one memory.
SIMD machines operate on multiple data sets in parallel.
Typical examples are ector machines and array processors.
SIMD !rray Processor
This architecture consists o" a s#uare $rid o" processor%memory elements.
! sin$le control unit &roadcasts instructions 'hich are carried out in loc(step &y
all the processors, each one usin$ its o'n data "rom ts o'n memory. The array
processor is 'ell-suited to calculations on matrices.
Sin$le Instruction stream Multiple Data stream) ! computer that per"orms one operation
on multiple sets o" data. It is typically used to add or multiply ei$ht or more sets o"
num&ers at the same time "or multimedia encodin$ and renderin$ as 'ell as scienti"ic
applications. *ard'are re$isters are loaded 'ith num&ers, and the mathematical
operation is per"ormed on all re$isters simultaneously.
SIMD capa&ility 'as added to the Pentium CPU startin$ 'ith the Pentium MM+ chip
and enhanced in su&se#uent Pentiums .!rray processors are machines speciali,ed "or
SIMD operations.
SIMD Computer organizations
Con"i$uration - is structured 'ith N synchroni,ed P.s,all o" 'hich are under the control
o" one CU..ach P.i is essentially an !/U 'ith attached 'or(in$ re$isters and local
memory P.Mi "or the stora$e o" distri&uted data.The CU also has its o'n main memory
"or stora$e o" pro$rams.The "unction o" CU is to decode all instruction and determine
'here the decoded instructions should &e executed.Scalar or control type instructions are
directly executed inside the CU.Vector instructions are &roadcasted to the P.s "or
distri&uted execution.
Con"i$uration II di""ers "rom con"i$uration I in t'o aspects.0irst the local memries
attached to the P.s are replaced &y parallel memory modules shared &y all the P.s
throu$h an alio$nment net'or(.Second the inter P. permutation net'or( is replaced &y
inter P. memory ali$nment net'or(.! $ood example o" con"i$uration II SIMD machine
is 1urrou$hs scienti"ic processor.
0ormally an SIMD computer is characteri,ed &y "ollo'in$ set o" parameters.
N2Num&er o" P.s in the system.
02! set o" data routin$ "unctions.
I2Set o" machine instructions
M2Set o" mas(in$ schemes.
Masking and data routing mechanisms
In an array processor, ector operands can &e speci"ied &y the re$isters to &e used or &y
the memory addresses to &e re"erenced. 0or memory-re"erence instructions, each P.i
accesses the local P.M5, o""set &y its o'n index re$ister Ii, The Ii re$ister modi"ies the
$lo&al memory address &roadcast "rom the CU. Thus, di""erent locations in di""erent
P.Ms can &e accessed simultaneously 'ith the same $lo&al address speci"ied &y the CU.
The "ollo'in$. example sho's ho' indexin$ can &e used to address the local memories
in parallel at di""erent local addresses.
.xample 6.- Consider an array o" n x n data elements7
A 2 {!8i,5),0< i,5 3 n - I9
.lements in the 5th column o" A are stored in n consecutie locations o" P.M 5, say "rom
location -:: to location -:: ; n - - 8assume n 3 N). I" the pro$rammer 'ishes to access
the principal dia$onal elements A(j, 5) "or 5 2 :, -. . . n - - o" the array A, <then the CU
must $enerate and &roadcast an
e""ectie memory addresses -:: 8a"ter o""set &y the $lo&al index re$ister I in the CU, i"
there is a &ase address o" A inoled). The local index re$isters must &e set to &e I 5 2 5
"or 5 2 :, -, . . . , n - - in order to conert the $lo&al address -:: to local addresses -:: ;
I 5 2 -:: ; 5 "or each P.M5< =ithin each P., there should &e a separate memory address
re$ister "or holdin$ these local addresses. *o'eer, i" one 'ishes to address a ro' o" the
array A, say the ith ro' A(i, 5) "or 5 2 :, -, >, . . . , n - -, all the I 5 re$isters 'ill &e reset to
&e "or all5 2 :,-,>,..., n - - in order to ensure the parallel access o" the entire ro'.
.xample 6.> To illustrate the necessity o" data routin$ in an array processor, 'e sho'
the execution details o" the "ollo'in$ ector instruction in an array o" N P.s. The sum
S(k) o" the "irst k components in a ector A is desired "or each ( "rom : to n - I. /et A 2
(Ao,A1.. ,An-I)=e need to compute the "ollo'in$ n summations7
S(k) 2 i Ai "or k 2 :, -, . . ., n - -
These n ector summations can &e computed recursiely &y $oin$ throu$h the ? "ollo'in$
n - - iterations de"ined &y7 .
S8@) 2 Ao
S(k) 2 S(k - -) ; Ak "or k 2 -,>, . . ., n - - 86.A)
The a&oe recursie summations "or the case o" n 2 B are implemented in an array
processor 'ith N 2 B P.s in lo$> n 2 C steps.. 1oth data routin$ and P. mas(in$ are used
in the implementation. Initially, each Ai, residin$ in P.Mi, is moed to the Ri re$ister in
P.i "or i 2 :, -. . . n - - (n 2 N 2 B is assumed here). In the "irst step, Ai is routed "rom Rj
to Rj+ - and added to Ai+ - 'ith the resultin$ sum Ai ; Ai+ - in Ri+ l "or i 2 :, -, . . . ,
D. The arro's in 0i$ure 6.C sho' the routin$ operations and the shorthand notation i - 5 is
used to re"er to the intermediate sum !i ; Ai+ - ; .. . ; Aj. In step >, the intermediate sums
in Ri are routed to Ri+2 "or i 2 : to 6. In the "inal step, the intermediate sums in Ri are
routed to Ri+4 "or i 2 : to C. Conse#uently, P.. has the "inal alue o" S(k) "or k 2:,
-,>, ...,E.
!s "ar as the data-routin$ operations are concerned, P.E is not inoled 8receiin$ &ut
not transmittin$) in step -. P.E and P.D are not inoled in step >. !lso P.E, P.D, P.6,
and P.A are not inoled in step C. These un- 'anted P.s are mas(ed o"" durin$ the
correspondin$ steps. Durin$ the addition operations, P.: is disa&led in step -F P.: and
P.l are made inactie in step >F and P.:, P.l, P.>, and P.C are mas(ed o"" in step C.
The P.s that are mas(ed o"" in each step depend on the operation 8data-routin$ or arith-
metic-addition) to &e per"ormed. There"ore, the mas(in$ patterns (eep chan$in$ in the
di""erent operation cycles, as demonstrated &y the example. Note that the mas(in$ and
routin$ operations 'ill &e much more complicated 'hen the ector len$th n 4 N.
!rray processors are special-purpose computers "or limited scienti"ic applica tions. The
array o" P.s are passie arithmetic units 'aitin$ to &e called "or parallel-computation
duties. The permutation net'or( amon$ P.s is under pro$ram control "rom the CU.
*o'eer, the principles o" P. mas(in$, $lo&al ersus local indexin$, and data
permutation are not much chan$ed in the di""erent machines.
Inter PE communications
There are "undamental decisions in desi$nin$ appropriate architecture o" an
interconnection net'or( "or an SIMD machine.The decisions are made &et'een
operation modes,control strate$ies,s'itchin$ methodolo$ies,and net'or( topolo$ies.
Operation Mode:
The types o" communication can &e identi"ied 7Synchronous and asunchronous.
Control strategy7
The control settin$ "umctions can &e mana$ed &y a centrali,ed controller or &y indiidual
s'itchin$ element.The later strate$y is called distri&uted controland the "irst strate$y
corresponds to centrali,ed control.
Switching Methodology:
The t'o ma5or s'itchin$ methodolo$ies are circuit s'itchin$ and pac(et s'itchin$.
Network topology7
The topolo$ies can &e $rouped into t'o cate$ories7static and dynamic.In static topolo$y
dedicated &uses cannot &e recon"i$ured.1ut lin(s in dynamic cate$ory can &e
SIMD Interconnection Networks
Various interconnection net'or(s hae &een su$$ested "or SIMD computers. The
classi"ication includes static ersus dynamic net'or(s, Mesh-connected Illiac net'or(,
Cu&e interconnection net'or(s, 1arrel shi"ter and data manipulator, Shu""le exchan$e
and ome$a net'or(s. *ere 'e 'ill discuss the "irst three net'or(s.
Static ersus dynamic networks
The topolo$ical structure o" an array processor is mainly characteri,ed &y the data
routin$ net'or( used in interconnectin$ the processin$ elements.Such net'or( can &e
speci"ied &y a set o" data routin$ "unctions.
Static networks
Topolo$ies in static net'or( can &e classi"ied accordin$ to the dimensions re#uired "or
layout. .xamples "or one dimensional topolo$ies include linear array.T'o dimensional
topolo$y include rin$, star, tree, mesh, and systolic array. Three dimensional topolo$ies
include completely connected chordal rin$, C cu&e, and C cu&e connected cycle
Dynamic networks
T'o classes o" dynamic net'or(s are there. Sin$le sta$e ersus multista$e.
Single stage networks
! sin$le sta$e net'or( is a s'itchin$ net'or( 'ith N input selectors 8IS) and N output
selectors 8@S)..ach IS is essentially a - to D demultiplexer and each @S is an M to -
multiplexer 'here -3D3N and -3M3N.! sin$le sta$e net'or( 'ith D2M2N is a
cross&ar s'itchin$ net'or(. To esta&lish a desired connectin$ path di""erent path control
si$nals 'ill &e applied to all IS and @S selectors.
! sin$le sta$e net'or( is also called a reciculatin$ net'or(. Data items may hae to
reirculate throu$h the sin$le sta$e seeral times &e"ore reachin$ the "inal destination.The
num&er o" recirculations needed depend on the connectiity in the sin$le sta$e
net'or(.In $eneral,hi$her is the hard'are connectiity,the less is the num&er o"
Multi stage networks
Many sta$es o" an interconnected s'itch "orm the multista$e net'or(.They are descri&ed
&y three characteri,in$ "eatures 7s'itch &ox,net'or( topolo$y and control structure.Many
s'itch &oxes are used in multista$e net'or(s.ach &ox is essentially an interchan$e
deice 'ith t'o inputs and outputs.The "our states o" a s'itch &ox are
7strai$ht,exchan$e,upper &roadcast and lo'er &roadcast.
Mesh!Connected Illiac Network
! sin$le sta$e recirculatin$ net'or( has &een implemented in the Illiac IV array
processor 'ith DA P.s..ach P.i is allo'ed to send data to P. I;-,P. i--,P.,i;r P.i-r 'here
0ormally Illiac net'or( is characteri,ed &y "ollo'in$ "our routin$ "unctions.
H;-8i)28i;-)mod N
H--8i)28i--)mod N
H;r8i)28i;r)mod N
H-r8i)28i-r)mod N
! reduced Illiac net'or( is illustrated in 0i$ 'ith N2-D and r2A.
H;-28: - > IIN--)
H--28N--III> - :)
H;A28: A B ->)8- 6 J -C)8> D -: -A )8C E -- -6)
H-A28-> B A :)8-C J 6 -)8-A -: D > )8-6 -- E C)
This "i$ sho's "our P.s can &e reached "rom any P. in one step,seen P.s in t'o
steps,and eleen P.s in three steps.In $eneral it ta(e I steps to route data "rom P.i to any
other P.5 in an Illiac net'or( o" si,e N 'here I is upper &ouded &y I3 GN--
Cu"e Interconnection Networks
The cu&e net'or( can &e implemented as multi sta$e net'or( "or SIMD
machines.0ormally an n dimensional net'or( o" N pes is speci"ied &y "ollo'in$ n routin$
"unctions. Vertical lines connect ertices 'hose address di""er in the most si$ni"icant &it
position.Vertices at &oth ends o" the dia$onal lines di""er in the middle &it
position.*ori,ontal lines di""er in the least si$ni"icant &it position.
Ci8an--IIai;-ai ai--III..a:) "or i2:,-,>II..n--.
P#$#%%E% #%&O$I'(MS )O$ #$$#* P$OCESSO$S
The ori$inal motiation "or deelopin$ SIMD array processors 'as to per"orm parallel
computations on ector or matrix types of data. Parallel processin$ al$orithms hae &een
deeloped &y many computer scientists "or SIMD computers. Important SIMD
al$orithms can &e used to per"orm matrix multiplication, "ast 0ourier trans"orm 800T),
matrix transposition, summation of ector elements, matrix inersion, parallel sortin$,
linear recurrence, &oolean matrix operations, and to sole partial di""erential e#uations.
=e study &elo' seeral representatie SIMD al$orithms "or matrix multiplication,
parallel sortin$, and parallel 00T. =e shall analy,e the speedups of these parallel
al$orithms oer the se#uential al$orithms on SISD computers. The implementation of
these parallel al$orithms on SIMD machines is descri&ed &y concurrnt !/L@/. The
physical memory allocations and pro$ram implementation depend on the speci"ic
architecture of a $ien SIMD machine.
SIMD Matri+ Multiplication
Many numerical pro&lems suita&le "or parallprocessin$ can &e "ormulated as matrix
computations. Matrix manipulation is "re#uently needed in solin$ linear systems of
e#uations. Important matrix operations include matrix multiplication, /-U
decomposition, and matrix inersion. =e present &elo' t'o parallel al$o- rithms "or
matrix multiplication. The di""erences &et'een SISD and SIMD matrix al$orithms are
pointed out in their pro$ram structures and speed per"ormances. In $eneral, the inner loop
of a multileel SISD pro$ram can &e replaced &y one or more SIMD ector instructions.
/et A 2 Mai(N and ! 2 M&(O N&e n x n matrices. The multiplication of A and ! $enerates a
product matrix C 2 A x ! 2 MCi5N of dimension n x n. The elements of the product matrix C
is related to the elements of A and ! &y7
Ci52Pai( x &(5 86.>>)
There are n
cumulatie multiplications to &e per"ormed in .#. 6.>>. ! cumulatie
multiplication re"ers to the lin(ed multiply-add operation c2 c ; a x ". The addition is
mer$ed into the multiplication &ecause the multiply is e#uialent to multioperand
addition. There"ore, 'e can consider the unit time as the time re#uired to per"orm one
cumulatie multiplication, since add and multiply are per"ormed simultaneously.
In a conentional SISD uniprocessor system, the nC cumulatie multiplications are
carried out &y a serially coded pro$ram 'ith three leels of D@ loops correspondin$ to
three indices to &e used. The time complexity of this se#uential pro$ram is proportional to
n#, as speci"ied in the "ollo'in$ SISD al$orithm "or matrix multiplication.
#n O(n3) algorithm ,or SISD matri+ multiplication
0or i 2 I to n Do
0or j 2 I to n Do
$ij 2 : 8initiali,ation)
0or k 2 I to n Do
$i52Ci5;ai( &i5 8scalar additie multiply)
.nd o" k loop
.nd o" j loop
.nd o" i loop
No', 'e 'ant to implement the matrix multiplication on an SIMD computer 'ith n P.s.
The al$orithm construct depends heaily on the memory allocations o" the A, !, and C
matrices in the P.Ms. Column ectors are then stored 'ithin the same P.M. This
memory allocation scheme allo's parallel access o" all the elements in each ro' ector
o" the matrices. 1ased in this data -distri&ution, 'e o&tain the "ollo'in$ parallel
al$orithm. The t'o parallel do opera tions correspond to %ctor &oad "or initiali,ation and
%ctor 'u&ti(&) "or the inner loop o" additie multiplications. The time complexity has
&een reduced to *(n2).
There"ore, the SIMD al$orithm is n times "aster than the SISD al$orithm "or matrix
#n O(n) algorithm ,or SIMD matri+ multiplication
0or i 2 I to n Do
Par "or k 2 I to n Do
$ik 2 : (rctor &oad)
0or j 2 I to n Do
Par "or k 2 - to n Do
$ik 2 $ik ; aij . "jk (%ctor 'u&ti(&))
.nd o" j loop
.nd o" i loop
It should &e noted that the %ctor &oad operation is per"ormed to initiali,e the ro' ectors
o" matrix C one ro' at a time. In the %ctor 'u&ti(&) operation, the
same multiplier aij is &roadcast "rom the CU to all P.s to multiply all n elements {"ik for
k 2 -,>, ..., n+ o" the ith ro' ector o" !. In total, n2 ector multiply operations are neede
in the dou&le loops.
I" 'e increase the num&er o" P.s used in an array processor to n
an @8n lo$>n) can &e
deised to multiply t'o n xn matrices a and &./et n2>
.Consider an array processor
'hose n
pes are located at the >
ertices o" a >m cu&e net'or(.! >m cu&e net'or(
can &e considered as t'o 8>m--) cu&e net'or(s lin(ed to$ether &y 2' extra ed$es. In
0i$ure a A-cu&e net'or( is constructed "rom t'o C-cu&e net'or(s &y usin$ B extra ed$es
&et'een correspondin$ ertices at the corner positions. 0or clarity, 'e simpli"y the A-
cu&e,dra'in$ &y sho'in$ only one o" the ei$ht "ourth dimension connections. The
remainin$ connections are implied.
/et 8P>m-l P>m->... Pm Pm-l. .,PI P@)>) &e the P. address in the 2' cu&e. =e can achiee the
@8n lo$> n) compute time only i" initially the matrix elements are "aora&ly distri&uted in
the P. ertices. The n ro's o" matrix A are distri&uted oer n distinct P.s 'hose
addresses satis"y the condition
P2m-lP2m-l...Pm =Pm-lPm-2.
as demonstrated in 0i$ure ,.20a "or the initial distri&ution o" "our ro's o" the matrix A in
a A x A matrix multiplication (n 2 A, ' 2 >). The "our ro's o" A are then &roadcast oer
the "ourth dimension and "ront to &ac( ed$es, as mar(ed &y
ro' num&ers in 0i$ure a.
The n columns o" matrix ! 8or the n ro's o" matrix !-) are eenly distri&uted oer the
P.s o" the 2' cu&es, as illustrated in 0i$ure ,.2*c. The "our ro's o" !- are then
&roadcast oer the "ront and &ac( "aces, as sho'n in 0i$ure ,.20d. 0i$ure 6.>- sho's the
com&ined results o" A and !- &roadcasts 'ith the inner product ready to &e computed.
The n-'ay &roadcast depicted in 0i$ure ,.20" and ,.20d
ta(es lo$ n steps, as illustrated in 0i$ure 6.>- in ' 2 lo$> n 2 lo$>A 2 > steps.
The matrix multiplication on a 2'-cu" net'or( is "ormally speci"ied &elo'
-.Transpose 1 to "orm 1
t oer the m cu&es x
>.N-'ay &roadcast each ro' o" 1
to all pes in the m cu&e.
C.N-'ay &roadcast each ro' o" !
A..ach P. no' contain a ro' o" ! and a column o" 1.
Parallel Sorting on #rray Processors
!n SIMD al$orithm is to &e presented "or sortin$ n
elements on a mesh-connected
8llIiac-lV-Ii(e) processor array in *(n) routin$ and comparison steps. This sho's a
speedup o" @8lo$> n) oer the &est sortin$ al$orithm, 'hich ta(es *(n lo$> n) steps on a
uniprocessor system. =e assume an array processor 'ith N 2 n
identical P.s
interconnected &y a mesh net'or( similar to llIiac-IV except that the P.s at the perimeter
hae t'o or three rather than "our nei$h&ors. In other 'ords, there are no .ra(around
connections in this simpli"ied mesh net'or(.
.liminatin$ the 'raparound connections simpli"ies the array-sortin$ al$orithm. The time
complexity o" the array-sortin$ al$orithm 'ould &e a""ected &y, at most, a "actor o" t'o i"
the 'raPiiround connections 'ere included.
T'o time measures are needed to estimate the time complexity o" the parallel sortin$
al$orithm. /et t R &e the routin/ ti' re#uired to moe one item "rom a P. to one o" its
nei$h&ors, and tc &e the co'(ari0on ti' re#uired "or one comparison step. Concurrent
data routin$ is allo'ed. Up to N comparisons may &e per"ormed simultaneously. This
means that a comparison-interchan$e step &et'een t'o items in ad5acent P.s can &e done
in 2tR ; tc time units 8route le"t, compare, and. route ri$ht). ! mixture o" hori,ontal and
ertical comparison interchan$es re#uires at least 4tR ; tc time units.
The sortin$ pro&lem depends on the indexin$ schemes on the P.s. The P.s may &e
indexed &y a &i5ection "rom {1,2,...,n+ x {1,2,...,n+to{0,1,...,N-1+, 'here N 2 n
. The
sortin$ pro&lem can &e "ormulated as the moin$ o" the 5th smallest element in the P.
array "or all 5 2 :, -, >,..., N - -. Illustrated in 0i$ure are three indexin$ patterns "ormed
a"ter sortin$ the $ien array in part a 'ith respect to three<di""erent 'ays "or indexin$ the
P.s. The pattern in part " corresponds to a ro.-'ajord ind1in/, part c corresponds to a
02uffid ro.-'ajor indexin$, and is &ased on a 0nak-&ik ro.-'ajor ind1in/. The
choice o" a particular indexin$ scheme depends upon ho' the sorted elements 'ill &e
used. =e are interested in desi$nin$ sortin$ al$orithms 'hich minimi,e the total routin$
and comparison steps.
The lon$est routin$ path on the mesh in a sortin$ process is the transposition o" t'o
elements initially loaded at opposite corner P.s, as illustrated in 0i$ure 6.>A. This
transposition needs at least 4(n - -) routin$ steps. This means that no al$orithm can sort
elements in a time o" less than *(n). In other 'ords, an *(n) sortin$ al$orithm is
considered optimal on a mesh o" n
P.s. 1e"ore 'e sho' one such optimal sortin$
al$orithm on the mesh-connected P.s, let us reie' 1atcher<s odd-%n 'r/ 0ort o" t'o
sorted se#uences on a set o" linearly connected P.s sho'n in 0i$ure. The 02uff& and
un02uff& operations can each &e implemented 'ith a se#uence o" interchan$e operations
8mar(ed &y the dou&le-arro's in 0i$ure). 1oth the per"ect shu""le and its inerse
8unshu""le) can &e done in k - - interchan$es or 2(k - -) routin$ steps on a linear array o"
2k P.s.
1atcher<s odd-een mer$e sort on a linear array has &een $enerali,ed &y Thompson and
Qun$ to a s#uare array o" P.s. /et M85, k) &e a sortin$ al$orithm "or mer$in$ t'o sorted
5-&y-(%> su&arrays to "orm a sorted j-")-k array, 'here 5 and k are po'ers o" > and k 4 -.
The sna(eli(e ro'-ma5or orderin$ is assumed in all the arrays. In the de$enerate case o"
3(&, >), a sin$le comparison-interchan$e step is su""icient to sort t'o unit su&arrays.
Lien t'o sorted columns o" len$th 5 R >, the M85, >) al$orithm consists o" the "ollo'in$
.xample 6.D7 The M85, >) sortin$ al$orithm
O -7 .Moe all odds to the le"t column and all eens to the ri$ht in 2tk time.
O>7 Use the odd-%n tran0(o0ition 0ort to sort each column in 2jtk ; jtc time.
OC7 Interchan$e on een ro' in 2tk time.
OA7 Per"orm one comparison-interchan$e in 2tk ; tc time
'he M/01k2 algorithm
-. .Per"orm sin$le interchan$e step on een ro's
>. .Unshu""le each ro'.
C. .Mer$e &y callin$ al$orithm m85,(%>)
A. .Shu""le each ro'
6. .Interchan$e on een ro'
D. Comparison interchan$e
#ssociatie array processing.
In this section, 'e descri&e the "unctional or$ani,ation o" an associatie array processor
and arious parallel processin$ "unctions that can &e per"ormed on an associatie
processor. =e classi"y associatie processors &ased on associatie- memory
or$ani,ations. 0inally, 'e identi"y the ma5or searchin$ applications o" associatie
memories and associatie processors. !ssociatie processors hae &een &uilt only as
special-purpose computers "or dedicated applications in the past.
!ssociatie Memory @r$ani,ations
Data stored in an associatie memory are addressed &y their contents. In this sense,
associatie memories hae &een (no'n as content.-addressa&le memory.
4ara&&& 0arc2 ''or) and 'u&tiacc00 ''or) . The ma5or adanta$e o" assosiatie
memory oer H!M is its capa&ility o" per"ormin$ parallel search and parallel search and
parallel comparuison operations. These are "re#uently needed-in-many-imp@rtant
applications., such as the stora$e and retrieal o" rapidly chan$in$ data&ases, radar-
si$nal trac(in$, ima$e processin$, computer ision, and arti"icial intelli$ence. The ma5or
shortcomin$ o" associatie memory is its much increased hard'are cost. Hecently, the
cost o" associatie memory is much Ihi$her than that o" H!Ms.
The structure o" !M is modeled in "i$.The associatuie memory array consists o" n 'ords
'ith m&its per 'ord..ach cell in the array consists o" a "lip "lop associated 'ith some
comparison lo$ic $ates "or pattern match and read 'rite control.! &it slice is a ertical
column o" &it cells o" all the 'ords at the same position.
.ach &it cell 1i5 can &e 'ritten in,read out,or compared 'ith an external interi$atin$
si$nal.The comparand re$ister C28C-,C>,IIII..Cm) is used to hold the (ey operand
&ein$ searched "or .The mas(in$ re$isterM28M-,M>,III..Mm) is used to ena&le the &it
slices to &e inoled in the parallel comparison operations across all the 'ord in the
associatie memory.
In practice, most associatie memories hae the capa&ility o" .ord (ara&&& operationsF
that is, all 'ords in the associatie memory array are inoled in the parallel search
operations. This di""ers drastically "rom the .ord 0ria& operations encountered in H!Ms.
1ased on ho' &it slices are inoled in the operation, 'e consider &elo' t'o di""erent
associatie memory or$ani,ations7
The &it parallel or$ani,ation In a &it parallel or$ani,ation, the comparison process is
per"ormed in a parallel-&y-'ord and parallel-&y-&it "ashion. !ll &it slices 'hich are not
mas(ed o"" &y the mas(in$ pattern are inoled in the comparison process. In this
or$ani,ation, 'ord-match ta$s "or all 'ords are used 80i$ure ,.#4a). .ach cross point in
the array is a &it cell. .ssentially, the entire array o" cells is inoled in a search
1it serial or$ani,ation The memory or$ani,ation in 0i$ure ,.#4" operates 'ith one &it
slice at a time across all the 'ords. The particular &it slice is selected &y an extra lo$ic
and control unit. The &it-cell readouts 'ill &e used in su&se#uent &it-slice operations. The
associatie processor ST!H!N has. the &it serial memory or$ani,ation and the P.P. has
&een installed 'ith the &it parallel or$ani,ation.
The associatie memories are used mainly "or search and retrieal o" non- numeric
in"ormation. The &it serial or$ani,ation re#uires less hard'are &ut is slo'er in speed. The
&it parallel or$ani,ation re#uires additional 'ord-match detection lo$ic &ut is "aster
in<speed. =e present &elo' an example to illustrate the search operation in a &it parallel
associatie memory. 1it serial associatie memory 'ill &e presented in Section 6.A.C 'ith
arious associatie search and retrieal al$orithms.