THE ART OF
COMPUTER PROGRAMMING
VOLUME 4 PREFASCICLE 1A
6
ADDISON{WESLEY 77
Internet
Stanford GraphBase
MMIX
PREFACE
These unforeseen stoppages,
whi
h I own I had no
on
eption of when I rst set out;
 but whi
h, I am
onvin
ed now, will rather in
rease than diminish as I advan
e,
 have stru
k out a hint whi
h I am resolved to follow;
 and that is,  not to be in a hurry;
 but to go on leisurely, writing and publishing two volumes of my life every year;
 whi
h, if I am suered to go on quietly, and
an make a tolerable bargain
with my bookseller, I shall
ontinue to do as long as I live.
 LAURENCE STERNE, The Life and Opinions of
Tristram Shandy, Gentleman (1760)
This booklet
ontains draft material that I'm
ir
ulating to experts in the
eld, in hopes that they
an help remove its most egregious errors before too
many other people see it. I am also, however, posting it on the Internet for
ourageous and/or random readers who don't mind the risk of reading a few
pages that have not yet rea
hed a very mature state. Beware: This material has
not yet been proofread as thoroughly as the manus
ripts of Volumes 1, 2, and 3
were at the time of their rst printings. And those
arefully
he
ked volumes,
alas, were subsequently found to
ontain thousands of mistakes.
Given this
aveat, I hope that my errors this time will not be so numerous
and/or obtrusive that you will be dis
ouraged from reading the material
arefully.
I did try to make the text both interesting and authoritative, as far as it goes.
But the eld is vast; I
annot hope to have surrounded it enough to
orral it
ompletely. So I beg you to let me know about any de
ien
ies that you dis
over.
To put the material in
ontext, this prefas
i
le
ontains Se
tion 7.1.3 of a
long, long
hapter on
ombinatorial algorithms. Chapter 7 will eventually ll
at least three volumes (namely Volumes 4A, 4B, and 4C), assuming that I'm
able to remain healthy. It will begin with a short review of graph theory, with
emphasis on some highlights of signi
ant graphs in the Stanford GraphBase,
from whi
h I will be drawing many examples. Then
omes Se
tion 7.1: Zeros
and Ones, beginning with basi
material about Boolean operations in Se
tion
7.1.1 and Boolean evaluation in Se
tion 7.1.2. Se
tion 7.1.3, whi
h you're about
to read here, applies these ideas to make
omputer programs run fast. Se
tion
7.1.4 will then dis
uss the representation of Boolean fun
tions.
The next part, 7.2, is about generating all possibilities, and it begins with
Se
tion 7.2.1: Generating Basi
Combinatorial Patterns. Fas
i
les for this se
tion
have already appeared on the Web and/or in print. Se
tion 7.2.2 will deal with
ba
ktra
king in general. And so it will
ontinue, if all goes well; an outline of
iii
iv PREFACE
the entire Chapter 7 as
urrently envisaged appears on the tao
p webpage that MMIX
is
ited on page ii. ASCII
This part of The Art of Computer Programming has probably been more fun
to write than any other so far. Indeed, I've spent more than 30 years
olle
ting
material for Se
tion 7.1.3; nally I'm able to assemble these goodies together
and segue through them.
Most of Volume 4 will deal with abstra
t
on
epts, and there will be little
or no need to say mu
h about a
omputer's ma
hine language. Volumes 1{3
have already dealt with most of the important ideas about programming at that
level. But Se
tion 7.1.3 is a notable ex
eption: Here we often want to see the
very pulse of the ma
hine.
Therefore I strongly re
ommend that readers be
ome familiar with the ba
si
s of the MMIX
omputer, explained in Volume 1 Fas
i
le 1, in order to fully
appre
iate the bitwise tri
ks and te
hniques des
ribed here. Crossreferen
es
to Se
tions 1.3.1 and 1.3.2 in the present booklet refer to that fas
i
le. I've
reprinted the basi
MMIX op
odeandtiming
hart, Table 1.3.1{1, at the end of
this booklet for
onvenien
e, together with a list of ASCII
odes.
The topi
of Boolean fun
tions and bit manipulation
an of
ourse be inter
preted so broadly that it en
ompasses the entire subje
t of
omputer program
ming. The real goal of this fas
i
le is to fo
us on
on
epts that appear at the
lowest levels,
on
epts on whi
h we
an ere
t signi
ant superstru
tures. And
even these apparently lowly notions turn out to be surprisingly ri
h, with expli
it
ties to se
tions 1.2.4, 1.2.5, 1.2.8, 2.3.1, 2.3.3, 2.3.4.2, 2.3.5, 3.1, 3.2.2, 4.1, 4.4,
4.5.3, 4.5.4, 4.6.1, 4.6.2, 4.6.3, 4.6.4, 5, 5.2.2, 5.2.3, 5.2.5, and 5.3.4 of the rst
three volumes. I strongly believe in building up a rm foundation, so I have
dis
ussed Boolean topi
s mu
h more thoroughly than I will be able to do with
material that is newer or less basi
. Se
tion 7.1.3 presented me with an extreme
embarrassment of ri
hes: After typing the manus
ript I was astonished to dis
over that I had
ome up with 215 exer
ises, even though  believe it or not  I
had to eliminate quite a lot of the interesting material that appears in my les.
My notes on
ombinatorial algorithms have been a
umulating for more
than forty years, so I fear that in several respe
ts my knowledge is woefully
behind the times. Please look, for example, at the exer
ises that I've
lassed as
resear
h problems (rated with diÆ
ulty level 46 or higher), namely exer
ises 61,
76, 112, 117, 126, 128, 129, 130, and 174; I've also impli
itly mentioned or posed
additional unsolved questions in the answers to exer
ises 21, 140, 141, 156, and
165. Are those problems still open? Please inform me if you know of a solution
to any of these intriguing questions. And of
ourse if no solution is known today
but you do make progress on any of them in the future, I hope you'll let me know.
I urgently need your help also with respe
t to some exer
ises that I made up
as I was preparing this material. I
ertainly don't like to re
eive
redit for things
that have already been published by others, and most of these results are quite
natural \fruits" that were just waiting to be \plu
ked." Therefore please tell
me if you know who deserves to be
redited, with respe
t to the ideas found in
exer
ises 5, 6, 20, 26, 34, 39, 49, 50, 53, 57, 58(d,e), 59, 60, 72, 78, 80, 81, 82, 83,
PREFACE v
84, 86, 90, 95, 110, 115, 116, 120, 121, 127, 146, 154, 155, 159, 168, 184, 194, and Roki
ki
199, and/or the answers to exer
ises 17, 18, and 139. Furthermore I've
redited Gosper
Steele
exer
ises 45 and 54 to unpublished work of Tom Roki
ki and Bill Gosper. Have Warren
either of those results ever appeared in print, to your knowledge? Knuth
GUIBAS
Spe
ial thanks are due to Guy Steele and Hank Warren for their
omments STOLFI
GOSPER
on my early attempts at exposition, as well as to numerous other
orrespondents h
notation xyz i
who have
ontributed
ru
ial
orre
tions. median fun
tion
majority fun
tion
.
I shall happily pay a nder's fee of $2.56 for ea
h error in this draft when it is notation x y
rst reported to me, whether that error be typographi
al, te
hni
al, or histori
al. monus fun
tion
dotminus
The same reward holds for items that I forgot to put in the index. And valuable saturated subtra
tion
suggestions for improvements to the text are worth 32/
ea
h. (Furthermore, if Hexade
imal
onstants
Notation
you nd a better solution to an exer
ise, I'll a
tually reward you with immortal
glory instead of mere money, by publishing your name in the eventual book: )
Cross referen
es to yetunwritten material sometimes appear as `00'; this
impossible value is a pla
eholder for the a
tual numbers to be supplied later.
Happy reading!
Stanford, California D. E. K.
16 De
ember 2006
[These te
hniques℄ are instan
es of general mathemati
al prin
iples
waiting to be dis
overed, if an appropriate setting is
reated.
Su
h a setting would be a
al
ulus of bitmap operations, so one
an learn
to use these operations just as naturally as arithmeti
operations on numbers.
 L. J. GUIBAS and J. STOLFI, ACM Transa
tions on Graphi
s (1982)
A note on notation. Several formulas in Se
tion 7.1.3 use the notation hxyz i,
for the median fun
tion (aka majority fun
tion) that is dis
ussed extensively in
Se
tion 7.1.1. Other formulas use the notation x . y, for the monus fun
tion
(aka dotminus or saturated subtra
tion), whi
h was dened in Se
tion 1.3.1.
Hexade
imal
onstants are pre
eded by a sharp sign: # 123 means (123)16 . If you
run a
ross other notations that may be unfamiliar, please look at the Index to
Notations at the end of Volumes 1, 2, or 3, and/or the entries under \Notation"
in the index to the present booklet. Of
ourse Volume 4 will some day
ontain
its own Index to Notations.
7.1.3 BITWISE TRICKS AND TECHNIQUES 1
Braymore,Caroline
Ro
hdale,Simon
COLMAN
bitwise{
Now
omes the fun part: We get to use Boolean operations in our programs.
People are more familiar with arithmeti
operations like addition, subtra

tion, and multipli
ation than they are with bitwise operations su
h as \and,"
\ex
lusiveor," and so on, be
ause arithmeti
has a very long history. But we will
see that Boolean operations on binary numbers deserve to be mu
h better known.
Indeed, they're an important
omponent of every good programmer's toolkit.
Early ma
hine designers provided fullword bitwise operations in their
om
puters primarily be
ause su
h instru
tions
ould be in
luded in a ma
hine's
repertoire almost for free. Binary logi
seemed to be potentially useful, although
2 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
only a few appli
ations were originally foreseen. For example, the EDSAC
om EDSAC
omputer
puter,
ompleted in 1949, in
luded a \
ollate"
ommand that essentially per
ollation, see bitwise and
unpa
king
formed the operation z z + (x & y), where z was the a
umulator, x was Man
hester Mark I
omputer
the multiplier register, and y was a spe
ied word in memory; it was used for AND
OR
unpa
king data. The Man
hester Mark I
omputer, built at about the same XOR
time, in
luded not only bitwise AND, but also OR and XOR. When Alan Turing Turing
NOT
wrote the rst programming manual for the Mark I in 1950, he remarked that Brooker
bitwise NOT
an be obtained by using XOR (denoted `j ') in
ombination with a Mark II
omputer (Man
hester/Ferranti)
round o
row of 1s. R. A. Brooker, who extended Turing's manual in 1952 when the Mark Ferranti Mer
ury
II
omputer was being designed, remarked further that OR
ould be used \to sideways addition
most signi
ant 1
round o a number by for
ing 1 into its least signi
ant digit position." By this To
her
time the Mark II, whi
h was to be
ome the prototype of the Ferranti Mer
ury, tri
ks versus te
hniques
innitepre
ision numbers
had also a
quired new instru
tions for sideways addition and for the position of two's
omplement notation
the most signi
ant 1. 2adi
integers
nim
Keith To
her published an unusual appli
ation of AND and OR in 1954, nim sum
whi
h has subsequently been reinvented frequently (see exer
ise 85). And dur
ing the ensuing de
ades, programmers have gradually dis
overed that bitwise
operations
an be amazingly useful. Many of these tri
ks have remained part of
the folklore; the time is now ripe to take advantage of what has been learned.
A tri
k is a
lever idea that
an be used on
e, while a te
hnique is a tri
k
that
an be used at least twi
e. We will see in this se
tion that tri
ks tend to
evolve naturally into te
hniques.
Enri
hed arithmeti
. Let's begin by oÆ
ially dening bitwise operations on
integers so that, if x = ( : : : x2 x1 x0 )2 , y = ( : : : y2 y1 y0 )2 , and z = ( : : : z2 z1 z0 )2
in binary notation, we have
x & y = z () xk ^ yk = zk ; for all k 0; (1)
x j y = z () xk _ yk = zk ; for all k 0; (2)
x y = z () xk yk = zk ; for all k 0. (3)
(It would be tempting to write `x^y' instead of x&y, and `x_y' instead of x j y; but
when we study optimization problems we'll nd it better to reserve the notations
x ^ y and x _ y for min(x; y) and max(x; y), respe
tively.) Thus, for example,
5 & 11 = 1; 5 j 11 = 15; and 5 11 = 14;
sin
e 5 = (0101)2 , 11 = (1011)2 , 1 = (0001)2 , 15 = (1111)2 , and 14 = (1110)2 .
Negative integers are to be thought of in this
onne
tion as innitepre
ision
numbers in two's
omplement notation, having innitely many 1s at the left; for
example, 5 is ( : : : 1111011)2 . Su
h innitepre
ision numbers are a spe
ial
ase
of 2adi
integers, whi
h are dis
ussed in exer
ise 4.1{31, and in fa
t the operators
&, j , make perfe
t sense when they are applied to arbitrary 2adi
numbers.
Mathemati
ians have never paid mu
h attention to the properties of & and j
as operations on integers. But the third operation, , has a venerable history,
be
ause it des
ribes a winning strategy in the game of nim (see exer
ises 8{16).
For this reason x y has often been
alled the \nim sum" of the integers x and y.
7.1.3 BITWISE TRICKS AND TECHNIQUES 3
All three of the basi
bitwise operations turn out to have many useful
ommutative laws
properties. For example, every relation between ^, _, and that we studied in asso
iative laws
distributive laws
Se
tion 7.1.1 is automati
ally inherited by &, j , and on integers, sin
e the rela absorption laws
tion holds in every bit position. We might as well re
ap the main identities here:
omplementation
notation: x
x & y = y & x; x j y = y j x; x y = y x; (4) negation
subtra
tion
(x & y)& z = x &(y & z ); (x j y) j z = x j (y j z ); (x y) z = x (y z ); (5) addition
shift binary
(x j y) & z = (x & z ) j (y & z ); (x & y) j z = (x j z ) & (y j z ); (6)
(x y) & z = (x & z ) (y & z ); (7)
(x & y) j x = x; (x j y) & x = x; (8)
(x & y) (x j y) = x y; (9)
x & 0 = 0; x j 0 = x; x 0 = x; (10)
x & x = x; x j x = x; x x = 0; (11)
x & 1 = x; x j 1 = 1; x 1 = x; (12)
x & x = 0; x j x = 1; x x = 1; (13)
x & y = x j y; x j y = x & y; x y = x y = x y: (14)
The notation x in (12), (13), and (14) stands for bitwise
omplementation of x,
namely ( : : : x2 x1 x0 )2 , also written x. Noti
e that (12) and (13) aren't quite
the same as 7.1.1{(10) and 7.1.1{(18); we must now use 1 = ( : : : 1111)2 instead
of 1 = ( : : : 0001)2 in order to make the formulas bitwise
orre
t.
We say that x is
ontained in y, written x y or y x, if the individual
bits of x and y satisfy xk yk for all k 0. Thus
x y () x & y = x () x j y = y () x & y = 0: (15)
Of
ourse we needn't use bitwise operations only in
onne
tion with ea
h
other; we
an
ombine them with all the ordinary operations of arithmeti
. For
example, from the relation x + x = ( : : : 1111)2 = 1 we
an dedu
e the formula
x = x + 1; (16)
whi
h turns out to be extremely important. Repla
ing x by x 1 gives also
x = x 1; (17)
and in general we
an redu
e subtra
tion to
omplementation and addition:
x y = x + y: (18)
We often want to shift binary numbers to the left or right. These operations
are equivalent to multipli
ation and division by powers of 2, with appropriate
rounding, but it is
onvenient to have spe
ial notations for them:
x k = x shifted left k bits = b2k x
; (19)
x k = x shifted right k bits = b2 x
: k (20)
Here k
an be any integer, possibly negative. In parti
ular we have
x ( k) = x k and x ( k) = x k; (21)
4 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
for every innitepre
ision number x. Also (x & y) k = (x k) & (y k), et
. innitepre
ision
When bitwise operations are
ombined with addition, subtra
tion, multi Sleator
quilt
pli
ation, and/or shifting, extremely intri
ate results
an arise, even when the pixel pattern
formulas are quite short. A taste of the possibilities
an be seen, for example, bla
k
white
in Fig. 11. Furthermore, su
h formulas do not merely produ
e purposeless, Gosper's ha
k
haoti
behavior: A famous
hain of operations known as \Gosper's ha
k," rst pa
king++
unpa
king++
published in 1972, opened people's eyes to the fa
t that a large number of useful Lehmer
and nontrivial fun
tions
an be
omputed rapidly (see exer
ise 20). Our goal in fra
tional pre
ision
date
this se
tion is to explore how su
h eÆ
ient
onstru
tions might be dis
overed. mod
division
Pa
king and unpa
king. We studied algorithms for multiplepre
ision arith
meti
in Se
tion 4.3.1, dealing with situations where integers are too large to t in
a single word of memory or a single
omputer register. But the opposite situation,
when integers are signi
antly smaller than the
apa
ity of one
omputer word, is
a
tually mu
h more
ommon; D. H. Lehmer
alled this \fra
tional pre
ision." We
an often deal with several integers at on
e, by pa
king them into a single word.
For example, a date x that
onsists of a year number y, a month number m,
and a day number d,
an be represented by using 4 bits for m and 5 bits for d:
x = (((y 4) + m) 5) + d: (22)
We'll see below that many operations
an be performed dire
tly on dates in this
pa
ked form. For example, x < x0 when date x pre
edes date x0 . But if ne
essary
the individual
omponents (y; m; d)
an readily be unpa
ked when x is given:
d = x mod 32; m = (x 5) mod 16; y = x 9: (23)
And these \mod" operations do not require division, be
ause of the important
law
x mod 2n = x & (2n 1) (24)
for any integer n 0. We have, for instan
e, d = x & 31 in (22) and (23).
Su
h pa
king of data obviously saves spa
e in memory, and it also saves time:
We
an more qui
kly move or
opy items of data from one pla
e to another when
7.1.3 BITWISE TRICKS AND TECHNIQUES 5
they've been pa
ked together. Moreover,
omputers run
onsiderably faster when
a
he memory
they operate on numbers that t into a
a
he memory of limited size. prime numbers
table lookup by shifting
The ultimate pa
king density is a
hieved when we have 1bit items, be
ause sieve of Eratosthenes
we
an then
ram 64 of them into a single 64bit word. Suppose, for example,
that we want a table of all odd prime numbers less than 1024, so that we
an
easily de
ide the primality of a small integer. No problem; only eight 64bit
numbers are required:
P0 = 0111011011010011001011010010011001011001010010001011011010000001;
P1 = 0100110000110010010100100110000110110000010000010110100110000100;
P2 = 1001001100101100001000000101101000000100100001101001000100100101;
P3 = 0010001010001000011000011001010010001011010000010001010001010010;
P4 = 0000110000000010010000100100110010000100100110010010110000010000;
P5 = 1101001001100000101001000100001000100001000100100101000100101000;
P6 = 1010000001000010000011000011011000010000001011010000001011010000;
P7 = 0000010100010000100010100100100000010100100100010010000010100110:
To test whether 2k + 1 is prime, for 0 k < 512, we simply
ompute
Pbk=64
(k & 63) (25)
in a 64bit register, and see if the leftmost bit is 1. For example, the following
MMIX instru
tions will do the job, if register pbase holds the address of P0 :
SRU $0,k,3 $0 bk=8
(i.e., k 3).
LDOU $1,pbase,$0 $1 Pb$0=8
(i.e., Pbk=64#
).
AND $0,k,#3f $0 k mod 64 (i.e., k & 3f ). (26)
SLU $1,$1,$0 $1 ($1 $0) mod 264 .
BN $1,PRIME Bran
h to PRIME if s($1) < 0.
Noti
e that the leftmost bit of a register is 1 if and only if the register
ontents
are negative.
We
ould equally well pa
k the bits from right to left in ea
h word:
Q0 = 1000000101101101000100101001101001100100101101001100101101101110;
Q1 = 0010000110010110100000100000110110000110010010100100110000110010;
Q2 = 1010010010001001011000010010000001011010000001000011010011001001;
Q3 = 0100101000101000100000101101000100101001100001100001000101000100;
Q4 = 0000100000110100100110010010000100110010010000100100000000110000;
Q5 = 0001010010001010010010001000010001000010001001010000011001001011;
Q6 = 0000101101000000101101000000100001101100001100000100001000000101;
Q7 = 0110010100000100100010010010100000010010010100010000100010100000;
here Qj = PjR . Instead of shifting left as in (25), we now shift right,
Qbk=64
(k & 63); (27)
and look at the rightmost bit of the result. The last two lines of (26) be
ome
SRU $1,$1,$0 $1 $1 $0. (28)
BOD $1,PRIME Bran
h to PRIME if $1 is odd.
(And of
ourse we use qbase instead of pbase.) Either way, the
lassi
sieve of
Eratosthenes will readily set up the basi
table entries Pj or Qj (see exer
ise 24).
6 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Table 1
bigendian++
THE BIGENDIAN VIEW OF A 32BYTE MEMORY littleendian++
multiplepre
ision
o
ta 0
z } {
tetra 0 tetra 4
z } {z } {
wyde 0 wyde 2 wyde 4 wyde 6
z } {z } {z } {z } {
byte 0 byte 1 byte 2 byte 3 byte 4 byte 5 byte 6 byte 7
z } {z } {z } {z } {z } {z } {z } {z } {
a0 : : : a7 a8 : : : a15 a16 : : : a23 a24 : : : a31 a32 : : : a39 a40 : : : a47 a48 : : : a55 a56 : : : a63
o
ta 8
z } {
tetra 8 tetra 12
z } {z } {
wyde 8 wyde 10 wyde 12 wyde 14
z } {z } {z } {z } {
byte 8 byte 9 byte 10 byte 11 byte 12 byte 13 byte 14 byte 15
z } {z } {z } {z } {z } {z } {z } {z } {
a64 : : : a71 a72 : : : a79 a80 : : : a87 a88 : : : a95 a96 : : : a103 a104 : : : a111 a112 : : : a119 a120 : : : a127
o
ta 16
z } {
tetra 16 tetra 20
z } {z } {
wyde 16 wyde 18 wyde 20 wyde 22
z } {z } {z } {z } {
byte 16 byte 17 byte 18 byte 19 byte 20 byte 21 byte 22 byte 23
z } {z } {z } {z } {z } {z } {z } {z } {
a128 : : : a135 a136 : : : a143 a144 : : : a151 a152 : : : a159 a160 : : : a167 a168 : : : a175 a176 : : : a183 a184 : : : a191
o
ta 24
z } {
tetra 24 tetra 28
z } {z } {
wyde 24 wyde 26 wyde 28 wyde 30
z } {z } {z } {z } {
byte 24 byte 25 byte 26 byte 27 byte 28 byte 29 byte 30 byte 31
z } {z } {z } {z } {z } {z } {z } {z } {
a192 : : : a199 a200 : : : a207 a208 : : : a215 a216 : : : a223 a224 : : : a231 a232 : : : a239 a240 : : : a247 a248 : : : a255
o
ta 16
z } {
tetra 20 tetra 16
z } {z } {
wyde 22 wyde 20 wyde 18 wyde 16
z } {z } {z } {z } {
byte 23 byte 22 byte 21 byte 20 byte 19 byte 18 byte 17 byte 16
z } {z } {z } {z } {z } {z } {z } {z } {
a191 : : : a184 a183 : : : a176 a175 : : : a168 a167 : : : a160 a159 : : : a152 a151 : : : a144 a143 : : : a136 a135 : : : a128
o
ta 8
z } {
tetra 12 tetra 8
z } {z } {
wyde 14 wyde 12 wyde 10 wyde 8
z } {z } {z } {z } {
byte 15 byte 14 byte 13 byte 12 byte 11 byte 10 byte 9 byte 8
z } {z } {z } {z } {z } {z } {z } {z } {
a127 : : : a120 a119 : : : a112 a111 : : : a104 a103 : : : a96 a95 : : : a88 a87 : : : a80 a79 : : : a72 a71 : : : a64
o
ta 0
z } {
tetra 4 tetra 0
z } {z } {
wyde 6 wyde 4 wyde 2 wyde 0
z } {z } {z } {z } {
byte 7 byte 6 byte 5 byte 4 byte 3 byte 2 byte 1 byte 0
z } {z } {z } {z } {z } {z } {z } {z } {
a63 : : : a56 a55 : : : a48 a47 : : : a40 a39 : : : a32 a31 : : : a24 a23 : : : a16 a15 : : : a8 a7 : : : a0
Noti
e, however, that we used (Q7 : : : Q1 Q0 )264 to get this simple result, not
(Q0 Q1 : : : Q7 )264 . The other number,
(Q0 Q1 : : : Q7 )264 = (a63 : : : a1 a0 a127 : : : a65 a64 a191 : : : a385 a384 a511 : : : a449 a448 )2
is in fa
t quite weird, and it has no really ni
e formula. (See exer
ise 25.)
Endianness has important
onsequen
es, be
ause most
omputers allow in
dividual bytes of the memory to be addressed as well as registersized units. MMIX
has a bigendian ar
hite
ture; therefore if register x
ontains the 64bit number
# 0123456789ab
def , and if we use the
ommands `STOU x,0; LDBU y,1' to
store x into o
tabyte lo
ation 0 and read ba
k the byte in lo
ation 1, the result
in register y will be # 23 . On ma
hines with a littleendian ar
hite
ture, the
analogous
ommands would set y #
d instead; # 23 would be byte 6.
Tables 1 and 2 illustrate the
ompeting \world views" of bigendian and
littleendian a
ionados. The bigendian approa
h is basi
ally topdown, with
bit 0 and byte 0 at the top left; the littleendian approa
h is basi
ally bottomup,
with bit 0 and byte 0 at the bottom right. Be
ause of this dieren
e, great
are
is ne
essary when transmitting data from one kind of
omputer to another, or
when writing programs that are supposed to give equivalent results in both
ases.
On the other hand, our example of the Q table for primes shows that we
an
perfe
tly well use a littleendian pa
king
onvention on a bigendian
omputer
8 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
like MMIX, or vi
e versa. The dieren
e is noti
eable only when data is loaded rightmost bits++
and stored in dierentsized
hunks, or passed between ma
hines. smearing bits
extra
ting bits
removing bits
Working with the rightmost bits. Bigendian and littleendian approa
hes runs of bits
aren't readily inter
hangeable in general, be
ause the laws of arithmeti
send Wegner
Gladwin
signals leftward from the bits that are \least signi
ant." Some of the most Warren
important bitwise manipulation te
hniques are based on this fa
t. trailing zeros
ruler fun
tion
If x is almost any nonzero 2adi
integer, we
an write its bits in the form x
binary valuation, see ruler fun
tion
x = ( 01a 10b )2 ; (32)
in other words, x
onsists of some arbitrary (but innite) binary string , followed
by a 0, whi
h is followed by a + 1 ones, and followed by b zeros, for some a 0
and b 0. (The ex
eptions o
ur when x = 2b ; then a = 1.) Consequently
x = ( 10a 01b )2 ; (33)
x 1 = ( 01 01 )2 ;
a b (34)
x = ( 10 10 )2 ;
a b (35)
and we see that x + 1 = x = x 1, in agreement with (16) and (17). With two
operations we
an therefore
ompute relatives of x in several useful ways:
x & (x 1) = ( 01a 00b )2 [remove the rightmost 1℄; (36)
x & x = (01 00a 10b )2 [extra
t the rightmost 1℄; (37)
1
x j x = (1 11 10 )2 [smear the rightmost 1 to the left℄;
a b (38)
1
x x = (1 11 00 )2 [remove and smear it to the left℄;
a b (39)
x j (x 1) = ( 01 11 )2 [smear the rightmost 1 to the right℄;
a b (40)
x (x 1) = (01 00a 11b )2 [extra
t and smear it to the right℄; (41)
x & (x 1) = (01 00a 01b )2 [extra
t, remove, and smear it to the right℄. (42)
And two further operations produ
e yet another variant:
((x j (x 1))+1) & x = ( 00a 00b )2 [remove the rightmost run of 1s℄. (43)
When x = 0, ve of these formulas produ
e 0, the other three give 1. [For
mula (36) is due to Peter Wegner, CACM 3 (1960), 322; and (43) is due to
H. Tim Gladwin, CACM 14 (1971), 407{408. See also Henry S. Warren, Jr.,
CACM 20 (1977), 439{441.℄
The quantity b in these formulas, whi
h spe
ies the number of trailing zeros
in x, is
alled the ruler fun
tion of x and written x, be
ause it is related to
the lengths of the ti
k marks that are often used to indi
ate fra
tions of an in
h:
` '. In general, x is the largest integer k su
h that 2k divides x,
when x 6= 0; and we dene 0 = 1. The re
urren
e relations
(2x + 1) = 0; (2x) = (x) + 1 (44)
also serve to dene x for nonzero x. Another handy relation is worthy of note,
(x y) = (x y): (45)
7.1.3 BITWISE TRICKS AND TECHNIQUES 9
The elegant formula x & x in (37) allows us to extra
t the rightmost 1 bit Dallos
very ni
ely, but we often want to identify exa
tly whi
h bit it is. The ruler SADD
magi
mask
fun
tion
an be
omputed in many ways, and the best method often depends mask: A bit pattern with 1s in key positions
heavily on the
omputer that is being used. For example, a twoinstru
tion 2adi
fra
tion
truth tables
sequen
e due to J. Dallos does the job qui
kly and easily on MMIX(see (42)): proje
tion fun
tions
MMIX
SUBU t,x,1; SADD rho,t,x. (46) CSZ
ZSZ
(See exer
ise 30 for the
ase x = 0.) We shall dis
uss here two approa
hes that
do not rely on exoti
ommands like SADD; and later, after learning a few more
te
hniques, we'll
onsider a third way.
The rst generalpurpose method makes use of \magi
mask"
onstants k
that prove to be useful in many other appli
ations, namely
0 = ( : : : 101010101010101010101010101010101)2 = 1=3;
1 = ( : : : 100110011001100110011001100110011)2 = 1=5; (47)
2 = ( : : : 100001111000011110000111100001111)2 = 1=17;
andk so on. In general k is the innite 2adi
fra
tion 1=(2 2k + 1), be
ause
(2 2 + 1) k = ( k 2k ) + k = ( : : : 11111)2 = 1. On a
omputer that has 2d 
bit registers we don't need innite pre
ision, of
ourse, so we use the trun
ated
onstants d k
d;k = (2 2 1)=(2 2 + 1) for 0 k < d. (48)
These
onstants are familiar from our study of Boolean evaluation, be
ause they
are the truth tables of the proje
tion fun
tions xn k (see, for example, 7.1.2{(7)).
When x is a power of 2, we
an use these masks to
ompute
x = [ x & 0 = 0℄ + 2[ x & 1 = 0℄ + 4[ x & 2 = 0℄ + 8[ x & 3 = 0℄ + ; (49)
be
ause [2j & k = 0℄ = jk when j = ( : : : j3 j2 j1 j0 )2 . Thus, on a 2d bit
omputer,
we
an start with 0 and y x & x; then set + 2k if y & d;k = 0, for
0 k < d. This pro
edure gives = x when x 6= 0. (It also gives 0 = 2d 1,
an anomalous value that may need to be
orre
ted; see exer
ise 30.)
For example, the
orresponding MMIX program might look like this:
m0 GREG #5555555555555555 ;m1 GREG #3333333333333333;
m2 GREG #0f0f0f0f0f0f0f0f ;m3 GREG #00ff00ff00ff00ff;
m4 GREG #0000ffff0000ffff ;m5 GREG #00000000ffffffff;
NEGU y,x; AND y,x,y; AND q,y,m5; ZSZ rho,q,32;
AND q,y,m4; ADD t,rho,16; CSZ rho,q,t; (50)
AND q,y,m3; ADD t,rho,8; CSZ rho,q,t;
AND q,y,m2; ADD t,rho,4; CSZ rho,q,t;
AND q,y,m1; ADD t,rho,2; CSZ rho,q,t;
AND q,y,m0; ADD t,rho,1; CSZ rho,q,t;
total time = 19. Or we
ould repla
e the last three lines by
SRU y,y,rho; LDB t,rhotab,y; ADD rho,rho,t (51)
where rhotab points to the beginning of an appropriate 129byte table (only
eight of whose entries are a
tually used). The total time would then be + 13.
10 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
The se
ond generalpurpose approa
h to the
omputation of x is quite de Bruijn
y
les
dierent. On a 64bit ma
hine it starts as before, with y x & x; but then it Martin
Lauter
simply sets Leiserson
de
ode ((a y ) mod 264 ) 58 ; (52) Prokop
Randall
where a is a suitable multiplier and de
ode is a suitable 64byte table. The Seal
leftmost bits+
onstant a = (a63 : : : a1 a0 )2 must have the property that its 64 substrings x+
[lg x℄+
a63 a62 : : : a58 ; a62 a61 : : : a57 ; : : : ; a5 a4 : : : a0 ; a4 a3 a2 a1 a0 0; : : : ; a0 00000 binary logarithm+
leftmost
are distin
t. Exer
ise 2.3.4.2{23 shows that many su
h \de Bruijn
y
les" exist;
oatingpoint
CSNZ
for example, we
an use M. H. Martin's
onstant # 03f79d71b4
a8b09 , whi
h ZSNZ
is dis
ussed in exer
ise 3.2.2{17. The de
oding table de
ode [0℄; : : : ; de
ode [63℄ is MMIX
onditional set
then 00; 01; 56; 02; 57; 49; 28; 03; 61; 58; 42; 50; 38; 29; 17; 04; zero or set
bran
h instru
tions
62; 47; 59; 36; 45; 43; 51; 22; 53; 39; 33; 30; 24; 18; 12; 05; (53)
63; 55; 48; 27; 60; 41; 37; 16; 46; 35; 44; 21; 52; 32; 23; 11;
54; 26; 40; 15; 34; 20; 31; 10; 25; 14; 19; 09; 13; 08; 07; 06:
[This te
hnique was devised in 1997 by M. Lauter, and independently by C. E.
Leiserson, H. Prokop, and K. H. Randall a few months later (unpublished).
David Seal had used a similar method in 1994, with a larger de
oding table.℄
Working with the leftmost bits. The fun
tion x = blg x
, whi
h is dual to
x be
ause it lo
ates the leftmost 1 when x > 0, was introdu
ed in Eq. 4.6.3{(6).
It satises the re
urren
e
1 = 0; (2x) = (2x + 1) = (x) + 1 for x > 0; (54)
and it is undened when x 0. What is a good way to
ompute it? On
e again
MMIX provides a qui
kbuttri
ky solution:
FLOTU y,ROUND_DOWN,x; SUB y,y,fone; SR lam,y,52 (55)
where fone = # 3ff0000000000000 is the
oatingpoint representation of 1.0.
(Total time 6.) This
ode
oats x, then extra
ts the exponent.
But if
oatingpoint
onversion is not readily available, a binary redu
tion
strategy works fairly well on a 2d bit ma
hine. We
an start with 0 and
y x; then we set + 2k and y y 2k if y & k 6= 0, for k = d 1,
: : : , 1, 0 (or until k is redu
ed to the point where a short table
an be used to
nish up). The MMIX
ode analogous to (50) and (51) is now
ANDN q,x,m5; SRU z,x,32; SET y,x; CSNZ y,q,z; ZSNZ lam,q,32;
ANDN q,y,m4; SRU z,y,16; ADD t,lam,16; CSNZ y,q,z; CSNZ lam,q,t;
ANDN q,y,m3; SRU z,y,8; ADD t,lam,8; CSNZ y,q,z; CSNZ lam,q,t;
LDB t,lamtab,y; ADD lam,lam,t; (56)
and the total time is + 17. In this
ase table lamtab has 256 entries, namely
x for 0 x < 256. Noti
e that the \
onditional set" (CS) and \zero or set"
(ZS) instru
tions have been used here instead of bran
h instru
tions. They tend
to save time, even though they've made the program slightly longer.
7.1.3 BITWISE TRICKS AND TECHNIQUES 11
There appears to be no simple way to extra
t the leftmost 1 bit that appears smearing right
in a register, analogous to the tri
k by whi
h we extra
ted the rightmost 1 in (37). Warren
run of 1s
For this purpose we
ould
ompute y x and then 1 y, if x 6= 0; but a binary Lyn
h
\smearing right" method is somewhat shorter and faster: sum of bits, see sideways sum
ones
ounting, see sideways
Set y x, then y y j (y 2k ) for 0 k < d. (57)
sideways addition+
subsets
The leftmost 1 bit of x is then y (y 1). largest
smallest
[These non
oatingpoint methods have been suggested by H. S. Warren, Jr.℄ population
ount
Other operations at the left of a register, like removing the leftmost run of
ardinality
Wilkes
1s, are harder yet; see exer
ise 39. But there is a remarkably simple, ma
hine Wheeler
independent way to determine whether or not x = y, given unsigned integers Gill
Gillies
x and y, in spite of the fa
t that we
an't
ompute x or y qui
kly: Miller
EDSAC
x = y if and only if x y x & y: (58) remainder mod 2n 1
Muller
[See exer
ise 40. This elegant relation was dis
overed by W. C. Lyn
h in 2006.℄ ILLIAC I
We will use (58) below, to devise another way to
ompute x.
Sideways addition. Binary nbit numbers x = (xn 1 : : : x1 x0 )2 are often used
to represent subsets X of the nelement universe f0; 1; : : : ; n 1g, with k 2 X
if and only if 2k x. The fun
tions x and x then represent the largest and
smallest elements of X . The fun
tion
x = xn 1 + + x1 + x0 ; (59)
whi
h is
alled the \sideways sum" or \population
ount" of x, also has obvious
importan
e in this
onne
tion, be
ause it represents the
ardinality jX j, namely
the number of elements in X . This fun
tion, whi
h we
onsidered in 4.6.3{(7),
satises the re
urren
e
0 = 0; (2x) = (x) and (2x +1) = (x) + 1; for x 0. (60)
It also has an interesting
onne
tion with the ruler fun
tion (exer
ise 1.2.5{11),
X n
x = 1 + (x 1) x ; equivalently, k = n n: (61)
k=1
The rst textbook on programming, The Preparation of Programs for an
Ele
troni
Digital Computer by Wilkes, Wheeler, and Gill, se
ond edition (Read
ing, Mass.: Addison{Wesley, 1957), 155, 191{193, presented an interesting sub
routine for sideways addition due to D. B. Gillies and J. C. P. Miller. Their
method was devised for the 35bit numbers of the EDSAC, but it is readily
onverted to the following 64bit pro
edure for x when x = (x63 : : : x1 x0 )2 :
Set y x ((x 1) & 0 ). (Now y = (u31 : : : u1 u0 )4 , where uj = xj +1 + xj .)
Set y (y & 1 ) + ((y 2) & 1 ). (Now y = (v15 : : : v1 v0 )16 , vj = uj +1 + uj .)
Set y (y + (y 4)) & 2 . (Now y = (w7 : : : w1 w0 )256 , wj = vj +1 + vj .)
Finally ((a y) mod 264 ) 56, where a = (11111111)256 . (62)
The last step
leverly
omputes y mod 255 = w7 + +w1 + w0 via multipli
ation,
using the fa
t that the sum ts
omfortably in eight bits. [David Muller had
programmed a similar method for the ILLIAC I ma
hine in 1954.℄
12 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
If x is expe
ted to be \sparse," having at most a few 1 bits, we
an use a Wegner
faster method [P. Wegner, CACM 3 (1960), 322℄: reversal of bits+
divide by 2 and
onquer
magi
masks
Set 0, y x. Then while y 6= 0, set + 1, y y & (y 1). (63) Stra
hey
Baumgart
A similar approa
h, using y y j (y +1), works when x is expe
ted to be \dense." Warren
bigendian
Bit reversal. For our next tri
k, let's
hange x = (x63 : : : x1 x0 )2 to its left littleendian
MOR
right mirror image, xR = (x0 x1 : : : x63 )2 . Anybody who has been following the swapping bits+++
developments so far, seeing methods like (50), (56), (57), and (62), will probably
think, \Aha  on
e again we
an divide by 2 and
onquer! If we've already
dis
overed how to reverse 32bit numbers, we
an reverse 64bit numbers almost
as fast, be
ause (xy)R = yR xR . All we have to do is apply the 32bit method in
parallel to both halves of the register, then swap the left half with the right half."
Right. For example, we
an reverse an 8bit string in three easy steps:
Given x7 x6 x5 x4 x3 x2 x1 x0
Swap bits x6 x7 x4 x5 x2 x3 x0 x1 (64)
Swap nyps x4 x5 x6 x7 x0 x1 x2 x3
Swap nybbles x0 x1 x2 x3 x4 x5 x6 x7
And six su
h easy steps will reverse 64 bits. Fortunately, ea
h of the swapping
operations turns out to be quite simple with the help of the magi
masks k :
y (x 1) & 0 ; z (x & 0 ) 1; x y j z ;
y (x 2) & 1 ; z (x & 1 ) 2; x y j z ;
y (x 4) & 2 ; z (x & 2 ) 4; x y j z ;
y (x 8) & 3 ; z (x & 3 ) 8; x y j z ; (65)
y (x 16) & 4 ; z (x & 4 ) 16; x y j z ;
x (x 32) j ((x 32) mod 264 ):
[Christopher Stra
hey foresaw some aspe
ts of this
onstru
tion in CACM 4
(1961), 146, and a similar ternary method was devised in 1973 by Bru
e Baum
gart (see exer
ise 49). The mature algorithm (65) was presented by Henry S.
Warren, Jr., in Ha
ker's Delight (Addison{Wesley, 2002), 102.℄
But MMIX is on
e again able to trump this generalpurpose te
hnique with
less traditional
ommands that do the job mu
h faster. Consider
rev GREG #0102040810204080; MOR x,x,rev; MOR x,rev,x; (66)
the rst MOR instru
tion reverses the bytes of x from bigendian to littleendian
or vi
e versa, while the se
ond reverses the bits within ea
h byte.
Bit swapping. Suppose we only want to inter
hange two bits within a register,
xi $ xj , where i > j . What would be a good way to pro
eed? (Dear reader,
please pause for a moment and solve this problem in your head, or with pen
il
and paper  without looking at the answer below.)
Let Æ = i j . Here is one solution (but don't peek until you're ready):
y (x Æ ) & 2 j ; z (x & 2 j ) Æ; x (x & m) j y j z; where m = 2 i j 2 j . (67)
7.1.3 BITWISE TRICKS AND TECHNIQUES 13
It uses two shifts and ve bitwise Boolean operations, assuming that i and j depth
are given
onstants. It is like ea
h of the rst lines of (65), ex
ept that a new Æswap
bit permutation++++
mask m is needed be
ause y and z don't a
ount for all of the bits of x. permutation networks
We
an, however, do better, saving one operation and one
onstant: Duguid
Le Corre
y (x (x Æ)) & 2 j ; x x y (y Æ): (68) Slepian
Benes
The rst assignment now puts xi xj into position j ; the se
ond
hanges xi to
xi (xi xj ) and xj to xj (xi xj ), as desired. In general it's often wise to
onvert a problem of the form \
hange x to f (x)" into a problem of the form
\
hange x to x g(x)," sin
e the bitdieren
e g(x) might be easy to
al
ulate.
On the other hand, there's a sense in whi
h (67) might be preferable to (68),
be
ause the assignments to y and z in (67)
an sometimes be performed simulta
neously. When expressed as a
ir
uit, (67) has a depth of 4 while (68) has depth 5.
Operation (68)
an of
ourse be used to swap several pairs of bits simulta
neously, when we use a mask that's more general than 2 j :
y (x (x Æ)) & ; x x y (y Æ): (69)
Let us
all this operation a \Æswap," be
ause it allows us to swap any non
overlapping pairs of bits that are Æ pla
es apart. The mask has a 1 in the right
most position of ea
h pair that's supposed to be swapped. For example, (69) will
swap the leftmost 25 bits of a 64bit word with the rightmost 25 bits, while leav
ing the 14 middle bits untou
hed, if we let Æ = 39 and = 225 1 = # 1ffffff .
Indeed, there's an astonishing way to reverse 64 bits using Æswaps, namely
y (x 1) & 0 ; z (x & 0 ) 1; x y j z;
y (x (x 4)) & # 0300
0303030
303 ; x x y (y 4);
y (x (x 8)) & # 00
0300
03f0003f ; x x y (y 8); (70)
y (x (x 20)) & # 00000ff
00003fff ; x x y (y 20);
x (x 34) j ((x 30) mod 264 );
saving two of the bitwise operations in (65) even though (65) looks \optimum."
*Bit permutation in general. The methods we've just seen
an be extended to
obtain an arbitrary permutation of the bits in a register. In fa
t, there always ex
ist masks 0 , : : : , 5 , ^4 , : : : , ^0 su
h that the following operations transform x =
(x63 : : : x1 x0 )2 into any desired rearrangement x = (x63 : : : x1 x0 )2 of its bits:
x 2 k swap of x with mask k , for k = 0, 1, 2, 3, 4, 5;
(71)
x 2 k swap of x with mask ^k , for k = 4, 3, 2, 1, 0.
In general, a permutation of 2d bits
an be a
hieved with 2d 1 su
h steps,
using appropriate masks k , ^k , where the swap distan
es are respe
tively 2 0 ,
21 , : : : , 2d 1 , : : : , 2 1 , 2 0 .
To prove this fa
t, we
an use a spe
ial
ase of the permutation networks
dis
overed independently by A. M. Duguid and J. Le Corre in 1959, based on
earlier work of D. Slepian [see V. E. Benes, Mathemati
al Theory of Conne
ting
Networks and Telephone TraÆ
(New York: A
ademi
Press, 1965), Se
tion 3.3℄.
14 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Figure 12 shows a permutation network P (2n) for 2n elements
onstru
ted from rearrangeable networks, see perm networks
rossbar module
two permutation networks for n elements, when n = 4. Ea
h ` '
onne
tion graph
between two lines represents a
rossbar module that either leaves the line
ontents bipartite graph
unaltered or inter
hanges them, as the data
ows from left to right. Every setting
of the individual
rossbars therefore
auses P (2n) to produ
e a permutation of
its inputs;
onversely, we wish to show that any permutation of the 2n inputs
an be a
hieved by some setting of the
rossbars.
The
onstru
tion of Fig. 12 is best understood by
onsidering an example.
Suppose we want to route the inputs (0; 1; 2; 3; 4; 5; 6; 7) to (3; 2; 4; 1; 6; 0; 5; 7),
respe
tively. The rst job is to determine the
ontents of the lines just after the
rst
olumn of
rossbars and just before the last
olumn, sin
e we
an then use
a similar method to set the
rossbars in the inner P (4)'s. Thus, in the network
0 a A 3
1 b B 2
2
C 4
3 d D 1 (72)
4 e E 6
5 f F 0
6 g G 5
7 h H 7
we want to nd permutations ab
defgh and ABCDEFGH su
h that fa; bg = f0; 1g,
f
; dg = f2; 3g, : : : , fg; hg = f6; 7g, fa;
; e; gg = fA; C; E; Gg, fb; d; f; hg =
fB; D; F; Hg, fA; Bg = f3; 2g, fC; Dg = f4; 1g, : : : , fG; Hg = f5; 7g. Starting at
the bottom, let us
hoose h = 7, be
ause we don't wish to disturb the
ontents
of that line unless ne
essary. Then the following
hoi
es are for
ed :
H = 7; G = 5; e = 5; f = 4; D = 4; C = 1; a = 1; b = 0; F = 0; E = 6; g = 6: (73)
If we had
hosen h = 6, the for
ing pattern would have been similar but reversed,
F = 6; E = 0; a = 0; b = 1; D = 1; C = 4; e = 4; f = 5; H = 5; G = 7; g = 7: (74)
Options (73) and (74)
an both be
ompleted by
hoosing either d = 3 (hen
e
B = 3, A = 2,
= 2) or d = 2 (hen
e B = 2, A = 3,
= 3).
In general the for
ing pattern will go in
y
les, no matter what permutation
we begin with. To see this,
onsider the graph on eight verti
es fab,
d, ef, gh,
AB, CD, EF, GHg that has an edge from uv to UV whenever the pair of inputs
onne
ted to uv has an element in
ommon with the pair of outputs
onne
ted
to UV. Thus, in our example the edges are ab EF, ab CD,
d AB,
d AB, ef CD, ef GH, gh EF, gh GH. We have a \double bond"
between
d and AB, sin
e the inputs
onne
ted to
and d are exa
tly the outputs
onne
ted to A and B; subje
t to this slight bending of the stri
t denition of
a graph, we see that ea
h vertex is adja
ent to exa
tly two other verti
es, and
lower
ase verti
es are always adja
ent to upper
ase ones. Therefore the graph
7.1.3 BITWISE TRICKS AND TECHNIQUES 15
rossbar modules
y
les in a graph
transpose
matrix transposition
z
{
2n outputs
2n inputs
}
}
P (n)
{
z
P (n)
P (2n)
Fig. 12. The inside of a bla
k box P (2n) that permutes 2n elements
in all possible ways, when n > 1. (Illustrated for n = 4.)
always
onsists of disjoint
y
les of even length. In our example, the
y
les are
EF gh
ab GH
d AB ; (75)
CD ef
where the longer
y
le
orresponds to (73) and (74). If there are k dierent
y
les, there will be 2k dierent ways to spe
ify the behavior of the rst and last
olumns of
rossbars.
To
omplete the network, we
an pro
ess the inner 4element permutations
in the same way; and any 2d element permutation is a
hievable in this same
re
ursive fashion. The resulting
rossbar settings determine the masks j and ^j
of (71). Some
hoi
es of
rossbars may lead to a mask that is entirely zero; then
we
an eliminate the
orresponding stage of the
omputation.
If the input and output are identi
al on the bottom lines of the network, our
onstru
tion shows how to ensure that none of the
rossbars tou
hing those lines
are a
tive. For example, the 64bit algorithm in (71)
ould be used also with a
60bit register, without needing the four extra bits for any intermediate results.
Of
ourse we
an often beat the general pro
edure of (71) in spe
ial
ases.
For example, exer
ise 52 shows that method (71) needs nine swapping steps to
transpose an 8 8 matrix, but in fa
t three swaps suÆ
e:
Given 7swap 14swap 28swap
00 01 02 03 04 05 06 07 00 10 02 12 04 14 06 16 00 10 20 30 04 14 24 34 00 10 20 30 40 50 60 70
10 11 12 13 14 15 16 17 01 11 03 13 05 15 07 17 01 11 21 31 05 15 25 35 01 11 21 31 41 51 61 71
20 21 22 23 24 25 26 27 20 30 22 32 24 34 26 36 02 12 22 32 06 16 26 36 02 12 22 32 42 52 62 72
30 31 32 33 34 35 36 37 21 31 23 33 25 35 27 37 03 13 23 33 07 17 27 37 03 13 23 33 43 53 63 73
40 41 42 43 44 45 46 47 40 50 42 52 44 54 46 56 40 50 60 70 44 54 64 74 04 14 24 34 44 54 64 74
50 51 52 53 54 55 56 57 41 51 43 53 45 55 47 57 41 51 61 71 45 55 65 75 05 15 25 35 45 55 65 75
60 61 62 63 64 65 66 67 60 70 62 72 64 74 66 76 42 52 62 72 46 56 66 76 06 16 26 36 46 56 66 76
70 71 72 73 74 75 76 77 61 71 63 73 65 75 67 77 43 53 63 73 47 57 67 77 07 17 27 37 47 57 67 77
16 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
The \perfe
t shue" is another bit permutation that arises frequently in perfe
t shue
pra
ti
e. If x = ( : : : x2 x1 x0 )2 and y = ( : : : y2 y1 y0 )2 are any 2adi
integers, we 2adi
integers
interleaving, see zipper fun
tion, perf shue
dene x z y (\x zip y," the zipper fun
tion of x and y) by interleaving their bits: 2dimensional data
magi
mask
x z y = ( : : : x2 y2 x1 y1 x0 y0 )2 : (76) Divide and
onquer
extra
t and
ompress
This operation has important appli
ations to the representation of 2dimensional mask
pa
king
data, be
ause a small
hange in either x or y usually
auses only a small
hange Æshift
in x z y (see exer
ise 86). Noti
e also that the magi
mask
onstants (47) satisfy Steele
k z k = k+1 : (77)
If x appears in the left half of a register and y appears in the right half, a perfe
t
shue is the permutation that
hanges the register
ontents to x z y.
A sequen
e of d 1 swapping steps will perfe
tly shue a 2d bit register; in
fa
t, exer
ise 53 shows that there are several ways to a
hieve this. On
e again,
therefore, we are able to improve on the (2d 1)step method of (71) and Fig. 12.
Conversely, suppose we're given the shued value z = x z y in a 2d bit
register; is there an eÆ
ient way to extra
t the original value of y? Sure: If the
d 1 swaps that do a perfe
t shue are performed in reverse order, they'll undo
the shue and re
over both x and y. But if only y is wanted, we
an save half of
the work: Start with y z & 0 ; then set y (y + (y 2k 1 )) & k for k = 1,
: : : , d 1. For example, when d = 3 this pro
edure goes (0 y3 0 y2 0 y1 0 y0 )2 7!
(00 y3 y2 00 y1 y0 )2 7! (0000 y3 y2 y1 y0 )2 . \Divide and
onquer"
onquers again.
Consider now a more general problem, where we want to extra
t and
om
press an arbitrary subset of a register's bits. Suppose we're given a 2d bit word
z = (z2d 1 : : : z1 z0 )2 and a mask = (2d 1 : : : 1 0 )2 that has s 1bits; thus
= s. The problem is to assemble the
ompa
t subword
y = (ys 1 : : : y1 y0 )2 = (zjs 1 : : : zj1 zj0 )2 ; (78)
where js 1 > > j1 > j0 are the indi
es where j = 1. For example, if
d = 3 and = (10110010)2 , we want to transform z = (y3 x3 y2 y1 x2 x1 y0 x0 )2 into
y = (y3 y2 y1 y0 )2 . (The problem of going from x z y to y,
onsidered above, is the
spe
ial
ase = 0 .) We know from (71) that y
an be found by Æswapping,
at most 2d 1 times; but in this problem the relevant data always moves to the
right, so we
an speed things up by doing shifts instead of swaps.
Let's say that a Æshift of x with mask is the operation
x x (x (x Æ)) & ; (79)
whi
h
hanges bit xj to xj +Æ if has 1 in position j , otherwise it leaves xj
un
hanged. Guy Steele dis
overed that there always exist masks 0 , 1 , : : : d 1
so that the general extra
tion problem (78)
an be solved with a few Æshifts:
Start with x z ; then do a 2kshift of x with mask k , (80)
for k = 0, 1, : : : , d 1; nally set y x.
In fa
t, the idea for nding appropriate masks is surprisingly simple. Every bit
that wants to move a total of exa
tly l = (ld 1 : : : l1 l0 )2 pla
es to the right should
be transported in the 2kshifts for whi
h lk = 1.
7.1.3 BITWISE TRICKS AND TECHNIQUES 17
For example, suppose d = 3 and = (10110010)2 . (We must assume that sheepandgoats
6= 0.) Remembering that some 0s need to be shifted in from the left, we
an notation z
mappings
set 0 = (00011001)2 , 1 = (00000110)2 , 2 = (11111000)2 ; then (80) maps Chung
Wong
(y3 x3 y2 y1 x2 x1 y0 x0 )2 7! (y3 x3 y2 y2 y1 x1 y0 y0 )2 7! (y3 x3 y2 y2 y1 y2 y1 y0 )2 7! (0000 y3 y2 y1 y0 )2 :
y
li
masks
Exer
ise 69 proves that the bits being extra
ted will never interfere with ea
h pi, as "random" example
re
ursively
other during their journey. Furthermore, there's a sli
k way to
ompute the
ne
essary masks k dynami
ally from , in O(d 2 ) steps (see exer
ise 70).
A \sheepandgoats" operation has been suggested for
omputer hardware,
extending (78) to produ
e the general unshued word
z = (xr 1 : : : x1 x0 ys 1 : : : y1 y0 )2 = (zir 1 : : : zi1 zi0 zjs 1 : : : zj1 zj0 )2 ; (81)
here ir 1 > > i1 > i0 are the indi
es where i = 0. Any permutation of 2d
bits is a
hievable via at most d sheepandgoats operations (see exer
ise 73).
Shifting also allows us to go beyond permutations, to arbitrary mappings of
bits within a register. Suppose we want to transform
x = (x2d 1 : : : x1 x0 )2 7! x' = (x(2d 1)' : : : x1' x0' )2 ; (82)
d
where ' is any of the (2 ) fun
tions from the set f0; 1; : : : ; 2 1g into itself.
d 2 d
K. M. Chung and C. K. Wong [IEEE Transa
tions C29 (1980), 1029{1032℄
dis
overed an attra
tive way to do this in O(d) steps by using
y
li
Æshifts,
whi
h are like (79) ex
ept that we set
x x (x (x Æ) (x (2d Æ))) & : (83)
Their idea is to let
l be the number of indi
es j su
h that j' = l, for 0 l < 2d .
Then they nd masks 0 , 1 , : : : , d 1 with the property that a
y
li
2k shift
of x with mask k , done su
essively for 0 k < d, will transform x into a
number x0 that
ontains exa
tly
l
opies of bit x l for ea
h l. Finally the general
permutation pro
edure (71)
an be used to
hange x0 7! x' .
For example, suppose d = 3 and x' = (x3 x1 x1 x0 x3 x7 x5 x5 )2 . Then we have
(
0 ;
1 ;
2 ;
3 ;
4 ;
5 ;
6 ;
7 ) = (1; 2; 0; 2; 0; 2; 0; 1). Using masks 0 = (00011100)2 ,
1 = (01001001)2 , and 2 = (00100000)2 , three
y
li
2k shifts now take x =
(x7 x6 x5 x4 x3 x2 x1 x0 )2 7! (x7 x6 x5 x5 x4 x3 x1 x0 )2 7! (x7 x0 x5 x5 x5 x3 x1 x3 )2 7!
(x7 x0 x1 x5 x5 x3 x1 x3 )2 = x0 . Then, ve Æswaps: x0 7! (x0 x7 x5 x1 x3 x5 x3 x1 )2 7!
(x0 x7 x5 x1 x3 x1 x3 x5 )2 7! (x0 x1 x3 x1 x3 x7 x5 x5 )2 7! (x3 x1 x0 x1 x3 x7 x5 x5 )2 7!
(x3 x1 x1 x0 x3 x7 x5 x5 )2 = x' ; we're done! Of
ourse any 8bit mapping
an be
a
hieved more qui
kly by brute for
e, one bit at a time; the method of Chung
and Wong be
omes mu
h more impressive in a 256bit register. Even with MMIX's
64bit registers it's pretty good, needing P at most 96
y
les in the worst
ase. P
To nd P0 , we use the fa
t that
l = 2d , and we look at even =
2l
and odd =
2l+1 . If even = odd = 2d 1 , we
an set 0 = 0 and omit the
y
li
1shift. But if, say, even < odd , we nd an even l with
l = 0. Cy
li
ally
shifting the bits l, l +1, : : : , l + t (modulo 2d ) for some t will produ
e new
ounts
(
00 ; : : : ;
02d 1 ) for whi
h 0even = 0odd = 2d 1 ; so 0 = 2 l + + 2(l+t) mod 2d .
Then we
an deal with the bits in even and odd positions separately, using the
same method, until getting down to 1bit subwords. Exer
ise 74 has the details.
18 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Working with fragmented elds. Instead of extra
ting bits from various fragmented elds
parts of a word and gathering them together, we
an often manipulate those bits subsets
mask
dire
tly in their original positions. lexi
ographi
order
For example, suppose we want to run through all subsets of a given set U , sub
ube
don't
ares
where (as usual) the set is spe
ied by a mask su
h that [ k 2 U ℄ = ( k)&1. asterisk
odes
If x and x 6= , there's an easy way to
al
ulate the next largest subset of U bit
odes
s
attered a
umulator
in lexi
ographi
order, namely the smallest integer x0 > x su
h that x0 : sheepandgoats
arries
x0 = (x ) & : (84) s
attered sum
In the spe
ial
ase when x = 0 and 6= 0, we've already seen in (37) that this for
mula produ
es the rightmost bit of , whi
h
orresponds to the lexi
ographi
ally
smallest nonempty subset of U .
Why does formula (84) work? Imagine adding 1 to the number x j , whi
h
has 1s wherever is 0. A
arry will propagate through those 1s until it rea
hes
the rightmost bit position where x has a 0 and has a 1; furthermore all bits
to the right of that position will be
ome zero. Therefore x0 = ((x j ) + 1) & .
But we have (x j ) + 1 = (x + ) + 1 = x + ( + 1) = x when x . QED.
Noti
e further that x0 = 0 if and only if x = . So we'll know when we've
found the largest subset. Exer
ise 79 shows how to go ba
k to x, given x0 .
We might also want to run through all elements of a sub
ube  for example,
to nd all bit patterns that mat
h a spe
i
ation like 10101,
onsisting of
0s, 1s, and s (don't
ares). Su
h a spe
i
ation
an be represented by asterisk
odes a = (an 1 : : : a0 )2 and bit
odes b = (bn 1 : : : b0 )2 , as in exer
ise 7.1.1{30;
our example
orresponds to a = (10010100)2 , b = (01001001)2 . The problem of
enumerating all subsets of a set is the spe
ial
ase where a = and b = 0. In
the more general sub
ube problem, the su
essor of a given bit pattern x is
x0 = ((x (a + b)) & a) + b: (85)
Suppose the bits of z = (zn 1 : : : z0 )2 have been stit
hed together from two
subwords x = (xr 1 : : : x0 )2 and y = (ys 1 : : : y0 )2 , where r + s = n, using
an arbitrary mask for whi
h = s to govern the stit
hing. For example,
z = (y2 x4 x3 y1 x2 y0 x1 x0 )2 when n = 8 and = (10010100)2 . We
an think of z
as a \s
attered a
umulator," in whi
h alien bits xi lurk among friendly bits yj .
From this viewpoint the problem of nding su
essive elements of a sub
ube is
essentially the problem of
omputing y + 1 inside a s
attered a
umulator z ,
without
hanging the value of x. The sheepandgoats operation (81) would
untangle x and y; but it's expensive, and (85) shows that we
an solve the
problem without it. We
an, in fa
t,
ompute y + y0 when y0 = (ys0 1 : : : y00 )2
is any value inside a s
attered a
umulator z 0 , if y and y0 both appear in the
positions spe
ied by : Consider t = z & and t0 = z 0 & . If we form the
sum (t j ) + t0 , all
arries that o
ur in a normal addition y + y0 will propagate
through the blo
ks of 1s in , just as if the s
attered bits were adja
ent. Thus
((z & ) + (z 0 j )) & (86)
is the sum of y and y0 , modulo 2s , s
attered a
ording to the mask .
7.1.3 BITWISE TRICKS AND TECHNIQUES 19
Tweaking several bytes at on
e. Instead of
on
entrating on the data in one pa
ked data, operating on++
eld within a word, we often want to deal simultaneously with two or more sub Lamport
parallel pro
essing of subwords
words, performing
al
ulations on ea
h of them in parallel. For example, many multibyte pro
essing
appli
ations need to pro
ess long sequen
es of bytes, and we
an gain speed by multibyte addition
arries
a
ting on eight bytes at a time; we might as well use all 64 bits that our ma
hine averages
provides. General multibyte te
hniques were introdu
ed by Leslie Lamport in Dietz
radix2 addition
CACM 18 (1975), 471{475, and subsequently extended by many programmers. MMIX
MOR
Suppose rst that we simply wish to take two sequen
es of bytes and nd shift instru
tions
their sum, regarding them as
oordinates of ve
tors, doing arithmeti
mod
arry bits
ulo 256 in ea
h byte. Algebrai
ally speaking, we're given 8byte ve
tors x = parallel pro
essors
SIMD ar
hite
ture
(x7 : : : x1 x0 )256 and y = (y7 : : : y1 y0 )256 ; we want to
ompute z = (z7 : : : z1 z0 )256 , Unger
where zj = (xj + yj ) mod 256 for 0 j < 8. Ordinary addition of x to y doesn't SWAR
Fisher
quite work, be
ause we need to prevent
arries from propagating between bytes. Dietz
So we separate out the highorder bits and deal with them separately:
z (x y) & h; where h = # 8080808080808080 ;
z ((x & h ) + (y & h )) z: (87)
The total time for MMIX to do this is 6, plus 3 +3 if we also
ount the time to
load x, load y, and store z . By
ontrast, eight onebyte additions (LDBU, LDBU,
ADDU, and STBU, repeated eight times) would
ost 8 (3 + 4 ) = 24 + 32 .
Parallel subtra
tion of bytes is just as easy (see exer
ise 88).
We
an also
ompute bytewise averages, with zj = b(xj + yj )=2
for ea
h j :
z ((x y) & l) 1; where l = # 0101010101010101 ;
z (x & y) + z: (88)
This elegant tri
k, suggested by H. G. Dietz, is based on the wellknown formula
x + y = (x y) + ((x & y) 1) (89)
for radix2 addition. (We
an implement (88) with four MMIX instru
tions, not
ve, be
ause a single MOR operation will
hange x y to ((x y) & l) 1.)
Exer
ises 88{93 and 100{104 develop these ideas further, showing how to do
mixedradix arithmeti
, as well as su
h things as the addition and subtra
tion of
ve
tors whose
omponents are treated modulo m when m needn't be a power of 2.
In essen
e, we
an regard the bits, bytes, or other subelds of a register as if
they were elements of an array of independent mi
ropro
essors, a
ting indepen
dently on their own subproblems yet tightly syn
hronized, and
ommuni
ating
with ea
h other via shift instru
tions and
arry bits. Computer designers have
been interested for many years in the development of parallel pro
essors with a
so
alled SIMD ar
hite
ture, namely a \Single Instru
tion stream with Multiple
Data streams"; see, for example, S. H. Unger, Pro
. IRE 46 (1958), 1744{1750.
The in
reased availability of 64bit registers has meant that programmers of
ordinary sequential
omputers are now able to get a taste of SIMD pro
essing.
Indeed,
omputations su
h as (87), (88), and (89) are
alled SWAR methods 
\SIMD Within A Register," a name
oined by R. J. Fisher and H. G. Dietz [see
Le
ture Notes in Computer S
ien
e 1656 (1999), 290{305℄.
20 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Of
ourse bytes often
ontain alphabeti
data as well as numbers, and one alphabeti
data
of the most
ommon programming tasks is to sear
h through a long string of strings
Lamport
hara
ters in order to nd the rst appearan
e of some parti
ular byte value. For My
roft
example, strings are often represented as a sequen
e of nonzero bytes terminated ruler fun
tion
littleendian
onvention
by 0. In order to lo
ate the end of a string qui
kly, we need a fast way to bigendian
determine whether all eight bytes of a given word x are nonzero (be
ause they dotminus
equality of bytes
usually are). Several fairly good solutions to this problem were found by Lamport newline
and others; but Alan My
roft dis
overed in 1987 that three instru
tions a
tually
ag: A 1bit indi
ator
mask
suÆ
e:
t h & (x l) & x; (90)
where h and l appear in (87) and (88). If ea
h byte xj is nonzero, t will be zero;
for (xj 1)&xj will be 2 xj 1, whi
h is always less than # 80 = 27 . But if xj = 0,
while its right neighbors xj 1 , : : : , x0 (if any) are all nonzero, the subtra
tion
x l will produ
e # ff in byte j , and t will be nonzero. In fa
t, t will be 8j + 7.
Caution: Although the
omputation in (90) pinpoints the rightmost zero
byte of x, we
annot dedu
e the position of the leftmost zero byte from the value
of t alone. (See exer
ise 94.) In this respe
t the littleendian
onvention proves
to be preferable to the
orresponding bigendian behavior. An appli
ation that
needs to lo
ate the leftmost zero byte
an use (90) to skip qui
kly over nonzeros,
but then it must fall ba
k on a slower method when the sear
h has been narrowed
down to eight nalists. The following 4operation formula produ
es a
ompletely
pre
ise test value t = (t7 : : : t1 t0 )256 , in whi
h tj = 128[ xj = 0℄ for ea
h j :
t h & (x j ((x j h) l)): (91)
The leftmost zero byte of x is now xj , where t = 8j + 7.
In
identally, the single MMIX instru
tion `BDIF t,l,x' solves the zerobyte
problem immediately by setting ea
h byte tj of t to [ xj = 0℄, be
ause 1 . x =
[ x = 0℄. But we are primarily interested here in fairly universal te
hniques that
don't rely on exoti
hardware; MMIX's spe
ial features will be dis
ussed later.
Now that we know a fast way to nd the rst 0, we
an use the same ideas
to sear
h for any desired byte value. For example, to test if any byte of x is the
newline
hara
ter (# a ), we simply look for a zero byte in x# 0a0a0a0a0a0a0a0a .
And these te
hniques also open up many other doors. Suppose, for instan
e,
that we want to
ompute z = (z7 : : : z1 z0 )256 from x and y, where zj = xj
when xj = yj but zj = '*' when xj 6= yj . (Thus if x = "bea
hing" and
y = "bel
hing", we're supposed to set z "be*
hing".) It's easy:
t h & ((x y) j (((x y) j h) l));
m (t 1) (t 7); (92)
z x ((x "********") & m):
The rst step uses (91) to
ag the highorder bits in ea
h byte where xj 6= yj .
The next step
reates a mask that highlights those bytes; the mask is # 00 if
xj = yj and # ff otherwise. And the last step, whi
h
ould also be written z
(x & m) j ("********" & m), sets zj xj or zj '*', depending on the mask.
7.1.3 BITWISE TRICKS AND TECHNIQUES 21
Operations (90) and (91) were originally designed as tests for bytes that are
omparison of bytes
zero; but a
loser look reveals that we
an more wisely regard them as tests for bytes, testing relative order of
median operation
bytes that are less than 1. Indeed, if we repla
e l by
l = (
)256 in
either formula, where
is any positive
onstant 128, we
an use (90) or (91)
Berlekamp
to see if x
ontains any bytes that are less than
. Furthermore the
omparison Ramshaw
values
need not be the same in every byte position; and with a bit more work 2adi
we
an also do bytewise
omparison in the
ases where
> 128. Here's an 8step
formula that sets tj 128[ xj < yj ℄ for ea
h byte position j in the test word t :
t h & hx yz i; where z = (x j h) (y & h ). (93)
(See exer
ise 96.) The median operation in this general formula
an often be
simplied; for example, (93) redu
es to (91) when y = l, be
ause hx1z i = x j z .
On
e we've found a nonzero t in (90) or (91) or (93), we might want to
ompute t or t in order to dis
over the index j of the rightmost or leftmost
byte that has been
agged. The problem of
al
ulating or is now simpler
than before, sin
e t
an take on only 256 dierent values. Indeed, the operation
j table [((a t) mod 264 ) 56℄; where a = 7
256 1 , (94)
2 1
now suÆ
es to
ompute j , given an appropriate 256byte table. And the mul
tipli
ation here
an often be performed faster by doing three shiftandadd
operations, \t t + (t 7), t t + (t 14), t t + (t 28)," instead.
Broadword
omputing. We've now seen more than a dozen ways in whi
h
a
omputer's bitwise operations
an produ
e astonishing results at high speed,
and the exer
ises below
ontain many more su
h surprises.
Elwyn Berlekamp has remarked that
omputer
hips
ontaining N
ip
ops
ontinue to be built with ever larger values of N, yet in pra
ti
e only O(log N ) of
those
omponents are
ipping or
opping at any given moment. The surprising
ee
tiveness of bitwise operations suggests that
omputers of the future might
make use of this untapped potential by having enhan
ed memory units that are
able to do eÆ
ient nbit
omputations for fairly large values of n. To prepare for
that day, we ought to have a good name for the
on
ept of manipulating \wide
words." Lyle Ramshaw has suggested the pleasant term broadword, so that we
an speak of nbit quantities as broadwords of width n.
Many of the methods we've dis
ussed are 2adi
, in the sense that they work
orre
tly with binary numbers that have arbitrary (even innite) pre
ision. For
example, the operation x & x always extra
ts 2 x , the least signi
ant 1 bit of
any nonzero 2adi
integer x. But other methods have an inherently broadword
nature, su
h as the methods that use O(d) steps to perform sideways addition
or bit permutation of 2d bit words. Broadword
omputing is the art of dealing
with nbit words, when n is a parameter that is not extremely small.
Some broadword algorithms are of theoreti
al interest only, be
ause they are
eÆ
ient only in an asymptoti
sense when n ex
eeds the size of the universe. But
others are eminently pra
ti
al even when n = 64. And in general, a broadword
mindset often suggests good te
hniques.
22 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
One fas
inatingbutimpra
ti
al fa
t about broadword operations is the dis Fredman
overy by M. L. Fredman and D. E. Willard that O(1) broadword steps suÆ
e Willard
Brodal
to evaluate the fun
tion x = blg x
for any nonzero nbit number x, no matter Paterson
how big n is. Here is their remarkable s
heme, when n = g2 and g is a power of 2: Knuth, DE
pattern
storage allo
ation
t1 h & (x j ((x j h) l)); where h = 2g 1 l and l = (2n 1)=(2g 1);
y (((a t1 ) mod 2n ) (n g)) l; where a = (2n g 1)=(2g 1 1);
t2 h & (y j ((y j h) b)); where b = (2n+g 1)=(2g+1 1);
m (t2 1) (t2 (g 1)); m m (m g); (95)
z (((l (x & m)) mod 2n ) (n g)) l;
t3 h & (z j ((z j h) b));
((l ((t2 (2g lg g 1)) + (t3 (2g 1)))) mod 2n ) (n g):
(See exer
ise 106.) The method fails to be pra
ti
al be
ause ve of these 29 steps
are multipli
ations, so they aren't really \bitwise" operations. In fa
t, we'll prove
later that multipli
ation by a
onstant requires at least
(log n) bitwise steps.
A multipli
ationfree way to nd x, with only O(log log n) bitwise broad
word operations, was dis
overed in 1997 by Gerth Brodal, whose method is even
more remarkable than (95). It is based on a formula analogous to (49),
x = [ x = (x & 0 )℄ + 2[ x = (x & 1 )℄ + 4[ x = (x & 2 )℄ + ; (96)
and the fa
t that the relation x = y is easily tested (see (58)):
Algorithm B (Binary logarithm ). This algorithm uses nbit operations to
ompute x = blg x
, assuming that 0 < x < 2n and n = d 2d .
k
B1. [S
ale down.℄ Set 0. Then set + 2k and x x 2k if x 2 2 ,
for dlg ne > k d.
d
B2. [Repli
ate.℄ (At this point 0 < x < 2 2 ; the remaining task is to in
rease
by blg x
. We will repla
e x by d
opies of itself, in 2d bit elds.) Set
x x j (x 2d+k ) for 0 k < dlg de.
B3. [Change leading bits.℄ Set y x & (d;d 1 : : : d;1 d;0 )2 2d . (See (48).)
B4. [Compare all elds.℄ Set t h & (y j ((y j h) (x y))), where h =
d 1 d 1 2d 1
(2 2 :::2 2 2 )2 2d .
B5. [Compress bits.℄ Set t (t + (t (2d+k 2k ))) mod 2n for 0 k < dlg de.
B6. [Finish.℄ Finally, set + (t (n d)).
This algorithm is a
tually
ompetitive with (56) when n = 64 (see exer
ise 107).
Another surprisingly eÆ
ient broadword algorithm was dis
overed in 2006
by M. S. Paterson and the author, who
onsidered the problem of identifying
all o
urren
es of the pattern 01r in a given nbit binary string. This problem,
whi
h is related to storage allo
ation, is equivalent to
omputing
q = x & (x 1) & (x 2) & (x 3) & & (x r) (97)
7.1.3 BITWISE TRICKS AND TECHNIQUES 23
when x = (xn 1 : : : x1 x0 )2 is given. For example, when n = 16, r = 3, and 2adi
hain++++
x = (1110111101100111)2 , we have q = (0001000000001000)2 . One might expe
t broadword
hain++++
bran
hless+++
intuitively that
(log r) bitwise operations would be needed. But in fa
t the table lookup by shifting
following
P 20step
omputation does the job for all nP> r > 0: Let s = dr=2e,
l = k0 2ks mod 2n , h = (2s 1 l) mod 2n , and a = k0 ( 1)k+1 22ks mod 2n .
y h & x & ((x & h ) + l);
t (x + y) & x & 2r ;
u t & a; v t & a; (98)
m (u (u r)) j (v (v r));
q t & ((x & m) + ((t r) & (m 1))):
Exer
ise 111 explains why these ma
hinations are valid. The method has little
or no pra
ti
al value; there's an easy way to evaluate (97) in 2dlg re + 2 steps,
so (98) is not advantageous until r > 512. But (98) is another indi
ation of the
unexpe
ted power of broadword methods.
*Lower bounds. Indeed, the existen
e of so many tri
ks and te
hniques makes
it natural to wonder whether we've only been s
rat
hing the surfa
e. Are there
many more in
redibly fast methods, still waiting to be dis
overed? A few
theoreti
al results are known by whi
h we
an derive
ertain limitations on what
is possible, although su
h studies are still in their infan
y.
Let's say that a 2adi
hain is a sequen
e (x0 ; x1 ; : : : ; xr ) of 2adi
integers
in whi
h ea
h element xi for i > 0 is obtained from its prede
essors via bitwise
manipulation. More pre
isely, we want the steps of the
hain to be dened by
binary operations
xi = xj (i) Æi xk(i) or
i Æi xk(i) or xj (i) Æi
i ; (99)
where ea
h Æi is one of the operators f+; ; &; j ; ; ; ; ; ; ; ^; _; ; g
and ea
h
i is a
onstant. Furthermore, when the operator Æi is a left shift or
right shift, the amount of shift must be a positive integer
onstant; operations
su
h as xj (i) xk(i) or
i xk(i) are not permitted. (Without the latter restri
tion
we
ouldn't derive meaningful lower bounds, be
ause every 0{1 valued fun
tion
of a nonnegative integer x would be
omputable in two steps as \(
x) & 1"
for some
onstant
.)
Similarly, a broadword
hain of width n, also
alled an nbit broadword
hain, is a sequen
e (x0 ; x1 ; : : : ; xr ) of nbit numbers subje
t to essentially the
same restri
tions, where n is a parameter and all operations are performed
modulo 2n . Broadword
hains behave like 2adi
hains in many ways, but
subtle dieren
es
an arise be
ause of the information loss that o
urs at the left
of nbit
omputations (see exer
ise 113).
Both types of
hains
ompute a fun
tion f (x) = xr when we start them
out with a given value x = x0 . Exer
ise 114 shows that an mnbit broadword
hain is able to do m essentially simultaneous evaluations of any fun
tion that
is
omputable with an nbit
hain. Our goal is to study the shortest
hains that
are able to evaluate a given fun
tion f .
24 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Any 2adi
or broadword
hain (x0 ; x1 ; : : : ; xr ) has a sequen
e of \shift sets" shift sets
(S0 ; S1 ; : : : ; Sr ) and \bounds" (B0 ; B1 ; : : : ; Br ), dened as follows: Start with division, by 10
monus
S0 = f0g and B0 = 1; then for i 1, let
8
> Sj (i) [ Sk(i) ; 8
> Mi Bj (i) Bk(i) ; if xi = xj (i) Æi xk(i) ,
>
> S
< k(i) ; >
< i Bk(i) ;
> M if xi =
i Æi xk(i) ,
Si = S j (i) ; and Bi = > i j (i)
M B ; if xi = xj (i) Æi
i , (100)
i = xj (i)
i ,
>
>
> S
: j (i) +
i ; >
> B
: j (i) ; if x
Sj (i)
i ; Bj (i) ; if xi = xj (i)
i ,
where Mi = 2 if Æi 2 f+; g and Mi = 1 otherwise, and these formulas assume
that Æi 2= f; g. For example,
onsider the following 7step
hain:
xi Si Bi
x0 = x f0g 1
x1 = x0 & 2 f0g 1
x2 = x1 + 2 f0g 2
x3 = x2 1 f1g 2 (101)
x4 = x2 + x3 f0; 1g 8
x5 = x4 4 f4; 5g 8
x6 = x4 + x5 f0; 1; 4; 5g 128
x7 = x6 4 f4; 5; 8; 9g 128
(We en
ountered this
hain in exer
ise 4.4{9, whi
h proved that these operations
will yield x7 = bx=10
for 0 x < 160 when performed with 8bit arithmeti
.)
To begin a theory of lower bounds, let's noti
e rst that the highorder bits
of x = x0
annot in
uen
e any loworder bits unless we shift them to the right.
Lemma A. Given a 2adi
or broadword
hain, let the binary representation of
xi be ( : : : xi2 xi1 xi0 )2 . Then bit xip
an depend on bit x0q only if q p +max Si .
Proof. By indu
tion on i we
an in fa
t show that, if Bi = 1, bit xip
an depend
on bit x0q only if q p 2 Si . Addition and subtra
tion, whi
h for
e Bi > 1,
allow any parti
ular bit of their operands to ae
t all bits that lie to the left in
the sum or dieren
e, but not those that lie to the right.
Corollary I. The fun
tion x . 1
annot be
omputed by a 2adi
hain, nor
an any fun
tion for whi
h at least one bit of f (x) depends on an unbounded
number of bits of x.
Corollary W. An nbit fun
tion f (x)
an be
omputed by an nbit broadword
hain without shifts if and only if x y (modulo 2p) implies f (x) f (y)
(modulo 2 p ) for 0 p < n.
Proof. If there are no shifts we have Si = f0g for all i. Thus bit xrp
annot
depend on bit x0q unless q p. In other words we must have xr yr (modulo 2 p )
whenever x0 y0 (modulo 2 p ).
Conversely, all su
h fun
tions are a
hievable by a suÆ
iently long
hain.
Exer
ise 119 gives shiftfree nbit
hains for the fun
tions
fpy (x) = 2 p [ x mod 2 p+1 = y ℄; when 0 p < n and 0 y < 2 p+1 , (102)
7.1.3 BITWISE TRICKS AND TECHNIQUES 25
from whi
h all the relevant fun
tions arise by addition. [H. S. Warren, Jr., gener Warren
alized this result to fun
tions of m variables in CACM 20 (1977), 439{441.℄
arry
ruler fun
tion
Shift sets Si and bounds Bi are important
hie
y be
ause of a fundamental binary logarithm
reversal
lemma that is our prin
ipal tool for proving lower bounds: bit permutation
Lemma B. Let Xpqr = fxr & b2 p 2q
j x0 2 Vpqr g in an nbit broadword
hain,
where
Vpqr = fx j x & b2 p+s 2q+s
= 0 for all s 2 Sr g (103)
and p > q. Then jXpqr j Br . (Here p and q are integers, possibly negative.)
This lemma states that at most Br dierent bit patterns xr(p 1) : : : xrq
an o
ur
within f (x), when
ertain intervals of bits in x are
onstrained to be zero.
Proof. The result
ertainly holds when r = 0. Otherwise if, for example, xr =
xj + xk , we know by indu
tion that jXpqj j Bj and jXpqk j Bk . Furthermore
Vpqr = Vpqj \ Vpqk , sin
e Sr = Sj [ Sk . Thus at most Bj Bk possibilities for
(xj + xk ) & b2 p 2q
arise when there's no
arry into position q, and at most
Bj Bk when there is a
arry, making a grand total of at most Br = 2Bj Bk
possibilities altogether. Exer
ise 122
onsiders the other
ases.
We now
an prove that the ruler fun
tion needs
(log log n) steps.
Theorem R. If n = d 2d , every nbit broadword
hain that
omputes x for
0 < x < 2n has more than lg d steps that are not shifts.
l
Proof. If there are l nonshift steps, we have jSr j 2 l and Br 22 1 . Apply
Lemma B with p = d and q = 0, and suppose jXd 0r j = 2d t. Then there are t
values of k < 2d su
h that
f2k ; 2k+2d ; 2k+22d ; : : : ; 2k+(d 1)2d g 2= Vd 0r :
But Vd 0r ex
ludes at most 2 l d of the n possible powers of 2; so t 2 l .
If l lg d, Lemma B tells us that 2d t Br 2d 1 ; hen
e 2d 1 t
2 d. But this is impossible unless d 2, when the theorem
learly holds.
l
The same proof works also for the binary logarithm fun
tion:
Corollary L. If n = d 2d > 2, every nbit broadword
hain that
omputes x
for 0 < x < 2n has more than lg d steps that are not shifts.
By using Lemma B with q > 0 we
an derive the stronger lower bound
(log n) for bit reversal, and hen
e for bit permutation in general.
Theorem P. If 2 g n, every nbit broadword
hain that
omputes the
gbit reversal xR for 0 x < 2g has at least 31 lg g steps that are not shifts.
p
Proof. Assume as above that there are l nonshifts. Let h = b 3 g
and suppose
that l < blg(h + 1)
. Then Sr is a set of at most 2 l 12 (h + 1) shift amounts s.
We shall apply Lemma B with p = q + h, where p g and q 0, thus in g h +1
ases altogether. The key observation is that xR & b2 p 2q
is independent of
x & b2 p+s 2q+s
whenever there are no indi
es j and k su
h that 0 j; k < h
and g 1 q j = q + s + k. The number of \bad"
hoi
es of q for whi
h su
h
26 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
indi
es exist is at most 21 (h + 1)h2 g h; therefore at least one \good"
hoi
e HAKMEM
of q yields jXpqr j = 2h . But then Lemma B leads to a
ontradi
tion, be
ause we S
hroeppel
abstra
t redu
edinstru
tionset
omputer
obviously
annot have 2h Br 2(h 1)=2 . basi
RAM
two's
omplement notation
Corollary M. Multipli
ation by
ertain
onstants, modulo 2n, requires
(log n) program
ounter
steps in an nbit broadword
hain. ruler fun
tion
0 1 0 1 0 1
(
0 0 0 0 0 0 1
( (
0 1 1 1
x 0 ; x 1 ; x 1 1 : (127)
1 0 1 1 1 1 1 1 0 1
x^y x_y x)y
For these operations the methods above show that the binary representation
0 7! 00; 7! 01; 1 7! 11 (128)
works well, be
ause we
an
ompute the logi
al operations thus:
xl xr ^ yl yr = (xl ^ yl )(xr ^ yr ); xl xr _ yl yr = (xl _ yl )(xr _ yr );
(129)
xl xr ) yl yr = ((xr yr ) ^ :(xl ^ yr ))(xl ^ yr ):
Of
ourse x need not be an isolated ternary value in this dis
ussion; we often
want to deal with ternary ve
tors x = x1 x2 : : : xn , where ea
h xj is either a, b,
or
. Su
h ternary ve
tors are
onveniently represented by two binary ve
tors
xl = x1l x2l : : : xnl and xr = x1r x2r : : : xnr ; (130)
where xj 7! xjl xjr as above. We
ould also pa
k the ternary values into twobit
elds of a single ve
tor,
x = x1l x1r x2l x2r : : : xnl xnr ; (131)
that would work ne if, say, we're doing Lukasiewi
z logi
with the operations ^
and _ but not ). Usually, however, the twove
tor approa
h of (130) is better,
be
ause it lets us do bitwise
al
ulations without shifting and masking.
32 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Appli
ations to data stru
tures. Bitwise operations oer many eÆ
ient ways
hess
to represent elements of data and the relationships between them. For example, bit board
Emde Boas
hessplaying programs often use a \bit board" to represent the positions of van Emde Boas
pie
es (see exer
ise 143). Kaas
Zijlstra
In Chapter 8 we shall dis
uss an important data stru
ture developed by impli
it data stru
tures{
Peter van Emde Boas for representing a dynami
ally
hanging subset of integers heaps
sibling
between 0 and N. Insertions, deletions, and other operations su
h as \nd the sideways heap
largest element less than x"
an be done in O(log log N ) steps pwith his methods; binary tree stru
tures
ruler fun
tion
the general idea is to organize the
p full stru
ture re
ursively as N substru
tures
for subsets of intervals of size N , together with an auxiliary stru
ture that
tells whi
h of those intervals are o
upied. [See Information Pro
essing Letters
6 (1977), 80{82; also P. van Emde Boas, R. Kaas, and E. Zijlstra, Math. Systems
Theory 10 (1977), 99{127.℄ Bitwise operations make those
omputations fast.
Hierar
hi
al data
an sometimes be arranged so that the links between
elements are impli
it rather than expli
it. For example, we studied \heaps"
in Se
tion 5.2.3, where n elements of a sequential array impli
itly have a binary
tree stru
ture like
1 0001
2 3 0010 0011
4 5 6 7 = 0100 0101 0110 0111 (132)
8 9 10 1000 1001 1010
when, say, n = 10. (Node numbers are shown here both in de
imal and binary
notation.) There is no need to store pointers in memory to relate node j of a
heap to its parent (whi
h is node j 1 if j 6= 1), or to its sibling (whi
h is node
j 1 if j 6= 1), or to its
hildren (whi
h are nodes j 1 and (j 1) + 1 if those
numbers don't ex
eed n), be
ause a simple
al
ulation leads dire
tly from j to
any desired neighbor.
Similarly, a sideways heap provides impli
it links for another useful family
of nnode binary tree stru
tures, typied by
8 1000
4 12 = 0100 1100 (133)
2 6 10 0010 0110 1010
1 3 5 7 9 0001 0011 0101 0111 1001
when n = 10. (We sometimes need to go beyond n when moving from a node to
its parent, as in the path from 10 to 12 to 8 shown here.) Heaps and sideways
heaps
an both be regarded as nodes 1 to n of innite binary tree stru
tures:
The heap with n = 1 is rooted at node 1 and has no leaves; by
ontrast, the
sideways heap with n = 1 has innitely many leaves 1, 3, 5, : : : , but no root(!).
The leaves of a sideways heap are the odd numbers, and their parents are the
odd multiples of 2. The grandparents of leaves, similarly, are the odd multiples
of 4; and so on. Thus the ruler fun
tion j tells how high node j is above leaf level.
The parent of node j in the innite sideways heap is easily seen to be node
(j k) j (k 1); where k = j & j ; (134)
7.1.3 BITWISE TRICKS AND TECHNIQUES 33
this quantity is j rounded to the nearest multiple of 2 1+j . And the
hildren are rounded
omplete binary tree
j (k 1) and j + (k 1) (135) symmetri
order
inorder
when j is even. In general the des
endants of node j form a
losed interval Harel
lowest
ommon an
estor, see Nearest
ommo
[j 2 j + 1 : : j + 2 j 1℄; (136)
Harel
arranged as a
omplete binary tree of 2 1+j nodes. The an
estor of node j at Tarjan
S
hieber
height h is node Vishkin
oriented forest
(j j (1 h)) & (1 h) = ((j h) j 1) h (137) a
y
li
digraph
an
estor
when h j . Noti
e that the symmetri
order of the nodes, also
alled inorder, rea
hability
is just the natural order 1, 2, 3, : : : . transitive
losure
nearest
ommon an
estor
Dov Harel noted these properties in his Ph.D. thesis (U. of California, Irvine, preorder++
1980), and observed that the nearest
ommon an
estor of any two nodes of a
sideways heap
an also be easily
al
ulated. Indeed, if node l is the nearest
ommon an
estor of nodes i and j , where i j , there is a remarkable identity
l = maxfx j i x j g = (j & i); (138)
whi
h relates the and fun
tions. (See exer
ise 146.) We
an therefore use
formula (137) with h = (j & i) to
al
ulate l.
Subtle extensions of this approa
h lead to an asymptoti
ally eÆ
ient algo
rithm that nds nearest
ommon an
estors in any oriented forest whose ar
s
grow dynami
ally [D. Harel and R. E. Tarjan, SICOMP 13 (1984), 338{355℄.
Baru
h S
hieber and Uzi Vishkin [SICOMP 17 (1988), 1253{1262℄ subsequently
dis
overed a mu
h simpler way to
ompute nearest
ommon an
estors in an
arbitrary (but xed) oriented forest, using an attra
tive and instru
tive blend of
bitwise and algorithmi
te
hniques that we shall
onsider next.
Re
all that an oriented forest with m trees and n verti
es is an a
y
li
digraph with n m ar
s. There is at most one ar
from ea
h vertex; the verti
es
with outdegree zero are the roots of the trees. We say that v is the parent of u
when u ! v, and v is an an
estor of u when u ! v. Two verti
es have a
ommon an
estor if and only if they belong to the same tree. Vertex w is
alled
the nearest
ommon an
estor of u and v when we have
u ! z and v ! z if and only if w ! z . (139)
S
hieber and Vishkin prepro
ess the given forest, mapping its verti
es into
a sideways heap S of size n by
omputing three quantities for ea
h vertex v:
v , the rank of v in preorder (1 v n);
v , a node of the sideways heap S (1 v n);
v, a (1 + n)bit routing
ode (1 v < 2 1+n ).
If u ! v we have u > v by the denition of preorder. Node v is dened to
be the nearest
ommon an
estor of all sidewaysheap nodes u su
h that v is an
an
estor of vertex u. And we dene
X
v = f2 w j v ! wg: (140)
34 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
For example, here's an oriented forest with ten verti
es and two trees:
A 1 B 8
C 2 D 4 E 9
(141)
F 3 G 5 H 7 I 10
J 6
Ea
h node has been labeled with its preorder rank, from whi
h we
an
ompute
the and
odes:
v = A B C D E F G H I J
v = 0001 1000 0010 0100 1001 0011 0101 0111 1010 0110
v = 0100 1000 0010 0100 1010 0011 0110 0111 1010 0110
v = 0100 1000 0110 0100 1010 0111 0110 0101 1010 0110
Noti
e that, for instan
e, A = 4 = 0100 be
ause the preorder ranks of the
des
endants of A are f1; 2; 3; 4; 5; 6; 7g. And H = 0101 be
ause the an
estors
of H have
odes fH; D; Ag = f0111; 0100g. One
an prove without
diÆ
ulty that the mapping v 7! v satises the following key properties:
i) If u ! v in the forest, then u is a des
endant of v in S .
ii) If several verti
es have the same value of v, they form a path in the forest.
Property (ii) holds be
ause exa
tly one
hild u of v has u = v when v 6= v.
Now let's imagine pla
ing every vertex v of the forest into node v of S :
1000 B!
0100 D !A ! 1100 (142)
0010 C !A 0110 J !G ! D 1010 I !E !B
0001 0011 F !C 0101 0111 H!D 1001
If k verti
es map into node j , we
an arrange them into a path
v0 ! v1 ! ! vk 1 ! vk ; where v0 = v1 = = vk 1 = j . (143)
These paths are illustrated in (142); for example, J ! G ! D is a path in (141),
and `J !G!D' appears with node 0110 = J = G.
The prepro
essing algorithm also
omputes a table j for all nodes j of S ,
ontaining pointers to the verti
es vk at the tail ends of (143):
j = 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010
j = A C D D B
Exer
ise 149 shows that all four tables v, v, v, and j
an be prepared in
O(n) steps. And on
e those tables are ready, they
ontain just enough informa
tion to identify the nearest
ommon an
estor of any two given verti
es qui
kly:
Algorithm V (Nearest
ommon an
estors ). Suppose v, v, v, and j are
known for all n verti
es v of an oriented forest, and for 1 j n. A dummy
vertex is also assumed to be present, with = = = 0. This algorithm
omputes the nearest
ommon an
estor z of any given verti
es x and y, returning
z = if x and y belong to dierent trees. We assume that the values j = blg j
have been pre
omputed for 1 j n, and that 0 = n.
7.1.3 BITWISE TRICKS AND TECHNIQUES 35
V1. [Find
ommon height.℄ If x y, set h (y & x); otherwise set priority queue
h (x & y). (See (138).) Katajainen
Vitale
V2. [Find true height.℄ Set k x & y & (1 h), then h (k & k). navigation pile
a
he
V3. [Find z .℄ Set j ((x h) j 1) h. (Now j = z , if z 6= .) hyperboli
plane{
nonEu
lidean geometry
V4. [Find x^ and y^.℄ (We now seek the lowest an
estors of x and y in node j .)
If j = x, set x^ = x; otherwise set l (x & ((1 h) 1)) and x^ =
(((x l) j 1) l). Similarly, if j = y, set y^ = y; otherwise set l
(y & ((1 h) 1)) and y^ = (((y l) j 1) l).
V5. [Find z .℄ Set z x^ if x^ y^, otherwise z y^.
These artful dodges obviously exploit (137); exer
ise 152 explains why they work.
Sideways heaps
an also be used to implement an interesting type of priority
queue that J. Katajainen and F. Vitale
all a \navigation pile," illustrated here
for n = 10:
16
8 24
4 12 20 (144)
2 6 10 14 18
503 087 512 061 908 170 275 897 653 426
1 3 5 7 9 11 13 15 17 19
P
standing for N () = k ak F k , with no two 1s in a row. For example, here are
the negaFibona
i representation
odes of all integers between 14 and +15:
14 = 10010100 8 = 100000 2 = 1001 4 = 10010 10 = 1001000
13 = 10010101 7 = 100001 1 = 10 5 = 10000 11 = 1001001
12 = 101010 6 = 100100 0=0 6 = 10001 12 = 1000010
11 = 101000 5 = 100101 1=1 7 = 10100 13 = 1000000
10 = 101001 4 = 1010 2 = 100 8 = 10101 14 = 1000001
9 = 100010 3 = 1000 3 = 101 9 = 1001010 15 = 1000100
As in the negade
imal system (see 4.1{(6) and (7)), we
an tell whether x is
negative or not by seeing if its representation has an even or odd number of digits.
The prede
essor and su
essor + of any negaFibona
i binary
ode
an be
omputed re
ursively by using the rules
(01) = 00; (000) = 010; (100) = 001; (10) = ( )01;
(10)+ = 00; (00)+ = 01; (1)+ = ( )0: (148)
(See exer
ise 157.) But ten elegant 2adi
steps do the
al
ulation dire
tly:
y x 0 ; z y (y 1); where x = ()2 ;
z z j (x & (z 1)); (149)
w x z ((z + 1) 2); then w = ()2 .
We just use y 1 in the top line to get the prede
essor, y +1 to get the su
essor.
38 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
And now here's the point: A negaFibona
i
ode
an be assigned to ea
h
ell of the pentagrid in su
h a way that the
odes of its ve neighbors are easy to
ompute. Let's
all the neighbors n, s, e, w, and o, for \north," \south," \east,"
\west," and \other." If is the
ode assigned to a given
ell, we dene
n = 2; s = 2; e = s +; w = s ; (150)
thus sn = , and also en = (01)n = . The \other" dire
tion is tri
kier:
+; if & 1 = 1;
o = n (151)
w ; if & 1 = 0.
For example, 1000o = 101001 and 101001o = 1000. This mysterious interloper
lies between north and east when ends with 1, but between north and west
when ends with 0.
If we
hoose any
ell and label it with
ode 0, and if we also
hoose an
orientation so that its neighbors are n, e, s, w, and o in
lo
kwise order, rules
(150) and (151) will assign
onsistent labels to every
ell of the pentagrid. (See
exer
ise 160.) For example, the vi
inity of a
ell labeled 1000 will look like this:
w
s o
1010
w o
e n
s n w n
101001 10
w e e o
n o s n
s e
10100101 o n 1001
e s
o w
o 1000 o (152)
n w e n
w e
100010 100001
s s s
e n w
n o
w o o e n e
10001001 100000 10000001
s e w s
w s
The
ode labels do not, however, identify
ells uniquely, be
ause innitely
many
ells re
eive the same label. (Indeed, we
learly have 0n = 0s = 0 and
1w = 1o = 1.) To get a unique identier, we atta
h a se
ond
oordinate so that
ea
h
ell's full name has the form (; y), where y is an integer. When y is
onstant
and ranges over all negaFibona
i
odes, the
ells (; y) form a moreorless
hookshaped strip whose edges take a 90Æ turn next to
ell (0; y). In general, the
ve neighbors of (; y) are (; y)n = (n ; y + Æn ()), (; y)s = (s ; y + Æs ()),
7.1.3 BITWISE TRICKS AND TECHNIQUES 39
(; y)e = (e ; y + Æe ()), (; y)w = (w ; y + Æw ()), and (; y)o = (o ; y + Æo ()),
ylinder
where {impli
it data stru
tures
bitmap graphi
s{
typeset
Æn () = [ = 0℄; Æs () = [ = 0℄; Æe () = 0; Æw () = [ = 1℄; raster
pixels
sign(o n )[ o & n = 0℄; if & 1 = 1; printing
Æo () = (153)
ustering
sign(o w )[ o & w = 0℄; if & 1 = 0.
(See the illustration below.) Bitwise operations now allow us to surf the entire
hyperboli
plane with ease. On the other hand, we
ould also ignore the y
oordinates as we move, thereby wrapping around a \hyperboli
ylinder" of
pentagons; the
oordinates dene an interesting multigraph on the set of all
negaFibona
i
odes, in whi
h every vertex has degree 5.
(100001,1) (1001,2)
(1001,1) (0,1)
(100101,1) (1,1)
(1010,0)
(0,0)
(101,0)
(154)
(10,0) (1,0)
(1000,0) (100,0)
(0, 1)
(1001,0) (1, 1)
(10, 1) (0, 2)
Bitmap graphi
s. It's fun to write programs that deal with pi
tures and shapes,
be
ause they involve our left and right brains simultaneously. When image data
is involved, the results
an be engrossing even if there are bugs in our
ode.
The book you are now reading was typeset by software that treated ea
h
page as a giganti
matrix of 0s and 1s,
alled a \raster" or \bitmap,"
ontaining
millions of square pi
ture elements
alled \pixels." The rasters were transmitted
to printing ma
hines,
ausing tiny dots of ink to be pla
ed wherever a 1 appeared
in the matrix. Physi
al properties of ink and paper
aused those small
lusters
of dots to look like smooth
urves; but ea
h pixel's basi
squareness be
omes
evident if we enlarge the images tenfold, as in the letter `A' shown in Fig. 15(a).
With bitwise operations we
an a
hieve spe
ial ee
ts like \
ustering," in
whi
h the bla
k pixels disappear when they are surrounded on all sides:
bla
k
white
where `X 1' and X 1' stand respe
tively for the result of shifting the bitmap X rookneighbors
pixel algebra
ellular automaton
for the 1pixel shifts of a bitmap X . Then, for example, the symboli
expression Gardner
`XN & (XS j XE )' evaluates to 1 in those pixel positions whose northern neighbor Conway
Life
is bla
k, and whi
h also have either a bla
k neighbor on the south side or a white game
neighbor to the east. With these abbreviations, (155) takes the form pattern re
ognition
opti
al
hara
ter re
ognition
uster(X ) = X & (XN & XW & XE & XS ); (157) ngerprints
thinning
Rutovitz
whi
h
an also be expressed as X & (XN j XW j XE j XS ).
Every pixel has four \rookneighbors," with whi
h it shares an edge at the
top, left, right, or bottom. It also has eight \kingneighbors," with whi
h it
shares at least one
orner point. For example, the kingneighbors that lie to the
northeast of all pixels in a bitmap X
an be denoted by XNE , whi
h is equivalent
to (XN )E in pixel algebra. Noti
e that we also have XNE = (XE )N .
A 3 3
ellular automaton is an array of pixels that
hanges dynami
ally
via a sequen
e of lo
al transformations, all performed simultaneously: The state
of ea
h pixel at time t + 1 depends entirely on its state at time t and the states
of its kingneighbors at that time. Thus the automaton denes a sequen
e of
bitmaps X (0) , X (1) , X (2) , : : : that lead from any given initial state X (0) , where
(t)
X (t+1) = f (XNW ; XN(t) ; XNE (t)
; XW ; X ; XE ; XSW ; XS(t) ; XSE
(t) (t) (t) (t) (t)
) (158)
and f is any bitwise Boolean fun
tion of nine variables. Fas
inating patterns
often emerge in this way. For example, after Martin Gardner introdu
ed John
Conway's game of Life to the world in 1970, more
omputer time was probably
devoted to studying its impli
ations than to any other
omputational task during
the next several years  although the people paying the
omputer bills were
rarely told! (See exer
ise 167.)
There are 2512 Boolean fun
tions of nine variables, so there are 2512 dierent
3 3
ellular automata. Many of them are trivial, but most of them probably
have su
h
ompli
ated behavior that they are humanly impossible to understand.
Fortunately there also are many
ases that do turn out to be useful in pra
ti
e 
and mu
h easier to justify on e
onomi
grounds than the simulation of a game.
For example, algorithms for re
ognizing alphabeti
hara
ters, ngerprints,
or similar patterns often make use of a \thinning" pro
ess, whi
h removes ex
ess
bla
k pixels and redu
es ea
h
omponent of the image to an underlying skeleton
that is
omparatively simple to analyze. Several authors have proposed
ellular
automata for this problem, beginning with D. Rutovitz [J. Royal Stat. So
iety
A129 (1966), 512{513℄ who suggested a 4 4 s
heme. But parallel algorithms
are notoriously subtle, and
aws tended to turn up after various methods had
7.1.3 BITWISE TRICKS AND TECHNIQUES 41
Guo
Fig. 16. Example Hall
results of Guo and
onne
tivity stru
ture
kingwise
onne
ted
Hall's 33 autom rookwise
onne
ted
aton for thinning Rosenfeld
the
omponents of a
bitmap. (\Hollow"
pixels were origi
nally bla
k.)
been published. For example, at least two of the bla
k pixels in a
omponent like
should be removed, yet a symmetri
al s
heme will erroneously erase all four.
A satisfa
tory solution to the thinning problem was nally found by Z. Guo
and R. W. Hall [CACM 32 (1989), 359{373, 759℄, using a 3 3 automaton that
invokes alternate rules on odd and even steps. Consider the fun
tion
f (xNW ;xN ;xNE ;xW ;x;xE ;xSW ;xS ;xSE ) = x ^:g(xNW ;:::;xW ;xE ;:::;xSE ); (159)
where g = 1 only in the following 37
ongurations surrounding a bla
k pixel:
Then we use (158), but with f (xNW ; xN ; xNE ; xW ; x; xE ; xSW ; xS ; xSE ) repla
ed by
its 180Æ rotation f (xSE ; xS ; xSW ; xE ; x; xW ; xNE ; xN ; xNW ) on evennumbered steps.
The pro
ess stops when two
onse
utive
y
les make no
hange.
With this rule Guo and Hall proved that the 3 3 automaton will preserve
the
onne
tivity stru
ture of the image, in a strong sense that we will dis
uss
below. Furthermore their algorithm obviously leaves an image inta
t if it is
already so thin that it
ontains no three pixels that are kingneighbors of ea
h
other. On the other hand it usually su
eeds in \removing the meat o the
bones" of ea
h bla
k
omponent, as shown in Fig. 16. Slightly thinner thinning
is obtained in
ertain
ases if we add four additional
ongurations
(160)
to the 37 listed above. In either
ase the fun
tion g
an be evaluated with a
Boolean
hain of length 25. (See exer
ises 170{172.)
In general, the bla
k pixels of an image
an be grouped into segments or
omponents that are kingwise
onne
ted, in the sense that any bla
k pixel
an
be rea
hed from any other pixel of its
omponent by a sequen
e of king moves
through bla
k pixels. The white pixels also form
omponents, whi
h are rookwise
onne
ted : Any two white
ells of a
omponent are mutually rea
hable via rook
moves that tou
h nothing bla
k. It's best to use dierent kinds of
onne
tedness
for white and bla
k, in order to preserve the topologi
al
on
epts of \inside" and
\outside" that are familiar from
ontinuous geometry [see A. Rosenfeld, JACM
17 (1970), 146{160℄. If we imagine that the
orner points of a raster are bla
k,
an innitely thin bla
k
urve
an
ross between pixels at a
orner, but a white
urve
annot. (We
ould also imagine white
orner points, whi
h would lead to
rookwise
onne
tivity for bla
k and kingwise
onne
tivity for white.)
42 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
time = 0 time = 1 time = 3 Levialdi
Beyer
Cheshire
at
online
ba
kground
obje
ts
holes
oriented tree
(a) (b) (
)
Fig. 17. The shrinking of a Cheshire
at
An amusing algorithm for shrinking a pi
ture while preserving its
onne

tivity, ex
ept that isolated bla
k or white pixels disappear, was presented by
S. Levialdi in CACM 15 (1972), 7{10; an equivalent algorithm, but with bla
k
and white reversed, had also appeared in T. Beyer's Ph.D. thesis (M.I.T., 1969).
The idea is to use a
ellular automaton with the simple transition fun
tion
f (xNW ; xN ; xNE ; xW ; x; xE ; xSW ; xS ; xSE ) = (x ^ (xW_xSW_xS )) _ (xW ^ xS ) (161)
at ea
h step. This formula is a
tually a 2 2 rule, but we still need a 3 3 window
if we want to keep tra
k of the
ases when a onepixel
omponent goes away.
For example, the 25 30 pi
ture of a Cheshire
at in Fig. 17(a) has seven
kingwise bla
k
omponents: the outline of its head, the two earholes, the two
eyes, the nose, and the smile. The result after one appli
ation of (161) is shown
in Fig. 17(b): Seven
omponents remain, but there's an isolated point in one ear,
and the other earhole will be
ome isolated after the next step. Hen
e Fig. 17(
)
has only ve
omponents. After six steps the
at loses its nose, and even the
smile will be gone at time 14. Sadly, the last bit of
at will vanish during step 46.
At most M + N 1 transitions will wipe out any M N pi
ture, be
ause
the lowest visible northwesttosoutheast diagonal line moves relentlessly upward
ea
h time. Exer
ises 176 and 177 prove that dierent
omponents will never
merge together and interfere with ea
h other.
Of
ourse this
ubi
time
ellular method isn't the fastest way to
ount or
identify the
omponents of a pi
ture. We
an a
tually do that job \online,"
while looking at a large image one row at a time, not bothering to keep all of
the previously seen rows in memory if we don't wish to look at them again.
While we're analyzing the
omponents we might as well also re
ord the
relationships between them. Let's assume that only nitely many bla
k pixels
are present. Then there's an innite
omponent of white pixels
alled the
ba
kground. Bla
k
omponents adja
ent to the ba
kground
onstitute the main
obje
ts of the image. And these obje
ts may in turn have holes, whi
h may serve
as a ba
kground for another level of obje
ts, and so on. Thus the
onne
ted
omponents of any nite pi
ture form a hierar
hy  an oriented tree, rooted at
the ba
kground. Bla
k
omponents appear at the oddnumbered levels of this
tree, and white
omponents at the evennumbered levels, alternating between
7.1.3 BITWISE TRICKS AND TECHNIQUES 43
time = 5 time = 10 time = 20 surrounded
simply
onne
ted
During the shrinking pro
ess of Fig. 17,
omponents disappear in the order
C , f B , 2 , 3 g (all at time 3), F , E , D , G , 1 , A .
Suppose we want to analyze the
omponents of su
h a pi
ture by reading
one row at a time. After we've seen four rows the resultsofar will be
00000000000000000000000A000000 0
0000BBB000000000000000AA000000
0000B1BB0000000000000A22A00000 B C A (163)
000B111BB000CCCCCCC00A22A00000 1 2
and we'll be ready to s
an row ve. A
omparison of rows four and ve will
then show that B and C should merge into A , but that new
omponents B
and 3 should also be laun
hed. Exer
ise 179
ontains full details about an
instru
tive algorithm that properly updates the
urrent tree as new rows are
input. Additional information
an also be
omputed on the
y: For example, we
ould determine the area of ea
h
omponent, the lo
ations of its rst and last
pixels, the smallest en
losing re
tangle, and/or its
enter of gravity.
44 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
*Filling. Let's
omplete our qui
k tour of raster graphi
s by
onsidering how raster graphi
s
to ll regions that are bounded by straight lines and/or simple
urves. Parti
u
oni
se
tions
ir
les
larly eÆ
ient algorithms are available when the
urves are built up from \
oni
ellipses
se
tions" 
ir
les, ellipses, parabolas, or hyperbolas, as in
lassi
al geometry. parabolas
hyperbolas
In keeping with geometri
tradition, we shall adopt Cartesian
oordinates Cartesian
oordinates
boundary
urves++++
(x; y) in the following dis
ussion, instead of speaking about rows or
olumns edge transitions++++
of pixels: An in
rease of x will signify a move to the right, while an in
rease
usterized
A
kland
of y will move upward. More signi
antly, we will fo
us on the edges between Weste
square pixels, instead of on the pixels themselves. Edges run between integer Jordan
urve theorem
points (x; y) and (x0 ; y0 ) of the plane when jx x0 j + jy y0 j = 1. Ea
h pixel Veblen
inside
is bounded by the four edges (x; y) (x 1; y) (x 1; y 1) (x; y 1) outside
digitization
(x; y). Experien
e has shown that algorithms for lling
ontours be
ome simpler
and faster when we
on
entrate on the edge transitions between white and bla
k,
instead of on the bla
k pixels of a
usterized boundary. (See, for example, the
dis
ussion by B. D. A
kland and N. Weste in IEEE Trans. C30 (1981), 41{47.)
Consider a
ontinuous
urve z (t) = x(t); y(t) that is tra
ed out as t varies
from 0 to 1. We assume that the
urve doesn't interse
t itself for 0 t < 1, and
that z (0) = z (1). The famous Jordan
urve theorem [C. Jordan, Cours d'analyse
3 (1887), 587{594; O. Veblen, Trans. Amer. Math. So
. 6 (1905), 83{98℄ states
that every su
h
urve divides the plane into two regions,
alled the inside and
the outside. We
an \digitize" z (t) by for
ing it to travel along edges between
pixels; then we obtain an approximation in whi
h the inside pixels are bla
k and
the outside pixels are white. This digitization pro
ess essentially repla
es the
original
urve by the sequen
e of integer points
round(z (t)) = bx(t) + 12
; by(t) + 21
; for 0 t 1. (164)
The
urve
an be perturbed slightly, if ne
essary, so that z (t) never passes exa
tly
through the
enter of a pixel. Then the digitized
urve takes dis
rete steps along
pixel edges as t grows; and a pixel lies inside the digitization if and only if its
enter lies inside the original
ontinuous
urve fz (t) j 0 t 1g.
For example, the equations x(t) = 20
os 2t and y(t) = 10 sin 2t dene an
ellipse. Its digitization, round(z (t)), starts at (20; 0) when t = 0, then jumps to
(20; 1) when t :008 and 10 sin 2t = 0:5. Then it pro
eeds to the points (20; 2),
(19; 2), (19; 3), (19; 4), (18; 4), : : : , (20; 1), (20; 0), as t in
reases through the
values .024, .036, .040, .057, .062, : : : , .976, .992:
(165)
Conversely, it's easy to obtain B when the H ve
tors are given:
B (y) = H (ymax ) H (ymax 1 ) H (y + 1)
= H (ymin ) H (ymin+1 ) H (y): (167)
Noti
e that H (ymin ) H (ymin+1 ) H (ymax ) is the zero ve
tor, be
ause ea
h
bitmap is white at both top and bottom. Noti
e further that the analogous verti
al edge ve
tors V (x) are redundant: They satisfy the formulas V = B (B 1)
and B = V (see exer
ise 36), but we need not bother to keep tra
k of them.
Coni
se
tions are easier to deal with than most other
urves, be
ause we
an readily eliminate the parameter t. For example, the ellipse that led to (165)
an be dened by the equation (x=20)2 + (y=10)2 = 1, instead of using sines
and
osines. Therefore pixel (x; y) should be bla
k if and only if its
enter point
(x 21 ; y 12 ) lies inside the ellipse, if and only if (x 12 )2=400+(y 12 )2=100 1 < 0.
In general, every
oni
se
tion is the set of points for whi
h F (x; y) = 0,
when F is an appropriate quadrati
form. Therefore there's a quadrati
form
Q(x; y) = F (x 12 ; y 21 ) = ax2 + bxy +
y2 + dx + ey + f (168)
that is negative at the integer point (x; y) if and only if pixel (x; y) lies on a
given side of the digitized
urve.
For pra
ti
al purposes we may assume that the
oeÆ
ients (a; b; : : : ; f ) of Q
are nottoolarge integers. Then we're in lu
k, be
ause the exa
t value of Q(x; y)
is easy to
ompute. In fa
t, as pointed out by M. L. V. Pitteway [Comp. J.
10 (1967), 282{289℄, there's a ni
e \threeregister algorithm" by whi
h we
an
qui
kly tra
k the boundary points: Let x and y be integers, and suppose we've got
the values of Q(x; y), Qx (x; y), and Qy (x; y) in three registers (Q; Qx ; Qy ), where
Qx (x; y) = 2ax + by + d and Qy (x; y) = bx + 2
y + e (169)
are x Q and y Q. We
an then move to any adja
ent integer point, be
ause
51 93 245 405
371 227 75 85
(21; 0)
371 227 75 85
Algorithm T applies only to
oni
urves. But that's not really a limitation
in pra
ti
e, be
ause just about every shape we ever need to draw
an be well ap
proximated by \pie
ewise
oni
s"
alled quadrati
Bezier splines or squines. For
example, Fig. 19 shows a typi
al squine
urve with 40 points (z0 ; z1 ; : : : ; z39 ; z40 ),
where z40 = z0 . The evennumbered points (z0 ; z2 ; : : : ; z40 ) lie on the
urve;
the others, (z1 ; z3 ; : : : ; z39 ), are
alled \
ontrol points," be
ause they regulate
lo
al bending and
exing. Ea
h se
tion S (z2j ; z2j +1 ; z2j +2 ) begins at point z2j ,
traveling in dire
tion z2j +1 z2j . It ends at point z2j +2 , traveling in dire
tion
z2j +2 z2j +1 . Thus if z2j lies on the straight line from z2j 1 to z2j +1 , the squine
passes smoothly through point z2j without
hanging dire
tion.
Exer
ise 186 denes S (z2j ; z2j +1 ; z2j +2 ) pre
isely, and exer
ise 187 explains
how to digitize any squine
urve using Algorithm T. The region inside the
digitized edges
an then be lled with bla
k pixels.
In
identally, the task of drawing lines and
urves on a bitmap turns out
to be mu
h more diÆ
ult than the task of lling a digitized
ontour, be
ause
we want diagonal strokes to have the same apparent thi
kness as verti
al and
horizontal strokes do. An ex
ellent solution to the linedrawing problem was
found by John D. Hobby, JACM 36 (1989), 209{229.
*Bran
hless
omputation. Modern
omputers tend to slow down when a
program
ontains
onditional bran
h instru
tions, be
ause an un
ertain
ow
of
ontrol
an interfere with predi
tive lookahead
ir
uitry. Therefore we've
used MMIX's
onditionalset instru
tions like CSNZ in programs like (56). Indeed,
the four instru
tions `SRU z,y,16; ADD t,lam,16; CSNZ y,q,z; CSNZ lam,q,t'
found in (56) are probably faster than their threeinstru
tion
ounterpart
BZ q,+12; SRU y,y,16; ADD lam,lam,16 (173)
when the a
tual running time is measured on a highly pipelined ma
hine, even
though the ruleofthumb
ost of (173) is only 3 a
ording to Table 1.3.1{1.
7.1.3 BITWISE TRICKS AND TECHNIQUES 49
Bitwise operations
an help diminish the need for
ostly bran
hing. For mask
example, if MMIX didn't have a CSNZ instru
tion we
ould write signed shift right
NEG
NEG m,q; OR m,m,q; SR m,m,63; merge sort
a
he
SRU t,y,16; XOR t,t,y; AND t,t,m; XOR y,y,t; (174)
ADD t,lam,16; XOR t,t,lam; AND t,t,m; XOR lam,lam,t;
here the rst line
reates the mask m = [ q 6= 0℄. On some
omputers these eleven
bran
hless instru
tions would still run faster than the three instru
tions in (173).
The inner loop of a merge sort algorithm provides an instru
tive example.
Suppose we want to do the following operations repeatedly:
If xi < yj , set zk xi , i i + 1, and go to x done if i = imax .
Otherwise set zk yj , j j + 1, and go to y done if j = jmax .
Then set k k + 1 and go to z done if k = kmax .
If we implement them in the \obvious" way, four
onditional bran
hes are in
volved, three of whi
h are a
tive on ea
h path through the loop:
1H CMP t,xi,yj; BNN t,2F Bran
h if xi yj .
STO xi,zbase,kk zk xi .
ADD ii,ii,8 i i + 1.
BZ ii,X Done To x done if i = imax.
LDO xi,xbase,ii Load xi into register xi.
JMP 3F Join the other bran
h.
2H STO yj,zbase,kk zk yj .
ADD jj,jj,8 j j + 1.
BZ jj,Y Done To y done if j = jmax.
LDO yj,ybase,jj Load yj into register yj.
3H ADD kk,kk,8 k k + 1.
PBNZ kk,1B Repeat if k 6= kmax.
JMP Z Done To z done.
(Here ii = 8(i imax ), jj = 8(j jmax ), and kk = 8(k kmax ); the fa
tor of
8 is needed be
ause xi , yj , and zk are o
tabytes.) Those four bran
hes
an be
redu
ed to just one:
1H CMP t,xi,yj t sign(xi yj ).
CSN yj,t,xi yj min(xi ; yj ).
STO yj,zbase,kk zk yj.
AND t,t,8 t 8[ xi < yj ℄.
ADD ii,ii,t i i + [ xi < yj ℄.
LDO xi,xbase,ii Load xi into register xi.
XOR t,t,8 t t 8.
ADD jj,jj,t j j + [ xi yj ℄.
LDO yj,ybase,jj Load yj into register yj.
ADD kk,kk,8 k k + 1.
AND u,ii,jj; AND u,u,kk u ii & jj & kk.
PBN u,1B Repeat if i<imax, j <jmax, and k <kmax .
When the loop stops in this version, we
an readily de
ide whether to
ontinue at
x done, y done, or z done. These instru
tions load both xi and yj from memory
ea
h time, but the redundant value will already be present in the
a
he.
50 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
*More appli
ations of MOR and MXOR. Let's nish o our study of bitwise MOR++
manipulation by taking a look at two operations that are spe
i
ally designed for MXOR++
matrix multipli
ation
64bit work. MMIX's instru
tions MOR and MXOR, whi
h essentially
arry out matrix mask
multipli
ation on 8 8 Boolean matri
es, turn out to be extremely
exible and bit permutations
byte permutations
powerful, both by themselves and in
ombination with other bitwise operations. permutation matrix
If x = (x7 : : : x1 x0 )256 is an o
tabyte and a = (a7 : : : a1 a0 )2 is a single byte, inverse permutation
perfe
t shue
the instru
tion MOR t,x,a sets t a7 x7 j j a1 x1 j a0 x0 , while MXOR t,x,a sets zippered
t a7 x7 a1 x1 a0 x0 . For example, MOR t,x,2 and MXOR t,x,2 both set MUX
Æswaps
t x1 ; MOR t,x,3 sets t x1 j x0 ; and MXOR t,x,3 sets t x1 x0 . nite eld
In general, of
ourse, MOR and MXOR are fun
tions of o
tabytes. When y =
(y7 : : : y1 y0 )256 is a general o
tabyte, the instru
tion MOR t,x,y produ
es the
o
tabyte t whose j th byte tj is the result of MOR applied to x and yj .
Suppose x = 1 = # ffffffffffffffff . Then MOR t,x,y
omputes the
mask t in whi
h byte tj is # ff whenever yj 6= 0, while tj is zero when yj = 0. This
simple spe
ial
ase is quite useful, be
ause it a
omplishes in just one instru
tion
what we previously needed seven operations to a
hieve in situations like (92).
We observed in (66) that two MORs will suÆ
e to reverse the bits of any 64bit
word, and many other important bit permutations also be
ome easy when MOR
is in a
omputer's repertoire. Suppose is a permutation of f0; 1; : : : ; 7g that
takes 0 7! 0, 1 7! 1, : : : , 7 7! 7. Then the o
tabyte p = (27 : : : 21 20)256
orresponds to a permutation matrix that makes MOR do ni
e tri
ks: MOR t,x,p
will permute the bytes of x, setting tj xj . Furthermore, MOR u,p,y will
permute the bits of ea
h byte of y , a
ording to the inverse permutation; it sets
uj (a7 : : : a1 a0 )2 when yj = (a7 : : : a1 a0 )2 .
With a little more skullduggery we
an also expedite further permutations
su
h as the perfe
t shue (76), whi
h transforms a given o
tabyte z = 232 x + y =
(x31 : : : x1 x0 y31 : : : y1 y0 )2 into the \zippered" o
tabyte
w = x z y = (x31 y31 : : : x1 y1 x0 y0 )2 : (175)
With appropriate permutation matri
es p, q, and r, the intermediate results
t = (x31 x27 x30 x26 x29 x25 x28 x24 y31 y27 y30 y26 y29 y25 y28 y24 : : :
x7 x3 x6 x2 x5 x1 x4 x0 y7 y3 y6 y2 y5 y1 y4 y0 )2 ; (176)
u = (y27 y31 y26 y30 y25 y29 y24 y28 x27 x31 x26 x30 x25 x29 x24 x28 : : :
y3 y7 y2 y6 y1 y5 y0 y4 x3 x7 x2 x6 x1 x5 x0 x4 )2 (177)
an be
omputed qui
kly via the four instru
tions
MOR t,z,p; MOR t,q,t; MOR u,t,r; MOR u,r,u; (178)
see exer
ise 204. So there's a mask m for whi
h ` PUT rM,m; MUX w,t,u'
ompletes
the perfe
t shue in just six
y
les altogether. By
ontrast, the traditional
method in exer
ise 53 requires 30
y
les (ve Æswaps).
The analogous instru
tion MXOR is espe
ially useful when binary linear alge
bra is involved. For example, exer
ise 1.3.1{37 shows that XOR and MXOR dire
tly
implement addition and multipli
ation in a nite eld of 2k elements, for k 8.
7.1.3 BITWISE TRICKS AND TECHNIQUES 51
The problem of
y
li
redundan
y
he
king provides an instru
tive example
y
li
redundan
y
he
king
of another
ase where MXOR shines. Streams of data are often a
ompanied by CRC
Peterson
\CRC bytes" in order to dete
t
ommon types of transmission errors [see W. W. Brown
Peterson and D. T. Brown, Pro
. IRE 49 (1961), 228{235℄. One popular method, MP3 (MPEG1 Audio Layer III)
Perez
used for example in MP3 audio les, is to regard ea
h byte = (a7 : : : a1 a0 )2 Warren
as if it were the polynomial
(x) = (a7 : : : a1 a0 )x = a7 x7 + + a1 x + a0 : (179)
When transmitting n bytes n 1 : : : 1 0 , we then
ompute the remainder
= n 1 (x) x8(n 1) + + 1 (x) x8 + 0 (x) x16 mod p(x); (180)
where p(x) = x16 + x15 + x2 +1, using polynomial arithmeti
mod 2, and append
the
oeÆ
ients of as a 16bit redundan
y
he
k.
The usual way to
ompute is to pro
ess one byte at a time, a
ording to
lassi
al methods like Algorithm 4.6.1D. The basi
idea is to dene the partial
result m = n 1 (x) x8(n 1) + + m (x) x8m x16 mod p(x) so that n = 0,
and then to use the re
ursion
m = ((m+1 8) & # ff00 )
r
table [(m+1 8) m ℄ (181)
to de
rease m by 1 until m = 0. Here
r
table [℄ is a 16bit table entry that
holds the remainder of (x) x16 , modulo p(x) and mod 2, for 0 < 256.
[See A. Perez, IEEE Mi
ro 3, 3 (June 1983), 40{50.℄
But of
ourse we'd prefer to pro
ess 64 bits at on
e instead of 8. The solution
is to nd 8 8 matri
es A and B su
h that
(x) x64 (A)(x) + (B )(x) x 8 (modulo p(x) and 2); (182)
for arbitrary bytes ,
onsidering to be a 1 8 ve
tor of bits. Then we
an
pad the given data bytes n 1 : : : 1 0 with leading zeros so that n is a multiple
of 8, and use the following eÆ
ient redu
tion method:
Begin with
0, n n 8, and t (n+7 : : : n )256 .
While n > 0, set u t A, v t B , n n 8, (183)
t (n+7 : : : n )256 u (v 8) (
56), and
v & # ff .
Here t A and t B denote matrix multipli
ation via MXOR. The desired CRC
bytes, (tx16 +
x8 ) mod p(x), are then readily obtained from the 64bit quantity t
and the 8bit quantity
. Exer
ise 213
ontains full details; the total running
time for n bytes
omes to only ( + 10) n=8 + O(1).
The exer
ises below
ontain many more instan
es where MOR and MXOR lead
to substantial e
onomies. New tri
ks undoubtedly remain to be dis
overed.
For further reading. The book Ha
ker's Delight by Henry S. Warren, Jr.
(Addison{Wesley, 2002) dis
usses bitwise operations in depth, emphasizing the
great variety of options that are available on realworld
omputers that are not
as ideal as MMIX.
52 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
EXERCISES Warren
subtra
tion
x 1. [15 ℄ What is the net ee
t of setting x x y, y y (x & m), x x y?
omplement
2. [16 ℄ (H. S. Warren, Jr.) Are any of the following relations valid for all integers x
negative
innitepre
ision
and y? (i) x y x j y; (ii) x & y x j y; (iii) jx yj x y. S
hroeppel
negabinary
3. [M20 ℄ If x = (xn 1 : : : x1 x0 )2 with xn 1 = 1, let x
M = ( xn 1 : : : x1 x0 )2 . Thus we radix 2
have 0 , 1 M, 2 , 3 , : : : = 1, 0, 1, 0, 3, 2, 1, 0, 7, 6, : : : , if we let 0M = 1. Prove
M M M M minimal ex
ludant
that (x y) < jx yj x y for all x; y 0. mex
Nim
x 4. [M16 ℄ Let xC = x, xN = x, xS = x +1, and xP = x 1 denote the
omplement, game
Conway's eld
the negative, the su
essor, and the prede
essor of an innitepre
ision integer x. Then nim multipli
ation
we have xCC = xNN = xSP = xPS = x. What are xCN and xNC? re
ursively
eld
5. [M21 ℄ Prove or disprove the following
onje
tured laws
on
erning binary shifts: Lenstra
a) (x j ) k = x (j + k); nim division
Nim, se
ondorder
b) (x j ) & (y k) = ((x (j + k)) & y) k = (x & (y (j + k))) j .
6. [M22 ℄ Find all integers x and y su
h that (a) x y = y x; (b) x y = y x.
7. [M22 ℄ (R. S
hroeppel, 1972.) Find a fast way to
onvert the binary number
x = ( : : : x2 x1 x0 )2 to its negabinary
ounterpart x = ( : : : x02 x01 x00 ) 2 , and vi
e versa.
Hint: Only two bitwise operations are needed!
x 8. [M22 ℄ Given a nite set S of nonnegative integers, the \minimal ex
ludant" of S
is dened to be
mex(S ) = minf k j k 0 and k 2= S g:
Let x S denote the set fx y j y 2 S g. Prove that if x = mex(S ) and y = mex(T )
then x y = mex((S y) [ (x T )).
9. [M26 ℄ (Nim.) Two people play a game with k piles of sti
ks, where there are aj
sti
ks in pile j . If a1 = = ak = 0 when it is a player's turn to move, that player
loses; otherwise the player redu
es one of the piles by any desired amount, throwing
away the removed sti
ks, and it is the other player's turn. Prove that the player to
move
an for
e a vi
tory if and only if a1 ak 6= 0.
10. [HM40 ℄ (Conway's eld.) Continuing exer
ise 8, dene the operation x
y of
\nim multipli
ation" re
ursively by the formula
x
y = mexf(x
j ) (i
y) (i
j ) j 0 i < x; 0 j < yg:
Prove that and
dene a eld over the set of all nonnegative integers. Prove also
that if 0 x; y < 22n then nx
y < 22n , and x
22n = 22n x. (In parti
ular, this eld
ontains subelds of size 22 for all n 0.) Explain how to
ompute x
y eÆ
iently.
x 11. [M26 ℄ (H. W. Lenstra, 1978.) Find a simple way to
hara
terize all pairs of
positive integers (m; n) for whi
h m
n = mn in Conway's eld.
12. [M26 ℄ Devise an algorithm for division in Conway's eld. Hint: If x < 2 2
n+1 then
we have x
(x (x 2 )) < 22 .
n n
13. [M32 ℄ (Se
ondorder nim.) Extend the game of exer
ise 9 by allowing two kinds
of moves: Either aj is redu
ed for some j , as before; or aj is redu
ed and ai is repla
ed
by an arbitrary nonnegative integer, for some i < j . Prove that the player to move
an now for
e a vi
tory if and only if the pile sizes satisfy either a2 6= a3 ak or
a1 6= a3 (2
a4 ) ((k 2)
ak ). For example, when k = 4 and (a1 ; a2 ; a3 ; a4 ) =
(7; 5; 0; 5), the only winning move is to (7; 5; 6; 3).
7.1.3 BITWISE TRICKS AND TECHNIQUES 53
14. [M30 ℄ Suppose ea
h node of a
omplete, innite binary tree has been labeled with
omplete, innite binary tree
0 or 1. Su
h a labeling is
onveniently represented as a set T = ft; t0 ; t1 ; t00 ; t01 ; t10 ; t11 ; 2adi
integer
bran
hing fun
tion
t000 ; : : : g, with one bit t for every binary string ; the root is labeled t, the left permutation
subtree labels are T0 = ft0 ; t00 ; t01 ; t000 ; : : : g, and the right subtree labels are T1 = ruler fun
tion rho
ft1 ; t10 ; t11 ; t100 ; : : : g. Any su
h labeling
an be used to transform a 2adi
integer group
omposition of permutations
x = ( : : : x2 x1 x0 )2 into the 2adi
integer y = ( : : : y2 y1 y0 )2 = T (x) by setting y0 = t, balan
ed
y1 = tx0 , y2 = tx0 x1 , et
., so that T (x) = 2Tx0 (bx=2
) + t. (In other words, x denes Qui
k
an innite path in the binary tree, and y
orresponds to the labels on that path, from XOR identities
animating
right to left in the bit strings as we pro
eed from top to bottom of the tree.) pixel pattern
A bran
hing fun
tion is the mapping xT = x T (x) dened by su
h a labeling.
For example, if t01 = 1 and all of the other t are 0, we have xT = x 4[ x mod 4 = 2℄.
a) Prove that every bran
hing fun
tion is a permutation of the 2adi
integers.
b) For whi
h integers k is x (x k) a bran
hing fun
tion?
) Let x 7! xT be a mapping from 2adi
integers into 2adi
integers. Prove that xT
is a bran
hing fun
tion if and only if (x y) = (x y ) for all 2adi
x and y.
T T
d) Prove that
ompositions and inverses of bran
hing fun
tions are bran
hing fun

tions. (Thus the set B of all bran
hing fun
tions is a permutation group.)
e) A bran
hing fun
tion is balan
ed if the labels satisfy t = t0 t1 for all . Show
that the set of all balan
ed bran
hing fun
tions is a subgroup of B.
x 15. [M21 ℄ J. H. Qui
k noti
ed that ((x +2) 3) 2 = ((x 2) 3)+2 for all x. Find
all
onstants a and b su
h that ((x + a) b) a = ((x a) b) + a is an identity.
16. [M31 ℄ A fun
tion of x is
alled animating if it
an be written in the form
((: : : ((((x + a1 ) b1 ) + a2 ) b2 ) + ) + am ) bm
for some integer
onstants a1 , b1 , a2 , b2 , : : : , am , bm , with m > 0.
a) Prove that every animating fun
tion is a bran
hing fun
tion (see exer
ise 14).
b) Furthermore, prove that it is balan
ed if and only if b1 b2 bm = 0. Hint:
What binary tree labeling
orresponds to the animating fun
tion ((x
) 1)
?
) Let bxe = x (x 1) = 2 (x)+1 1. Show that every balan
ed animating fun
tion
an be written in the form
x bx p1 e bx p2 e bx pl e; p1 < p2 < < pl ;
for some integers fp1 ; p2 ; : : : ; pl g, where l 0, and this representation is unique.
d) Conversely, show that every su
h expression denes a balan
ed animating fun
tion.
17. [HM36 ℄ The results of exer
ise 16 make it pos
sible to de
ide whether or not any two given ani
mating fun
tions are equal. Is there an algorithm
that de
ides whether any given expression is iden
ti
ally zero, when that expression is
onstru
ted
from a nite number of integer variables and
on
stants using only the binary operations + and ?
What if we also allow &?
18. [M25 ℄ The
urious pixel pattern shown here
has (x2 y 11) & 1 in row x and
olumn y, for
1 x; y 256. Is there any simple way to explain
some of its major
hara
teristi
s mathemati
ally?
54 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
x 19. [M37 ℄ (Paley's rearrangement theorem.) Given three ve
tors A = (a0 ; : : : ; a2n 1 ), Paley
B = (b0 ; : : : ; b2n 1 ), and C = (
0 ; : : : ;
2n 1 ) of nonnegative numbers, let sorted
zeroone prin
iple
f (A; B; C ) =
X
aj bk
l :
0 {1 prin
iple
Gosper's ha
k
j kl=0 nested parentheses
parenthesis tra
e
For example, if n = 2 we have f (A; B; C ) = a0 b0
0 + a0 b1
1 + a0 b2
2 + a0 b3
3 + a1 b0
1 + Gosper's ha
k
a1 b1
0 + a1 b2
3 + + a3 b3
0 ; in general there are 22n terms, one for ea
h
hoi
e of MMIX
prime numbers
j and k. Our goal is to prove that f (A; B; C ) f (A ; B ; C ), where A denotes the sieve
ve
tor A sorted into nonin
reasing order, a0 a1 a2n 1 . Eratosthenes
bookworm
a) Prove the result when all elements of A, B, and C are 0s and 1s. pa
k
b) Show that it is therefore true in general. allo
ation of memory
) Similarly, f (A; B; C; D) = Pjklm=0 aj bk
l dm f (A ; B ; C ; D). storage allo
ation
division, avoiding
x 20. [21 ℄ (Gosper's ha
k.) The following seven operations produ
e a useful fun
tion y Pratt
of x, when x is a positive integer. Explain what this fun
tion is and why it is useful. magi
mask
38. [17 ℄ How long does the leftmostbitextra
tion pro
edure (57) take when imple
mented on MMIX?
x 39. [20 ℄ Formula (43) shows how to remove the rightmost run of 1 bits from a given
number x. How would you remove the leftmost run of 1 bits?
x 40. [21 ℄ Prove (58), and nd a simple way to de
ide if x < y, given x and y 0.
41. [M22 ℄ What are the generating fun
tions of the integer sequen
es (a) n, (b) n,
and (
) n? Pn 1
42. [M21 ℄ If n = 2 1 + + 2 r , with e1 > > er 0, express the sum
e e
k=0 k
in terms of the exponents e1 , : : : , er .
x 43. [20 ℄ How sparse should x be, to make (63) faster than (62) on MMIX?
x 44. [23 ℄ (E. Freed, 1983.) What's a fast way to evaluate the weighted bit sum P jxj ?
x 45. [20 ℄ (T. Roki
ki, 1999.) Explain how to test if xR< yR, without reversing x and y.
46. [22 ℄ Method (68) uses six operations to inter
hange two bits xi $ xj of a register.
Show that this inter
hange
an a
tually be done with only three MMIX instru
tions.
47. [10 ℄ Can the general Æ swap (69) also be done with a method like (67)?
48. [M21 ℄ How many dierent Æ swaps are possible in an nbit register? (When n = 4,
a Æswap
an transform 1234 into 1234, 1243, 1324, 1432, 2134, 2143, 3214, 3421, 4231.)
x 49. [M30 ℄ Let s(n) denote the fewest Æswaps that suÆ
e to reverse an nbit number.
a) Prove that s(n) dlog3 ne when n is odd, s(n) dlog3 3n=2e when n is even.
b) Evaluate s(n) when n = 3m , 2 3m , (3m + 1)=2, and (3m 1)=2.
) What are s(32) and s(64)? Hint: Show that s(5n + 2) s(n) + 2.
50. [M37 ℄ Continuing exer
ise 49, prove that s(n) = log3 n + O (log log n).
56 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
51. [23 ℄ Let
be a
onstant, 0
< 2 . Find all sequen
es of masks (0 ; 1 ; : : : ; d 1 ;
d reversal
^d 2 ; : : : ; ^1 ; ^0 ) su
h that the general permutation s
heme (71) takes x 7! x , where
y
li
right shift
perfe
t shue
the bit permutation is dened by either (a) j = j
; or (b) j = (j +
) mod 2d . outshue
[The masks should satisfy k d;k and ^k d;k , so that (71)
orresponds to Fig. 12; transposes
see (48). Noti
e that reversal, x = xR , is the spe
ial
ase
= 2d 1 of part (a), while fast Fourier transforms
permutation of index digits
part (b)
orresponds to the
y
li
right shift x = (x
) + (x (2d
)).℄ Gosper
52. [22 ℄ Find hexade
imal
onstants (0 ; 1 ; 2 ; 3 ; 4 ; 5 ; ^4 ; ^3 ; ^2 ; ^1 ; ^0 ) that
ause transposed
matrix multipli
ation
(71) to produ
e the following important 64bit permutations, based on the binary Boolean matrix multipli
ation
representation j = (j5 j4 j3 j2 j1 j0 )2: (a) j = (j0 j5 j4 j3j2 j1 )2 ; (b) j = (j2 j1 j0 j5 j4 j3 )2 ; MOR
MXOR
(
) j = (j1 j0 j5 j4 j3 j2 )2 ; (d) j = (j0 j1 j2 j3 j4 j5 )2 . [Case (a) is
alled a \perfe
t shuf swap
e" be
ause it takes (x63 : : : x33 x32 x31 : : : x1 x0 )2 into (x63 x31 : : : x33 x1 x32 x0 )2 ;
ase (b) Omega network for routing
transposes an 8 8 matrix of bits;
ase (
), similarly, transposes a 16 4 matrix; and butter
y network
shue network for routing
ase (d) arises in
onne
tion with \fast Fourier transforms," see exer
ise 4.6.4{14.℄ bran
hing fun
tions
x 53. [M25 ℄ The permutations in exer
ise 52 are said to be \indu
ed by a permutation animating fun
tions
of index digits," be
ause we obtain j by permuting the binary digits of j . Suppose
j = (j(d 1) : : : j1 j0 )2 , where is a permutation of f0; 1; : : : ; d 1g. Prove that if
has t
y
les, the 2d bit permutation x 7! x
an be obtained with only d t swaps.
In parti
ular, show that this observation speeds up all four
ases of exer
ise 52.
54. [22 ℄ (R. W. Gosper, 1985.) If an m m bit matrix is stored in the rightmost
m2 bits of a register, show that it
an be transposed by doing (2k (m 1))swaps for
0 k < dlg me. Write out the method in detail when m = 7.
x 55. [26 ℄ Suppose an n n bit matrix is stored in the rightmost n2 bits of an n3 bit reg
ister. dProve that 18d +2 bitwise operations suÆ
e to multiply two su
h matri
es, when
n = 2 ; the matrix multipli
ation
an be either Boolean (like MOR) or mod 2 (like MXOR).
56. [24 ℄ Suggest a way to transpose a 7 9 bit matrix in a 64bit register.
57. [22 ℄ Prove that any permutation of 2 elements
an be realized with the network
d
P (2 ) of Fig. 12 by some setting in whi
h at most d=(2d 1) of the
rossbars are a
tive.
d
x 58. [M27 ℄ The rst d
olumns of
rossbar modules in the permutation network P (2d )
perform a 1swap, then a 2swap, : : : , and nally a 2d 1swap, when the wires of the
network are stret
hed into horizontal lines as shown here for d = 3. 0'
Let N = 2d . These N lines, together with the Nd=2
rossbars, 01 1'
form a so
alled \Omega router." The purpose of this exer
ise is 2 2'
3'
to study the set
of all permutations ' su
h that we
an obtain 34 4'
(0'; 1'; : : : ; (N 1)') as outputs on the right of an Omega router 56 5'
when the inputs at the left are (0; 1; : : : ; N 1). 7
6'
7'
a) Prove that j
j = 2Nd=2. (Thus lg j
j = Nd=2 21 lg N !.)
b) Prove that a permutation ' of f0; 1; : : : ; N 1g belongs to
if and only if
i mod 2k = j mod 2k and i' k = j' k implies i' = j' ()
for all 0 i; j < N and all 0 k d.
) Simplify
ondition () to the following, for all 0 i; j < N :
(i' j') < (i j ) implies i = j:
d) Let T be the set of all permutations of f0; 1; : : : ; N 1g su
h that (i j ) =
(i j ) for all i and j . (This is the set of bran
hing fun
tions
onsidered in exer
ise 14, modulo 2d ; dso it has 2N 1 members, 2N=2+d 1 of whi
h are the animating
fun
tions modulo 2 .) Prove that ' 2
if and only if ' 2
for all 2 T .
7.1.3 BITWISE TRICKS AND TECHNIQUES 57
e) Suppose ' and are permutations of
that operate on dierent elements; that permutation network
is, j' 6= j implies j = j , for 0 j < N . Prove that ' 2
. generating fun
tion
varian
e
59. [M30 ℄ Given 0 a < b < N = 2 , how many Omegaroutable permutations
d NPhard
operate only on the interval [a : : b℄? (Thus we want to
ount the number of ' 2
su
h represent an arbitrary permutation
zipper fun
tion
that j' 6= j implies a j b. Exer
ise 58(a) is the spe
ial
ase a = 0, b = N 1.) polynomial
polynomial remainder mod 2
60. [HM28 ℄ Given a random permutation of f0; 1; : : : ; 2n 1g, let pnk be the proba trinomial
bility that there are 2k ways to set the
rossbars in the rst and last
olumns of the squaring a polynomial
permutation network P (2n) when realizing this permutation. In other words, pnk is the perfe
t shuing
MMIX
probability that the asso
iated graph has k
y
les (see (75)). What is the generating Steele
fun
tion Pk0 pnk zk ? What are the mean and varian
e of 2k ?
ompression
unpa
king
61. [46 ℄ Is it NPhard to de
ide whether a given permutation is realizable with at un
ompressing
least one mask j = 0, using the re
ursive method of Fig. 12 as implemented in (71)? sheepandgoats
x 62. [22 ℄ Let N = 2d . We
an obviously represent a permutation of f0; 1; : : : ; N 1g
by storing a table of N numbers, d bits ea
h. With this representation we have instant
a
ess to y = x, given x; but it takes
(N ) steps to nd x = y when y is given.
Show that, with the same amount of memory, we
an represent an arbitrary
permutation in su
h a way that x and y are both
omputable in O(d) steps.
63. [16 ℄ For what integers w , x, y , and z does the zipper fun
tion satisfy (i) x z y =
y z x? (ii) (x z y) z = (x dz=2e) z (y bz=2
)? (iii) (w z x)&(y z z ) = (w & y) z (x & z )?
0
64. [22 ℄ Find a \simple" expression for the zipperofsums (x + x ) z (y + y ), as a
0
0
fun
tion of z = x z y and z = x z y .0 0
65. [M16 ℄ The binary polynomial u(x) = u0 + u1 x + + un 1 x
n 1 (mod 2)
an be
represented by the integer u = (un 1 : : : u1 u0 )2 . If u(x) and v(x)
orrespond to integers
u and v in this way, what polynomial
orresponds to u z v ?
x 66. [M26 ℄ Suppose the polynomial u(x) has been represented as an nbit integer u as
in exer
ise 65, and let v = u (u Æ) (u 2Æ) (u 3Æ) for some integer Æ.
a) What's a simple way to des
ribe the polynomial v(x)?
b) Suppose n is large, and the bits of u have been pa
ked into 64bit words. How
would you
ompute v when Æ = 1, using bitwise operations in 64bit registers?
) Consider the same question as (b), but when Æ = 64.
d) Consider the same question as (b), but when Æ = 3.
e) Consider the same question as (b), but when Æ = 67.
67. [M31 ℄ If u(x) is a polynomial of degree < n, represented as in exer
ise 65, dis
uss
the
omputation of v(x) = u(x)2 mod (xn + xm + 1), when 0 < m < n and both m
and n are odd. Hint: This problem has an interesting
onne
tion with perfe
t shuing.
68. [20 ℄ What three MMIX instru
tions implement the Æ shift operation, (79)?
69. [25 ℄ Prove that method (80) always extra
ts the proper bits when the masks k
have been set up properly: We never
lobber any of the
ru
ial bits yj .
x 70. [31 ℄ (Guy L. Steele Jr., 1994.) What's a good way to
ompute the masks 0 , 1 ,
: : : , d 1 that are needed in the general
ompression pro
edure (80), given 6= 0?
71. [17 ℄ Explain how to reverse the pro
edure of (80), going from the
ompa
t value
y = (yr 1 : : : y1 y0 )2 to a number z = (z63 : : : z1 z0 )2 that has zji = yi for 0 i < r.
2d 1
72. [10 ℄ Simplify the expression (xzy ) 0 , when x; y < 2 . (See Eqs. (76) and (81).)
73. [22 ℄ Prove that d sheepandgoats steps will implement any 2 bit permutation.
d
58 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
74. [22 ℄ Given
ounts (
0 ;
1 ; : : : ;
2d 1 ) for the Chung{Wong pro
edure, explain why Chung
an
P 0
appropriate
P 0
y
li
1shift
an always produ
e new
ounts (
00 ;
01 ; : : : ;
02d 1 ) for whi
h Wong
repli
ates
2l =
2l+1 , thus allowing the re
ursion to pro
eed. mapping modules
x 75. [32 ℄ The method of Chung and Wong repli
ates bit l of a register exa
tly
l
rossbar module
mapping network
times, but it produ
es results in s
rambled order. For example, the
ase (
0 ; : : : ;
7 ) = sorting network
(1; 2; 0; 2; 0; 2; 0; 1) illustrated in the text produ
es (x7 x0 x1 x5 x5 x3 x1 x3 )2 . In some distribution network, see mapping network
appli
ations this
an be a disadvantage; we might prefer to have the bits retain their permutation network
Floyd
original order, namely (x7 x5 x5 x3 x3 x1 x1 x0 )2 ind that example. Pratt
Prove that the permutation network P (2 ) of Fig. 12
an be modied to a
hieve sorting network
disjointness
this goal, given any sequen
e of
ounts (
0 ;
1 ; : : : ;
2d 1 ), if we repla
e the d 2d 1 represent sets
rossbar modules in the righthand half by general 2 2 mapping modules. (A
rossbar Qui
k
module with inputs (a; b) produ
es either (a; b) or (b; a) as output; a mapping module maximal proper subsets
s
attered dieren
e
an also produ
e (a; a) or (b; b).) s
attered a
umulator
s
attered shifting
76. [47 ℄ A mapping network is analogous to a sorting network or a permutation stret
hed
network, but it uses 2 2 mapping modules instead of
omparators or
rossbars, and it segmented broad
asting, see stret
hing
is supposed to be able to output all nn possible mappings of its n inputs. Exer
ise 75,
in
onjun
tion with Fig. 12, shows that a mapping network for n = 2d exists with only
4d 2 levels of delay, and with n=2 modules on ea
h level; furthermore, this
onstru
tion
needs general 2 2 mapping modules (instead of simple
rossbars) in only d of those
levels.
To within O(n), what is the smallest number G(n) of modules that are suÆ
ient
to implement a general nelement mapping network?
77. [26 ℄ (R. W. Floyd and V. R. Pratt.) Design an algorithm that tests whether
or not a given standard nnetwork is a sorting network, as dened in the exer
ises
of Se
tion 5.3.4. When the given network has r
omparator modules, your algorithm
should use O(r) bitwise operations on words of length 2n .
78. [M27 ℄ (Testing disjointness.) Suppose the binary numbers x1 , x2 , : : : , xm ea
h
represent sets in a universe of n k elements, so that ea
h xj is less than 2n k . J. H.
Qui
k (a student) de
ided to test whether the sets are disjoint by testing the
ondition
x1 j x2 j j xm = (x1 + x2 + + xm ) mod 2n :
Prove or disprove: Qui
k's test is valid if and only if k lg(m 1).
x 79. [20 ℄ If x 6= 0 and x , what is an easy way to determine the largest integer
x0 < x su
h that x0 ? (Thus (x0)0 = (x0 ) 0 = x , in
onne
tion with (84).)
80. [20 ℄ Suggest a fast way to nd all maximal proper subsets of a set. More pre
isely,
given with = m , we want to nd all x su
h that x = m 1.
81. [21 ℄ Find a formula for \s
attered dieren
e," to go with the \s
attered sum" (86).
82. [21 ℄ Is it easy to shift a s
attered a
umulator to the left by 1, for example to
hange (y2x4 x3 y1 x2 y0 x1 x0 )2 to (y1 x4 x3 y0 x2 0 x1x0 )2 ?
x 83. [28 ℄ Continuing exer
ise 82, nd a way to shift a s
attered 2d bit a
umulator to
the right by 1, given z and , in O(d) steps.
84. [25 ℄ Given nbit numbers z = (zn 1 : : : z1 z0 )2 and = (n 1 : : : 1 0 )2 , explain
how to
al
ulate the \stret
hed" quantities z ) = (z(n 1)) : : : z1) z0) )2 and
z + = (z(n 1)+ : : : z1+ z 0+ )2 , where
j ) = maxfk j k j and k = 1g; j + = minfk j k j and k = 1g;
7.1.3 BITWISE TRICKS AND TECHNIQUES 59
we let zj) = 0 if k = 0 for 0 k j , and zj+ = 0 if k = 0 for n > k j . For To
her
example, if n = 11 and = (01101110010)2, then z ) = (z9 z9 z8 z6 z6 z5 z4 z1 z1 z1 0)2 allo
ate
storage allo
ation
and z + = (0z9 z8 z8 z6 z5 z4 z4 z4 z1 z1 )2 . interleaving the bits
85. [22 ℄ (K. D. To
her, 1954.) Imagine that you have a vintage 1950s
omputer
page fault
ASCII
ode
with a drum memory for storing data, and that you need to do some
omputations lower
ase letters
with a 32 32 32 array a[i; j; k℄, whose subs
ripts are 5bit integers in the range upper
ase
0 i; j; k < 32. Unfortunately your ma
hine has only a very small highspeed memory: multibyte subtra
tion
pa
ked data+
You
an a
ess only 128
onse
utive elements of the array in fast memory at any0 time. division, 2bit
Sin
e your appli
ation
0 0
usually moves
0
from a[i; j; k℄ to a neighboring position a[i ; j 0 ; k0 ℄, averaging
rounding to the nearest odd
where ji i j + jj j j + jk k j = 1, you have de
ided to allo
ate the array so that, if unbiased rounding
i = (i4 i3 i2 i1 i0 )2 , j = (j4 j3 j2 j1 j0 )2 , and k = (k4 k3 k2 k1 k0 )2 , the array entry a[i; j; k℄ is Alpha
hannels
stored in drum lo
ation (k4 j4 i4k3 j3 i3 k2 j2 i2 k1 j1 i1 k0 j0 i0 )2 . By interleaving the bits in subtra
tion
distin
t
this way, a small
hange to i, j , or k will
ause only a small
hange in the address.
ags
Dis
uss implementation of this addressing fun
tion: (a) How does it
hange when Lamport
i, j , or k
hanges by 1? (b) How would you handle a random a
ess to a[i; j; k℄, given
i, j , and k? (
) How would you dete
t a \page fault" (namely, the
ondition that a
new segment of 128 elements must be swapped into fast memory from the drum)?
86. [M25 ℄ An array of 2 2 2 elements is to be allo
ated by putting a[i; j; k ℄
p q r
into a lo
ation whose bits are the p + q + r bits of (i; j; k), permuted in some fashion.
Furthermore, this array is to be stored in an external memory using pages of size 2s .
(Exer
ise 85
onsiders the
ase p = q = r = 5 and s = 7.) What allo
ation strategy
of 0this0 kind minimizes the number of times that a[i; j; k℄ is on a dierent page from
a[i ; j ; k0 ℄, summed over all i, j , k, i0 , j 0 , and k0 su
h that ji i0 j + jj j 0 j + jk k0 j = 1?
x 87. [20 ℄ Suppose ea
h byte of a 64bit word x
ontains an ASCII
ode that represents
either a letter, a digit, or a spa
e. What three bitwise operations will
onvert all the
lower
ase letters to upper
ase?
88. [20 ℄ Given x = (x7 : : : x0 )256 and y = (y7 : : : y0 )256 ,
ompute z = (z7 : : : z0 )256 ,
where zj = (xj yj ) mod 256 for 0 j < 8. (See the addition operation in (87).)
89. [23 ℄ Given x = (x31 : : : x1 x0 )4 and y = (y31 : : : y1 y0 )4 ,
ompute z = (z31 : : : z1 z0 )4 ,
where zj = bxj =yj
for 0 j < 32, assuming that no yj is zero.
90. [20 ℄ The bytewise averaging rule (88) always rounds downward when xj + yj is
odd. Make it less biased by rounding to the nearest odd integer in su
h
ases.
x 91. [26 ℄ (Alpha
hannels.) Re
ipe (88) is a good way to
ompute bytewise averages,
but appli
ations to
omputer graphi
s often require a more general blending of 8bit
values. Given three o
tabytes x = (x7 : : : x0 )256 , y = (y7 : : : y0)256 , = (a7 : : : a0 )256 ,
show that bitwise operations allow us to
ompute z = (z7 : : : z0 )256 , where ea
h byte zj
is a good approximation to ((255 aj )xj + aj yj )=255, without doing any multipli
ation.
Implement your method with MMIX instru
tions.
x 92. [21 ℄ What happens if the se
ond line of (88) is
hanged to `z (x j y) z'?
93. [18 ℄ What basi
formula for subtra
tion is analogous to formula (89) for addition?
94. [21 ℄ Let x = (x7 : : : x1 x0 )256 and t = (t7 : : : t1 t0 )256 in (90). Can tj be nonzero
when xj is nonzero? Can tj be zero when xj is zero?
95. [22 ℄ What's a bitwise way to tell if all bytes of x = (x7 : : : x1 x0 )256 are distin
t?
96. [21 ℄ Explain (93), and nd a similar formula that sets test
ags tj 128[xj yj ℄.
97. [23 ℄ Leslie Lamport's paper in 1975 presented the following \problem taken from
an a
tual
ompiler optimization algorithm": Given o
tabytes x = (x7 : : : x0 )256 and y =
60 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
(y7 : : : y0 )256,
ompute t = (t7 : : : t0 )256 and z = (z7 : : : z0 )256 so that tj 6= 0 if and only multibyte max and min
if xj 6= 0, xj 6= '*', and xj 6= yj ; and zj = (xj = 0? yj : (xj 6= '*' ^ xj 6= yj ? '*': xj )).
omparison of bytes
bytes, testing relative order of
98. [20 ℄ Given x = (x7 : : : x0 )256 and y = (y7 : : : y0 )256 ,
ompute z = (z7 : : : z0 )256
ags
and w = (w7 : : : w0 )256 , where zj = max(xj ; yj ) and wj = min(xj ; yj ) for 0 j < 8. binary
oded de
imal
radix
onversion
x 99. [28 ℄ Find hexade
imal
onstants a, b,
, d, e su
h that the six bitwise operations time
mixedradix representation
polynomials modulo 5
y x a; t ((((y & b) +
) j y) d) & e unary
max
will
ompute the
ags t = (f7 : : : f1f0 )256 7 from any bytes x = (x7 : : : x1 x0 )256, where min
date
f0 = [ x0 = '!' ℄; f1 = [ x1 6= '*' ℄; f2 = [ x2 < 'A' ℄; f3 = [ x3 > 'z' ℄; f4 = [ x4 'a' ℄; range
he
king
sort
f5 = [ x5 2 f'0'; '1'; : : : ; '9'g ℄; f6 = [ x6 168℄; f7 = [ x7 2 f'<'; '='; '>'; '?'g ℄: Fredman
Willard
100. [25 ℄ Suppose x = (x15 : : : x1 x0 )16 and y = (y15 : : : y1 y0 )16 are binary
oded de

2 x +
binary log+
imal numbers, where 0 xj ; yj < 10 for ea
h j . Explain how to
ompute their sum x+
u = (u15 : : : u1 u0 )16 and dieren
e v = (v15 : : : v1 v0 )16 , where 0 uj ; vj < 10 and extra
t the most signi
ant bit
hyper
oor
(u15 : : : u1 u0 )10 = ((x15 : : : x1 x0 )10 + (y15 : : : y1y0 )10 ) mod 1016 ; MMIX
broadword
(v15 : : : v1 v0)10 = ((x15 : : : x1 x0 )10 (y15 : : : y1y0 )10 ) mod 1016 ; ruler fun
tion
x 120. [M25 ℄ There are 2n2mn fun
tions that take nbit numbers (x1 ; : : : ; xm ) into an
nbit number f (x1 ; : : : ; xm ). How many of them
an be implemented with addition,
subtra
tion, multipli
ation, and nonshift bitwise Boolean operations (modulo 2n )?
x 121. [M25 ℄ By exer
ise 3.1{6, a fun
tion from [0 : : 2n ) into itself is eventually periodi
.
62 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
a) Prove that if f is any nbit broadword fun
tion that
an be implemented without period length
shift instru
tions, the lengths of its periods are always powers of 2. multiplying
broadword
hain
b) However, for every p between 1 and n, there's an nbit broadword
hain of length 3 extra
ting the most signi
ant bit
that has a period of length p. basi
RAM
sideways addition
122. [M22 ℄ Complete the proof of Lemma B. symmetri
fun
tion
parity fun
tion
123. [M23 ℄ Let aq be the
onstant 1 + 2 + 2 + + 2
q 2q (q 1)q
= (2q2 1)=(2q 1).
ir
ular list
Using (104), showq2that there are innitely many q su
h that the operation of multiplying graph algorithms+
liques+
by aq , modulo 2 , requires
(log q) steps in any nbit broadword
hain with n q2 . graph representation
0
124. [M38 ℄ Complete the proof of Theorem R by dening an nbit broadword
hain
set representation
maximal
(x0 ; x1 ; : : : ; xf ) and sets (U0 ; U1 ; : : : ; Uf ) su
h that, for 0 t f , all inputs x 2 Ut lead Moody
to an essentially similar state Q(x; t), in the following sense: (i) The
urrent instru
tion Hollis
in Q(x; t) does not depend on x. (ii) If register rj has a known value in Q(x; t), it holds adja
en
y matrix
xj 0 for some denite index j 0 t. (iii) If memory lo
ation M [z ℄ has been
hanged, it
holds xz00 for some denite index z00 t. (The t
values of j 0 and z00 depend on j , z,
and t, but not on x.) Furthermore jUtj n=2 , and the program
annot guarantee
2 1
that r1 = x when t < f . Hint: Lemma B implies that a limited number of shift
amounts and memory addresses need to be
onsidered when t is small.
0
125. [M33 ℄ Prove Theorem P . Hint: Lemma B remains true if we repla
e `= 0' by
`= s ' in (103), for any values s .
126. [M46 ℄ Does the operation of extra
ting the most signi
ant bit, 2 , require
x
(log log n) steps in an nbit basi
RAM? (See exer
ise 110.)
127. [20 ℄ Prove that if there's a way to
arry out sideways addition of nbit numbers
in O(log log n) broadword steps, then every symmetri
fun
tion of a number's n bits
an also be done in O(log log n) broadword steps.
128. [M46 ℄ Does sideways addition require
(log n) broadword steps?
129. [M46 ℄ Can the parity fun
tion (x) mod 2 be
omputed in O (1) broadword
steps?
130. [M46 ℄ Is there an nbit
onstant a su
h that the fun
tion (a x) mod 2 requires
n
(log n) nbit broadword steps?
x 131. [23 ℄ Write an MMIX program for Algorithm R when the graph is represented by
ar
lists. Vertex nodes have at least two elds,
alled LINK and ARCS, and ar
nodes have
TIP and NEXT elds, as explained in Se
tion 7. Initially all LINK elds are zero, ex
ept
in the given set of verti
es Q, whi
h is represented as a
ir
ular list. Your program
should
hange that
ir
ular list so that it represents the set R of all rea
hable verti
es.
x 132. [M27 ℄ A
lique in a graph is a set of mutually adja
ent verti
es; a
lique is
maximal if it's not
ontained in any other. The purpose of this exer
ise is to dis
uss
an algorithm due to J. K. M. Moody and J. Hollis, whi
h provides a
onvenient way
to nd every maximal
lique of a nottoolarge graph, using bitwise operations.P u
Suppose G is a graph with n verti
es V = f0; 1; : : : ; n 1g. Let v = Pf2u j
u v or u = vg be row v of G's re
exive adja
en
y matrix, and let Æv = f2 j
u 6= vg P = 2n u 1 2v . Every subset U V is representable as an nbit integer
(U ) = u2U 2 ; for example, Æv = (V n v). We also dene the bitwise interse
tion
(U ) = 0& u<n
(u 2 U ? u : Æu ):
For example, if n = 5 we have (f0; 2g) = 0 & Æ1 & 2 & Æ3 & Æ4 .
7.1.3 BITWISE TRICKS AND TECHNIQUES 63
a) Prove that U is a
lique if and only if (U ) = (U ). independent sets
b) Show that if (U ) = (T ) thenk T is a
lique. vertex
overs
mappings for ternary values
) For 1 k n,
onsider the 2 bitwise interse
tions Lukasiewi
z
n o threevalued logi
Ck = 0& u<k
( u 2 U ? u : Æ u ) U f0; 1; : : : ; k 1g ;
negation
possibility
ne
essity
and let Ck+ be the maximal elements of Ck . Prove that U is a maximal
lique if equivalen
e
multipli
ation tables for groupoids
and only if (U ) 2 Cn+ . + groupoids, mult tables for
d) Explain how to
ompute Ck from Ck+ 1 , starting with C0+ = 2n 1. pa
ked
2bit en
oding
x 133. [20 ℄ Given a graph G, how
an the algorithm of exer
ise 132 be used to nd half adder
(a) all maximal independent sets of verti
es? (b) all minimal vertex
overs (sets that balan
ed ternary numbers
hit every edge)? full adder
Ulam numbers
134. [15 ℄ Nine
lasses of mappings for ternary values appear in (119), (123), and (124).
sub
ube
asterisk
odes
To whi
h
lass does the representation (128) belong, if a = 0, b = ,
= 1? bit
odes
135. [22 ℄ Lukasiewi
z in
luded a few operations besides (127) in his threevalued logi
:
prime impli
ants
onsensus
:x (negation) inter
hanges 0 with 1 but leaves un
hanged; x (possibility) is dened
hessboard
as :x ) x ; x (ne
essity) is dened as ::x ; and x , y (equivalen
e) is dened as knight
bit board
(x ) y) ^ (y ) x). Explain how to perform these operations using representation (128). sibling
136. [29 ℄ Suggest twobit en
odings for binary operations on the set fa; b;
g that are
sideways heap+
dened by the following \multipli
ation tables":
a b
a
b
a b a
(a) b
; (b)
b a ; (
) a a
:
b a
a b
137. [21 ℄ Show that the operation in exer
ise 136(
) is simpler with pa
ked ve
tors
like (131) than with the unpa
ked form (130).
138. [24 ℄ Find an example of threestatetotwobit en
oding where
lass Va is best.
139. [25 ℄ If x and y are signed bits 0, +1, or 1, what 2bit en
oding is good for
al
ulating their sum (z1 z2 )3 = x + y, where z1 and z2 are also required to be signed
bits? (This is a \half adder" for balan
ed ternary numbers.)
140. [27 ℄ Design an e
onomi
al full adder for balan
ed ternary numbers: Show how
to
ompute signed bits u and v su
h that 3u + v = x + y + z when x; y; z 2 f0; +1; 1g.
x 141. [30 ℄ The Ulam numbers hU1 ; U2 ; : : : i = h1; 2; 3; 4; 6; 8; 11; 13; 16; 18; 26; : : : i are
dened for n 3 by letting Un be the smallest integer > Un 1 that has a unique
representation Un = Uj + Uk for 0 < j < k < n. Show that a million Ulam numbers
an be
omputed rapidly with the help of bitwise te
hniques.
x 142. [33 ℄ A sub
ube su
h as 10101
an be represented by asterisk
odes 10010100
and bit
odes 01001001, as in (85); but many other en
odings are also possible. What
representation s
heme for sub
ubes works best, for nding prime impli
ants by the
onsensusbased algorithm of exer
ise 7.1.1{31?
143. [20 ℄ Let x be a 64bit number that represents an 8 8
hessboard, with a 1 bit
in every position where a knight is present. Find a formula for the 64bit number f (x)
that has a 1 in every position rea
hable in one move by a knight of x. For example,
the white knights at the start of a game
orrespond to x = # 42 ; then f (x) = # a51800 .
144. [16 ℄ What node is the sibling of node j in a sideways heap? (See (134).)
uster0 (X ) = X & (XNW & XN & XNE & XW & XE & XSW & XS & XSE ):
Why is (157) preferable?
7.1.3 BITWISE TRICKS AND TECHNIQUES 65
165. [21 ℄ (R. A. Kirs
h.) Dis
uss the
omputation of the 33
ellular automaton with Kirs
h
ellular automaton
X (t+1) =
uster(X (t) ) = X (t) & (XN(t) j XW (t)
j XE(t) j XS(t)): Life
broadword
166. [M23 ℄ Let f (M; N ) be the maximum number of bla
k pixels in an M N torus
bitmap X for whi
h X =
uster(X ). Prove that f (M; N ) = 54 MN + O(M + N ). Cheshire
at
Life
167. [24 ℄ (Life.) If the bitmap X represents an array of
ells that are either dead (0) Guo
Hall
or alive (1), the Boolean fun
tion thinning
noisy data
f (xNW ; : : : ; x; : : : ; xSE ) = [2 < xNW + xN + xNE + xW + 12 x + xE + xSW + xS + xSE < 4℄ thinning
opti
al
hara
ter re
ognition
an lead to astonishing life histories when it governs a
ellular automaton as in (158).
losed
a) Find a way to evaluate f with a Boolean
hain of 26 steps or less. open
lean
b) Let Xj(t) denote row j of X at time t. Show that Xj(t+1)
an be(t)evaluated in
at(tmost 23 broadword steps, as a fun
tion of the three rows Xj 1 , Xj(t) , and
)
Xj +1 .
x 168. [23 ℄ To keep an image nite, we might insist that a 3 3
ellular automaton
treats a M N bitmap as a torus, wrapping around seamlessly between top and bottom
and between left and right. The task of simulating its a
tions eÆ
iently with bitwise
operations is somewhat tri
ky: We want to minimize referen
es to memory, yet ea
h
new pixel value depends on old values that lie on all sides. Furthermore the shifting of
bits between neighboring words tends to be awkward, taxing the
apa
ity of a register.
Show that su
h diÆ
ulties
an be surmounted by maintaining an array of nbit
words Ajk for 0 j M and 0 k N 0 = dN=(n 2)e. If j 6= M and k 6= 0, word Ajk
should
ontain the pixels of row j and
olumns (k 1)(n 2) through k(n 2) + 1,
in
lusive; the other words AMk and Aj0 provide auxiliary buer spa
e. (Noti
e that
some bits of the raster appear twi
e.)
169. [22 ℄ Continuing the previous two exer
ises, what happens to the Cheshire
at of
Fig. 17(a) when it is subje
ted to the vi
issitudes of Life, in a 26 31 torus?
x 170. [21 ℄ What result does the Guo{Hall thinning automaton produ
e when given a
solid bla
k re
tangle of M rows and N
olumns? How long does it take?
171. [24 ℄ Find a Boolean
hain of length 25 to evaluate the lo
al thinning fun
tion
g(xNW ; xN ; xNE ; xW ; xE ; xSW ; xS ; xSE ) of (159), with or without the extra
ases in (160).
172. [M29 ℄ Prove or disprove: If a pattern
ontains three bla
k pixels that are king
neighbors of ea
h other, the Guo{Hall pro
edure extended by (160) will redu
e it,
unless none of those pixels
an be removed without destroying the
onne
tivity.
x 173. [M30 ℄ Raster images often need to be
leaned up if they
ontain noisy data. For
example, a
idental spe
ks of bla
k or white may well spoil the results when a thinning
algorithm is used for opti
al
hara
ter re
ognition.
Say that a bitmap X is
losed if every white pixel is part of a 2 2 square of
white pixels, and open if every bla
k pixel is part of a 2 2 square of bla
k pixels. Let
X D = & f Y j Y X and Y is
losedg; X L = f Y j Y X and Y is openg:
A bitmap is
alled
lean if it equals X DL for some X . We might, for example, have
X = ; XD = ; X DL = :
In general X D is \darker" than X , while X L is \lighter": X D X X L .
a) Prove that (X DL )DL = X DL . Hint: X Y implies X D Y D and X L Y L .
b) Show that X D
an be
omputed with one step of a 3 3
ellular automaton.
66 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
174. [M46 ℄ (M. Minsky and S. Papert.) Is there a threedimensional shrinking algo Minsky
rithm that preserves
onne
tivity, analogous to (161)? Papert
shrinking
175. [15 ℄ How many rookwise
onne
ted bla
k
omponents does the Cheshire
at have? Cheshire
at
zipper fun
tion
176. [M24 ℄ Let G be the graph whose verti
es are the bla
k pixels of a given bitmap X , kingwise
onne
ted
with u v when u and v are a king move apart. Let G0 be the
orresponding graph rookwise
onne
ted
after the shrinking transformation (161) has been applied. The purpose of this exer
ise surroundedness tree
hyperbola
is to show that the number of
onne
ted
omponents of G0 is the number of
omponents
oni
of G minus the number of isolated verti
es of G. monotoni
parts
threeregister method
Let N(i;j) = f(i; j ); (i 1; j ); (i 1; j +1); (i; j +1)g be pixel (i; j ) together with its Rote
north and/or east neighbors. For ea
h v 2 G let S (v) = fv0 2 G0 j v0 2 Nv g. straight line
a) Prove that S (v) is empty if and only if v is isolated in G. Bezier splines+
squines
b) If u v in0 G, u0 0 2 S (0u),0 and v0 2 S (v),0 prove that u0 0 0 v0 in G0 . parabola
) For ea
h v 2 G let S (v ) = fv 2 G j v 2 Nv g. Is S (v ) always nonempty?
d) If u0 v0 in G0 , u 2 S 0 (u0 ), and v 2 S 0 (v0 ), prove that u v in G.
e) Hen
e there's a onetoone
orresponden
e between the nontrivial
omponents
of G and the
omponents of G0 .
177. [M22 ℄ Continuing exer
ise 176, prove an analogous result for the white pixels.
178. [20 ℄ If X is an M N bitmap,
let X be the M (2N + 1) bitmap
X z (X j (X 1)). Show that the
kingwise
onne
ted
omponents of
X are also rookwise
onne
ted, and
that bitmap X has the same \sur
roundedness tree" (162) as X .
x 179. [34 ℄ Design an algorithm that
onstru
ts the surroundedness tree of a given
M N bitmap, s
anning the image one row at a time as dis
ussed in the text. (See
(162) and (163).)
x 180. [M24 ℄ Digitize the hyperbola y2 = x2 + 13 by hand, for 0 < y 7.
181. [HM20 ℄ Explain how to subdivide a general
oni
(168) with rational
oeÆ
ients
into monotoni
parts so that Algorithm T applies.
182. [M31 ℄ Why does the threeregister method (Algorithm T) digitize
orre
tly?
x 183. [M29 ℄ (G. Rote.) Explain why Algorithm T might fail if
ondition (v) is false.
x 184. [M22 ℄ Find a quadrati
form Q0 (x; y) so that, when Algorithm T is applied to
(x0 ; y0 ), (x; y), and Q0 , it produ
es exa
tly the same edges as it does from (x; y), (x0 ; y0 ),
and Q, but in the reverse order.
x 185. [22 ℄ Design an algorithm that properly digitizes a straight line from (; ) to
(0 ; 0 ), when , , 0 , and 0 are rational numbers, by simplifying Algorithm T.
186. [HM22 ℄ Given three
omplex numbers (z0 ;z1 ;z2 ),
onsider the
urve tra
ed out by
a string of up to six bytes (x) = 1 : : : l in the following way: If x <8 278, setl l 1 and
1 x. Otherwise let x = (x5 : : : x1 x0 )64 ; set l d(x)=5e, 1 2 2 + xl 1 , and
j = 27 + xl j for 2 j l. Noti
e that (x)
ontains a zero byte if and only if x = 0.
a) What are the en
odings of # a , # 3a3 , # 7b97 , and # 1D141 ?
b) If x x0 , prove that (x) (x(1)0 ) in(2)lexi
ographi
order.
) Suppose a sequen
e of values x x : : : x(n) has been en
oded as a byte string
(x(1) ) (x(2) ) : : : (x(n) ), and let k be the kth byte in that string. Show that
it's easy to determine the value x(i) from whi
h k
ame, by looking at a few of
the neighboring bytes if ne
essary.
7.1.3 BITWISE TRICKS AND TECHNIQUES 69
197. [22 ℄ The Universal Chara
ter Set (UCS), also known as Uni
ode, is a standard Universal Chara
ter Set
mapping of
hara
ters to integer
odepoints x in the range 0 x < 220 + 216 . An UCS
Uni
ode
en
oding
alled UTF16 represents su
h integers as one or two wydes (x) = 1 or UTF16: 16bit UCS Transformation Forma
(x) = 1 2 , in the following way: If x < 216 then (x) = x; otherwise UTF8
ASCII
1 = # d800 + by=210
and 2 = # d
00 + (y mod 210 ), where y = x 216 . pa
king
fra
tional pre
ision
Answer questions (a), (b), and (
) of exer
ise 196 for this en
oding. table lookup by shifting
wyde: a 16bit quantity
x 198. [21 ℄ Uni
ode
hara
ters are often represented as strings of bytes using a s
heme byte: an 8bit quantity
alled UTF8, whi
h is the en
oding of exer
ise 196 restri
ted to integers in the range nybble: a 4bit quantity
nyp: a 2bit quantity
0 x < 220 +216 . Noti
e that UTF8 eÆ
iently preserves the standard ASCII
hara
ter tetrabyte or tetra: a 32bit quantity
set (the
odepoints with x < 27 ), and that it is quite dierent from UTF16. o
tabyte or o
ta: a 64bit quantity
Let 1 be the rst byte of a UTF8 string (x). Show that there are reasonably se
urity
bran
hless
small integer
onstants a, b, and
su
h that only four bitwise operations MOR+
MXOR+
(a ((1 b) &
)) & 3 hexade
imal digits
masks
suÆ
e to determine the number l 1 of bytes between 1 and the end of (x). ASCII
hexade
imal digit
x 199. [23 ℄ A person might try to en
ode # a as #
08a or # e0808a or # f080808a in perfe
t shue
UTF8, be
ause the obvious de
oding algorithm produ
es the same result in ea
h
ase. outshue
But su
h unne
essarily long forms are illegal, be
ause#they
ould lead to se
urity holes. inshue
3way perfe
t shue
Suppose 1 and 2 are bytes su
h that 1 80 and # 80 2 < #
0 . Find triple zip
a bran
hless way to de
ide whether 1 and 2 are the rst two bytes of at least one transpose
Boolean matrix
legitimate UTF8 string (x). suÆx parity
200. [20 ℄ Interpret the
ontents of register $3 after the following three MMIX instru

tions have been exe
uted: MOR $1,$0,#94; MXOR $2,$0,#94; SUBU $3,$2,$1.
201. [20 ℄ Suppose x = (x15 : : : x1 x0 )16 has sixteen hexade
imal digits. What one MMIX
instru
tion will
hange ea
h nonzero digit to f, while leaving zeros untou
hed?
202. [20 ℄ What two instru
tions will
hange an o
tabyte's nonzero wydes to ffff ?
#
203. [22 ℄ Suppose we want to
onvert a tetrabyte x = (x7 : : : x1 x0 )16 to the o
tabyte
y = (y7 : : : y1 y0 )256 , where yj is the ASCII
ode for the hexade
imal digit xj . For
example, if x = # 1234ab
d , y should represent the 8
hara
ter string "1234ab
d".
What
lever
hoi
es of ve
onstants a, b,
, d, and e will make the following MMIX
instru
tions do the job?
MOR t,x,a; SLU s,t,4; XOR t,s,t; AND t,t,b;
ADD t,t,
; MOR s,d,t; ADD t,t,e; ADD y,t,s .
x 204. [22 ℄ What are the amazing
onstants p, q, r, m that a
hieve a perfe
t shue
with just six MMIX
ommands? (See (175){(178).)
x 205. [22 ℄ How would you perfe
tly unshue on MMIX, going from w in (175) ba
k to z?
206. [20 ℄ The perfe
t shue (175) is sometimes
alled an \outshue," by
omparison
with the \inshue" that takes z 7! y z x = (y31x31 : : : y1 x1 y0 x0 )2 ; the outshue
preserves the leftmost and rightmost bits of z, but the inshue has no xed points.
Can an inshue be performed as eÆ
iently as an outshue?
207. [22 ℄ Use MOR to perform a 3way perfe
t shue or \triple zip," taking (x63 : : : x0 )2
to (x21 x42 x63 x20 : : : x2 x23 x44 x1 x22 x43 x0 )2, as well as the inverse of this shue.
x 208. [23 ℄ What's a fast way for MMIX to transpose an 8 8 Boolean matrix?
x 209. [21 ℄ Is the suÆx parity operation x of exer
ise 36 easy to
ompute with MXOR?
70 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
210. [22 ℄ A puzzle: Register x
ontains a number 8j + k, where 0 j; k < 8. Registers truth table
a and b
ontain arbitrary o
tabytes (a7 : : : a1 a0 )256 and (b7 : : : b1 b0 )256 . Find a sequen
e monotone Boolean fun
tion
polynomial multipli
ation
of four MMIX instru
tions that will put aj & bk into register x. MXOR
x 211. [M25 ℄ The truth table of a Boolean fun
tion f (x1 ; : : : ; x6 ) is essentially a 64bit CRC
Gosper
number bran
hless
f = (f (0; 0; 0; 0; 0; 0) : : : f (1; 1; 1; 1; 1; 0)f (1; 1; 1; 1; 1; 1))2 : inverse
0
matrix X of s and 1s
Show that two MOR instru
tions will
onvert f to the truth table of f^, the least monotone Divisibility by 3
Boolean fun
tion that is greater than or equal to f at ea
h point.
212. [M32 ℄ Suppose a = (a63 : : : a1 a0 )2 represents the polynomial
xP (
)P (d) = xP (
)bxP (
)de = xbx
ebxdS(
)e, be
ause bxye = bxT yT e for any
bran
hing fun
tion xT . Similarly xP (
)P (d)P (e) = xbx
ebxdS(
)ebxeS(d)S(
)e,
et
. After dis
arding equal terms we obtain the desired form. The resulting numbers
pj are unique be
ause they are the only values of x at whi
h the fun
tion
hanges sign.
0 0 0
0
(d) We0
have,0 for example,
0
x bx ae bx be bx
e = xP (a )P (b )P (
) where
0 0
a = a, b = b P ( a ), and
= bP ( a ) P ( b ) .
[The theory of animating fun
tions was developed by J. H. Conway in Chapter 13
of his book On Numbers and Games (1976), inspired by previous work of C. P. Welter
in Indagationes Math. 14 (1952), 304{314; 16 (1954), 194{200.℄
17. (Solution by M. Slanina.) Su
h equations are de
idable even if we also allow opera
tions su
h as x & y, x, x 1, x 1, 2x, and 2x , and even if we allow Boolean
ombina
tions of statements and quanti
ations over integer variables, by translating them into
formulas of se
ondorder monadi
logi
with one su
essor (S1S). Ea
h 2adi
variable
x = ( : : : x2 x1 x0 )2
orresponds to an S1S set variable X, where j 2 X means xj = 1:
z = x be
omes 8t(t 2 Z , t 2= X );
z = x & y be
omes 8t(t 2 Z , (t 2 X ^ t 2 Y ));
z = 2 x be
omes 8t(t 2 Z , (t 2 X ^ 8s(s < t ) s 2= X )));
z = x + y be
omes 9 C 8t(0 2= C ^ (t 2 Z , (t2X ) (t2Y ) (t2C ))
^ (t+1 2 C , h(t2X )(t2Y )(t2C )i)):
An identity su
h as x & ( x) = 2 is equivalent to the translation of
x
p IS $0 ;pp IS $1 ;m IS $2 ;mm IS $3 ;q IS $4 ;s IS $5
LDOU q,qbase,0; LDA pp,qbase,8
SET p,13; NEG m,13*13,n; SRU q,q,6 Begin with p = 13.
1H SR m,m,1 m b p2 ( N )=2
.
2H SR mm,m,3; LDOU s,qtop,mm
AND t,m,#3f; SLU t,one,t
ANDN s,s,t; STOU s,qtop,mm Zero out a bit.
ADD m,m,p; PBN m,2B Advan
e by p bits.
SRU q,q,1; PBNZ q,3F Move to next potential prime.
2H LDOU q,pp,0; INCL pp,8 Read in another bat
h
OR p,p,#7f; PBNZ q,3F of potential primes.
ADD p,p,2; JMP 2B Skip past 128 nonprimes.
7.1.3 ANSWERS TO EXERCISES 77
2H SRU q,q,1 Prit
hard
3H ADD p,p,2; PBEV q,2B Set p p + 2 2until p is prime.
a
he
segmented sieve
MUL m,p,p; SUB m,m,n; PBN m,1B Repeat until p > N . Lander
The running time, 1172 +5166, is of
ourse mu
h less than the time needed for steps Parkin
Bays
P1{P8 of Program 1.3.2P, namely 10037 + 641543 (improved to 10096 + 215351 Hudson
in exer
ise 1.3.2{14). [See P. Prit
hard, S
ien
e of Computer Programming 9 (1987), Ni
ely
gap
17{35, for several instru
tive variations. In pra
ti
e, a program like this one tends Knuth
to slow down dramati
ally when the sieve is too big for the
omputer's
a
he. Better littleendian
results are obtained by working with a segmented sieve, whi
h
ontains bits for numbers Loyd
bigendian
between N0 + kÆ and N0 + (k + 1)Æ, as suggested by L. J. Lander and T. R. Parkin, littleendian
Math. Comp. 21 (1967), 483{488; C. Bays and R. H. Hudson, BIT 17 (1977), 121{127. transposed
Here N0
an be quite large, but Æ is limited by the
a
he size;
al
ulations are done large megabytes
Woodrum
separately for k = 0, 1, : : : . Segmented sieves have be
ome highly developed; see, for zip
example, T. R. Ni
ely, Math. Comp. 68 (1999), 1311{1315, and the referen
es
ited CSZ
there. The author used su
h a program in 2006 to dis
over an unusually large gap of
length 1370 between 418032645936712127 and the next larger prime.℄
25. (1 + 1 + 25 + 1 + 1 + 25 + 1 + 1 = 56)mm; the worm never sees pages 2{500 of
Volume 1 or 1{499 of Volume 4. (Unless the books have been pla
ed in littleendian
fashion on the bookshelf; then the answer would be 106mm.) This
lassi
brainteaser
an be found in Sam Loyd's Cy
lopedia (New York: 1914), pages 327 and 383.
26. We
ould multiply by aa:::ab instead of dividing by 12 (see exer
ise 1.3.1{17); but
#
multipli
ation is slow too. Instead we
an use a s
heme that is neither bigendian nor
littleendian but transposed : Put item k into o
tabyte 8(k mod 220 ), where it is shifted
left by 5bk=220
. Sin
e k < 12000000, the amount of shift is always less than 60.
The MMIX
ode to put item k into register $1 is AND $0,k,[#fffff℄; SLU $0,$0,3;
LDOU $1,base,$0; SRU $0,k,20; 4ADDU $0,$0,$0; SRU $1,$1,$0; AND $1,$1,#1f.
[This solution uses 8 large megabytes (223 bytes). Any
onvenient s
heme for
onverting item numbers to o
tabyte addresses and shift amounts will work, as long as
the same method is used
onsistently. Of
ourse, just `LDBU $1,base,k' would be faster.℄
27. (a) ((x 1) x) + x. [This exer
ise is based on an idea of Luther Woodrum, who
noti
ed that ((x 1) j x) + 1 = (x & x) + x.℄
(b) (y + x) j y, where y = (x 1) x.
(
,d,e) ((y x) + x) & y, ((y x) + x) y, and ((y x) + x) & y, where y = x 1.
(f) x (a); alternatively, y (y+1), where y = x j (x 1). [The number (01 01a 11b)2
looks simpler, but it apparently requires ve operations: ((y + 1) & y) 1.℄
28. A 1 bit indi
ates x's rightmost 0 (for example, (101011)2 7! (000100)2 ); 1 7! 0.
29. k = k+1 (k+1 2 ) [see 6 (1974), 125℄. This relation holds also for
k STOC
the
onstants d;k of (48), when 0 k < d, if we start with d;d = 22d 1. (There is,
however, no easy way to go from k to k+1, unless we use the \zip" operation; see (77).)
30. Append `CSZ rho,x,64' to (50), thereby adding 1 to its exe
ution time. For (51),
we simply need to make sure that rhotab [0℄ = 8.
31. In the rst pla
e, his
ode loops forever when x = 0. But even after that bug is
pat
hed, his assumption that x is a random integer is highly questionable. In most
appli
ations when we want to
ompute x for a nonzero 64bit number x, a more
reasonable assumption would be that ea
h of the out
omes f0; 1; : : : ; 63g is equally
likely. The average and standard deviation then be
ome 31.5 and 18:5.
78 ANSWERS TO EXERCISES 7.1.3
32. `NEGU y,x; AND y,x,y; MULU y,debruijn,y; SRU y,y,58; LDB rho,de
ode,y ' has perfe
t hash fun
tion
estimated
ost + 14, although multipli
ation by a power of 2 might well be faster Prodinger
signed right shift
than a typi
al multipli
ation. Add 1 for the
orre
tion in answer 30. CSZ
33. In fa
t, an exhaustive
al
ulation shows that exa
tly 94727 suitable
onstants a
Warren
smearing
yield a \perfe
t hashj fun
tion" for this problem, 90970 of whi
h also identify the power Warren
oftwo
ases y = 2 ; 90918 of those also distinguish the
ase y = 0. The multiplier
# 208b2430
8
82129 is uniquely best, in the sense that it doesn't need to refer to table
Diri
hlet generating fun
tion
zeta fun
tion
magi
masks
entries above de
ode [32400℄ when y is known to be a valid input. Allou
he
Shallit
34. Identities (a) and (b) are obviously true, also when xy = 0. Proof of (
): If x = y
we have either x = y = 0 or x = 10k and y = 10k+2 k ; hen
e x y = ( )01k =
(x 1) (y 1). If x > y = k we have (x y) mod 2 6= ((x 1) (y 1)) mod 2k+2 . Knuth
fra
tal
35. Let f (x) = x 3x. Clearly f (2x) = 2f (x), and f (4x + 1) = 4f (x) + 2. We also
have f (4x 1) = 4f (x) + 2, by exer
ise 34(
). The hinted+ identity follows.
Given n, set u n 1, v u + n, t u v, n v & t, and n u & t.
Clearly u = bn=2
and v =+ b3n=2
, so n+ n = v u = n. And this is Reitwiesner's
representation, be
ause n j n has no
onse
utive 1s. [H. Prodinger, Integers 0 (2000),
paper a8, 14 pp. In
identally we also have f ( x) = f (x).℄
36. (i) The
ommands x x(x1), x x(x2), x x(x4), x x(x8),
x x (x 16), x x (x 32)
hange x to x . (ii) x& = ((x + 1) & x) 1.
(See exer
ises 66, 70, and 117 for appli
ations of x ; see also exer
ise 209.)
37. Insert `CSZ y,x,half ' after the FLOTU in (55), where half = 3fe0000000000000 ;
#
note that (55) says `SR' (not `SRU'). No
hange is needed to (56), if lamtab [0℄ = 1.
38. ` SRU t,x,1; OR y,x,t; SRU t,y,2; OR y,y,t; SRU t,y,4; OR y,y,t; ...;
SRU t,y,32; OR y,y,t; SRU y,y,1; CMPU t,x,0; ADDU y,y,t' takes 15 .
39. (Solution by H. S. Warren, Jr.) Let (x) denote the result of smearing x to the
right, as in the rst line of (57). Compute x & ((x 1) & x).
40. Suppose x = y = k . If x = y = 0, (58)
ertainly holds, regardless of how we
dene 0. Otherwise x = (1k)2 and y = (1 )2, for some binary strings and with
jj = j j = k; and x y < 2 x & y. On the other hand if x < y = k, we have
x y 2k > x & y. And H. S. Warren, Jr., notes that x < y if and only if x < y & x.
P1 P1 P1
41. (a) 2k 2k ) = z=(1 2k 2k
n=1 (n)z = k=1 z =(1 zP 1 z) k=0z z =(1 + z ). The
n
Diri
hletPgenerating fun
tion is simpler: n=1 (n)=n = (z )=(2
z 1).
(b) P n1=1 (n)zn = P P1
k=1 z 2k=(1 z ).
(
) n1=1 (nk )zn = k1=0k+1z2k=((1 z)(1 + z2k )) = Pk1=0 z2k k (z), where k (z) =
(1 + z + + z2 1)=(1 z2 ). (The \magi
masks" of (47)
orrespond to k (2).)
[SeeAutomati
Sequen
es by J.P. Allou
he and J. Shallit (2003), Chapter 3, for
further information about the fun
tions and , whi
h they denote by 2 and s2 .℄
42. e1 2 1
e 1 +(e +2)2e2 1 + +(e +2r 2)2er 1 , by indu
tion on r. [D. E. Knuth,
2 r
Pro
. IFIP Congress (1971), 1, 19{27. The fra
tal aspe
ts of this sum are illustrated
in Figs. 3.1 and 3.2 of the book by Allou
he and Shallit.℄
43. The straightforward implementation of (63), `SET nu,0; SET y,x; BZ y,Done;
1H ADD nu,nu,1; SUBU t,y,1; AND y,y,t; PBNZ y,1B'
osts (5 + 4x) ; it beats the
implementation of (62) when x < 4, ties when x = 4, and loses when x > 4.
But we
an save 4 from the implementation of (62) if we repla
e the nal
multipli
ationandshift by `y y + (y 8), y y + (y 16), y y + (y 32),
y & # ff '. [Of
ourse, MMIX's single instru
tion `SADD nu,x,0' is mu
h better.℄
7.1.3 ANSWERS TO EXERCISES 79
44. Let this sum be x. If we
an solve the problem for 2 bit numbers, we
an
(2) d MMIX
solve it for 2d+1bit numbers, be
ause (2)(2d x + x0 ) = (2) x + (2)x0 +2d x. Therefore sideways addition
SADD
a solution analogous to (62) suggests itself, on a 64bit ma
hine: 2ADDU
4ADDU
Set z (x 1) & 0 and y x z. 8ADDU
Set z ((z + (z 2)) & 1 ) + ((y & 1 ) 1) and y (y & 1 ) + ((y 2) & 1). 16ADDU
Set z (((2)z + (z 4)) & 2 )64+ ((y & 2 ) 2) and y 64(y + (y 4)) & 2 . Roki
ki
randomized data stru
tures
Finally (((Az) mod 2 ) 56) + ((((By) mod 2 ) 65) 3), binary sear
h trees
where A = (11111111)256 and B = (01234567)256. Cartesian trees
treaps
But another approa
h is better on MMIX, whi
h has sideways addition built in: Patent
parity
SADD nu2,x,m0 SADD t,x,m2 8ADDU nu2,t,nu2 SADD t,x,m5 SADD
NXOR
SADD t,x,m1 4ADDU nu2,t,nu2 SADD t,x,m4 SLU t,t,5 CSOD
2ADDU nu2,t,nu2 SADD t,x,m3 16ADDU nu2,t,nu2 ADD nu2,nu2,t balan
ed ternary notation
P k
[In general, (2)x = 2 ( & ). See
k x k Dr. Dobb's Journal 8,4 (April 1983), 24{37.℄
45. Let d = (x y) & (y x); test if d & p 6= d. [Roki
ki found that this idea
an be
used with node addresses to nearrandomize binary sear
h trees or Cartesian trees as
if they were treaps, without needing an additional random \priority key" in ea
h node.
See U.S. Patent 6347318 (12 February 2002).℄
46. SADD t,x,m; NXOR y,x,m; CSOD x,t,y; the mask m is ~(1<<i1<<j) . (In general,
these instru
tions
omplement the bits spe
ied by m if those bits have odd parity.)
47. y (x Æ ) & , z (xQ& ) Æ, x (x & m) j y j z, where m = j ( Æ ).
48. Given Æ , there are sÆ = j =0 b(n+j )=Æ
+1 dierent Æ swaps, in
luding the identity
Æ 1F
Pn 1
permutation. (See exer
ise 4.5.3{32.) Summing over Æ gives 1+ Æ=1 (sÆ 1) altogether.
49. (a) The set S = fa1 d1 + +am dm j fa1 ; : : : ; am g f 1; 0; +1gg for displa
ements
Æ1 , : : : , Æm must
ontain fn 1; n 3; : : : ; 1 ng, be
ause the kth bit must be ex
hanged
with the (n + 1 2k)thm bit1 for 1 k n. Hen
e jS j n. And S
ontains at most 3m
numbers, at most 2 3 of whi
h are odd.
(b) Clearly s(mn) s(m) + s(n), be
ause we
an reverse m elds of n bits ea
h.
Thus s(3m) m and s(2 3m ) m + 1. Furthermore the reversal of 3m bits uses
only Æswaps with even values of Æ; the
orresponding (Æ=2)swaps prove that we have
s((3m 1)=2) m. These upper bounds mat
h the lower bounds of (a) when m > 0.
(
) The string a z! with jj = j j = jj = j j = j!j = n
an be
hanged to
!z a with a (3n + 1)swap followed by an (n + 1)swap. Then s(n) further swaps
reverse all. Hen
e s(32) s(6) + 2 = 4, and s(64) 5. Again, equality holds by (a).
In
identally, s(63) = 4 be
ause s(7) = s(9) = 2. The lower bound in (a) turns out
to be the exa
t value of s(n) for 1 n 22, ex
ept that s(16) = 4.
50. Express n = (tm : : : t1 t0 )3 in balan
ed ternary notation. Let nj = (tm : : : tj )3 and
Æj = 2nj + tj 1 , so that nj 1 Æj = nj and 2Æj nj 1 = nj + tj for 1 j m. Let
E0 = f0g and Ej +1 = Ej [ (tj Ej ) for 0 j < m. (Thus, for example, E1 = f0; t0 g
and E2 = f0; t0 ; t1 ; t1 t0 g.)
Assume by indu
tion on j that Æswaps for Æ = Æ1 , : : : , Æj have
hanged the n
bit word 1 : : : 3j to 3j : : : 1 , where ea
h subword k has length nj + "k for some
"k 2 Ej . If j < m, a Æj +1 swap within ea
h subword will preserve this assumption. If
j = m, ea
h k has jk j m + 1, be
ause " 2 Ej implies j"j j . Therefore 2k swaps
for blg m
k 0 will reverse them all. (Note that a 2k swap on a subword of size t,
where 2k < t 2k+1 , redu
es it to three subwords of sizes t 2k , 2k+1 t, t 2k .)
80 ANSWERS TO EXERCISES 7.1.3
51. (a) If
= (
d 1 : : :
0 )2 , we must have d 1 =
d 1 d;d 1 . But for 0 k < d 1 zipping
we
an take k =
k d;k ^k , where ^k is any mask d;k . magi
masks
inshue
(b) Let (d;
) be the set of all su
h mask sequen
es. Clearly (1;
) = f
g. When Lenfant
d > 1 we will have, re
ursively, transpositions
perfe
t shue
(d;
) = f(0 ; : : : ; d 2 ; d 1 ; ^d 2 ; : : : ; ^0 ) j k = k0 1 z k00 1 ; ^k = ^k0 1 z ^k00 1 g; Steele
matrix transposition
by \zipping together" two sequen
es (00 ; : : : ; d0 3 ; d0 2 ; ^d0 3 ; : : : ; ^00 ) 2 (d 1;
0 ) and
(000 ; : : : ; d00 3 ; d00 2 ; ^d00 3 ; : : : ; ^000 ) 2 (d 1;
00 ) for some appropriate 0 , ^0 ,
0 , and
00 .
When
is odd, the bigraph
orresponding to (75) has only one
y
le; so (0; ^0 ;
0 ;
00 ) is either (d;0 ; 0; d
=2e; b
=2
) or (0; d;0 ; b
=2
; d
=2e). But when
is even, the
bigraph has 2d 1 double bonds; so 0 = ^0 is any mask d;0 , and
0 =
00 =
=2.
[In
identally, lg j(d;
)j = 2d 1 (d 1) Pdk=11 (2k 1) (2k 1 j2k 1
mod 2k j).℄
In both
ases we
an therefore let ^d 2 = = ^0 = 0 and omit the se
ond half
of (71) entirely. Of
ourse in
ase (b) we would do the
y
li
shift dire
tly, instead of
using (71) at all. But exer
ise 58 proves that many other useful permutations, su
h as
sele
tive reversal followed by
y
li
shift,
an also be handled by (71) with ^k = 0 for
all k. The inverses of those permutations
an be handled with k = 0 for 0 k < d 1.
52. The following solutions make ^j = 0 whenever possible. We shall express the
masks in terms of the 's, for example by writing 6 & 6;5 instead of stating the
requested hexade
imal form # 55555555 ; the form is shorter and more instru
tive.
(a) k = 5 & 6;k and ^k = (k+1 k 1 ) & 6;k for 0 k < 5; 5 = 4 . (Here
1 = 0. To get the \other" perfe
t shue, (x31 x63 : : : x1 x33 x0 x32 )2 , let ^0 = 6;0 &1 .)
(b) = 3 = ^0 = 6;0 & 3 ; 1 = 4 = ^1 = 6;1 & 4 ; 2 = 5 = ^2 = 6;2 & 5 ;
^3 = ^4 =00. [See J. Lenfant,IEEE Trans. C27 (1978), 637{647, for a general theory.℄
(
) 0 = 6;0 & 4 ; 1 = 6;1 & 5; 2 = 4 = 6;2 & 4 ; 3 = 5 = 6;3 & 5 ;
^0 = 6;0 & 2 ; ^1 = 6;1 & 3 ; ^2 = ^0 2 ; ^3 = ^1 3 ; ^4 = 0.
(d) k = 6;k & 5 k for 0 k 5; ^k = k for 0 k 2; ^3 = ^4 = 0.
53. We
an write as a produ
t of d t transpositions, (u1 v1 ) : : : (ud t vd t) (see
exer
ise 5.2.2{2). The permutation indu
ed by a single transposition (uv) on the index
digits, when u < v,
orresponds to a (2v 2u )swap with mask d;v & u . We should
do su
h a swap for (u1 v1) rst, : : : , (ud 1 vd 1 ) last.
In parti
ular, the perfe
t shue in a 2d bit register
orresponds to the
ase where
= (01 : : : (d 1)) is a one
y
le; so it
an be a
hieved by doing su
h (2v 2u )swaps
for (u; v) = (0; 1), : : : , (0; d 1). For example, when d = 3 the twostep pro
edure is
12345678 7! 13245768 7!k15263748. [Guy Steele suggests an alternative (d 1)step
pro
edure: We
an do a 2 swap with mask d;k+1 & k for d 1 > k 0. When d = 3
his method takes 12345678 7! 12563478 7! 15263748.℄
The matrix transposition in exer
ise 52(b)
orresponds to d = 6 and (u; v) = (0; 3),
(1; 4), (2; 5). These operations are the 7swap, 14swap, and 28swap steps for 8 8
matrix transposition illustrated in the text; they
an be done in any order.
For exer
ise 52(
), use d = 6 and (u; v) = (0; 2), (1; 3), (0; 4), (1; 5). Exer
ise 52(d)
is as easy as 52(b), with (u; v) = (0; 5), (1; 4), (2; 3).
54. Transposition amounts to reversing the bits of the minor diagonals. Su
essive
elements of those diagonals are m 1 apart in the register. Simultaneous reversal of
all diagonals
orresponds to simultaneous reversal of subwords of sizes 1, : : : , m, whi
h
an be done with 2k swaps for 0 k < dlg me (be
ause su
h transposition is easy
7.1.3 ANSWERS TO EXERCISES 81
when m is a power of 2, as illustrated in the text). Here's the pro
edure for m = 7: Pratt
Sto
kmeyer
Given 6swap 12swap 24swap permutation networks
00 01 02 03 04 05 06 00 10 02 12 04 14 06 00 10 20 30 04 14 24 00 10 20 30 40 50 60 banyan
10 11 12 13 14 15 16 01 11 03 13 05 15 25 01 11 21 31 05 15 25 01 11 21 31 41 51 61 Lawrie
20 30 22 32 24 16 26 02 12 22 32 06 16 26 02 12 22 32 42 52 62 mapping
20 21 22 23 24 25 26 don't
are
30 31 32 33 34 35 36 21 31 23 33 43 35 45 03 13 23 33 43 53 63 03 13 23 33 43 53 63 notation
40 41 42 43 44 45 46 40 50 42 34 44 36 46 40 50 60 34 44 54 64 04 14 24 34 44 54 64
50 51 52 53 54 55 56 41 51 61 53 63 55 65 41 51 61 35 45 55 65 05 15 25 35 45 55 65
60 61 62 63 64 65 66 60 52 62 54 64 56 66 42 52 62 36 46 56 66 06 16 26 36 46 56 66
55. Given x and y , rst set x x j (x 2k ) and y y j (y 2k ) for 2d k < 3d. Then
set x (22d+k 2k )swap of x with mask 2d+k & k and y (22d+k 2d+k )swap ofky
with mask 2d+k &d+k for 0 k < d. Finally set z x & y, then either z z j (z 2 )
or z z (z 2k ) for 2d k < 3d, and z z & (2n2 1). [The idea is to form two
n n n arrays x = (x000 : : : x(n 1)(n 1)(n 1) )n and y = (y000 : : : y(n 1)(n 1)(n 1) )n
with xijk = ajk and yijk = bjk3 , then transpose
oordinates so that xijk = aji and
yijk = bik ; now x & y does all n bitwise multipli
ations at on
e. This method is due to
V. R. Pratt and L. J. Sto
kmeyer, J. Computer and System S
i. 12 (1976), 210{213.℄
57. The two
hoi
es for ea
h
y
le when d > 1 have
omplementary settings. So we
an
hoose a setting in whi
h at least half of the
rossbars are ina
tive, ex
ept in the
middle
olumn. (See exer
ise 5.3.4{55 for more about permutation networks.)
58. (a) Every dierent setting of the
rossbars gives a dierent permutation, be
ause
there is exa
tly one path from input line i to output line j for all 0 i; j < N . (A net
work with that property is
alled a \banyan.") The unique su
h path
arries input i
on line l(i; j; k) = ((i k) k) + (j mod 2k ) after k swapping steps have been made.
(b) We have l(i'; i; k) = l(j'; j; k) if and only if i mod 2k = j mod 2k and i' k =
j' k; so () is ne
essary. And it is also suÆ
ient, be
ause a mapping ' that sat
ises ()
an always be routed in su
h a way that j' appears on line l = l(j'; j; k)
after k steps: If k > 1, j' will appear on line l(j'; j; k 1), whi
h is one of the inputs
to l. Condition () says that we
an route it to l without
on
i
t, even if l is l(i'; i; k).
[In
IEEE Transa
tions C24 (1975), 1145{1155, Dun
an Lawrie proved that
ondi
tion () is ne
essary and suÆ
ient for an arbitrary mapping ' of the set f0; 1; : : : ; N 1g
into itself, when the
rossbar modules are allowed to be general 2 2 mapping modules
as in exer
ise 63. Furthermore the mapping ' might be only partially spe
ied, with
j' = (\wild
ard" or \don't
are") for some values of j . The proof that appears in
the previous paragraph a
tually demonstrates Lawrie's more general theorem.℄
(
) i mod 2k = j mod 2k if and only if k (i j ); i k = j k if and only if
k > (i j ); and i' = j' if and only if i = j , when ' is a permutation.
(d) (i' j') < (i j ) for all i and j if and only if (i' j') < (i j ) =
(i j ) for all i and j , be
ause is a permutation. [Note that the notation
an be
onfusing: Bit j appears in bit position j if permutation is applied rst, then .℄
(e) Given i 6= j we must prove that (i' j' ) (i j ). Case 1, i and j are
xed by both ' and : Then (i' j' ) = (i j ) (i j ). Case 2, i' 6= i and
j = j : Then (i' j' ) = (i' j') (i j ). Case 3, i' 6= i and j 6= j : Then
(i' j' ) = (i' j ). Let k = (i j ), and suppose (i' j ) < k. Then
82 ANSWERS TO EXERCISES 7.1.3
i mod 2k = j mod 2k and i' k = j k. Hen
e l(i'; i; k) = l(j ; j; k), and that line monus
arries both i' and j . But those two values
annot be equal. nu(k) summed
He
kel
59. It is 2 d
M (a;b) , where M (a; b) is the number of
rossbars that have both endpoints S
hroeppel
d
in [a : : b℄.k To 0
ount0 them, let k = (ab), a00 =ka mod 0
2k , and b0 = b mod 2k ; noti
e that magi
mask
b a = 2 + b a , and Md (a; b) = Mk+1 (a ; 2 + b ). Counting the
rossbars in the top
half and0
bottom half, plus 0
those0 that. jump
0
between halves, gives M0 k+1 (a0 ; 02k + b0 ) =
Mk (a ; 2 1) + Mk (0; b ) + ((b + 1) a ). Finally, we have Mk (0; b ) = S (b + 1); and
k
Mk (a0 ; 2k 1) = Mk (0; 2k 1 a0 ) = S (2k a0 ) = k2k 1 ka0 + S (a0 ), where S (n) is
evaluated in exer
ise 42.
60. A
y
le of length 2l
orresponds to a pattern u0 v0 $ v1 ! u1 $ u2 v2 $
$ v2l 1 ! u2l 1 $ u2l , where u2l = u0 and `u v' or `v ! u' means that the
permutation sends u to v, `x $ y' means that x = y 1.
We
an generate a random permutation as follows: Given u0 , there are 2n
hoi
es
for v0 , then 2n 1
hoi
es for u1 only one of whi
h
auses u2 = u0 , then 2n 2
hoi
es
for v2 , then 2n 3
hoi
es for u3 only one of whi
h
loses a
y
le, et
.
Consequently the generating fun
tion is G(z) = Qjn=1 22nn 22jj++1z . The expe
ted
number of
y
les, k, is G0 (1) = Hp2n 12 Hn = 21 ln n + ln 2 21
+ O(n 1 ). The mean
of 2k is G(2) = (2n n!)2=p(2n)!3==2 n + O(n 1=2); and the varian
e is G(4) G(2)2 =
(n + 1 G(2)) G(2) = n + O(n).
62. The
rossbar settings in P (2 )
an be stored in (2d 1)2
d d 1 = Nd 1 N bits. To get
2
the inverse permutation pro
eed from right to left. [See P. He
kel and R. S
hroeppel,
Ele
troni
Design 28, 8 (12 April 1980), 148{152. Note that any way to represent an
arbitrary permutation requires at least lg N ! > Nd N=ln 2 bits of memory; so this
representation is nearly optimum, spa
ewise.℄
63. (i) x = y . (ii) z must be even. (When z is odd we have (x z y ) z = (y dz=2e) z
(x bz=2
), even when z < 0.) (iii) This identity holds for all w, x, y, and z (and also
with any other binary bitwise Boolean operator in pla
e of &).
64. (((z & 0 ) + (z j
0 0 )) & 0 ) j (((z & 0 ) + (z 0 j 0 )) & 0 ). (See (86).)
65. xu(x ) + v (x ) = xu(x) + v (x) .
2 2 2 2
66. (a) v (x) = (u(x)=(1+ x )) mod x ; it's the unique polynomial of degree less than n
Æ n
su
h that (1+ x ) v(x) u(x) n(modulo xn ). (Equivalently, v is the unique nbit integer
Æ
su
h that (v (v Æ)) mod 2 = u.)
(b) We may as well assume that n = 64m, and that u = (um 1 : : : u1 u0 )264 ,
v = (vm 1 : : : v1 v0 )264 . Set
0; then, using exer
ise 36, set vj uj (
) and
vj 63 for j = 0, 1, : : : , m 1.
(
) Set
v0 u0 ; then vj uj
and
vj , for j = 1, 2, : : : , m 1.
(d) Start with
0 and do the following for j = 0, 1, : : : , m 1: Set t uj ,
t t (t 3), t t (t 6), t t (t 12), t t (t 24), t t (t 48),
vj t
,
(t 61) # 9249249249249249 .
(e) Start with v u. Then, for j = 1, 2, : : : , m 1, set vj vj (vj 1 3) and
(if j < m 1) vj+1 vj+1 (vj 1 61).
67. Let n = 2l 1 and m = n 2d. If 21 n < k < n we have x2k xm+t + xt (modulo
x + x +1), where t = 2k n is odd. Consequently, if v = (vn 1 : : : v1 v0 )2 , the number
n m
74. If j
P
2l
2l+1 j = 2 > 0, we must rob from the ri
h half and give it to
the poor. There'sd a1position l in the poor half with
l = 0; otherwise that half wouldd
sum to at0 least 2 . A
y
li
1shift that modies positions l through (l + t) mod 2
makes
l+k =
l+k+1 for 0 k < t,
0l+t =
l+t+1 Æ,
0l+t+1 = Æ, and
0l+k =
l+k
for all other k; here Æ
an be any desired value in the range 0 Æ
l+t+1 . (We've
treated all subs
ripts modulo 2d in these formulas.) So we
an use the smallest even t
su
h that
l+1 +
l+3 + +
l+t+1 =
l +
l+2 + +
l+t + + Æ for some Æ 0.
(The 1shift need not be
y
li
, if we allow ourselves to shift left instead of right.
But the
y
li
property may be needed in subsequent steps.)
84 ANSWERS TO EXERCISES 7.1.3
75. Equivalently, given indi
es 0 i0 < i1 < < is 1 < is = 2 and 0 = j0 <
d re
ursively
j1 < < js 1 < js = 2d , we want to map (x2d 1 : : : x1 x0 )2 7! (x(2d 1)' : : : x1' x0' )2 , Ofman
Æ map
where j' = ir for jr j < jr+1 and 0 r < s. If d = 1, a mapping module does this. Ofman
When d > 1, we
an set the lefthand
rossbars so that they route input ir to line magi
masks
ir ((ir + r) mod 2). If s is even, we re
ursively ask one of the networks P (2d 1) inside Pratt
Rabin
P (2d) to solve the problem for indi
es bfi0 ; i2 ; : : : ; is g=2
and bfj0 ; j2 ; : : : ; js g=2
, while Sto
kmeyer
the other solves it for dfi1 ; i3 ; : : : ; is 1 ; 2dg=2e and dfj0 ; j2 ; : : : ; js g=2e. At the right of Arndt
P (2d), one
an now
he
k that when jr j < jr+1 , the mapping module for lines j MMIX
2ADDU
and j 1 has input ir on line j if j r (modulo 2), otherwise ir is on line j 1.
A similar proof works when s is odd.
Notes: This network is a slight improvement over a
onstru
tion by Yu. P. Ofman,
Trudy Mosk. Mat. Obsh
hestva 14 (1965), 186{199. We
an implement the
orrespond
ing network by substituting a \Æmap" for a Æswap; instead of (69), we use two0 masks
and do seven operations instead of six: y x (x Æ), x x (y & ) ((y & ) Æ).
This extension of (71) therefore takes only d additional units of time.
76. When a mapping network realizes a permutation, all of its modules must a
t as
rossbars; hen
e G(n) lg n!. Ofman proved that G(n) 2:5n lg n, and remarked in
a footnote that the
onstant 2.5
ould be improved (without giving any details). We
have seen that in fa
t G(n) 2n lg n. Note that G(3) = 3.
77. Represent an nnetwork by (x2n 1 : : : x1 x0 )2 , where xk = [the binary representa
tion nof k is a possible
onguration of 0s andn 1s when the network has been applied to
all 2 sequen
es of 0s and 1s℄, for 0 k < 2 . Thus the empty network is represented
by 22n 1, and a sorting network for n = 3 is represented by (10001011)2. In general,
x represents a sorting network for n elements if and only if it represents an nnetwork
and x = n + 1, if and only if x = 20 + 21 + 23 + 27 + + 22n 1.
If x represents a
ording to these
onventions, the representation of [i:j ℄ is
(x y) j (y (2n i 2n j )), where y = x & n j & n i .
[See V. R. Pratt, M. O. Rabin, and L. J. Sto
kmeyer, 6 (1974), 122{126.℄
STOC
integers with a > b +1, the minimum (for all s) o
urs when we sele
t bits from right to
left
y
li
ally until running out. For example, when (p; q; r) = (2; 6; 3) the addressing
fun
tion would be (j5 j4 j3 k2 j2 k1 j1 i1 k0 j0 i0 )2 . In parti
ular, To
her's s
heme is optimal.
[But su
h a mapping is not ne
essarily best when the page size isn't a power of 2.
For example,
onsider a 16 16 matrix; the addressing fun
tion (j3 i3i2 i1 i0 j2 j1j0 )2 is
better than (j3 i3 j2 i2j1 i1 j0 i0 )2 for all page sizes from 17 to 62, ex
ept for size 32 when
they are equally good.℄
87. Set x x & ((x & "" ) 1); ea
h byte (a7 : : : a0 )2 is thereby
hanged to
(a7 a6 (a5 ^a6 )a4 : : : a0 )2 . The same transformation works also on 30 additional letters
in the Latin1 supplement to ASCII (for example, 7! ); but there's one glit
h, y 7! .
[Don Woods used this tri
k in his original program for the game of Adventure
(1976), upper
asing the user's input words before looking them up in a di
tionary.℄
88. Set z (x y) & h, then z ((x j h) (y & h )) z.
86 ANSWERS TO EXERCISES 7.1.3
89. x
0 x 1, y0 y 1, t (x0 &(x j y)) j (x & y0 ), z (x & y & 0 ) j (t & 0 ). [From Dietz
the \nasty" test program for H. G. Dietz and R. J. Fisher's SWARC
ompiler (1998).℄ Fisher
SWARC
ompiler
90. Insert `z z j ((x y) & l)' either before or after `z (x & y) + z '. (The ordering MMIX
makes no dieren
e, be
ause x+y xy (modulo 4) when x+y is even. Therefore MMIX MOR
Rounding to even
an round to odd at no additional
ost, using MOR. Rounding to even in the ambiguous round to odd
ases is more diÆ
ult, and with xed point arithmeti
it is not advantageous.) MUX
unbiased rounding
91. If 2 [x; y ℄ denotes the average as in (88), the desired result is obtained by repeating
1
Warren
the following operations seven times, then
on
luding with z 12 [x; y℄ on
e more: identity
Borrows
z 1 [x; y℄; t & h; m (t 1) (t 7);
arries
2 borrow
MOR
x (m & z ) j (m & x); y (m & z ) j (m & y); 1: BDIF
y
li
shift
Although rounding errors a
umulate through eight levels, the resulting absolute error medians
never ex
eeds 807/255. Moreover, it is 1:13 if we average over all 2563
ases, and
it is less than 2 with probability 94:2%. If we round to odd as in exer
ise 90, the
maximum and average error are redu
ed to 616/255 and 0:58; the probability of error
< 2 rises to 99:9%. Therefore the following MMIX
ode uses su
h unbiased rounding:
8 9
x GREG ;y GREG ;z GREG >
>XOR t,x,y MOR m,ffhi,alf >
>
>
> >
>
alf GREG ;m GREG ;t IS $255 <MOR z,rodd,t PUT rM,m =
ffhi GREG 1<<56 repeat seven times: >
AND t,x,y MUX x,z,x
>
l GREG #0101010101010101 > >
>ADDU z,z,t
>
:
MUX y,y,z >
>
;
rodd GREG #4020100804020101 SLU alf,alf,1
after whi
h the rst four instru
tions are repeated again. The total time for eight
blends (67) is less than the
ost of eight multipli
ations.
92. We get zj = d(xj + yj )=2e for ea
h j . (This fa
t, noti
ed by H. S. Warren, Jr.,
follows from the identity x + y = ((x j y) 1) (x y). See also the next exer
ise.)
93. x y = (x y) ((x & y) 1). (\Borrows" instead of \
arries.")
94. (x l)j = (xj 1 bj ) mod 256, where bj is the \borrow" from elds to the right.
So tj is nonzero if and only if (xj : : : x0 )256 < (1 : : : 1)256 = (256j+1 1)=255. (The
answers to the stated questions are therefore \yes" and \no.")
In general if the
onstant l is allowed to have any value (l7 : : : l1 l0 )256 , opera
tion (90) makes tj 6= 0 if and only if (xj : : : x0 )256 < (lj : : : l0 )256 and xj < 128.
95. Use (90): Test if h & (t(x ((x 8) + (x 56))) j t(x ((x 16) + (x 48))) j
t(x ((x 24)+(x 40))) j t(x ((x 32)+(x 32)))) = 0, where t(x) = (x l)& x.
(These 28 steps redu
e to 20 if
y
li
shift or MOR is available, or 15 with BDIF and MOR.)
96. Suppose 0 x; y < 256, xh = bx=128
, x l = x mod 128, yh = by=128
, y l =
y mod 128. Then [ x < y ℄ = hxh yh [ x l < y l ℄i; see exer
ise 7.1.1{106. And [ x l < y l ℄ =
[ yl + 127 xl 128℄. Hen
e [ x < y ℄ = bhxyzi=128
, where z = (x & 127) + (y & 127).
It follows that t = h & hxyzi has the desired properties, when z = (x & h )+(y & h ).
This formula
an also be written t = h & hxyzi, where z = ((x & h ) + (y & h )) =
(x j h) (y & h ) by (18).
To get a similar test fun
tion for [ xj yj ℄ = 1 [ yj < xj ℄, we just inter
hange x $ y
and take the
omplement: t h & hxyzi = h & hxyzi, where z = (x & h ) + (y & h ).
97. Set x
0 x "********", y0 x y, t h & (x j ((x j h) l)) &(y0 j ((y0 j h) l)),
m (t1) (t7), t t&(x0 j ((x0 j h) l)), z (m&"********") j (m&y). (20 steps.)
98. Set u xy, z (x&h )+(y &h ), t hx(u j (xz )), v ((t1) (t7))&u,
z x v, w y v. [This 14step pro
edure invokes answer 96 to
ompute t =
7.1.3 ANSWERS TO EXERCISES 87
h & hxyz i, using the footprint method of Se
tion 7.1.2 to evaluate the median in only footprint method
three steps when x y is known. Of
ourse the MMIX solution is mu
h qui
ker, if median
MMIX
available: BDIF t,x,y; ADDU z,y,t; SUBU w,x,t.℄ BDIF
99. In this potpourri, ea
h of the eight bytes appears to be solving a dierent kind potpourri
arry
of problem; we must re
ast the
onditions so that they t into a
ommon framework: borrow
f0 = [ x0 '!' 0℄, f1 = [ x1 '*' > 0℄, f2 = [ x2 'A' 1℄, f3 = [ x3 > 'z' ℄, f4 = medians
[ x4 > 'a' 1℄, f5 = [ x5 '0' 9℄, f6 = [ x6 255 > 86℄, f7 = [ x7 '?' 3℄. Aha! We Borrows
arries
an use the formulas in answer 96, adjusting d to swit
h between and > as needed: Soule
a = ('?'(255)'0'00'*''!')256 = # 3fff300000002a21 ; b = h = # 7f7f7f7f7f7f7f7f ; multibyte arithmeti
= h & (3(86)9('a' 1)'z'('A' 1)00)256 = # 7
29761f053f7f7f (the hardest one); toruses
d = # 8000800000800080 ; and e = h = # 8080808080808080 .
100. We want uj = xj +yj +
j 10
j +1 and vj = xj yj bj +10bj +1 , where
j and bj are
the \
arry"
0
and \borrow"64into digit position 0j . Set u0 (x + y + (6 : : : 66)16) mod0264
and v (x y) mod 2 . Then we nd uj = xj + yj +
j + 6 16
j+1 and vj =
xj yj bj + 16bj +1 for 0 j < 16, by indu
tion on j . Hen
e u0 and v0 have the
same pattern of
arries and borrows as if we were working in radix 10, and we have
u = u0 6(
16 : : :
2
1 )16 , v = v0 6(b16 : : : b2 b1 )16 . The following
omputation s
hemes
therefore provide the desired results (10 operations for addition, 9 for subtra
tion):
y0 y + (6 : : : 66)16 ; u0 x + y0; v0 x y;
0 0
t hxy u i & (8 : : : 88)16 ; t hxyv0 i & (8 : : : 88)16 ;
0
u u t + (t 2); v v0 t + (t 2):
101. For subtra
tion, set z x y; for addition, set z x + y + # e8
4
4f
18 , where
this
onstant is built from 256 24 = # e8 , 256 60 = #
4 , and 65536 1000 =
# f
18 . Borrows and
arries will o
ur between elds as if mixedradix subtra
tion or
addition were being performed. The remaining task is to
orre
t for
ases in whi
h
borrows o
urred or
arries did not; we
an do this easily by inspe
ting individual
digits, be
ause the radi
es are less than half of the eld sizes: Set t z & # 8080808000 ,
t (t 1) (t 7) ((t 15) & 1), z z (t & # e8
4
4f
18 ). [See Stephen Soule,
CACM 6 (1975), 344{346. We're lu
ky that the `
' in `f
18' is even.℄
102. (a) We assume that x = (x15 : : : x0 )16 and y = (y15 : : : y0 )16 , with 0 xj ; yj < 5;
the goal is to
ompute u = (u15 : : : u0 )16 and v = (v15 : : : v0)16 , with
omponents
uj = (xj + yj ) mod 5 and vj = (xj yj ) mod 5. Here's how:
u x + y; v x y + 5l;
t (u + 3l) & h; t (v + 3l) & h;
u u ((t (t 3)) & 5l); v v ((t (t 3)) & 5l):
Here l = (1 : : : 1)16 = (264 1)=15, h = 8l. (Addition in 7 operations, subtra
tion in 8.)
(b) Now x = (x20 : : : x0 )8 , et
., and we must be more
areful to
onne
arries:
t x + h ;
z (x j h) (y & h );
z (t & h ) + (y & h );
t (y j z) & x & h;
t (y j z ) & t & h;
v x y + t + (t 2):
u x + y (t + (t 2));
Here h = (4 : : : 4)8 = (265 4)=7. (Addition in 11 operations, subtra
tion in 10.)
Similar pro
edures work, of
ourse, for other moduli. In fa
t we
an do multibyte
arithmeti
on the
oordinates of toruses in general, with dierent moduli in ea
h
omponent (see 7.2.1.3{(66)).
88 ANSWERS TO EXERCISES 7.1.3
103. Let h and l be the
onstants in (87) and (88). Addition is easy: u x j ((x&h )+y). leap year
For subtra
tion, take away 1 and add xj &(1 yj ): t (x&l)1, v t j (t+(x&(y l))). table lookup by shifting
bytewise min and max
104. Yes, in 19: Let a = (((1901 4) + 1) 5) + 1, b = (((2099 4) + 12) 5) + 28. perfe
t shues
Set m (x 5) & # f (the month),
# 10 & ((x j (x 1)) 5) (the leap year Albers
Hagerup
orre
tion), u b + # 3 & ((# 3bbee
+
) #(m + m)) (the max day adjustment), and sideways addition
t ((x a (x a)) j (x u (u x))) & 1000220 (the test for unwanted
arries). CSNZ
ZSNZ
105. Exer
ise 98 explains how to
ompute bytewise min and max; a simple modi
ation Stret
h
will
ompute min in some byte positions and max in others. Thus we
an \sort by magi
perfe
t shues" as in Se
tion 5.3.4, Fig. 57, if we
an permute bytes between x and y
appropriately. And su
h permutation is easy, by exer
ise 1. [Of
ourse there are mu
h
simpler and faster ways to sort 16 bytes. But see S. Albers and T. Hagerup, Inf. and
Computation 136 (1997), 25{51, for asymptoti impli ations of this approa h.℄
106. The n bits are regarded as g elds of g bits ea
h. First the nonzero elds are
dete
ted (t1 ), and we form a word y that has (yg 1 : : : y0 )2 in ea
h gbit eld, where
yj = [eld j of x is nonzero℄. Then we
ompare ea
h eld with the
onstants 2g 1 ,
: : : , 20 (t2 ), and form a mask m that identies the most signi
ant nonzero eld of x.
After putting g
opies of that eld into z, we test z as we tested y (t3 ). Finally an appro
priate sideways addition of t2 and t3 (gbitwise) yields . (Try the
ase g = 4, n = 16.)
To
ompute 2 without shifting left, repla
e `t2 1' by `t2 + t2 ', and repla
e the
nal line by w (((a (t3 (t3 g))) mod 2n ) (n g)) l ; then w & m is 2x .
107. h GREG #8000800080008000 CSNZ x,q,z SUBU t,q,t
ms GREG #00ff0f0f33335555 CSNZ lam,q,t OR t,t,y
1H ANDN q,x,m5 2H SLU q,x,16 AND t,t,h
SRU z,x,32 ADDU x,x,q 5H SLU q,t,15
CSNZ x,q,z SLU q,x,32 ADDU t,t,q
ZSNZ lam,q,32 ADDU x,x,q SLU q,t,30
ANDN q,x,m4 3H ANDN y,x,ms ADDU t,t,q
SRU z,x,16 4H XOR t,x,y 6H SRU q,t,60
ADD t,lam,16 OR q,y,h ADDU lam,lam,q
The total time 25 (and no mems) should be in
reased by for a fair
omparison with
(56), be
ause (56) doesn't
lobber x.
e 2e
108. For example, let e be minimum so that n 2 2 . If n is a multiple of 2 , we
e
an use 2 elds of size n=2 , with e redu
tions in step B1; otherwise we
an use 2e
e e
elds of size 2dlg ne e 1, with e + 1 redu
tions in step B1. In either
ase there are e
iterations in steps B2 and B5, so the total running time is O(e) = O(log log n).
109. Start with x x & x and apply Algorithm B. (Step B4 of that algorithm
an
be slightly simplied in this spe
ial
ase, using a
onstant l instead of x y.)
110. Let s = 2 where d = 2
d e e. We will use sbit elds in nbit words.
K1. [Stret
h x mod s.℄ Set y x &(s 1). Then set t y & j and y y t
(t2j (s 1)) for e > j 0. Finally set y (yss) y. [If x = (x2e 1 : : : x0 )2
we now have y = (y2e 1 : : : y0 )2s , where yj = (2 1)xj [ j < d ℄.℄
K2. [Set up minterms.℄ Set y y (a2e 1 : : : a0 )2s , where aj = d;j for 0 j < d
and aj = 2s 1 for d j < 2e .
K3. [Compress.℄ Set y (y 2j s) for e > j 0, then y y & (2s 1). [Now
y = 1 (x mod s). This is the key point that makes the algorithm work.℄
7.1.3 ANSWERS TO EXERCISES 89
K4. [Finish.℄ Set y y j (y 2j se) for 0 j < e. Finally set y y & (2e;j extra
t the most signi
ant bit
((x j ) & 1)) for d j < 2 . quanti
ation
Pratt
111. The n bits are divided into elds of s bits ea
h, although the leftmost eld might nite state automaton
be shorter. First y is set to
ag the all1 elds. Then t = (k: : : t1 t0 )2s
ontains
andidate Gray binary
ode
bits for q, in
luding \false drops" for
ertain patterns 01 with s k < r. We always
have tj 1, and tj 6= 0 implies tj 1 = 0. The bits of u and v subdivide t into two
parts so that we
an safely
ompute m = (t 1) j (t 2) j j (t r), before making
a nal test to eliminate the false drops.
112. Noti
e that if q = x & (x 1) & & (x (r 1)) & (x r) then we have
x & x + q = x & (x 1) & & (x (r 1)).
If we
an solve the stated problem in O(1) steps, we
an also extra
t the most
signi
ant bit of an rbit number in O(1) steps: Apply the
ase n = 2r to the number
22n 1 x. Conversely, a solution to the extra
tion problem
an be shown to yield a so
lution to the 1r 0 problem. Exer
ise 110 therefore implies a solution in O(log log r) steps.
0 0 0
113. Let 0 = 0, x0 = x0 , and
onstru
t xi0 = xi for 1 i r as follows: If
xi = a Æi b and Æi 2= f+; ; g, let i0 = (i 1)0 + 1 and x0i0 = a0 Æi b0 , where a0 = x0j 0
if 0a = xnj and a0 = a if a =
i . If xi = a
, let i0 = (i 1)0 + 2 and (x0i0 1 ; x0i0 ) =
(a &(b2
1); x0i0 1
). If xi = a + b, let i0 = (i 1)0 +6 and let (x0(i 1)0 +1 ; : : : ; x0i0 )
ompute ((a0 & h ) + (b0 & h ))0 ((a0 0 b0 ) & h), where h = 2n 1 . And if xi = a b, do
the similar
omputation ((a j h) (b & h)) ((a b0 ) & h). Clearly r0 6r.
0
114. Simply let Xi = Xj (i) Æi Xk(i) when xi = xj (i) Æi xk(i) , Xi = Ci Æi Xk(i) when
xi =
i Æi xk(i) , and Xi = Xj (i) Æi Ci when xi = xj (i) Æi
i , where Ci =
i when
i is a
shift amount, otherwise Ci = (
i : : :
i )2n = (2mn 1)
i =(2n 1). This
onstru
tion is
possible thanks to the fa
t dthat variablelength shifts are prohibited.
[Noti
e that if m = 2 , we
an use this idea to simulate 2d instan
es of f (x; yi );
then O(d) further operations allow \quanti
ation."℄
115. (a) z x & (x 1) & (x 2), y x & (x + z ). [This problem was posed to the
author by Vaughan Pratt in 1977.℄
(b) First nd xl 0 (x 1) & x and xr x & (x 1),0 the 0left and right ends
of x's blo
ks; and set xr = xr & (xr 1). Then ze xr & (xr (xl & 0 )) and
zo x0r & (x0r (xl & 0 )) are the right ends that are followed by a left end in even or
odd position, respe
tively. The answer is y x & (x + (ze & 0 ) + (zo & 0 )); it
an be
simplied to y x & (x + (ze (x0r & 0))).
(
) This
ase is impossible, by Corollary I.
116. The language L is well dened, by Lemma A (ex
ept that the presen
e or absen
e
of the empty string is irrelevant). A language is regular if and only if it
an be dened by
a nite state automaton, and a 2adi
integer is rational if and only if it
an be dened
by a nite state automaton that ignores its inputs. The identity fun
tion
orresponds
to the language L = 1(0 [ 1) , and a simple
onstru
tion will dene an automaton that
orresponds to the sum, dieren
e, or Boolean
ombination of the numbers dened by
any two given automata a
ting on thesequen
e x0 x1 x2 : : : . Hen
e L is regular.
In exer
ise 115, L is (a) 11 (000 1(0 [ 1) [ 0 ); (b) 11(00(00)1(0 [ 1) [ 0).
117. In
identally, the stated language L
orresponds to an inverse Gray binary
ode:
It denes a fun
tion with the property that f (2x) = f (2x + 1), and g(f (2x)) =
g(f (2x + 1)) = x, where g(x) = x (x 1) (see Eq. 7.2.1.1{(9)).
Pn 1
118. If x = (xn 1 : : : x1 x0 )2 and 0 aj 2 for 0 j < n, we have j =0 aj xj =
j
Pn 1
j =0 ( a j
. (
x & 2j )). Take a = b2j 1
to get x 1.
j
90 ANSWERS TO EXERCISES 7.1.3
Conversely, the following argument by M. S. Paterson proves that monus must be Paterson
used at least n 1 times: Consider any
hain for f (x) that uses addition, subtra
tion, under
ow
bitwise Booleans, and k o
urren
es of the \under
ow"0 00
operation0y / z = (2n 00 1)[ y <z ℄.
If k < n 1 there must be two nbit numbers x and x su
h that x mod0 2 = x 00mod 2 =
0 and su
h that all k of the /'s yield the same result for both x and x . Then
f (x0 ) mod 2j = f (x00 ) mod 2j when j = (x0 x00 ). So f (x) is not the fun
tion x 1.
119. z x y, f 2p & z & (z 1). (See (90).)
120. Generalizing Corollary W, these are the fun
tions su
h that f (x1 ; : : : ; xm )
f (y1 ; : : : ; ym ) (modulo 2k ) whenever xj yj (modulo 2k ) for 1 j m, for 0 k n.
The least signi
ant bit is a binary fun
tion of m variables, so it has 22m possibilities.
The nexttoleast is a binary fun
tion of 2m variables, namely the bits of (x1 mod 4;
: : : ; xm mod 4), so it has 222m ; and so on. Thus the answer is 22m +22m ++2nm .
121. (a) If f has a period of length pq , where q > 1 is odd, its pfold iteration f has a
[p℄
period of length q, say y0 7! y1 7! 7! yq = ny0 1where yj+1 =nf 1 (yj ) and y1 6= y0 . But
[p℄
138. The simplest
ase known to the author requires the
al
ulation of two binary
operations, su
h as
a b b
a b a
a b b and a b a ;
a a
a
ea
h has
ost 2 in
lass Va , but the
osts are (3; 2) and (2; 3) in
lasses I and II.
139. The
al
ulation of z2 is essentially equivalent to exer
ise 136(b); so the natural
representation (111) wins. Fortunately this representation also is good for z1 , with
z1l = xl ^ yl , z1r = xr ^ yr .
140. With representation (111), rst use full binary adders to
ompute (a1 a0 )2 =
xl + yl + zl and (b1 b0 )2 = xr + yr + zr in 5 + 5 = 10 steps. Now the \greedy footprint"
method shows how to
ompute the four desired fun
tions of (a1 ; a0 ; b1 ; b0 ) in eight
further steps: ul = a1 ^ b0 , ur = a0 ^ b1 ; t1 = a1 b0 , t2 = a0 b1 , t3 = a1 t2 ,
t4 = a0 t1 , vl = t3 ^ t1 , vr = t4 ^ t2 . [Is this method optimum?℄
141. Suppose we've
omputed bits a = a0 a1 : : : a2m 1 and b = b0 b1 : : : b2m 1 su
h that
(1971), 249{257. These mysterious numbers, whi
h were rst dened by S. Ulam in
SIAM Review 6 (1964), 348, have baed number theorists for many years. The ratio
Un =n appears to
onverge to a
onstant, 13:52; for example, U20000000 = 270371127
and U40000000 = 540752349. Furthermore, D. W. Wilson has observed empiri
ally that
the numbers form quasiperiodi
\
lusters" whose
enters dier by multiples of another
onstant, 21:6016. Cal
ulations by Jud M
Cranie and the author for Un < 640000000
indi
ate that the largest gap Un Un 1 may o
ur between U24576523 = 332250401 and
U24576524 = 332251032; the smallest gap Un Un 1 = 1 apparently o
urs only when
Un 2 f2; 3; 4; 48g. Certain small gaps like 6, 11, 14, and 16 have never been observed.℄
94 ANSWERS TO EXERCISES 7.1.3
142. Algorithm E in that exer
ise performs the following0 operations on sub
ubes: sideways addition
(i) Count the s in a given sub
ube
. (ii) Given
and
, test if
0 . (iii) Given don't
ares
Breuer
and
0 ,
ompute
t
0 (if it exists). Operation (i) is simple with sideways addition; Frey
let's see whi
h of the nine
lasses of twobit en
odings (119), (123), (124) works best MOR
for (ii) and (iii). Suppose a = 0, b = 1,
= ; the symmetry between 0 and 1 means triply linked tree
traversal in preorder
that we need only examine
lasses I, III, IVa , IV
, Va , and V
.
For the asterisksandbits mapping (0; 1; ) 7! (00; 01; 10), whi
h belongs to
lass I, the truth table for
6
0 is 010100110 in ea
h
omponent. (For example,
0 and 6 1. The s in this truth table are don't
ares for the unused
odes 11.)
The methods of Se
tion 7.1.2 tell us that the
heapest su
h fun
tions have
ost 3;
for example,
0 if and only if ((b b0 ) j a) & a0 = 0. Furthermore the
onsensus
t
0 =
00 exists if and only if z = 1, where z = (b b0 ) & (a a0 ). And in that
ase, a00 = (a b b0 ) & (a a0 ), b00 = (b j b0 ) & z. [The asterisk and bit
odes were
used for this purpose by M. A. Breuer in Pro
. ACM Nat. Conf. 23 (1968), 241{250.℄
But
lass III works out better, with (0 ; 1; ) 7! (01; 10; 00). Then
0 if and only
if (
l &
0l )0j (
r &
0r )00 = 0;
t
0 =00
00 exists if and only if z = 1 where z = x & y, x =
l j
0l ,
y =
r j
r ; and
l = x z ,
r = y z . We save two operations for ea
h
onsensus,
with respe
t to
lass I,
ompensating for an extra step when
ounting asterisks.
Classes IVa , Va , and V
turn out to be far inferior. Class IV
has some merit,
but
lass III is best.
143. f (x) = ((x&m1 )17) j ((x17)&m1 ) j ((x&m2 )15) j ((x15)&m2 ) j ((x&m3 )
10) j ((#x 10) & m3 ) j ((x & m#4 ) 6) j ((x 6) & m4 ), where m1 = # 7f7f7f7f7f7f ,
m2 = fefefefefefe , m3 = 3f3f3f3f3f3f3f , m4 = f
f
f
f
f
f
f
. [See, for #
example, Chess Skill in Man and Ma
hine , edited by Peter W. Frey (1977), page 59.
Five steps suÆ
e to
ompute f (x) on MMIX (four MOR operations and one OR), sin
e
f (x) = q x q0 j q0 x q with q = # 40a05028140a0502 and q0 = # 2010884422110804 .℄
144. Node j (k 1), where k = j & j .
145. It names the an
estor of the leaf node j j 1 at height h.
146. By (136) we want to show that (j & i) = l when l 2 < i l j < l + 2 .
l l
The desired result follows from (35) be
ause l i < l + 2 . l
147. (a) vj = vj = j , vj = 1 j , and j = , for 1 j n.
(b) Suppose n = 2e1 + +2et where e1 > > et 0, and let nk = 2e1 + +2ek
for 0 k t. Then vj = j and vj = vj = nk for nk 1 < j nk . Also nk = vnk 1
for 1 k t, where v0 = ; all other j = .
148. Yes, if y1 = 010000, y2 = 010100, x1 = 010101, x2 = 010110, x3 = 010111,
x3 = 010111, y2 = 010100, x2 = 011000, y1 = 010000, and x1 = 100000.
149. We assume that CHILD(v ) = SIB(v ) = PARENT(v ) = initially for all verti
es v
(in
luding v = ), and that there is at least one nonnull vertex.
S1. [Make triply linked tree.℄ For ea
h of the n ar
s u ! v (perhaps v = ), set
SIB(u) CHILD(v ), CHILD(v ) u, PARENT(u) v . (See exer
ise 2.3.3{6.)
S2. [Begin rst traversal.℄ Set p CHILD(), n 0, and 0 1.
S3. [Compute in the easy
ase.℄ Set n n + 1, p n, n , and
n 1+ (n 1). If CHILD(p) 6= , set p CHILD(p) and repeat this step;
otherwise set p n.
S4. [Compute , bottomup.℄ Set p PARENT(p). Then if SIB(p) 6= , set
p SIB(p) and return to S3; otherwise set p PARENT(p).
7.1.3 ANSWERS TO EXERCISES 95
S5. [Compute in the hard
ase.℄ If p 6= , set h (n & p), then p traversal in postorder
((n h) j 1) h, and go ba
k to S4. Cartesian trees
Vuillemin
S6. [Begin se
ond traversal.℄ Set p CHILD(); 0 n, 0. righttoleft minimum
lefttoright minimum
S7. [Compute , topdown.℄ Set p (PARENT(p)) j (p & p). Then if triply linked tree
CHILD(p) 6= , set p CHILD(p) and repeat this step. Gabow
Bentley
S8. [Continue to traverse.℄ If SIB(p) 6= , set p SIB(p) and go to S7. Tarjan
Otherwise set p PARENT(p), and repeat step S8 if p 6= . Fis
her
Heun
150. We may assume that the elements Aj are distin
t, by regarding them as ordered sum of rho
pairs (Aj ; j ). The hinted binary sear
h tree, whi
h is a spe
ial
ase of the \Cartesian
trees" introdu
ed by Jean Vuillemin [ CACM 23 (1980), 229{239℄, has the property that
k(i; j ) is the nearest
ommon an
estor of i and j . Indeed, the an
estors of any given
node j are pre
isely the nodes k su
h that Ak is a righttoleft minimum of A1 : : : Aj
or Ak is a lefttoright minimum of Aj : : : An .
The algorithm of the pre
eding answer does the desired prepro
essing, ex
ept
that we need to set up a triply linked tree dierently on the nodes f0; 1; : : : ; ng. Start
as before with CHILD(v) = SIB(v) = PARENT(v) = 0 for 0 v n, and let = 0.
Assume that A0 Aj for 1 j n. Set t 0 and do the following steps for v = n,
n 1, : : : , 1: Set u 0; then while Av < At set u t and t PARENT(t). If u 6= 0,
set SIB(v) SIB(u), SIB(u) 0, PARENT(u) v, CHILD(v) u; otherwise simply
set SIB(v) CHILD(t). Also set CHILD(t) v, PARENT(v) t, t v.
Continue with step S2 after the tree has been built. The running time is O(n),
be
ause the operation t PARENT(t) is performed at most on
e for ea
h node t. [This
beautiful way to redu
e the range minimum query problem to the nearest
ommon
an
estor problem was dis
overed by H. N. Gabow, J. L. Bentley, and R. E. Tarjan,
STOC 16 (1984), 137{138, who also suggested the following exer
ise.℄
151. For node v with k
hildren u1 , : : : , uk , dene the node sequen
e S (v ) = v if
k = 0; S (v) = vS (u1 ) if k = 1; and S (v) = S (u1 ) v : : : vS (uk ) if k > 1. (Consequently
v appears exa
tly max(k 1; 1) times in S (v).) If there are k trees in the forest, rooted at
u1 , : : : , uk , write down the node sequen
e S (u1 ) : : : S (uk ) = V1 : : : VN . (The length
of this sequen
e will satisfy n N < 2n.) Let Aj be the depth of node Vj , for 1
j N , where has depth 0. (For example,
onsider the forest (141), but add another
hild K ! D and an isolated node L. Then V1 : : : V15 = CFAGJDHDK BEI L
and A1 : : : A15 = 231342323012301.) The nearest
ommon an
estor of u and v, when
u = Vi and v = Vj , is then Vk(i;j ) in the range minimum query problem. [See J. Fis
her
and V. Heun, Le
ture Notes in Comp. S
i. 4009 (2006), 36{48.℄
152. Step V1 nds the level above whi
h x and y have bits that apply to both of
their an
estors. (See exer
ise 148.) Step V2 in
reases h, if ne
essary, to the level where
they have a
ommon an
estor, or to the top level n if they don't (namely if k = 0).
If x 6= z, step V4 nds the topmost level among x's an
estors that leads to level h;
hen
e it knows the lowest an
estor x^ for whi
h x^ = z (or x^ = ). Finally in V5,
preorder tells us whi
h of x^ or y^ is an an
estor of the other.
153. That pointer has j bits, so it ends after 1 + 2 + + j = j j bits of the
pa
ked string, by (61). [Here j is even. Navigation piles were introdu
ed in Nordi
x = F l1 + F l2 + + F ls ;
where l1 l2 ls > 0 and ls is odd.
Given a negaFibona
i
ode , the following 20step 2adi
hain
onverts x = ()2 to
y = ( )2 to z = (
)2 , where is the odd
odeword with N () = F ( ) and
is the
standard
odeword with F ( ) = F (
0): x+ x & 0 , x x x+ ; d x+ x ;
t d j x , t t & (t 1); y (d & 0 ) t ((t & x ) 1); z (y + 1) 1;
w z (40 ); t w & (w+1); z z (t & (z ((w+1) 1))).
Corresponding negaFibona
i and odd representations satisfy the remarkable law
Fk1 +m + + Fkr +m = ( 1)m (F l1 m + + F ls m ); for all integers m.
For example, if N () < 0 the steps above will
onvert x = (0)2 to y = ( )2, where
F (( 2)0) = N (). Furthermore is the odd
ode for negaFibona
i if and only
if R is the odd
ode for negaFibona
i R , when jj = j j is odd and N () > 0.
No nite 2adi
hain will go the other way, by Corollary I, be
ause the Fibona
i
ode 10k
orresponds to negaFibona
i 10k+1 when k is odd, (10)k=21 when k is even.
But if
is a standard Fibona
i
odeword we
an
ompute y = ( )2 from z = (
)2 by
setting y z 1, t y & (y 1) & 0 , y y t + [ t 6= 0℄((t 1) & 0 ). And then
the method above will
ompute R from R . The overall running time for
onversion
to negaFibona
i form will then be of order log j
j, for two string reversals.
160. The text's rules are a
tually in
omplete: They should also dene the orientation
of ea
h neighbor. Let us stipulate that sn = ; en = ; (0)wn = 0, (1)wo = 1;
(00)ns = 00, (10)nw = 10, (1)ne = 1; (0)oo = 0, (101)oo = 101,
7.1.3 ANSWERS TO EXERCISES 97
(1001)oo = 1001, (0001)ow = 0001. Then a
ase analysis proves that all
ells bipartite
within d steps of the starting
ell have a
onsistent labeling and orientation, by indu

ylinder
hyperboli
plane
tion on the graph distan
e d. (Note the identity + = ((0) ) 1.) Furthermore the upper halfplane
labeling remains
onsistent when we atta
h y
oordinates and move when ne
essary re
e
tion
from one strip to another via the Ærules of (153). breadthrst sear
h
S
hla
i
161. Yes, it is bipartite, be
ause all of its edges are dened by the set of boundary
lines. (The hyperboli
ylinder
annot be bi
olored; but two adja
ent strips
an.) B0
162. It's
onvenient to view the hyperboli
plane through another lens,
by mapping its points to the upper halfplane =z > 0. Then the \straight C 0 A
36
45
lines" be ome semi ir les entered on the xaxis, together with verti al p 90 90
C
triangle ABC has three neighbors CBA , ACB , and BAC , obtained
36 45
by \re
e
ting"
0 0
two of its edges about the third, where1 the re
e
tion of2 0 0 A0 90
The mapping z 7! (z z0 )=(z z0 ) takes the upper halfplane into the unit
ir
le;
when z0 = 12 (p 1=)(1 + 51=4i) the
entral pentagon will be symmetri
. Repeated
re
e
tions of the initial triangle, using breadthrst sear
h until rea
hing triangles that
are invisible, will lead to Fig. 14. To get just the pentagons (without the gray lines),
one
an begin with just the
entral
ell and perform re
e
tions about its edges, et
.
163. (This gure
an be drawn as in exer
ise 162, starting with verti
es that proje
t to
the three points ir, ir!, and ir!2 , where r2 = 21 (1+ p2)(4 p2 p6) and ! = e2i=3 .
Using a notation devised by L. S
hla
i in 1852, it
an be des
ribed as the innite tiling
with parameters f3; 8g, meaning that eight triangles meet at every vertex; see S
hla
i's
Gesammelte Mathematis
he Abhandlungen 1 (1950), 212. Similarly, the pentagrid and
the tiling of exer
ise 154 have S
hla
i symbols f5; 4g and f5; 5g, respe
tively.)
164. The original denition requires more
omputation, even though it
an be fa
tored:
like whorls are formed thereafter. For example, starting with Fig. 15(a) we get
; ; ; :::; ; ;
in a 120 120 bitmap, eventually alternating endlessly between two bizarre patterns.
(Does every nonempty M N bitmap lead to su
h a 2
y
le?)
166. If X =
uster(X ), the sum of the elements of X +(X 1)+(X 1)+(X 1)+(X 1)
The `I' and `H' at the left show that pixels are sometimes left inta
t where paths join,
and that rotating by 90Æ
an make a dieren
e. The next two examples illustrate
a quirky in
uen
e of leftright re
e
tion. The diamond example demonstrates that
very thi
k images
an be unthinnable; none of its bla
k pixels
an be removed without
hanging the number of holes. The nal examples, one of whi
h was inspired by the
answer to exer
ise 166, were pro
essed rst without (160), in whi
h
ase they are
un
hanged by the transformation. But with (160) they're thinned dramati
ally.
173. (a) If X and Y are
losed, X & Y is
losed; if X and Y are open, X j Y is
open. The hinted statement follows. Furthermore X DD = X D , be
ause X D is
losed;
similarly X = X . (In fa
t we have X = (XDL)D , be
ause
LL L L the denitions are dual,
obtained byL swapping bla
k with white.) Now X X D , so X DLD X DD = X D .
Dually, X X LDL . We
on
lude that there's no reason to launder a
lean pi
ture:
X DLDL = (X DLD )L X DL (X D )LDL = X DLDL .
(b) We have X D = (X j XW j XNW j XN )&(X j XN j XNE j XE )&(X j XE j XSE j XS )&
(X j XS j XSW j XW ). Furthermore, in analogy with answer 167(b), this fun
tion
an be
omputed from x , x, and x+ in ten broadword steps: f x j (x 1) j ((x j (x 1))&
(x+ j (x+ 1))), f f & (f 1). [This answer in
orporates ideas of D. R. Fu
hs.℄
To get X L , just inter
hange j and &. For further dis
ussion, see C. Van Wyk
and D. E. Knuth, Report STANCS79707 (Stanford Univ., 1979), 15{36.℄
174. Threedimensional digital topology has been studied by R. Malgouyres, Theoret
Journal 4 (1965), 25{30℄ gave similar algorithms, but with diagonal edges permitted.)
186. (a) B () = z0 + 2(z1 z0 ) + O(2 ); B (1 ) = z2 2(z2 z1 ) + O(2 ).
(b) Every point of S (z0 ; z1 ; z2 ) is2 a
onvex
ombination of z0 , z1 , and z2 .
(
) Obviously true, sin
e (1 t) + 2(1 t)t + t2 = 1.
(d) The
ollinear
ondition follows from (b). Otherwise, by (
), we need only
onsider the
ase z0 = 0 and z2 2z1 = 1, where z1 = x1 + iy1 and y1 6= 0. In that
ase all points lie on the parabola 42 x = (y=y1)2 + 4yx1 =y1.
(e) Note that B(u) = (1 u) z +2u(1 u)((1 )z0 + z1 )+ u2 B() for 0 u 1.
7.1.3 ANSWERS TO EXERCISES 103
[S. N. Bernshten introdu
ed Bn (z0 ; z1 ; : : : ; zn ; t) = P n
(1 t)n k tk zk in Bernshten
k k
Soobsh
henia Khar'kovskoe matemati
heskoe obsh
hestvo (2) 13 (1912), 1{2.℄ xedpoint
Kaasila
187. We
an assume that z0 = (x0 ; y0 ), z1 = (x1 ; y1 ), and z2 = (x2 ; y2 ), where the squines
oordinates are (say) xedpoint numbers represented as 16bit integers divided by 32. TrueType
Bezier
If z0 , z1 , and z2 are
ollinear, use the method of exer
ise 185 to draw a straight Knuth
line from z0 to z2 . (If z1 doesn't lie between z0 and z2 , the other edges will
an
el out, METAFONT
sixregister algorithm
be
ause edges are impli
itly XORed by a lling algorithm.) This
ase o
urs if and Hobby
only if D = x0 y1 + x1 y2 + x2 y0 x1 y0 x2 y1 x0 y2 = 0. Pratt
oni
splines
Otherwise the points (x; y) of S (z0 ; z1 ; z2 ) satisfy F (x; y) = 0, where ellipti
al
hyperboli
F (x; y) = ((x x0 )(y2 2y1 + y0 ) (y y0 )(x2 2x1 + x0 ))2 magi
4D((x1 x0 )(y y0 ) (y1 y0 )(x x0 )) MOR
MUX
Guibas
and D is dened above. We multiply by 324 to obtain integer
oeÆ
ients; then negate Stol
this formula and subtra
t 1, if D < 0, to satisfy
ondition (iv) of Algorithm T and the lower bounds
redundant representation
reverseorder
ondition. (See exer
ise 184.) bigendian
The monotoni
ity
ondition (ii) holds if and only if (x1 x0 )(x2 x1 ) > 0 and
(y1 y0 )(y2 y1 ) > 0. If ne
essary, we
an use the re
urren
e of exer
ise 186(e)
to break S (z0 ; z1 ; z2 ) into at most three monotoni
subsquines; for example, setting
= (x0 x1 )=(x0 2x1 + x2 ) will a
hieve monotoni
ity in x. (A slight rounding error
may o
ur during this xed point arithmeti
, but the re
urren
e
an be performed in
su
h a way that the subsquines are denitely monotoni
.)
Notes: When z0 , z1 , and z2 are near ea
h other, a simpler and faster method based
on exer
ise 186(e) with = 12 is adequate for most pra
ti
al purposes, if one doesn't
are about making the exa
tly
orre
t
hoi
e between lo
al edge sequen
es like \up
thenleft" versus \leftthenup." In the late 1980s, Sampo Kaasila
hose to use squines
as the basi
method of shape spe
i
ation in the TrueType font format, be
ause they
an be digitized so rapidly. The hijklmnj system a
hieves greater
exibility with
ubi
Bezier splines [see D. E. Knuth, 89:;<=>: : The Program (Addison{Wesley,
1986)℄, but at the
ost of extra pro
essing time. A fairly fast \sixregister algorithm"
for the resulting
ubi
urves was, however, developed subsequently by John Hobby
[ACM Trans. on Graphi
s 9 (1990), 262{277℄. Vaughan Pratt introdu
ed
oni
splines,
whi
h are sort of midway between squines and Bezier
ubi
s, in Computer Graphi
s
(modulo x2q + 1), using polynomial arithmeti
mod 2. Equivalently, it's the smallest
positive j for1 whi
h Fj (y) is a multiple of (x2q + 1)=(x2 + 1) = (1 + x + + xq 1 )2 ,
when y = x +1+x.
(b) Use the method of exer
ise 191(d) to evaluate ((x + x 1 )Fj (y)) mod (x2q +1)
when j = M=p, for all prime divisors p of M . If the result is zero, set M M=p and
repeat the pro
ess. If no su
h result eis zero,
(q) = M . e 1
(
) We want to show that
(2 ) is a divisor e
of 3 2 but not ofe+13 2e 2 or
2 . The latter holds be
ause F2e 1(y) = y
e 1 2 1 1
is relatively prime to x2 +1. The
former holds be
ause
e 1 1 e 1 e 1 1
F32e 1(y) = y2 F3 (y)2 = y2 (1 + y)2e = y2e 1 1 (x 1 +x)2e ;
e+1 e
whi
h is 0 modulo P x2 + 1 but not modulo x2 + 1.
(d) F21e 1 (y) = ke=1 y2e 2k . Sin
e y = x 1 (1+x+x2 ) is relatively prime to xq +1,
we have y a0 + a1 x + + aq 1 x (modulo xq +1) for some
oeÆ
ients ai ; hen
e
q 1
202. MOR x,x,
, where
=
0
030300
0
0303 ; then MOR x,mone,x . (See answer 209.)
#
209. Four instru
tions suÆ
e: MXOR y,p,x; MXOR x,mone,x; MXOR x,x,q; XOR x,x,y;
here p = # 80
0e0f0f8f
feff , mone = 1, and q = p.
210. SLU x,one,x; MOR x,b,x; AND x,x,a; MOR x,x,#ff; here register one = 1.
W
211. In general, element ij of the Boolean matrix produ
t AXB is fxkl j aik ^ blj g.
For this problem we
hoose aik = [ i k ℄ and b lj = [ l j ℄; the answer is ` MOR t,f,a;
MOR t,b,t' where a = # 80
0a0f088
aaff and b = # ff5533110f050301 = aT .
108 ANSWERS TO EXERCISES 7.1.3
(Noti
e that this tri
k gives a simple test [ f = f^℄ for monotoni
ity. Furthermore, multilinear representation
the 64bit result (t63 : : : t1 t0 )2 gives the
oeÆ
ients of the multilinear representation MXOR
bigendian
f (x1 ; : : : ; x6 ) = (t63 + t62 x6 + + t1 x1 x2 x3 x4 x5 + t0 x1 x2 x3 x4 x5 x6 ) mod 2; GULLIVER
SWIFT
if we substitute MXOR for MOR, by the result of exer
ise 7.1.1{11.) Clift
MXOR
212. If denotes MXOR as in (183) and b = (7 : : : 1 0 )256 has bytes j , we
an evaluate identity matrix
SADD
= (a B0L ) ((a 8) (B1L +B0U )) ((a 16) (B2L +B1U )) ((a 56) (B7L +B6U )); NEG
2ADDU
where# BjU = (qj ) & m, BjL = (((qj ) 8) + j ) & m, q = # 0080402010080402 , and table lookup by shifting
m = 7f3f1f0f07030100 . (Here qj denotes ordinary multipli
ation of integers.)
213. In this bigendian
omputation, register nn holds n, and register data points
to the o
tabyte following the given bytes n 1 : : : 1 0 in memory (with n 1 rst).
The
onstants aa = # 8381808080402010 and bb = # 339b
f6530180
06
orrespond to
matri
es A and B, found by
omputing the remainders xk mod p(x) for 72 k < 80.
SET
,0
0. LDOU t,data,nn t next o
ta.
LDOU t,data,nn t next o
ta. XOR u,u,
u u
.
ADD nn,nn,8 n n 8. SLU
,v,56
v 56.
BZ nn,2F Done if n = 0. SRU v,v,8 v v 8.
1H MXOR u,aa,t u t A. XOR u,u,v u u v.
MXOR v,bb,t v t B. XOR t,t,u t t u.
ADD nn,nn,8 n n 8. PBN nn,1B Repeat if n > 0.
A similar method nishes the job, with no auxiliary table needed:
2H SET nn,8 n 8. SRU v,v,8 v v 8.
3H AND x,t,ffooo x high 0byte. XOR t,t,v t t v.
MXOR u,aaa,x u xA . SUB nn,nn,1 n n 1.
MXOR v,bbb,x v x B0 . PBP nn,3B Repeat if n > 0.
SLU t,t,8 t t 8. XOR t,t,
t t
.
XOR t,t,u t t u. SRU
r
,t,48 Return t 48.
Here aaa = # 8381808080808080 , bbb = # 0383
363331b0f05 , and ffooo = # ff00:::00 .
The Books of the BigEndians have been long forbidden.
 LEMUEL GULLIVER, Travels Into Several Remote Nations of the World (1726)
214. By
onsidering the irredu
ible fa
tors of the
hara
teristi
polynomial of X ,
we must have X n = I where n = 23 32 5 7 17 31 127 = 168661080. Neill
Clift has shown that l(n 1) = 33 and found the following sequen
e of 33 MXOR
instru
tions to
ompute Y = X 1 =6 X n 1: MXOR t,x,x; MXOR $1,t,x; MXOR $2,t,$1;
MXOR $3,$2,$2; MXOR t,$3,$3; S ; MXOR t,t,$2; S 3 ; MXOR $1,t,$1; MXOR t,$1,$3;
S 13 ; MXOR t,t,$1; S ; MXOR y,t,x; here S stands for `MXOR t,t,t'. To test if X is
nonsingular, do MXOR t,y,x and
ompare t to the identity matrix # 8040201008040201 .
215. SADD $0,x,0; SADD $1,x,a; NEG $0,32,$0; 2ADDU $1,$1,$0; SLU $0,b,$1; then
BN $0,Yes; here a = # aaaaaaaaaaaaaaaa and b = # 2492492492492492 .
INDEX AND GLOSSARY
When an index entry refers to a page
ontaining a relevant exer
ise, see also the answer to
that exer
ise for further information. An answer page is not indexed here unless it refers to a
topi
not in
luded in the statement of the exer
ise.
0{1 matri
es, 67{70, see also Bitmaps. Allou
he, JeanPaul, 78.
multipli
ation of, 50{51, 56. Alpha
hannels, 59.
transposing, 15, 56, 67, 69, 80. Alphabeti
data, 20, 59.
triangularizing, 68. Analysis of algorithms, 55, 85.
0{1 prin
iple, 54. An
estors in a forest, 33.
1 (the
onstant ( 111)2 ), 3, 8, 9, nearest
ommon, 33{35, 64.
50, 71, 76, 107. AND (bitwise
onjun
tion), 2{3.
2adi
hains, 23{27, 37, 61, 91, 96. Animating fun
tions, 53, 56.
2adi
fra
tions, 9, 75. Ar
lists, 62.
2adi
integers: Innite binary strings Ariyoshi, Hiromu ( ), 92.
(: : : x2 x1 x0 )2 subje
t to arithmeti
and Arndt, Jorg Uwe, 76, 84.
bitwise operations, 2, 16, 21, 53, 55. Array storage allo
ation, 16, 22, 54, 59.
as a metri
spa
e, 74.
2bit en
oding for 3state data, 28{31, 63. ASCII: Ameri
an Standard Code for
2
ube equivalen
e, 29{30. Information Inter
hange, iv, 59, 69, 118.
2dimensional data allo
ation, 16. Asso
iative laws, 3, 72.
2ADDU (times 2 and add unsigned), Asterisk
odes for sub
ubes, 18, 63.
79, 84, 108. Averages, bytewise, 19, 59.
3valued logi
, 31, 63.
4neighbors, see Rookneighbors, 40. Ba
kground of an image, 42.
4ADDU (times 4 and add unsigned), 79. Balan
ed bran
hing fun
tions, 53.
8neighbors, see Kingneighbors, 40. Balan
ed ternary notation, 63, 79.
8ADDU (times 8 and add unsigned), 79. Banyan networks, 81.
16ADDU (times 16 and add unsigned), 79. Basi
RAM (randoma
ess ma
hine)
1 (innity), 8, 55. model, 26, 62, 91.
Æ maps, 84. Baumgart, Bru
e Guenther, 12.
Æ shifts, 16. Bays, John Carter, 77.
Æ swaps, 13{15, 50, 55{56, 107.
BDIF (byte dieren
e), 20, 86{87.
x (blg x
), see Binary logarithm.
Benes, Va
lav Edvard, 13.
(average memory a
ess time), 118.
k and d;k , see Magi
masks.
Bentley, Jon Louis, 95.
x, see Sideways addition.
Berlekamp, Elwyn Ralph, 21, 73, 98.
x, see Ruler fun
tion.
Bernshten, Serge Natanovi
h (Bernxten,
Serge Natanoviq), 103.
(instru
tion
y
le time), 118.
BESM6 (BESM6)
omputer, 83.
Beyer, Wendell Terry, 42.
Absorption laws, 3. Bezier, Pierre Etienne, splines, 48,
Abstra
t RISC (redu
edinstru
tionset 66{67, 103.
omputer) model, 26. Bigendian
onvention, 6{8, 12, 20, 77,
A
kland, Bryan David, 44. 103{104, 108.
A
y
li
digraph, 33. Binary basis, 71.
Addition, 3, 19.
bytewise, 19, 87. Binary logarithm (x = blg x
), 10{11,
modulo 5, 60. 21, 25, 33, 55, 60{61, 64.
s
attered, 18, 57. Binary re
urren
e relations, 8, 10, 55.
sideways, 2, 11{12, 55, 62, 94. Binary sear
h trees, 64, 79.
unary, 60. Binary tree stru
tures, 32.
Adja
en
y lists, 62. Binary valuation, see Ruler fun
tion.
Adja
en
y matri
es of graphs, 28, 62. Binary
oded de
imal notation, 60.
Adventure game, 85. Bipartite graphs, 14{15, 97.
Agrawal, Dharma Prakash (Dm
þkAf Bit boards, 32, 63.
ag}vAl), 71. Bit
odes for sub
ubes, 18, 63.
Albers, Susanne, 88. Bit permutations, 13{17, 25, 50.
109
110 INDEX AND GLOSSARY
Interleaving bits, 16, 59, see also Perfe
t Loukakis, Emmanuel (Loukkh
,
shues, Zip. Man¸lh
), 92.
Гораздо больше, чем просто документы.
Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.
Отменить можно в любой момент.