Вы находитесь на странице: 1из 123

KNUTH

THE ART OF

COMPUTER PROGRAMMING

VOLUME 4 PRE-FASCICLE 1A

A DRAFT OF SECTION 7.1.3:

BITWISE TRICKS AND TECHNIQUES

DONALD E. KNUTH Stanford University

6
ADDISON{WESLEY 77
Internet
Stanford GraphBase
MMIX

Internet page http://www- s-fa ulty.stanford.edu/~knuth/tao p.html ontains


urrent information about this book and related books.
See also http://www- s-fa ulty.stanford.edu/~knuth/sgb.html for information
about The Stanford GraphBase , in luding downloadable software for dealing with
the graphs used in many of the examples in Chapter 7.
See also http://www- s-fa ulty.stanford.edu/~knuth/mmixware.html for down-
loadable software to simulate the MMIX omputer.
Copyright 2006 by Addison{Wesley
All rights reserved. No part of this publi ation may be reprodu ed, stored in a retrieval
system, or transmitted, in any form, or by any means, ele troni , me hani al, photo-
opying, re ording, or otherwise, without the prior onsent of the publisher, ex ept
that the oÆ ial ele troni le may be used to print single opies for personal (not
ommer ial) use.
Zeroth printing (revision 3), 16 February 2008
STERNE
Internet
Stanford GraphBase

PREFACE
These unforeseen stoppages,
whi h I own I had no on eption of when I rst set out;
| but whi h, I am onvin ed now, will rather in rease than diminish as I advan e,
| have stru k out a hint whi h I am resolved to follow;
| and that is, | not to be in a hurry;
| but to go on leisurely, writing and publishing two volumes of my life every year;
| whi h, if I am su ered to go on quietly, and an make a tolerable bargain
with my bookseller, I shall ontinue to do as long as I live.
| LAURENCE STERNE, The Life and Opinions of
Tristram Shandy, Gentleman (1760)

This booklet ontains draft material that I'm ir ulating to experts in the
eld, in hopes that they an help remove its most egregious errors before too
many other people see it. I am also, however, posting it on the Internet for
ourageous and/or random readers who don't mind the risk of reading a few
pages that have not yet rea hed a very mature state. Beware: This material has
not yet been proofread as thoroughly as the manus ripts of Volumes 1, 2, and 3
were at the time of their rst printings. And those arefully- he ked volumes,
alas, were subsequently found to ontain thousands of mistakes.
Given this aveat, I hope that my errors this time will not be so numerous
and/or obtrusive that you will be dis ouraged from reading the material arefully.
I did try to make the text both interesting and authoritative, as far as it goes.
But the eld is vast; I annot hope to have surrounded it enough to orral it
ompletely. So I beg you to let me know about any de ien ies that you dis over.
To put the material in ontext, this pre-fas i le ontains Se tion 7.1.3 of a
long, long hapter on ombinatorial algorithms. Chapter 7 will eventually ll
at least three volumes (namely Volumes 4A, 4B, and 4C), assuming that I'm
able to remain healthy. It will begin with a short review of graph theory, with
emphasis on some highlights of signi ant graphs in the Stanford GraphBase,
from whi h I will be drawing many examples. Then omes Se tion 7.1: Zeros
and Ones, beginning with basi material about Boolean operations in Se tion
7.1.1 and Boolean evaluation in Se tion 7.1.2. Se tion 7.1.3, whi h you're about
to read here, applies these ideas to make omputer programs run fast. Se tion
7.1.4 will then dis uss the representation of Boolean fun tions.
The next part, 7.2, is about generating all possibilities, and it begins with
Se tion 7.2.1: Generating Basi Combinatorial Patterns. Fas i les for this se tion
have already appeared on the Web and/or in print. Se tion 7.2.2 will deal with
ba ktra king in general. And so it will ontinue, if all goes well; an outline of
iii
iv PREFACE

the entire Chapter 7 as urrently envisaged appears on the tao p webpage that MMIX
is ited on page ii. ASCII
This part of The Art of Computer Programming has probably been more fun
to write than any other so far. Indeed, I've spent more than 30 years olle ting
material for Se tion 7.1.3; nally I'm able to assemble these goodies together
and segue through them.
Most of Volume 4 will deal with abstra t on epts, and there will be little
or no need to say mu h about a omputer's ma hine language. Volumes 1{3
have already dealt with most of the important ideas about programming at that
level. But Se tion 7.1.3 is a notable ex eption: Here we often want to see the
very pulse of the ma hine.
Therefore I strongly re ommend that readers be ome familiar with the ba-
si s of the MMIX omputer, explained in Volume 1 Fas i le 1, in order to fully
appre iate the bitwise tri ks and te hniques des ribed here. Cross-referen es
to Se tions 1.3.1 and 1.3.2 in the present booklet refer to that fas i le. I've
reprinted the basi MMIX op ode-and-timing hart, Table 1.3.1{1, at the end of
this booklet for onvenien e, together with a list of ASCII odes.
The topi of Boolean fun tions and bit manipulation an of ourse be inter-
preted so broadly that it en ompasses the entire subje t of omputer program-
ming. The real goal of this fas i le is to fo us on on epts that appear at the
lowest levels, on epts on whi h we an ere t signi ant superstru tures. And
even these apparently lowly notions turn out to be surprisingly ri h, with expli it
ties to se tions 1.2.4, 1.2.5, 1.2.8, 2.3.1, 2.3.3, 2.3.4.2, 2.3.5, 3.1, 3.2.2, 4.1, 4.4,
4.5.3, 4.5.4, 4.6.1, 4.6.2, 4.6.3, 4.6.4, 5, 5.2.2, 5.2.3, 5.2.5, and 5.3.4 of the rst
three volumes. I strongly believe in building up a rm foundation, so I have
dis ussed Boolean topi s mu h more thoroughly than I will be able to do with
material that is newer or less basi . Se tion 7.1.3 presented me with an extreme
embarrassment of ri hes: After typing the manus ript I was astonished to dis-
over that I had ome up with 215 exer ises, even though | believe it or not | I
had to eliminate quite a lot of the interesting material that appears in my les.
My notes on ombinatorial algorithms have been a umulating for more
than forty years, so I fear that in several respe ts my knowledge is woefully
behind the times. Please look, for example, at the exer ises that I've lassed as
resear h problems (rated with diÆ ulty level 46 or higher), namely exer ises 61,
76, 112, 117, 126, 128, 129, 130, and 174; I've also impli itly mentioned or posed
additional unsolved questions in the answers to exer ises 21, 140, 141, 156, and
165. Are those problems still open? Please inform me if you know of a solution
to any of these intriguing questions. And of ourse if no solution is known today
but you do make progress on any of them in the future, I hope you'll let me know.
I urgently need your help also with respe t to some exer ises that I made up
as I was preparing this material. I ertainly don't like to re eive redit for things
that have already been published by others, and most of these results are quite
natural \fruits" that were just waiting to be \plu ked." Therefore please tell
me if you know who deserves to be redited, with respe t to the ideas found in
exer ises 5, 6, 20, 26, 34, 39, 49, 50, 53, 57, 58(d,e), 59, 60, 72, 78, 80, 81, 82, 83,
PREFACE v
84, 86, 90, 95, 110, 115, 116, 120, 121, 127, 146, 154, 155, 159, 168, 184, 194, and Roki ki
199, and/or the answers to exer ises 17, 18, and 139. Furthermore I've redited Gosper
Steele
exer ises 45 and 54 to unpublished work of Tom Roki ki and Bill Gosper. Have Warren
either of those results ever appeared in print, to your knowledge? Knuth
GUIBAS
Spe ial thanks are due to Guy Steele and Hank Warren for their omments STOLFI
GOSPER
on my early attempts at exposition, as well as to numerous other orrespondents h
notation xyz i
who have ontributed ru ial orre tions. median fun tion
majority fun tion
.
I shall happily pay a nder's fee of $2.56 for ea h error in this draft when it is notation x y
rst reported to me, whether that error be typographi al, te hni al, or histori al. monus fun tion
dot-minus
The same reward holds for items that I forgot to put in the index. And valuable saturated subtra tion
suggestions for improvements to the text are worth 32/ ea h. (Furthermore, if Hexade imal onstants
Notation
you nd a better solution to an exer ise, I'll a tually reward you with immortal
glory instead of mere money, by publishing your name in the eventual book: )
Cross referen es to yet-unwritten material sometimes appear as `00'; this
impossible value is a pla eholder for the a tual numbers to be supplied later.
Happy reading!
Stanford, California D. E. K.
16 De ember 2006
[These te hniques℄ are instan es of general mathemati al prin iples
waiting to be dis overed, if an appropriate setting is reated.
Su h a setting would be a al ulus of bitmap operations, so one an learn
to use these operations just as naturally as arithmeti operations on numbers.
| L. J. GUIBAS and J. STOLFI, ACM Transa tions on Graphi s (1982)

A ni e mixture of boolean and numeri fun tions |


a suitable exer ise for biturgi al a olytes.
| R. W. GOSPER (1996)

A note on notation. Several formulas in Se tion 7.1.3 use the notation hxyz i,
for the median fun tion (aka majority fun tion) that is dis ussed extensively in
Se tion 7.1.1. Other formulas use the notation x . y, for the monus fun tion
(aka dot-minus or saturated subtra tion), whi h was de ned in Se tion 1.3.1.
Hexade imal onstants are pre eded by a sharp sign: # 123 means (123)16 . If you
run a ross other notations that may be unfamiliar, please look at the Index to
Notations at the end of Volumes 1, 2, or 3, and/or the entries under \Notation"
in the index to the present booklet. Of ourse Volume 4 will some day ontain
its own Index to Notations.
7.1.3 BITWISE TRICKS AND TECHNIQUES 1
Braymore,Caroline
Ro hdale,Simon
COLMAN
bitwise{

Lady Caroline. Psha! that's su h a ha k!


Sir Simon. A ha k, Lady Caroline, that
the knowing ones have warranted sound.
| GEORGE COLMAN, John Bull, A t 3, S ene 1 (1803)

7.1.3. Bitwise Tri ks and Te hniques

Now omes the fun part: We get to use Boolean operations in our programs.
People are more familiar with arithmeti operations like addition, subtra -
tion, and multipli ation than they are with bitwise operations su h as \and,"
\ex lusive-or," and so on, be ause arithmeti has a very long history. But we will
see that Boolean operations on binary numbers deserve to be mu h better known.
Indeed, they're an important omponent of every good programmer's toolkit.
Early ma hine designers provided fullword bitwise operations in their om-
puters primarily be ause su h instru tions ould be in luded in a ma hine's
repertoire almost for free. Binary logi seemed to be potentially useful, although
2 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
only a few appli ations were originally foreseen. For example, the EDSAC om- EDSAC omputer
puter, ompleted in 1949, in luded a \ ollate" ommand that essentially per- ollation, see bitwise and
unpa king
formed the operation z z + (x & y), where z was the a umulator, x was Man hester Mark I omputer
the multiplier register, and y was a spe i ed word in memory; it was used for AND
OR
unpa king data. The Man hester Mark I omputer, built at about the same XOR
time, in luded not only bitwise AND, but also OR and XOR. When Alan Turing Turing
NOT
wrote the rst programming manual for the Mark I in 1950, he remarked that Brooker
bitwise NOT an be obtained by using XOR (denoted `j ') in ombination with a Mark II omputer (Man hester/Ferranti)
round o
row of 1s. R. A. Brooker, who extended Turing's manual in 1952 when the Mark Ferranti Mer ury
II omputer was being designed, remarked further that OR ould be used \to sideways addition
most signi ant 1
round o a number by for ing 1 into its least signi ant digit position." By this To her
time the Mark II, whi h was to be ome the prototype of the Ferranti Mer ury, tri ks versus te hniques
in nite-pre ision numbers
had also a quired new instru tions for sideways addition and for the position of two's omplement notation
the most signi ant 1. 2-adi integers
nim
Keith To her published an unusual appli ation of AND and OR in 1954, nim sum
whi h has subsequently been reinvented frequently (see exer ise 85). And dur-
ing the ensuing de ades, programmers have gradually dis overed that bitwise
operations an be amazingly useful. Many of these tri ks have remained part of
the folklore; the time is now ripe to take advantage of what has been learned.
A tri k is a lever idea that an be used on e, while a te hnique is a tri k
that an be used at least twi e. We will see in this se tion that tri ks tend to
evolve naturally into te hniques.
Enri hed arithmeti . Let's begin by oÆ ially de ning bitwise operations on
integers so that, if x = ( : : : x2 x1 x0 )2 , y = ( : : : y2 y1 y0 )2 , and z = ( : : : z2 z1 z0 )2
in binary notation, we have
x & y = z () xk ^ yk = zk ; for all k  0; (1)
x j y = z () xk _ yk = zk ; for all k  0; (2)
x  y = z () xk  yk = zk ; for all k  0. (3)
(It would be tempting to write `x^y' instead of x&y, and `x_y' instead of x j y; but
when we study optimization problems we'll nd it better to reserve the notations
x ^ y and x _ y for min(x; y) and max(x; y), respe tively.) Thus, for example,
5 & 11 = 1; 5 j 11 = 15; and 5  11 = 14;
sin e 5 = (0101)2 , 11 = (1011)2 , 1 = (0001)2 , 15 = (1111)2 , and 14 = (1110)2 .
Negative integers are to be thought of in this onne tion as in nite-pre ision
numbers in two's omplement notation, having in nitely many 1s at the left; for
example, 5 is ( : : : 1111011)2 . Su h in nite-pre ision numbers are a spe ial ase
of 2-adi integers, whi h are dis ussed in exer ise 4.1{31, and in fa t the operators
&, j ,  make perfe t sense when they are applied to arbitrary 2-adi numbers.
Mathemati ians have never paid mu h attention to the properties of & and j
as operations on integers. But the third operation, , has a venerable history,
be ause it des ribes a winning strategy in the game of nim (see exer ises 8{16).
For this reason x  y has often been alled the \nim sum" of the integers x and y.
7.1.3 BITWISE TRICKS AND TECHNIQUES 3
All three of the basi bitwise operations turn out to have many useful ommutative laws
properties. For example, every relation between ^, _, and  that we studied in asso iative laws
distributive laws
Se tion 7.1.1 is automati ally inherited by &, j , and  on integers, sin e the rela- absorption laws
tion holds in every bit position. We might as well re ap the main identities here: omplementation

notation: x
x & y = y & x; x j y = y j x; x  y = y  x; (4) negation
subtra tion
(x & y)& z = x &(y & z ); (x j y) j z = x j (y j z ); (x  y)  z = x  (y  z ); (5) addition
shift binary
(x j y) & z = (x & z ) j (y & z ); (x & y) j z = (x j z ) & (y j z ); (6)
(x  y) & z = (x & z )  (y & z ); (7)
(x & y) j x = x; (x j y) & x = x; (8)
(x & y)  (x j y) = x  y; (9)
x & 0 = 0; x j 0 = x; x  0 = x; (10)
x & x = x; x j x = x; x  x = 0; (11)
x & 1 = x; x j 1 = 1; x  1 = x; (12)
x & x = 0; x j x = 1; x  x = 1; (13)
x & y = x j y; x j y = x & y; x  y = x  y = x  y: (14)
The notation x in (12), (13), and (14) stands for bitwise omplementation of x,
namely ( : : : x2 x1 x0 )2 , also written  x. Noti e that (12) and (13) aren't quite
the same as 7.1.1{(10) and 7.1.1{(18); we must now use 1 = ( : : : 1111)2 instead
of 1 = ( : : : 0001)2 in order to make the formulas bitwise orre t.
We say that x is ontained in y, written x  y or y  x, if the individual
bits of x and y satisfy xk  yk for all k  0. Thus
x  y () x & y = x () x j y = y () x & y = 0: (15)
Of ourse we needn't use bitwise operations only in onne tion with ea h
other; we an ombine them with all the ordinary operations of arithmeti . For
example, from the relation x + x = ( : : : 1111)2 = 1 we an dedu e the formula
x = x + 1; (16)
whi h turns out to be extremely important. Repla ing x by x 1 gives also
x = x 1; (17)
and in general we an redu e subtra tion to omplementation and addition:
x y = x + y: (18)
We often want to shift binary numbers to the left or right. These operations
are equivalent to multipli ation and division by powers of 2, with appropriate
rounding, but it is onvenient to have spe ial notations for them:
x  k = x shifted left k bits = b2k x ; (19)
x  k = x shifted right k bits = b2 x : k (20)
Here k an be any integer, possibly negative. In parti ular we have
x  ( k) = x  k and x  ( k) = x  k; (21)
4 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
for every in nite-pre ision number x. Also (x & y)  k = (x  k) & (y  k), et . in nite-pre ision
When bitwise operations are ombined with addition, subtra tion, multi- Sleator
quilt
pli ation, and/or shifting, extremely intri ate results an arise, even when the pixel pattern
formulas are quite short. A taste of the possibilities an be seen, for example, bla k
white
in Fig. 11. Furthermore, su h formulas do not merely produ e purposeless, Gosper's ha k
haoti behavior: A famous hain of operations known as \Gosper's ha k," rst pa king++
unpa king++
published in 1972, opened people's eyes to the fa t that a large number of useful Lehmer
and nontrivial fun tions an be omputed rapidly (see exer ise 20). Our goal in fra tional pre ision
date
this se tion is to explore how su h eÆ ient onstru tions might be dis overed. mod
division

Fig. 11. A small portion of


the pat hwork quilt de ned by
the bitwise fun tion f (x; y) 2=
((x  y) & ((y 350)  3)) ;
the square ell in olumn x
and row y is painted white or
bla k a ording as the value of
((f (x; y)  12) & 1) is 0 or 1.
(Design by D. Sleator, 1976;
see also exer ise 18.)

Pa king and unpa king. We studied algorithms for multiple-pre ision arith-
meti in Se tion 4.3.1, dealing with situations where integers are too large to t in
a single word of memory or a single omputer register. But the opposite situation,
when integers are signi antly smaller than the apa ity of one omputer word, is
a tually mu h more ommon; D. H. Lehmer alled this \fra tional pre ision." We
an often deal with several integers at on e, by pa king them into a single word.
For example, a date x that onsists of a year number y, a month number m,
and a day number d, an be represented by using 4 bits for m and 5 bits for d:
x = (((y  4) + m)  5) + d: (22)
We'll see below that many operations an be performed dire tly on dates in this
pa ked form. For example, x < x0 when date x pre edes date x0 . But if ne essary
the individual omponents (y; m; d) an readily be unpa ked when x is given:
d = x mod 32; m = (x  5) mod 16; y = x  9: (23)
And these \mod" operations do not require division, be ause of the important
law
x mod 2n = x & (2n 1) (24)
for any integer n  0. We have, for instan e, d = x & 31 in (22) and (23).
Su h pa king of data obviously saves spa e in memory, and it also saves time:
We an more qui kly move or opy items of data from one pla e to another when
7.1.3 BITWISE TRICKS AND TECHNIQUES 5
they've been pa ked together. Moreover, omputers run onsiderably faster when a he memory
they operate on numbers that t into a a he memory of limited size. prime numbers
table lookup by shifting
The ultimate pa king density is a hieved when we have 1-bit items, be ause sieve of Eratosthenes
we an then ram 64 of them into a single 64-bit word. Suppose, for example,
that we want a table of all odd prime numbers less than 1024, so that we an
easily de ide the primality of a small integer. No problem; only eight 64-bit
numbers are required:
P0 = 0111011011010011001011010010011001011001010010001011011010000001;
P1 = 0100110000110010010100100110000110110000010000010110100110000100;
P2 = 1001001100101100001000000101101000000100100001101001000100100101;
P3 = 0010001010001000011000011001010010001011010000010001010001010010;
P4 = 0000110000000010010000100100110010000100100110010010110000010000;
P5 = 1101001001100000101001000100001000100001000100100101000100101000;
P6 = 1010000001000010000011000011011000010000001011010000001011010000;
P7 = 0000010100010000100010100100100000010100100100010010000010100110:
To test whether 2k + 1 is prime, for 0  k < 512, we simply ompute
Pbk=64  (k & 63) (25)
in a 64-bit register, and see if the leftmost bit is 1. For example, the following
MMIX instru tions will do the job, if register pbase holds the address of P0 :
SRU $0,k,3 $0 bk=8 (i.e., k  3).
LDOU $1,pbase,$0 $1 Pb$0=8 (i.e., Pbk=64# ).
AND $0,k,#3f $0 k mod 64 (i.e., k & 3f ). (26)
SLU $1,$1,$0 $1 ($1  $0) mod 264 .
BN $1,PRIME Bran h to PRIME if s($1) < 0.
Noti e that the leftmost bit of a register is 1 if and only if the register ontents
are negative.
We ould equally well pa k the bits from right to left in ea h word:
Q0 = 1000000101101101000100101001101001100100101101001100101101101110;
Q1 = 0010000110010110100000100000110110000110010010100100110000110010;
Q2 = 1010010010001001011000010010000001011010000001000011010011001001;
Q3 = 0100101000101000100000101101000100101001100001100001000101000100;
Q4 = 0000100000110100100110010010000100110010010000100100000000110000;
Q5 = 0001010010001010010010001000010001000010001001010000011001001011;
Q6 = 0000101101000000101101000000100001101100001100000100001000000101;
Q7 = 0110010100000100100010010010100000010010010100010000100010100000;
here Qj = PjR . Instead of shifting left as in (25), we now shift right,
Qbk=64  (k & 63); (27)
and look at the rightmost bit of the result. The last two lines of (26) be ome
SRU $1,$1,$0 $1 $1  $0. (28)
BOD $1,PRIME Bran h to PRIME if $1 is odd.
(And of ourse we use qbase instead of pbase.) Either way, the lassi sieve of
Eratosthenes will readily set up the basi table entries Pj or Qj (see exer ise 24).
6 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Table 1
big-endian++
THE BIG-ENDIAN VIEW OF A 32-BYTE MEMORY little-endian++
multiple-pre ision
o ta 0
z }| {
tetra 0 tetra 4
z }| {z }| {
wyde 0 wyde 2 wyde 4 wyde 6
z }| {z }| {z }| {z }| {
byte 0 byte 1 byte 2 byte 3 byte 4 byte 5 byte 6 byte 7
z }| {z }| {z }| {z }| {z }| {z }| {z }| {z }| {
a0 : : : a7 a8 : : : a15 a16 : : : a23 a24 : : : a31 a32 : : : a39 a40 : : : a47 a48 : : : a55 a56 : : : a63

o ta 8
z }| {
tetra 8 tetra 12
z }| {z }| {
wyde 8 wyde 10 wyde 12 wyde 14
z }| {z }| {z }| {z }| {
byte 8 byte 9 byte 10 byte 11 byte 12 byte 13 byte 14 byte 15
z }| {z }| {z }| {z }| {z }| {z }| {z }| {z }| {
a64 : : : a71 a72 : : : a79 a80 : : : a87 a88 : : : a95 a96 : : : a103 a104 : : : a111 a112 : : : a119 a120 : : : a127

o ta 16
z }| {
tetra 16 tetra 20
z }| {z }| {
wyde 16 wyde 18 wyde 20 wyde 22
z }| {z }| {z }| {z }| {
byte 16 byte 17 byte 18 byte 19 byte 20 byte 21 byte 22 byte 23
z }| {z }| {z }| {z }| {z }| {z }| {z }| {z }| {
a128 : : : a135 a136 : : : a143 a144 : : : a151 a152 : : : a159 a160 : : : a167 a168 : : : a175 a176 : : : a183 a184 : : : a191

o ta 24
z }| {
tetra 24 tetra 28
z }| {z }| {
wyde 24 wyde 26 wyde 28 wyde 30
z }| {z }| {z }| {z }| {
byte 24 byte 25 byte 26 byte 27 byte 28 byte 29 byte 30 byte 31
z }| {z }| {z }| {z }| {z }| {z }| {z }| {z }| {
a192 : : : a199 a200 : : : a207 a208 : : : a215 a216 : : : a223 a224 : : : a231 a232 : : : a239 a240 : : : a247 a248 : : : a255

Big-endian and little-endian onventions. Whenever we pa k bits or bytes


into words, we must de ide whether to pla e them from left to right or from right
to left. The left-to-right onvention is alled \big-endian," be ause the initial
items go into the most signi ant positions; thus they will have bigger signi an e
than their su essors, when numbers are ompared. The right-to-left onvention
is alled \little-endian"; it puts the rst items where little numbers go.
A big-endian approa h seems more natural in many ases, be ause we're a -
ustomed to reading and writing from left to right. But a little-endian pla ement
has advantages too. For example, let's onsider the prime number problem again;
let ak = [2k+1 is prime℄. Our table entries fP0 ; P1 ; : : : ; P7 g are big-endian, and
we an regard them as the representation of a single multiple-pre ision integer
that is 512 bits long:
(P0 P1 : : : P7 )264 = (a0 a1 : : : a511 )2 : (29)
Similarly, our little-endian table entries represent the multipre ise integer
(Q7 : : : Q1 Q0 )264 = (a511 : : : a1 a0 )2 : (30)
The latter integer is mathemati ally ni er than the former, be ause it is
511
X 511
X X
1 
2k ak = 2k [2k+1 is prime℄ = 2k [2k+1 is prime℄ mod 2512 : (31)
k=0 k=0 k=0
7.1.3 BITWISE TRICKS AND TECHNIQUES 7
Table 2
portability+
THE LITTLE-ENDIAN VIEW OF A 32-BYTE MEMORY
o ta 24
z }| {
tetra 28 tetra 24
z }| {z }| {
wyde 30 wyde 28 wyde 26 wyde 24
z }| {z }| {z }| {z }| {
byte 31 byte 30 byte 29 byte 28 byte 27 byte 26 byte 25 byte 24
z }| {z }| {z }| {z }| {z }| {z }| {z }| {z }| {
a255 : : : a248 a247 : : : a240 a239 : : : a232 a231 : : : a224 a223 : : : a216 a215 : : : a208 a207 : : : a200 a199 : : : a192

o ta 16
z }| {
tetra 20 tetra 16
z }| {z }| {
wyde 22 wyde 20 wyde 18 wyde 16
z }| {z }| {z }| {z }| {
byte 23 byte 22 byte 21 byte 20 byte 19 byte 18 byte 17 byte 16
z }| {z }| {z }| {z }| {z }| {z }| {z }| {z }| {
a191 : : : a184 a183 : : : a176 a175 : : : a168 a167 : : : a160 a159 : : : a152 a151 : : : a144 a143 : : : a136 a135 : : : a128

o ta 8
z }| {
tetra 12 tetra 8
z }| {z }| {
wyde 14 wyde 12 wyde 10 wyde 8
z }| {z }| {z }| {z }| {
byte 15 byte 14 byte 13 byte 12 byte 11 byte 10 byte 9 byte 8
z }| {z }| {z }| {z }| {z }| {z }| {z }| {z }| {
a127 : : : a120 a119 : : : a112 a111 : : : a104 a103 : : : a96 a95 : : : a88 a87 : : : a80 a79 : : : a72 a71 : : : a64

o ta 0
z }| {
tetra 4 tetra 0
z }| {z }| {
wyde 6 wyde 4 wyde 2 wyde 0
z }| {z }| {z }| {z }| {
byte 7 byte 6 byte 5 byte 4 byte 3 byte 2 byte 1 byte 0
z }| {z }| {z }| {z }| {z }| {z }| {z }| {z }| {
a63 : : : a56 a55 : : : a48 a47 : : : a40 a39 : : : a32 a31 : : : a24 a23 : : : a16 a15 : : : a8 a7 : : : a0

Noti e, however, that we used (Q7 : : : Q1 Q0 )264 to get this simple result, not
(Q0 Q1 : : : Q7 )264 . The other number,
(Q0 Q1 : : : Q7 )264 = (a63 : : : a1 a0 a127 : : : a65 a64 a191 : : : a385 a384 a511 : : : a449 a448 )2
is in fa t quite weird, and it has no really ni e formula. (See exer ise 25.)
Endianness has important onsequen es, be ause most omputers allow in-
dividual bytes of the memory to be addressed as well as register-sized units. MMIX
has a big-endian ar hite ture; therefore if register x ontains the 64-bit number
# 0123456789ab def , and if we use the ommands `STOU x,0; LDBU y,1' to
store x into o tabyte lo ation 0 and read ba k the byte in lo ation 1, the result
in register y will be # 23 . On ma hines with a little-endian ar hite ture, the
analogous ommands would set y # d instead; # 23 would be byte 6.
Tables 1 and 2 illustrate the ompeting \world views" of big-endian and
little-endian a ionados. The big-endian approa h is basi ally top-down, with
bit 0 and byte 0 at the top left; the little-endian approa h is basi ally bottom-up,
with bit 0 and byte 0 at the bottom right. Be ause of this di eren e, great are
is ne essary when transmitting data from one kind of omputer to another, or
when writing programs that are supposed to give equivalent results in both ases.
On the other hand, our example of the Q table for primes shows that we an
perfe tly well use a little-endian pa king onvention on a big-endian omputer
8 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
like MMIX, or vi e versa. The di eren e is noti eable only when data is loaded rightmost bits++
and stored in di erent-sized hunks, or passed between ma hines. smearing bits
extra ting bits
removing bits
Working with the rightmost bits. Big-endian and little-endian approa hes runs of bits
aren't readily inter hangeable in general, be ause the laws of arithmeti send Wegner
Gladwin
signals leftward from the bits that are \least signi ant." Some of the most Warren
important bitwise manipulation te hniques are based on this fa t. trailing zeros
ruler fun tion
If x is almost any nonzero 2-adi integer, we an write its bits in the form x
binary valuation, see ruler fun tion
x = ( 01a 10b )2 ; (32)
in other words, x onsists of some arbitrary (but in nite) binary string , followed
by a 0, whi h is followed by a + 1 ones, and followed by b zeros, for some a  0
and b  0. (The ex eptions o ur when x = 2b ; then a = 1.) Consequently
x = (  10a 01b )2 ; (33)
x 1 = ( 01 01 )2 ;
a b (34)
x = (  10 10 )2 ;
a b (35)
and we see that x + 1 = x = x 1, in agreement with (16) and (17). With two
operations we an therefore ompute relatives of x in several useful ways:
x & (x 1) = ( 01a 00b )2 [remove the rightmost 1℄; (36)
x & x = (01 00a 10b )2 [extra t the rightmost 1℄; (37)
1
x j x = (1 11 10 )2 [smear the rightmost 1 to the left℄;
a b (38)
1
x  x = (1 11 00 )2 [remove and smear it to the left℄;
a b (39)
x j (x 1) = ( 01 11 )2 [smear the rightmost 1 to the right℄;
a b (40)
x  (x 1) = (01 00a 11b )2 [extra t and smear it to the right℄; (41)
x & (x 1) = (01 00a 01b )2 [extra t, remove, and smear it to the right℄. (42)
And two further operations produ e yet another variant:
((x j (x 1))+1) & x = ( 00a 00b )2 [remove the rightmost run of 1s℄. (43)
When x = 0, ve of these formulas produ e 0, the other three give 1. [For-
mula (36) is due to Peter Wegner, CACM 3 (1960), 322; and (43) is due to
H. Tim Gladwin, CACM 14 (1971), 407{408. See also Henry S. Warren, Jr.,
CACM 20 (1977), 439{441.℄
The quantity b in these formulas, whi h spe i es the number of trailing zeros
in x, is alled the ruler fun tion of x and written x, be ause it is related to
the lengths of the ti k marks that are often used to indi ate fra tions of an in h:
` '. In general, x is the largest integer k su h that 2k divides x,
when x 6= 0; and we de ne  0 = 1. The re urren e relations
(2x + 1) = 0; (2x) = (x) + 1 (44)
also serve to de ne x for nonzero x. Another handy relation is worthy of note,
(x y) = (x  y): (45)
7.1.3 BITWISE TRICKS AND TECHNIQUES 9
The elegant formula x & x in (37) allows us to extra t the rightmost 1 bit Dallos
very ni ely, but we often want to identify exa tly whi h bit it is. The ruler SADD
magi mask
fun tion an be omputed in many ways, and the best method often depends mask: A bit pattern with 1s in key positions
heavily on the omputer that is being used. For example, a two-instru tion 2-adi fra tion
truth tables
sequen e due to J. Dallos does the job qui kly and easily on MMIX(see (42)): proje tion fun tions
MMIX
SUBU t,x,1; SADD rho,t,x. (46) CSZ
ZSZ
(See exer ise 30 for the ase x = 0.) We shall dis uss here two approa hes that
do not rely on exoti ommands like SADD; and later, after learning a few more
te hniques, we'll onsider a third way.
The rst general-purpose method makes use of \magi mask" onstants k
that prove to be useful in many other appli ations, namely
0 = ( : : : 101010101010101010101010101010101)2 = 1=3;
1 = ( : : : 100110011001100110011001100110011)2 = 1=5; (47)
2 = ( : : : 100001111000011110000111100001111)2 = 1=17;
andk so on. In general k is the in nite 2-adi fra tion 1=(2 2k + 1), be ause
(2 2 + 1) k = ( k  2k ) + k = ( : : : 11111)2 = 1. On a omputer that has 2d -
bit registers we don't need in nite pre ision, of ourse, so we use the trun ated
onstants d k
d;k = (2 2 1)=(2 2 + 1) for 0  k < d. (48)
These onstants are familiar from our study of Boolean evaluation, be ause they
are the truth tables of the proje tion fun tions xn k (see, for example, 7.1.2{(7)).
When x is a power of 2, we an use these masks to ompute
x = [ x & 0 = 0℄ + 2[ x & 1 = 0℄ + 4[ x & 2 = 0℄ + 8[ x & 3 = 0℄ +    ; (49)
be ause [2j & k = 0℄ = jk when j = ( : : : j3 j2 j1 j0 )2 . Thus, on a 2d -bit omputer,
we an start with  0 and y x & x; then set   + 2k if y & d;k = 0, for
0  k < d. This pro edure gives  = x when x 6= 0. (It also gives  0 = 2d 1,
an anomalous value that may need to be orre ted; see exer ise 30.)
For example, the orresponding MMIX program might look like this:
m0 GREG #5555555555555555 ;m1 GREG #3333333333333333;
m2 GREG #0f0f0f0f0f0f0f0f ;m3 GREG #00ff00ff00ff00ff;
m4 GREG #0000ffff0000ffff ;m5 GREG #00000000ffffffff;
NEGU y,x; AND y,x,y; AND q,y,m5; ZSZ rho,q,32;
AND q,y,m4; ADD t,rho,16; CSZ rho,q,t; (50)
AND q,y,m3; ADD t,rho,8; CSZ rho,q,t;
AND q,y,m2; ADD t,rho,4; CSZ rho,q,t;
AND q,y,m1; ADD t,rho,2; CSZ rho,q,t;
AND q,y,m0; ADD t,rho,1; CSZ rho,q,t;
total time = 19. Or we ould repla e the last three lines by
SRU y,y,rho; LDB t,rhotab,y; ADD rho,rho,t (51)
where rhotab points to the beginning of an appropriate 129-byte table (only
eight of whose entries are a tually used). The total time would then be  + 13.
10 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
The se ond general-purpose approa h to the omputation of x is quite de Bruijn y les
di erent. On a 64-bit ma hine it starts as before, with y x & x; but then it Martin
Lauter
simply sets   Leiserson
 de ode ((a  y ) mod 264 )  58 ; (52) Prokop
Randall
where a is a suitable multiplier and de ode is a suitable 64-byte table. The Seal
leftmost bits+
onstant a = (a63 : : : a1 a0 )2 must have the property that its 64 substrings x+
[lg x℄+
a63 a62 : : : a58 ; a62 a61 : : : a57 ; : : : ; a5 a4 : : : a0 ; a4 a3 a2 a1 a0 0; : : : ; a0 00000 binary logarithm+
leftmost
are distin t. Exer ise 2.3.4.2{23 shows that many su h \de Bruijn y les" exist; oating-point
CSNZ
for example, we an use M. H. Martin's onstant # 03f79d71b4 a8b09 , whi h ZSNZ
is dis ussed in exer ise 3.2.2{17. The de oding table de ode [0℄; : : : ; de ode [63℄ is MMIX
onditional set
then 00; 01; 56; 02; 57; 49; 28; 03; 61; 58; 42; 50; 38; 29; 17; 04; zero or set
bran h instru tions
62; 47; 59; 36; 45; 43; 51; 22; 53; 39; 33; 30; 24; 18; 12; 05; (53)
63; 55; 48; 27; 60; 41; 37; 16; 46; 35; 44; 21; 52; 32; 23; 11;
54; 26; 40; 15; 34; 20; 31; 10; 25; 14; 19; 09; 13; 08; 07; 06:
[This te hnique was devised in 1997 by M. Lauter, and independently by C. E.
Leiserson, H. Prokop, and K. H. Randall a few months later (unpublished).
David Seal had used a similar method in 1994, with a larger de oding table.℄
Working with the leftmost bits. The fun tion x = blg x , whi h is dual to
x be ause it lo ates the leftmost 1 when x > 0, was introdu ed in Eq. 4.6.3{(6).
It satis es the re urren e
1 = 0; (2x) = (2x + 1) = (x) + 1 for x > 0; (54)
and it is unde ned when x  0. What is a good way to ompute it? On e again
MMIX provides a qui k-but-tri ky solution:
FLOTU y,ROUND_DOWN,x; SUB y,y,fone; SR lam,y,52 (55)
where fone = # 3ff0000000000000 is the oating-point representation of 1.0.
(Total time 6.) This ode oats x, then extra ts the exponent.
But if oating-point onversion is not readily available, a binary redu tion
strategy works fairly well on a 2d -bit ma hine. We an start with  0 and
y x; then we set   + 2k and y y  2k if y & k 6= 0, for k = d 1,
: : : , 1, 0 (or until k is redu ed to the point where a short table an be used to
nish up). The MMIX ode analogous to (50) and (51) is now
ANDN q,x,m5; SRU z,x,32; SET y,x; CSNZ y,q,z; ZSNZ lam,q,32;
ANDN q,y,m4; SRU z,y,16; ADD t,lam,16; CSNZ y,q,z; CSNZ lam,q,t;
ANDN q,y,m3; SRU z,y,8; ADD t,lam,8; CSNZ y,q,z; CSNZ lam,q,t;
LDB t,lamtab,y; ADD lam,lam,t; (56)
and the total time is  + 17. In this ase table lamtab has 256 entries, namely
x for 0  x < 256. Noti e that the \ onditional set" (CS) and \zero or set"
(ZS) instru tions have been used here instead of bran h instru tions. They tend
to save time, even though they've made the program slightly longer.
7.1.3 BITWISE TRICKS AND TECHNIQUES 11
There appears to be no simple way to extra t the leftmost 1 bit that appears smearing right
in a register, analogous to the tri k by whi h we extra ted the rightmost 1 in (37). Warren
run of 1s
For this purpose we ould ompute y x and then 1  y, if x 6= 0; but a binary Lyn h
\smearing right" method is somewhat shorter and faster: sum of bits, see sideways sum
ones ounting, see sideways
Set y x, then y y j (y  2k ) for 0  k < d. (57)
sideways addition+
subsets
The leftmost 1 bit of x is then y (y  1). largest
smallest
[These non- oating-point methods have been suggested by H. S. Warren, Jr.℄ population ount
Other operations at the left of a register, like removing the leftmost run of ardinality
Wilkes
1s, are harder yet; see exer ise 39. But there is a remarkably simple, ma hine- Wheeler
independent way to determine whether or not x = y, given unsigned integers Gill
Gillies
x and y, in spite of the fa t that we an't ompute x or y qui kly: Miller
EDSAC
x = y if and only if x  y  x & y: (58) remainder mod 2n 1
Muller
[See exer ise 40. This elegant relation was dis overed by W. C. Lyn h in 2006.℄ ILLIAC I
We will use (58) below, to devise another way to ompute x.
Sideways addition. Binary n-bit numbers x = (xn 1 : : : x1 x0 )2 are often used
to represent subsets X of the n-element universe f0; 1; : : : ; n 1g, with k 2 X
if and only if 2k  x. The fun tions x and x then represent the largest and
smallest elements of X . The fun tion
x = xn 1 +    + x1 + x0 ; (59)
whi h is alled the \sideways sum" or \population ount" of x, also has obvious
importan e in this onne tion, be ause it represents the ardinality jX j, namely
the number of elements in X . This fun tion, whi h we onsidered in 4.6.3{(7),
satis es the re urren e
 0 = 0;  (2x) =  (x) and  (2x +1) =  (x) + 1; for x  0. (60)
It also has an interesting onne tion with the ruler fun tion (exer ise 1.2.5{11),
X n
x = 1 +  (x 1) x ; equivalently, k = n n: (61)
k=1
The rst textbook on programming, The Preparation of Programs for an
Ele troni Digital Computer by Wilkes, Wheeler, and Gill, se ond edition (Read-
ing, Mass.: Addison{Wesley, 1957), 155, 191{193, presented an interesting sub-
routine for sideways addition due to D. B. Gillies and J. C. P. Miller. Their
method was devised for the 35-bit numbers of the EDSAC, but it is readily
onverted to the following 64-bit pro edure for x when x = (x63 : : : x1 x0 )2 :
Set y x ((x  1) & 0 ). (Now y = (u31 : : : u1 u0 )4 , where uj = xj +1 + xj .)
Set y (y & 1 ) + ((y  2) & 1 ). (Now y = (v15 : : : v1 v0 )16 , vj = uj +1 + uj .)
Set y (y + (y  4)) & 2 . (Now y = (w7 : : : w1 w0 )256 , wj = vj +1 + vj .)
Finally  ((a  y) mod 264 )  56, where a = (11111111)256 . (62)
The last step leverly omputes y mod 255 = w7 +   +w1 + w0 via multipli ation,
using the fa t that the sum ts omfortably in eight bits. [David Muller had
programmed a similar method for the ILLIAC I ma hine in 1954.℄
12 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
If x is expe ted to be \sparse," having at most a few 1 bits, we an use a Wegner
faster method [P. Wegner, CACM 3 (1960), 322℄: reversal of bits+
divide by 2 and onquer
magi masks
Set  0, y x. Then while y 6= 0, set   + 1, y y & (y 1). (63) Stra hey
Baumgart
A similar approa h, using y y j (y +1), works when x is expe ted to be \dense." Warren
big-endian
Bit reversal. For our next tri k, let's hange x = (x63 : : : x1 x0 )2 to its left- little-endian
MOR
right mirror image, xR = (x0 x1 : : : x63 )2 . Anybody who has been following the swapping bits+++
developments so far, seeing methods like (50), (56), (57), and (62), will probably
think, \Aha | on e again we an divide by 2 and onquer! If we've already
dis overed how to reverse 32-bit numbers, we an reverse 64-bit numbers almost
as fast, be ause (xy)R = yR xR . All we have to do is apply the 32-bit method in
parallel to both halves of the register, then swap the left half with the right half."
Right. For example, we an reverse an 8-bit string in three easy steps:
Given x7 x6 x5 x4 x3 x2 x1 x0
Swap bits x6 x7 x4 x5 x2 x3 x0 x1 (64)
Swap nyps x4 x5 x6 x7 x0 x1 x2 x3
Swap nybbles x0 x1 x2 x3 x4 x5 x6 x7
And six su h easy steps will reverse 64 bits. Fortunately, ea h of the swapping
operations turns out to be quite simple with the help of the magi masks k :
y (x  1) & 0 ; z (x & 0 )  1; x y j z ;
y (x  2) & 1 ; z (x & 1 )  2; x y j z ;
y (x  4) & 2 ; z (x & 2 )  4; x y j z ;
y (x  8) & 3 ; z (x & 3 )  8; x y j z ; (65)
y (x  16) & 4 ; z (x & 4 )  16; x y j z ;
x (x  32) j ((x  32) mod 264 ):
[Christopher Stra hey foresaw some aspe ts of this onstru tion in CACM 4
(1961), 146, and a similar ternary method was devised in 1973 by Bru e Baum-
gart (see exer ise 49). The mature algorithm (65) was presented by Henry S.
Warren, Jr., in Ha ker's Delight (Addison{Wesley, 2002), 102.℄
But MMIX is on e again able to trump this general-purpose te hnique with
less traditional ommands that do the job mu h faster. Consider
rev GREG #0102040810204080; MOR x,x,rev; MOR x,rev,x; (66)
the rst MOR instru tion reverses the bytes of x from big-endian to little-endian
or vi e versa, while the se ond reverses the bits within ea h byte.
Bit swapping. Suppose we only want to inter hange two bits within a register,
xi $ xj , where i > j . What would be a good way to pro eed? (Dear reader,
please pause for a moment and solve this problem in your head, or with pen il
and paper | without looking at the answer below.)
Let Æ = i j . Here is one solution (but don't peek until you're ready):
y (x  Æ ) & 2 j ; z (x & 2 j )  Æ; x (x & m) j y j z; where m = 2 i j 2 j . (67)
7.1.3 BITWISE TRICKS AND TECHNIQUES 13
It uses two shifts and ve bitwise Boolean operations, assuming that i and j depth
are given onstants. It is like ea h of the rst lines of (65), ex ept that a new Æ-swap
bit permutation++++
mask m is needed be ause y and z don't a ount for all of the bits of x. permutation networks
We an, however, do better, saving one operation and one onstant: Duguid
Le Corre
y (x  (x  Æ)) & 2 j ; x x  y  (y  Æ): (68) Slepian
Benes
The rst assignment now puts xi  xj into position j ; the se ond hanges xi to
xi  (xi  xj ) and xj to xj  (xi  xj ), as desired. In general it's often wise to
onvert a problem of the form \ hange x to f (x)" into a problem of the form
\ hange x to x  g(x)," sin e the bit-di eren e g(x) might be easy to al ulate.
On the other hand, there's a sense in whi h (67) might be preferable to (68),
be ause the assignments to y and z in (67) an sometimes be performed simulta-
neously. When expressed as a ir uit, (67) has a depth of 4 while (68) has depth 5.
Operation (68) an of ourse be used to swap several pairs of bits simulta-
neously, when we use a mask  that's more general than 2 j :
y (x  (x  Æ)) & ; x x  y  (y  Æ): (69)
Let us all this operation a \Æ-swap," be ause it allows us to swap any non-
overlapping pairs of bits that are Æ pla es apart. The mask  has a 1 in the right-
most position of ea h pair that's supposed to be swapped. For example, (69) will
swap the leftmost 25 bits of a 64-bit word with the rightmost 25 bits, while leav-
ing the 14 middle bits untou hed, if we let Æ = 39 and  = 225 1 = # 1ffffff .
Indeed, there's an astonishing way to reverse 64 bits using Æ-swaps, namely
y (x  1) & 0 ; z (x & 0 )  1; x y j z;
y (x  (x  4)) & # 0300 0303030 303 ; x x  y  (y  4);
y (x  (x  8)) & # 00 0300 03f0003f ; x x  y  (y  8); (70)
y (x  (x  20)) & # 00000ff 00003fff ; x x  y  (y  20);
x (x  34) j ((x  30) mod 264 );
saving two of the bitwise operations in (65) even though (65) looks \optimum."
*Bit permutation in general. The methods we've just seen an be extended to
obtain an arbitrary permutation of the bits in a register. In fa t, there always ex-
ist masks 0 , : : : , 5 , ^4 , : : : , ^0 su h that the following operations transform x =
(x63 : : : x1 x0 )2 into any desired rearrangement x = (x63 : : : x1 x0 )2 of its bits:
x 2 k -swap of x with mask k , for k = 0, 1, 2, 3, 4, 5;
(71)
x 2 k -swap of x with mask ^k , for k = 4, 3, 2, 1, 0.
In general, a permutation of 2d bits an be a hieved with 2d 1 su h steps,
using appropriate masks k , ^k , where the swap distan es are respe tively 2 0 ,
21 , : : : , 2d 1 , : : : , 2 1 , 2 0 .
To prove this fa t, we an use a spe ial ase of the permutation networks
dis overed independently by A. M. Duguid and J. Le Corre in 1959, based on
earlier work of D. Slepian [see V. E. Benes, Mathemati al Theory of Conne ting
Networks and Telephone TraÆ (New York: A ademi Press, 1965), Se tion 3.3℄.
14 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Figure 12 shows a permutation network P (2n) for 2n elements onstru ted from rearrangeable networks, see perm networks
rossbar module
two permutation networks for n elements, when n = 4. Ea h ` ' onne tion graph
between two lines represents a rossbar module that either leaves the line ontents bipartite graph
unaltered or inter hanges them, as the data ows from left to right. Every setting
of the individual rossbars therefore auses P (2n) to produ e a permutation of
its inputs; onversely, we wish to show that any permutation of the 2n inputs
an be a hieved by some setting of the rossbars.
The onstru tion of Fig. 12 is best understood by onsidering an example.
Suppose we want to route the inputs (0; 1; 2; 3; 4; 5; 6; 7) to (3; 2; 4; 1; 6; 0; 5; 7),
respe tively. The rst job is to determine the ontents of the lines just after the
rst olumn of rossbars and just before the last olumn, sin e we an then use
a similar method to set the rossbars in the inner P (4)'s. Thus, in the network
0 a A 3
1 b B 2
2 C 4
3 d D 1 (72)
4 e E 6
5 f F 0
6 g G 5
7 h H 7
we want to nd permutations ab defgh and ABCDEFGH su h that fa; bg = f0; 1g,
f ; dg = f2; 3g, : : : , fg; hg = f6; 7g, fa; ; e; gg = fA; C; E; Gg, fb; d; f; hg =
fB; D; F; Hg, fA; Bg = f3; 2g, fC; Dg = f4; 1g, : : : , fG; Hg = f5; 7g. Starting at
the bottom, let us hoose h = 7, be ause we don't wish to disturb the ontents
of that line unless ne essary. Then the following hoi es are for ed :
H = 7; G = 5; e = 5; f = 4; D = 4; C = 1; a = 1; b = 0; F = 0; E = 6; g = 6: (73)
If we had hosen h = 6, the for ing pattern would have been similar but reversed,
F = 6; E = 0; a = 0; b = 1; D = 1; C = 4; e = 4; f = 5; H = 5; G = 7; g = 7: (74)
Options (73) and (74) an both be ompleted by hoosing either d = 3 (hen e
B = 3, A = 2, = 2) or d = 2 (hen e B = 2, A = 3, = 3).
In general the for ing pattern will go in y les, no matter what permutation
we begin with. To see this, onsider the graph on eight verti es fab, d, ef, gh,
AB, CD, EF, GHg that has an edge from uv to UV whenever the pair of inputs
onne ted to uv has an element in ommon with the pair of outputs onne ted
to UV. Thus, in our example the edges are ab EF, ab CD, d AB,
d AB, ef CD, ef GH, gh EF, gh GH. We have a \double bond"
between d and AB, sin e the inputs onne ted to and d are exa tly the outputs
onne ted to A and B; subje t to this slight bending of the stri t de nition of
a graph, we see that ea h vertex is adja ent to exa tly two other verti es, and
lower ase verti es are always adja ent to upper ase ones. Therefore the graph
7.1.3 BITWISE TRICKS AND TECHNIQUES 15
rossbar modules y les in a graph
transpose
matrix transposition

z
{

2n outputs
2n inputs

}|
}|

P (n)

{
z

P (n)
P (2n)
Fig. 12. The inside of a bla k box P (2n) that permutes 2n elements
in all possible ways, when n > 1. (Illustrated for n = 4.)
always onsists of disjoint y les of even length. In our example, the y les are
EF gh
ab GH d AB ; (75)
CD ef
where the longer y le orresponds to (73) and (74). If there are k di erent
y les, there will be 2k di erent ways to spe ify the behavior of the rst and last
olumns of rossbars.
To omplete the network, we an pro ess the inner 4-element permutations
in the same way; and any 2d -element permutation is a hievable in this same
re ursive fashion. The resulting rossbar settings determine the masks j and ^j
of (71). Some hoi es of rossbars may lead to a mask that is entirely zero; then
we an eliminate the orresponding stage of the omputation.
If the input and output are identi al on the bottom lines of the network, our
onstru tion shows how to ensure that none of the rossbars tou hing those lines
are a tive. For example, the 64-bit algorithm in (71) ould be used also with a
60-bit register, without needing the four extra bits for any intermediate results.
Of ourse we an often beat the general pro edure of (71) in spe ial ases.
For example, exer ise 52 shows that method (71) needs nine swapping steps to
transpose an 8  8 matrix, but in fa t three swaps suÆ e:
Given 7-swap 14-swap 28-swap
00 01 02 03 04 05 06 07 00 10 02 12 04 14 06 16 00 10 20 30 04 14 24 34 00 10 20 30 40 50 60 70
10 11 12 13 14 15 16 17 01 11 03 13 05 15 07 17 01 11 21 31 05 15 25 35 01 11 21 31 41 51 61 71
20 21 22 23 24 25 26 27 20 30 22 32 24 34 26 36 02 12 22 32 06 16 26 36 02 12 22 32 42 52 62 72
30 31 32 33 34 35 36 37 21 31 23 33 25 35 27 37 03 13 23 33 07 17 27 37 03 13 23 33 43 53 63 73
40 41 42 43 44 45 46 47 40 50 42 52 44 54 46 56 40 50 60 70 44 54 64 74 04 14 24 34 44 54 64 74
50 51 52 53 54 55 56 57 41 51 43 53 45 55 47 57 41 51 61 71 45 55 65 75 05 15 25 35 45 55 65 75
60 61 62 63 64 65 66 67 60 70 62 72 64 74 66 76 42 52 62 72 46 56 66 76 06 16 26 36 46 56 66 76
70 71 72 73 74 75 76 77 61 71 63 73 65 75 67 77 43 53 63 73 47 57 67 77 07 17 27 37 47 57 67 77
16 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
The \perfe t shue" is another bit permutation that arises frequently in perfe t shue
pra ti e. If x = ( : : : x2 x1 x0 )2 and y = ( : : : y2 y1 y0 )2 are any 2-adi integers, we 2-adi integers
interleaving, see zipper fun tion, perf shue
de ne x z y (\x zip y," the zipper fun tion of x and y) by interleaving their bits: 2-dimensional data
magi mask
x z y = ( : : : x2 y2 x1 y1 x0 y0 )2 : (76) Divide and onquer
extra t and ompress
This operation has important appli ations to the representation of 2-dimensional mask
pa king
data, be ause a small hange in either x or y usually auses only a small hange Æ-shift
in x z y (see exer ise 86). Noti e also that the magi mask onstants (47) satisfy Steele
k z k = k+1 : (77)
If x appears in the left half of a register and y appears in the right half, a perfe t
shue is the permutation that hanges the register ontents to x z y.
A sequen e of d 1 swapping steps will perfe tly shue a 2d -bit register; in
fa t, exer ise 53 shows that there are several ways to a hieve this. On e again,
therefore, we are able to improve on the (2d 1)-step method of (71) and Fig. 12.
Conversely, suppose we're given the shued value z = x z y in a 2d -bit
register; is there an eÆ ient way to extra t the original value of y? Sure: If the
d 1 swaps that do a perfe t shue are performed in reverse order, they'll undo
the shue and re over both x and y. But if only y is wanted, we an save half of
the work: Start with y z & 0 ; then set y (y + (y  2k 1 )) & k for k = 1,
: : : , d 1. For example, when d = 3 this pro edure goes (0 y3 0 y2 0 y1 0 y0 )2 7!
(00 y3 y2 00 y1 y0 )2 7! (0000 y3 y2 y1 y0 )2 . \Divide and onquer" onquers again.
Consider now a more general problem, where we want to extra t and om-
press an arbitrary subset of a register's bits. Suppose we're given a 2d -bit word
z = (z2d 1 : : : z1 z0 )2 and a mask  = (2d 1 : : : 1 0 )2 that has s 1-bits; thus
 = s. The problem is to assemble the ompa t subword
y = (ys 1 : : : y1 y0 )2 = (zjs 1 : : : zj1 zj0 )2 ; (78)
where js 1 >    > j1 > j0 are the indi es where j = 1. For example, if
d = 3 and  = (10110010)2 , we want to transform z = (y3 x3 y2 y1 x2 x1 y0 x0 )2 into
y = (y3 y2 y1 y0 )2 . (The problem of going from x z y to y, onsidered above, is the
spe ial ase  = 0 .) We know from (71) that y an be found by Æ-swapping,
at most 2d 1 times; but in this problem the relevant data always moves to the
right, so we an speed things up by doing shifts instead of swaps.
Let's say that a Æ-shift of x with mask  is the operation

x x  (x  (x  Æ)) &  ; (79)
whi h hanges bit xj to xj +Æ if  has 1 in position j , otherwise it leaves xj
un hanged. Guy Steele dis overed that there always exist masks 0 , 1 , : : : d 1
so that the general extra tion problem (78) an be solved with a few Æ-shifts:
Start with x z ; then do a 2k-shift of x with mask k , (80)
for k = 0, 1, : : : , d 1; nally set y x.
In fa t, the idea for nding appropriate masks is surprisingly simple. Every bit
that wants to move a total of exa tly l = (ld 1 : : : l1 l0 )2 pla es to the right should
be transported in the 2k-shifts for whi h lk = 1.
7.1.3 BITWISE TRICKS AND TECHNIQUES 17
For example, suppose d = 3 and  = (10110010)2 . (We must assume that sheep-and-goats
 6= 0.) Remembering that some 0s need to be shifted in from the left, we an notation z 


mappings
set 0 = (00011001)2 , 1 = (00000110)2 , 2 = (11111000)2 ; then (80) maps Chung
Wong
(y3 x3 y2 y1 x2 x1 y0 x0 )2 7! (y3 x3 y2 y2 y1 x1 y0 y0 )2 7! (y3 x3 y2 y2 y1 y2 y1 y0 )2 7! (0000 y3 y2 y1 y0 )2 : y li
masks
Exer ise 69 proves that the bits being extra ted will never interfere with ea h pi, as "random" example
re ursively
other during their journey. Furthermore, there's a sli k way to ompute the
ne essary masks k dynami ally from , in O(d 2 ) steps (see exer ise 70).
A \sheep-and-goats" operation has been suggested for omputer hardware,
extending (78) to produ e the general unshued word
z  = (xr 1 : : : x1 x0 ys 1 : : : y1 y0 )2 = (zir 1 : : : zi1 zi0 zjs 1 : : : zj1 zj0 )2 ; (81)


here ir 1 >    > i1 > i0 are the indi es where i = 0. Any permutation of 2d
bits is a hievable via at most d sheep-and-goats operations (see exer ise 73).
Shifting also allows us to go beyond permutations, to arbitrary mappings of
bits within a register. Suppose we want to transform
x = (x2d 1 : : : x1 x0 )2 7! x' = (x(2d 1)' : : : x1' x0' )2 ; (82)
d
where ' is any of the (2 ) fun tions from the set f0; 1; : : : ; 2 1g into itself.
d 2 d
K. M. Chung and C. K. Wong [IEEE Transa tions C-29 (1980), 1029{1032℄
dis overed an attra tive way to do this in O(d) steps by using y li Æ-shifts,
whi h are like (79) ex ept that we set

x x  (x  (x  Æ)  (x  (2d Æ))) &  : (83)
Their idea is to let l be the number of indi es j su h that j' = l, for 0  l < 2d .
Then they nd masks 0 , 1 , : : : , d 1 with the property that a y li 2k -shift
of x with mask k , done su essively for 0  k < d, will transform x into a
number x0 that ontains exa tly l opies of bit x l for ea h l. Finally the general
permutation pro edure (71) an be used to hange x0 7! x' .
For example, suppose d = 3 and x' = (x3 x1 x1 x0 x3 x7 x5 x5 )2 . Then we have
( 0 ; 1 ; 2 ; 3 ; 4 ; 5 ; 6 ; 7 ) = (1; 2; 0; 2; 0; 2; 0; 1). Using masks 0 = (00011100)2 ,
1 = (01001001)2 , and 2 = (00100000)2 , three y li 2k -shifts now take x =
(x7 x6 x5 x4 x3 x2 x1 x0 )2 7! (x7 x6 x5 x5 x4 x3 x1 x0 )2 7! (x7 x0 x5 x5 x5 x3 x1 x3 )2 7!
(x7 x0 x1 x5 x5 x3 x1 x3 )2 = x0 . Then, ve Æ-swaps: x0 7! (x0 x7 x5 x1 x3 x5 x3 x1 )2 7!
(x0 x7 x5 x1 x3 x1 x3 x5 )2 7! (x0 x1 x3 x1 x3 x7 x5 x5 )2 7! (x3 x1 x0 x1 x3 x7 x5 x5 )2 7!
(x3 x1 x1 x0 x3 x7 x5 x5 )2 = x' ; we're done! Of ourse any 8-bit mapping an be
a hieved more qui kly by brute for e, one bit at a time; the method of Chung
and Wong be omes mu h more impressive in a 256-bit register. Even with MMIX's
64-bit registers it's pretty good, needing P at most 96 y les in the worst ase. P
To nd P0 , we use the fa t that l = 2d , and we look at  even = 2l
and  odd = 2l+1 . If  even =  odd = 2d 1 , we an set 0 = 0 and omit the
y li 1-shift. But if, say,  even <  odd , we nd an even l with l = 0. Cy li ally
shifting the bits l, l +1, : : : , l + t (modulo 2d ) for some t will produ e new ounts
( 00 ; : : : ; 02d 1 ) for whi h 0even = 0odd = 2d 1 ; so 0 = 2 l +    + 2(l+t) mod 2d .
Then we an deal with the bits in even and odd positions separately, using the
same method, until getting down to 1-bit subwords. Exer ise 74 has the details.
18 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Working with fragmented elds. Instead of extra ting bits from various fragmented elds
parts of a word and gathering them together, we an often manipulate those bits subsets
mask
dire tly in their original positions. lexi ographi order
For example, suppose we want to run through all subsets of a given set U , sub ube
don't- ares
where (as usual) the set is spe i ed by a mask  su h that [ k 2 U ℄ = (  k)&1. asterisk odes
If x   and x 6= , there's an easy way to al ulate the next largest subset of U bit odes
s attered a umulator
in lexi ographi order, namely the smallest integer x0 > x su h that x0  : sheep-and-goats
arries
x0 = (x ) & : (84) s attered sum

In the spe ial ase when x = 0 and  6= 0, we've already seen in (37) that this for-
mula produ es the rightmost bit of , whi h orresponds to the lexi ographi ally
smallest nonempty subset of U .
Why does formula (84) work? Imagine adding 1 to the number x j , whi h
has 1s wherever  is 0. A arry will propagate through those 1s until it rea hes
the rightmost bit position where x has a 0 and  has a 1; furthermore all bits
to the right of that position will be ome zero. Therefore x0 = ((x j ) + 1) & .
But we have (x j ) + 1 = (x + ) + 1 = x + ( + 1) = x  when x  . QED.
Noti e further that x0 = 0 if and only if x = . So we'll know when we've
found the largest subset. Exer ise 79 shows how to go ba k to x, given x0 .
We might also want to run through all elements of a sub ube | for example,
to nd all bit patterns that mat h a spe i ation like 10101, onsisting of
0s, 1s, and s (don't- ares). Su h a spe i ation an be represented by asterisk
odes a = (an 1 : : : a0 )2 and bit odes b = (bn 1 : : : b0 )2 , as in exer ise 7.1.1{30;
our example orresponds to a = (10010100)2 , b = (01001001)2 . The problem of
enumerating all subsets of a set is the spe ial ase where a =  and b = 0. In
the more general sub ube problem, the su essor of a given bit pattern x is
x0 = ((x (a + b)) & a) + b: (85)
Suppose the bits of z = (zn 1 : : : z0 )2 have been stit hed together from two
subwords x = (xr 1 : : : x0 )2 and y = (ys 1 : : : y0 )2 , where r + s = n, using
an arbitrary mask  for whi h  = s to govern the stit hing. For example,
z = (y2 x4 x3 y1 x2 y0 x1 x0 )2 when n = 8 and  = (10010100)2 . We an think of z
as a \s attered a umulator," in whi h alien bits xi lurk among friendly bits yj .
From this viewpoint the problem of nding su essive elements of a sub ube is
essentially the problem of omputing y + 1 inside a s attered a umulator z ,
without hanging the value of x. The sheep-and-goats operation (81) would
untangle x and y; but it's expensive, and (85) shows that we an solve the
problem without it. We an, in fa t, ompute y + y0 when y0 = (ys0 1 : : : y00 )2
is any value inside a s attered a umulator z 0 , if y and y0 both appear in the
positions spe i ed by  : Consider t = z &  and t0 = z 0 & . If we form the
sum (t j ) + t0 , all arries that o ur in a normal addition y + y0 will propagate
through the blo ks of 1s in , just as if the s attered bits were adja ent. Thus
((z & ) + (z 0 j )) &  (86)
is the sum of y and y0 , modulo 2s , s attered a ording to the mask .
7.1.3 BITWISE TRICKS AND TECHNIQUES 19
Tweaking several bytes at on e. Instead of on entrating on the data in one pa ked data, operating on++
eld within a word, we often want to deal simultaneously with two or more sub- Lamport
parallel pro essing of subwords
words, performing al ulations on ea h of them in parallel. For example, many multibyte pro essing
appli ations need to pro ess long sequen es of bytes, and we an gain speed by multibyte addition
arries
a ting on eight bytes at a time; we might as well use all 64 bits that our ma hine averages
provides. General multibyte te hniques were introdu ed by Leslie Lamport in Dietz
radix-2 addition
CACM 18 (1975), 471{475, and subsequently extended by many programmers. MMIX
MOR
Suppose rst that we simply wish to take two sequen es of bytes and nd shift instru tions
their sum, regarding them as oordinates of ve tors, doing arithmeti mod- arry bits
ulo 256 in ea h byte. Algebrai ally speaking, we're given 8-byte ve tors x = parallel pro essors
SIMD ar hite ture
(x7 : : : x1 x0 )256 and y = (y7 : : : y1 y0 )256 ; we want to ompute z = (z7 : : : z1 z0 )256 , Unger
where zj = (xj + yj ) mod 256 for 0  j < 8. Ordinary addition of x to y doesn't SWAR
Fisher
quite work, be ause we need to prevent arries from propagating between bytes. Dietz
So we separate out the high-order bits and deal with them separately:
z (x  y) & h; where h = # 8080808080808080 ;
z ((x & h ) + (y & h ))  z: (87)
The total time for MMIX to do this is 6, plus 3 +3 if we also ount the time to
load x, load y, and store z . By ontrast, eight one-byte additions (LDBU, LDBU,
ADDU, and STBU, repeated eight times) would ost 8  (3 + 4 ) = 24 + 32 .
Parallel subtra tion of bytes is just as easy (see exer ise 88).
We an also ompute bytewise averages, with zj = b(xj + yj )=2 for ea h j :
z ((x  y) & l)  1; where l = # 0101010101010101 ;
z (x & y) + z: (88)
This elegant tri k, suggested by H. G. Dietz, is based on the well-known formula
x + y = (x  y) + ((x & y)  1) (89)
for radix-2 addition. (We an implement (88) with four MMIX instru tions, not
ve, be ause a single MOR operation will hange x  y to ((x  y) & l)  1.)
Exer ises 88{93 and 100{104 develop these ideas further, showing how to do
mixed-radix arithmeti , as well as su h things as the addition and subtra tion of
ve tors whose omponents are treated modulo m when m needn't be a power of 2.
In essen e, we an regard the bits, bytes, or other sub elds of a register as if
they were elements of an array of independent mi ropro essors, a ting indepen-
dently on their own subproblems yet tightly syn hronized, and ommuni ating
with ea h other via shift instru tions and arry bits. Computer designers have
been interested for many years in the development of parallel pro essors with a
so- alled SIMD ar hite ture, namely a \Single Instru tion stream with Multiple
Data streams"; see, for example, S. H. Unger, Pro . IRE 46 (1958), 1744{1750.
The in reased availability of 64-bit registers has meant that programmers of
ordinary sequential omputers are now able to get a taste of SIMD pro essing.
Indeed, omputations su h as (87), (88), and (89) are alled SWAR methods |
\SIMD Within A Register," a name oined by R. J. Fisher and H. G. Dietz [see
Le ture Notes in Computer S ien e 1656 (1999), 290{305℄.
20 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Of ourse bytes often ontain alphabeti data as well as numbers, and one alphabeti data
of the most ommon programming tasks is to sear h through a long string of strings
Lamport
hara ters in order to nd the rst appearan e of some parti ular byte value. For My roft
example, strings are often represented as a sequen e of nonzero bytes terminated ruler fun tion
little-endian onvention
by 0. In order to lo ate the end of a string qui kly, we need a fast way to big-endian
determine whether all eight bytes of a given word x are nonzero (be ause they dotminus
equality of bytes
usually are). Several fairly good solutions to this problem were found by Lamport newline
and others; but Alan My roft dis overed in 1987 that three instru tions a tually ag: A 1-bit indi ator
mask
suÆ e:
t h & (x l) & x; (90)
where h and l appear in (87) and (88). If ea h byte xj is nonzero, t will be zero;
for (xj 1)&xj will be 2 xj 1, whi h is always less than # 80 = 27 . But if xj = 0,
while its right neighbors xj 1 , : : : , x0 (if any) are all nonzero, the subtra tion
x l will produ e # ff in byte j , and t will be nonzero. In fa t, t will be 8j + 7.
Caution: Although the omputation in (90) pinpoints the rightmost zero
byte of x, we annot dedu e the position of the leftmost zero byte from the value
of t alone. (See exer ise 94.) In this respe t the little-endian onvention proves
to be preferable to the orresponding big-endian behavior. An appli ation that
needs to lo ate the leftmost zero byte an use (90) to skip qui kly over nonzeros,
but then it must fall ba k on a slower method when the sear h has been narrowed
down to eight nalists. The following 4-operation formula produ es a ompletely
pre ise test value t = (t7 : : : t1 t0 )256 , in whi h tj = 128[ xj = 0℄ for ea h j :
t h & (x j ((x j h) l)): (91)
The leftmost zero byte of x is now xj , where t = 8j + 7.
In identally, the single MMIX instru tion `BDIF t,l,x' solves the zero-byte
problem immediately by setting ea h byte tj of t to [ xj = 0℄, be ause 1 . x =
[ x = 0℄. But we are primarily interested here in fairly universal te hniques that
don't rely on exoti hardware; MMIX's spe ial features will be dis ussed later.
Now that we know a fast way to nd the rst 0, we an use the same ideas
to sear h for any desired byte value. For example, to test if any byte of x is the
newline hara ter (# a ), we simply look for a zero byte in x# 0a0a0a0a0a0a0a0a .
And these te hniques also open up many other doors. Suppose, for instan e,
that we want to ompute z = (z7 : : : z1 z0 )256 from x and y, where zj = xj
when xj = yj but zj = '*' when xj 6= yj . (Thus if x = "bea hing" and
y = "bel hing", we're supposed to set z "be* hing".) It's easy:
t h & ((x  y) j (((x  y) j h) l));
m (t  1) (t  7); (92)
z x  ((x  "********") & m):
The rst step uses (91) to ag the high-order bits in ea h byte where xj 6= yj .
The next step reates a mask that highlights those bytes; the mask is # 00 if
xj = yj and # ff otherwise. And the last step, whi h ould also be written z
(x & m) j ("********" & m), sets zj xj or zj '*', depending on the mask.
7.1.3 BITWISE TRICKS AND TECHNIQUES 21
Operations (90) and (91) were originally designed as tests for bytes that are omparison of bytes
zero; but a loser look reveals that we an more wisely regard them as tests for bytes, testing relative order of
median operation
bytes that are less than 1. Indeed, if we repla e l by  l = ( )256 in 
either formula, where is any positive onstant  128, we an use (90) or (91) 
Berlekamp
to see if x ontains any bytes that are less than . Furthermore the omparison Ramshaw
values need not be the same in every byte position; and with a bit more work 2-adi
we an also do bytewise omparison in the ases where > 128. Here's an 8-step
formula that sets tj 128[ xj < yj ℄ for ea h byte position j in the test word t :
t h & hx yz i; where z = (x j h) (y & h ). (93)
(See exer ise 96.) The median operation in this general formula an often be
simpli ed; for example, (93) redu es to (91) when y = l, be ause hx1z i = x j z .
On e we've found a nonzero t in (90) or (91) or (93), we might want to
ompute t or t in order to dis over the index j of the rightmost or leftmost
byte that has been agged. The problem of al ulating  or  is now simpler
than before, sin e t an take on only 256 di erent values. Indeed, the operation
j table [((a  t) mod 264 )  56℄; where a = 7
256 1 , (94)
2 1
now suÆ es to ompute j , given an appropriate 256-byte table. And the mul-
tipli ation here an often be performed faster by doing three shift-and-add
operations, \t t + (t  7), t t + (t  14), t t + (t  28)," instead.
Broadword omputing. We've now seen more than a dozen ways in whi h
a omputer's bitwise operations an produ e astonishing results at high speed,
and the exer ises below ontain many more su h surprises.
Elwyn Berlekamp has remarked that omputer hips ontaining N ip- ops
ontinue to be built with ever larger values of N, yet in pra ti e only O(log N ) of
those omponents are ipping or opping at any given moment. The surprising
e e tiveness of bitwise operations suggests that omputers of the future might
make use of this untapped potential by having enhan ed memory units that are
able to do eÆ ient n-bit omputations for fairly large values of n. To prepare for
that day, we ought to have a good name for the on ept of manipulating \wide
words." Lyle Ramshaw has suggested the pleasant term broadword, so that we
an speak of n-bit quantities as broadwords of width n.
Many of the methods we've dis ussed are 2-adi , in the sense that they work
orre tly with binary numbers that have arbitrary (even in nite) pre ision. For
example, the operation x & x always extra ts 2 x , the least signi ant 1 bit of
any nonzero 2-adi integer x. But other methods have an inherently broadword
nature, su h as the methods that use O(d) steps to perform sideways addition
or bit permutation of 2d -bit words. Broadword omputing is the art of dealing
with n-bit words, when n is a parameter that is not extremely small.
Some broadword algorithms are of theoreti al interest only, be ause they are
eÆ ient only in an asymptoti sense when n ex eeds the size of the universe. But
others are eminently pra ti al even when n = 64. And in general, a broadword
mindset often suggests good te hniques.
22 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
One fas inating-but-impra ti al fa t about broadword operations is the dis- Fredman
overy by M. L. Fredman and D. E. Willard that O(1) broadword steps suÆ e Willard
Brodal
to evaluate the fun tion x = blg x for any nonzero n-bit number x, no matter Paterson
how big n is. Here is their remarkable s heme, when n = g2 and g is a power of 2: Knuth, DE
pattern
storage allo ation
t1 h & (x j ((x j h) l)); where h = 2g 1 l and l = (2n 1)=(2g 1);
y (((a  t1 ) mod 2n )  (n g))  l; where a = (2n g 1)=(2g 1 1);
t2 h & (y j ((y j h) b)); where b = (2n+g 1)=(2g+1 1);
m (t2  1) (t2  (g 1)); m m  (m  g); (95)
z (((l  (x & m)) mod 2n )  (n g))  l;
t3 h & (z j ((z j h) b));
 ((l  ((t2  (2g lg g 1)) + (t3  (2g 1)))) mod 2n )  (n g):
(See exer ise 106.) The method fails to be pra ti al be ause ve of these 29 steps
are multipli ations, so they aren't really \bitwise" operations. In fa t, we'll prove
later that multipli ation by a onstant requires at least
(log n) bitwise steps.
A multipli ation-free way to nd x, with only O(log log n) bitwise broad-
word operations, was dis overed in 1997 by Gerth Brodal, whose method is even
more remarkable than (95). It is based on a formula analogous to (49),
x = [ x = (x & 0 )℄ + 2[ x = (x & 1 )℄ + 4[ x = (x & 2 )℄ +    ; (96)
and the fa t that the relation x = y is easily tested (see (58)):
Algorithm B (Binary logarithm ). This algorithm uses n-bit operations to
ompute x = blg x , assuming that 0 < x < 2n and n = d  2d .
k
B1. [S ale down.℄ Set  0. Then set   + 2k and x x  2k if x  2 2 ,
for dlg ne > k  d.
d
B2. [Repli ate.℄ (At this point 0 < x < 2 2 ; the remaining task is to in rease
 by blg x . We will repla e x by d opies of itself, in 2d -bit elds.) Set
x x j (x  2d+k ) for 0  k < dlg de.
B3. [Change leading bits.℄ Set y x & (d;d 1 : : : d;1 d;0 )2 2d . (See (48).)
B4. [Compare all elds.℄ Set t h & (y j ((y j h) (x  y))), where h =
d 1 d 1 2d 1
(2 2 :::2 2 2 )2 2d .
B5. [Compress bits.℄ Set t (t + (t  (2d+k 2k ))) mod 2n for 0  k < dlg de.
B6. [Finish.℄ Finally, set   + (t  (n d)).
This algorithm is a tually ompetitive with (56) when n = 64 (see exer ise 107).
Another surprisingly eÆ ient broadword algorithm was dis overed in 2006
by M. S. Paterson and the author, who onsidered the problem of identifying
all o urren es of the pattern 01r in a given n-bit binary string. This problem,
whi h is related to storage allo ation, is equivalent to omputing
q = x & (x  1) & (x  2) & (x  3) &    & (x  r) (97)
7.1.3 BITWISE TRICKS AND TECHNIQUES 23
when x = (xn 1 : : : x1 x0 )2 is given. For example, when n = 16, r = 3, and 2-adi hain++++
x = (1110111101100111)2 , we have q = (0001000000001000)2 . One might expe t broadword hain++++
bran hless+++
intuitively that
(log r) bitwise operations would be needed. But in fa t the table lookup by shifting
following
P 20-step omputation does the job for all nP> r > 0: Let s = dr=2e,
l = k0 2ks mod 2n , h = (2s 1 l) mod 2n , and a = k0 ( 1)k+1 22ks mod 2n .
y h & x & ((x & h ) + l);
t (x + y) & x & 2r ;
u t & a; v t & a; (98)
m (u (u  r)) j (v (v  r));
q t & ((x & m) + ((t  r) & (m  1))):
Exer ise 111 explains why these ma hinations are valid. The method has little
or no pra ti al value; there's an easy way to evaluate (97) in 2dlg re + 2 steps,
so (98) is not advantageous until r > 512. But (98) is another indi ation of the
unexpe ted power of broadword methods.
*Lower bounds. Indeed, the existen e of so many tri ks and te hniques makes
it natural to wonder whether we've only been s rat hing the surfa e. Are there
many more in redibly fast methods, still waiting to be dis overed? A few
theoreti al results are known by whi h we an derive ertain limitations on what
is possible, although su h studies are still in their infan y.
Let's say that a 2-adi hain is a sequen e (x0 ; x1 ; : : : ; xr ) of 2-adi integers
in whi h ea h element xi for i > 0 is obtained from its prede essors via bitwise
manipulation. More pre isely, we want the steps of the hain to be de ned by
binary operations
xi = xj (i) Æi xk(i) or i Æi xk(i) or xj (i) Æi i ; (99)
where ea h Æi is one of the operators f+; ; &; j ; ; ; ; ; ; ; ^; _; ; g
and ea h i is a onstant. Furthermore, when the operator Æi is a left shift or
right shift, the amount of shift must be a positive integer onstant; operations
su h as xj (i) xk(i) or i xk(i) are not permitted. (Without the latter restri tion
we ouldn't derive meaningful lower bounds, be ause every 0{1 valued fun tion
of a nonnegative integer x would be omputable in two steps as \(  x) & 1"
for some onstant .)
Similarly, a broadword hain of width n, also alled an n-bit broadword
hain, is a sequen e (x0 ; x1 ; : : : ; xr ) of n-bit numbers subje t to essentially the
same restri tions, where n is a parameter and all operations are performed
modulo 2n . Broadword hains behave like 2-adi hains in many ways, but
subtle di eren es an arise be ause of the information loss that o urs at the left
of n-bit omputations (see exer ise 113).
Both types of hains ompute a fun tion f (x) = xr when we start them
out with a given value x = x0 . Exer ise 114 shows that an mn-bit broadword
hain is able to do m essentially simultaneous evaluations of any fun tion that
is omputable with an n-bit hain. Our goal is to study the shortest hains that
are able to evaluate a given fun tion f .
24 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Any 2-adi or broadword hain (x0 ; x1 ; : : : ; xr ) has a sequen e of \shift sets" shift sets
(S0 ; S1 ; : : : ; Sr ) and \bounds" (B0 ; B1 ; : : : ; Br ), de ned as follows: Start with division, by 10
monus
S0 = f0g and B0 = 1; then for i  1, let
8
> Sj (i) [ Sk(i) ; 8
> Mi Bj (i) Bk(i) ; if xi = xj (i) Æi xk(i) ,
>
> S
< k(i) ; >
< i Bk(i) ;
> M if xi = i Æi xk(i) ,
Si = S j (i) ; and Bi = > i j (i)
M B ; if xi = xj (i) Æi i , (100)
i = xj (i)  i ,
>
>
> S
: j (i) + i ; >
> B
: j (i) ; if x
Sj (i) i ; Bj (i) ; if xi = xj (i)  i ,
where Mi = 2 if Æi 2 f+; g and Mi = 1 otherwise, and these formulas assume
that Æi 2= f; g. For example, onsider the following 7-step hain:
xi Si Bi
x0 = x f0g 1
x1 = x0 & 2 f0g 1
x2 = x1 + 2 f0g 2
x3 = x2  1 f1g 2 (101)
x4 = x2 + x3 f0; 1g 8
x5 = x4  4 f4; 5g 8
x6 = x4 + x5 f0; 1; 4; 5g 128
x7 = x6  4 f4; 5; 8; 9g 128
(We en ountered this hain in exer ise 4.4{9, whi h proved that these operations
will yield x7 = bx=10 for 0  x < 160 when performed with 8-bit arithmeti .)
To begin a theory of lower bounds, let's noti e rst that the high-order bits
of x = x0 annot in uen e any low-order bits unless we shift them to the right.
Lemma A. Given a 2-adi or broadword hain, let the binary representation of
xi be ( : : : xi2 xi1 xi0 )2 . Then bit xip an depend on bit x0q only if q  p +max Si .
Proof. By indu tion on i we an in fa t show that, if Bi = 1, bit xip an depend
on bit x0q only if q p 2 Si . Addition and subtra tion, whi h for e Bi > 1,
allow any parti ular bit of their operands to a e t all bits that lie to the left in
the sum or di eren e, but not those that lie to the right.
Corollary I. The fun tion x . 1 annot be omputed by a 2-adi hain, nor
an any fun tion for whi h at least one bit of f (x) depends on an unbounded
number of bits of x.
Corollary W. An n-bit fun tion f (x) an be omputed by an n-bit broadword
hain without shifts if and only if x  y (modulo 2p) implies f (x)  f (y)
(modulo 2 p ) for 0  p < n.
Proof. If there are no shifts we have Si = f0g for all i. Thus bit xrp annot
depend on bit x0q unless q  p. In other words we must have xr  yr (modulo 2 p )
whenever x0  y0 (modulo 2 p ).
Conversely, all su h fun tions are a hievable by a suÆ iently long hain.
Exer ise 119 gives shift-free n-bit hains for the fun tions
fpy (x) = 2 p [ x mod 2 p+1 = y ℄; when 0  p < n and 0  y < 2 p+1 , (102)
7.1.3 BITWISE TRICKS AND TECHNIQUES 25
from whi h all the relevant fun tions arise by addition. [H. S. Warren, Jr., gener- Warren
alized this result to fun tions of m variables in CACM 20 (1977), 439{441.℄ arry
ruler fun tion
Shift sets Si and bounds Bi are important hie y be ause of a fundamental binary logarithm
reversal
lemma that is our prin ipal tool for proving lower bounds: bit permutation
Lemma B. Let Xpqr = fxr & b2 p 2q j x0 2 Vpqr g in an n-bit broadword hain,
where
Vpqr = fx j x & b2 p+s 2q+s = 0 for all s 2 Sr g (103)
and p > q. Then jXpqr j  Br . (Here p and q are integers, possibly negative.)
This lemma states that at most Br di erent bit patterns xr(p 1) : : : xrq an o ur
within f (x), when ertain intervals of bits in x are onstrained to be zero.
Proof. The result ertainly holds when r = 0. Otherwise if, for example, xr =
xj + xk , we know by indu tion that jXpqj j  Bj and jXpqk j  Bk . Furthermore
Vpqr = Vpqj \ Vpqk , sin e Sr = Sj [ Sk . Thus at most Bj Bk possibilities for
(xj + xk ) & b2 p 2q arise when there's no arry into position q, and at most
Bj Bk when there is a arry, making a grand total of at most Br = 2Bj Bk
possibilities altogether. Exer ise 122 onsiders the other ases.
We now an prove that the ruler fun tion needs
(log log n) steps.
Theorem R. If n = d  2d , every n-bit broadword hain that omputes x for
0 < x < 2n has more than lg d steps that are not shifts.
l
Proof. If there are l nonshift steps, we have jSr j  2 l and Br  22 1 . Apply
Lemma B with p = d and q = 0, and suppose jXd 0r j = 2d t. Then there are t
values of k < 2d su h that
f2k ; 2k+2d ; 2k+22d ; : : : ; 2k+(d 1)2d g 2= Vd 0r :
But Vd 0r ex ludes at most 2 l d of the n possible powers of 2; so t  2 l .
If l  lg d, Lemma B tells us that 2d t  Br  2d 1 ; hen e 2d 1  t 
2  d. But this is impossible unless d  2, when the theorem learly holds.
l

The same proof works also for the binary logarithm fun tion:
Corollary L. If n = d  2d > 2, every n-bit broadword hain that omputes x
for 0 < x < 2n has more than lg d steps that are not shifts.
By using Lemma B with q > 0 we an derive the stronger lower bound

(log n) for bit reversal, and hen e for bit permutation in general.
Theorem P. If 2  g  n, every n-bit broadword   hain that omputes the
g-bit reversal xR for 0  x < 2g has at least 31 lg g steps that are not shifts.
p
Proof. Assume as above that there are l nonshifts. Let h = b 3 g and suppose
that l < blg(h + 1) . Then Sr is a set of at most 2 l  12 (h + 1) shift amounts s.
We shall apply Lemma B with p = q + h, where p  g and q  0, thus in g h +1
ases altogether. The key observation is that xR & b2 p 2q is independent of
x & b2 p+s 2q+s whenever there are no indi es j and k su h that 0  j; k < h
and g 1 q j = q + s + k. The number of \bad" hoi es of q for whi h su h
26 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
indi es exist is at most 21 (h + 1)h2  g h; therefore at least one \good" hoi e HAKMEM
of q yields jXpqr j = 2h . But then Lemma B leads to a ontradi tion, be ause we S hroeppel
abstra t redu ed-instru tion-set omputer
obviously annot have 2h  Br  2(h 1)=2 . basi RAM
two's omplement notation
Corollary M. Multipli ation by ertain onstants, modulo 2n, requires
(log n) program ounter
steps in an n-bit broadword hain. ruler fun tion

Proof. In Ha k 167 of the lassi memorandum HAKMEM (M.I.T. A.I. Lab-


oratory, 1972), Ri hard S hroeppel observed that the operations
t ((ax) mod 2n ) & b; y (( t) mod 2n )  (n g) (104)
ompute y = x whenever n = g and 0  x < 2 , using the onstants a =
R 2 g
(2n+g 1)=(2g+1 1), b = 2g 1 (2n 1)=(2g 1), and = (2n g 1)=(2g 1 1).
(See exer ise 123.)
At this point the reader might well be thinking, \Okay, I agree that broad-
word hains sometimes have to be asymptoti ally long. But programmers needn't
be sha kled by su h hains; we an use other te hniques, like onditional bran hes
or referen es to pre omputed tables, whi h go beyond those restri tions."
Right. And we're in lu k, be ause broadword theory an also be extended
to more general models of omputation. Consider, for example, the follow-
ing idealization of an abstra t redu ed-instru tion-set omputer, alled a basi
RAM : The ma hine has n-bit registers r1 , : : : , r l , and n-bit memory words
fM [0℄; : : : ; M [2m 1℄g. It an perform the instru tions
ri rj  rk ; ri rj Æ rk ; ri rj  rk ; ri ;
ri M [rj mod 2m ℄; M [rj mod 2m ℄ ri ; (105)
where Æ is any bitwise Boolean operator, and where rk in the shift instru tion is
treated as a signed integer in two's omplement notation. The ma hine is also
able to bran h if ri  rj , treating ri and rj as unsigned integers. Its state is the
entire ontents of all registers and memory, together with a \program ounter"
that points to the urrent instru tion. Its program begins in a designated state,
whi h may in lude pre omputed tables in memory, and with an n-bit input
value x in register r1 . This initial state is alled Q(x; 0), and Q(x; t) denotes the
state after t instru tions have been performed. When the ma hine stops, r1 will
ontain some n-bit value f (x). Given a fun tion f (x), we want to nd a lower
bound on the least t su h that r1 is equal to f (x) in state Q(x; t), for 0  x < 2n .
Theorem R . Let  = 2 e. A basi n-bit RAM with memory parameter m 
0

n1  requires at least lg lg n e steps to evaluate the ruler fun tion x, as n ! 1.


e+f e+f 2f
Proof. Let n = 22 , so that m  22 . Exer ise 124 explains how an
omnis ient observer an onstru t a broadword hain from a ertain lass of
inputs x, in su h a way that ea h x auses the RAM to take the same bran hes,
use the same shift amounts, and refer to the same memory lo ations. Our earlier
methods an then be used to show that this hain has length  f .
A skepti al reader may still obje t that Theorem R0 has no pra ti al value,
be ause lg lg n never ex eeds 6 in the real world. To this argument there is no
rebuttal. But the following result is slightly more relevant:
7.1.3 BITWISE TRICKS AND TECHNIQUES 27
Theorem P0 . A basi n-bit RAM requires at least 31 lg g steps to ompute the Brodnik
g-bit reversal xR for 0  x < 2g , if g  n and Miltersen
Munro
max(m; 1 + lg n) < 2blg(hh++1)1 2 ;
p
h = b 3 g : (106)
sideways addition
parity fun tion
majority fun tion
graph algorithms+
Proof. An argument like the proof of Theorem R0 appears in exer ise 125. rea hability problem
Lemma B and Theorems R, P, R0 , P0 and their orollaries are due to oriented paths
transitive losure
A. Brodnik, P. B. Miltersen, and J. I. Munro, Le ture Notes in Comp. S i. notation u  v
!
1272 (1997), 426{439, based on earlier work of Miltersen in Le ture Notes in garbage
Comp. S i. 1099 (1996), 442{451.
Many unsolved questions remain (see exer ises 126{130). For example, does
sideways addition require
(log n) steps in an n-bit broadword hain? Can the
parity fun tion (x) mod 2, or the majority fun tion (x) > n=2, be omputed
broadwordwise in O(log log n) steps | or even perhaps in onstant time?
An appli ation to dire ted graphs. Now let's use some of what we've learned,
by implementing a simple algorithm. Given a digraph on a set of verti es V , we
write u ! v when there's an ar from u to v. The rea hability problem is to
nd all verti es that lie on oriented paths beginning in a spe i ed set Q  V ;
in other words, we seek the set
R = f v j u ! v for some u 2 Q g; (107)

where u ! v means that there is a sequen e of t ar s
u = u0 ! u1 !    ! ut = v; for some t  0. (108)
This problem arises frequently in pra ti e. For example, we en ountered it in
Se tion 2.3.5 when marking all elements of Lists that are not \garbage."
If the number of verti es is small, say jV j  64, we may want to approa h
the rea hability problem in quite a di erent way than we did before, by working
dire tly with subsets of verti es. Let
S [u℄ = f v j u ! v g (109)
be the set of su essors of vertex u, for all u 2 V . Then the following algorithm
is almost ompletely di erent from Algorithm 2.3.5E, yet it solves the same
abstra t problem:
Algorithm R (Rea hability ). Given a dire ted graph, represented by the
su essor sets S [u℄ in (109), this algorithm omputes the elements R that are
rea hable from a given set Q.
R1. [Initialize.℄ Set R Q and X ;. (In the following steps, X is the subset
of verti es u 2 R for whi h we've looked at S [u℄.)
R2. [Done?℄ If X = R, the algorithm terminates.
R3. [Examine another vertex.℄ Let u be an element of R n X . Set X X [fug,
R R [ S [u℄, and return to step R2.
The algorithm is orre t be ause (i) every element pla ed into R is rea hable;
(ii) every rea hable element uj in (108) is present in R, by indu tion on j ; and
(iii) termination eventually o urs, be ause step R3 always in reases jX j.
28 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
To implement Algorithm R we will assume that V = f0; 1; : : : ; n 1P g, with adja en y matrix
n  64. The set X is onveniently represented by the integer (X ) = f2u j little-endian order
ruler fun tion
u 2 X g, and the same onvention works ni ely for the other sets Q, R, and en oding of ternary data
S [u℄. Noti e that the bits of S [0℄, S [1℄, : : : , S [n 1℄ are essentially the adja en y representing three states with two bits+++
mapping three items into two-bit odes+++
matrix of the given digraph, as explained in Se tion 7, but in little-endian order:
The \diagonal" elements, whi h tell us whether or not u 2 S [u℄, go from right to
left. For example, if n = 3 and the ar s are f0 ! 0; 0 ! 1; 1 ! 0; 2 ! 0g, we have 
S [0℄ = (011)2 and S [1℄ = S [2℄ = (001)2 , while the adja en y matrix is 110 100 .
100
Step R3 allows us to hoose any element of R n X , so we use the ruler fun tion
u  ((R) (X )) to hoose the smallest. The bitwise operations require no
further tri kery when we adapt the algorithm to MMIX:
Program R (Rea hability ). The input set Q is given in register q, and ea h
su essor set S [u℄ appears in o tabyte M8 [su + 8u℄. The output set R will
appear in register r; other registers s, t, tt, u, and x hold intermediate results.
01 1H SET r,q 1 R1. Initialize. r (Q).
02 SET x,0 1 x (;).
03 JMP 2F 1 To R2.
04 3H SUBU tt,t,1 jR j R3. Examine another vertex. tt t 1.
05 SADD u,tt,t jRj u (t) [see (46)℄.
06 SLU s,u,3 jRj s 8u.
07 LDOU s,su ,s jRj s (S [u℄).
08 ANDN tt,t,tt jRj tt t & tt = 2u .
09 OR x,x,tt jRj X X [ fug; that is, x x j 2u , sin e x = (X ).
10 OR r,r,s jRj R R [ S [u℄; that is, r r j s, sin e r = (R).
11 2H SUBU t,r,x jR j + 1R2. Done? t r x =  (R n X ), sin e X  R.
12 PBNZ t,3B jRj + 1 To R3 if R 6= X .
The total running time is ( + 9) jRj + 7. By ontrast, exer ise 131 imple-
ments Algorithm R with linked lists; the overall exe ution time P then grows to
(3S + 4jRj 2jQj + 1) + (5S + 12jRj 5jQj + 4), where S = u2R jS [u℄j. (But
of ourse that program is also able to handle graphs with millions of verti es.)
Exer ise 132 presents another instru tive algorithm where bitwise operations
work ni ely on not-too-large graphs.
Appli ation to data representation. Computers are binary, but (alas?)
the world isn't. We often must nd a way to en ode nonbinary data into 0s
and 1s. One of the most ommon problems of this sort is to hoose an eÆ ient
representation for items that an be in exa tly three di erent states.
Suppose we know that x 2 fa; b; g, and we want to represent x by two
bits xl xr . We ould, for example, map a 7! 00, b 7! 01, and 7! 10. But there
are many other possibilities | in fa t, 4 hoi es for a, then 3 hoi es for b, and
2 for , making 24 altogether. Some of these mappings might be mu h easier to
deal with than others, depending on what we want to do with x.
Given two elements x; y 2 fa; b; g, we typi ally want to ompute z = x Æ y,
for some binary operation Æ. If x = xl xr and y = yl yr then z = zl zr , where
zl = fl (xl ; xr ; yl ; yr ) and zr = fr (xl ; xr ; yl ; yr ); (110)
7.1.3 BITWISE TRICKS AND TECHNIQUES 29
these Boolean fun tions fl and fr of four variables depend on Æ and the hosen multipli ation of signed bits+
representation. We seek a representation that makes fl and fr easy to ompute. signed bits, representation of
don't- ares
Suppose, for example, that fa; b; g = f 1; 0; +1g and that Æ is multipli a- 2- ube equivalen e
tion. If we de ide to use the natural mapping x 7! x mod 3, namely
0 7! 00; +1 7! 01; 1 7! 10; (111)
so that x = xr xl , then the truth tables for fl and fr are respe tively
fl $ 000001010 and fr $ 000010001: (112)
(There are seven \don't- ares," for ases where xl xr = 11 and/or yl yr = 11.)
The methods of Se tion 7.1.2 tell us how to ompute zl and zr optimally, namely
zl = (xl  yl ) ^ (xr  yr ); zr = (xl  yr ) ^ (xr  yl ); (113)
unfortunately the fun tions fl and fr in (112) are independent, in the sense that
they annot both be evaluated in fewer than C (fl ) + C (fr ) = 6 steps.
On the other hand the somewhat less natural mapping s heme
+1 7! 00; 0 7! 01; 1 7! 10 (114)
leads to the transformation fun tions
fl $ 001000100 and fr $ 010111010; (115)
and three operations now suÆ e to do the desired evaluation:
zl = xr _ yr ; zr = (xl  yl ) ^ zl : (116)
Is there an easy way to dis over su h improvements? Fortunately we don't
need to try all 24 possibilities, be ause many of them are basi ally alike. For
example, the mapping x 7! xr xl is equivalent to x 7! xl xr , be ause the new
representation x0l x0r = xr xl obtained by swapping oordinates makes
fl0 (x0l ; x0r ; yl0 ; yr0 ) = zl0 = zr = fr (xl ; xr ; yl ; yr );
the new transformation fun tions fl0 and fr0 de ned by
fl0 (xl ; xr ; yl ; yr ) = fr (xr ; xl ; yr ; yl ); fr0 (xl ; xr ; yl ; yr ) = fl (xr ; xl ; yr ; yl ) (117)
have the same omplexity as fl and fr . Similarly we an omplement a oordi-
nate, letting x0l x0r = xl xr ; then the transformation fun tions turn out to be
fl0 (xl ; xr ; yl ; yr ) = fl (xl ; xr ; yl ; yr ); fr0 (xl ; xr ; yl ; yr ) = fr (xl ; xr ; yl ; yr ); (118)
and again the omplexity is essentially un hanged.
Repeated use of swapping and/or omplementation leads to eight mappings
that are equivalent to any given one. So the 24 possibilities redu e to only three,
whi h we shall all lasses I, II, and III:
Class I Class II Class III
z }| {z }| {z }| {
a 7! 00 01 10 11 00 10 01 11 00 01 10 11 00 10 01 11 00 01 10 11 00 10 01 11;
b 7! 01 00 11 10 10 00 11 01 01 00 11 10 10 00 11 01 11 10 01 00 11 01 10 00; (119)
7! 10 11 00 01 01 11 00 10 11 10 01 00 11 01 10 00 01 00 11 10 10 00 11 01.
30 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
To hoose a representation we need onsider only one representative of ea h one-to-many mapping
lass. For example, if a = +1, b = 0, and = 1, representation (111) belongs don't- ares
2- ube equivalen e
to lass II, and (114) belongs to lass I. Class III turns out to have ost 3, like don't- ares
lass I. So it appears that representation (114) is as good as any, with z omputed
by (116), for the 3-element multipli ation problem we've been studying.
Appearan es an, however, be de eiving, be ause we need not map fa; b; g
into unique two-bit odes. Consider the one-to-many mapping
+1 7! 00; 0 7! 01 or 11; 1 7! 10; (120)
where both 01 and 11 are allowed as representations of zero. The truth tables
for fl and fr are now quite di erent from (112) and (115), be ause all inputs are
legal but some outputs an be arbitrary:
fl $ 0110 and fr $ 0101111101011111: (121)
And in fa t, this approa h needs just two operations, instead of the three in (116):
zl = xl  yl ; zr = xr _ yr : (122)
A moment's thought shows that indeed, these operations obviously yield the
produ t z = x  y when the three elements f+1; 0; 1g are represented as in (120).
Su h nonunique mappings add 36 more possibilities to the 24 that we had
before. But again, they redu e under \2- ube equivalen e" to a small number of
equivalen e lasses. First there are three lasses that we all IVa , IVb , and IV ,
depending on whi h element has an ambiguous representation:
z
Class}| IVa {z
Class IV
}| b {z
Class}| IV {
a 7! 0 0 1 1 0 0 1 1 11 10 01 00 11 01 10 00 10 11 00 01 01 11 00 10;
b 7! 10 11 00 01 01 11 00 10 0 0 1 1 0 0 1 1 11 10 01 00 11 01 10 00; (123)
7! 11 10 01 00 11 01 10 00 10 11 00 01 01 11 00 10 0 0 1 1 0 0 1 1.
(Representation (120) belongs to Class IVb . Classes IVa and IV don't work well
for z = x  y.) Then there are three further lasses with only four mappings ea h:
z
Class V { z Class
}| a }| b
V { z Class }|
V {
a 7! tt tt tt tt 10 11 00 01 01 00 11 10;
b 7! 01 00 11 10 tt tt tt tt 10 11 00 01; (124)
7! 10 11 00 01 01 00 11 10 tt tt tt tt.
These lasses are a bit of a nuisan e, be ause the indetermina y in their truth
tables annot be expressed simply in terms of don't- ares as we did in (121). For
example, if we try
+1 7! 00 or 11; 0 7! 01; 1 7! 10; (125)
whi h is the rst mapping in Class Va , there are binary variables pqrst su h that
fl $ p 01q 000010 r1s01t and fr $ p10 q111101r0 s10 t: (126)
7.1.3 BITWISE TRICKS AND TECHNIQUES 31
Furthermore, mappings of lasses Va , Vb , and V almost never turn out to max
be better than the mappings of the other six lasses (see exer ise 138). Still, min
three-valued logi
representatives of all nine lasses must be examined before we an be sure that Lukasiewi z
an optimal mapping has been found. Borkowski
modal logi
In pra ti e we often want to perform several di erent operations on ternary- maybe
onjun tion
valued variables, not just a single operation like multipli ation. For example, we disjun tion
might want to ompute max(x; y) as well as x  y. With representation (120), the impli ation
best we an do is zl = xl ^ yl , zr = (xl ^ yr ) _ (xr ^ (yl _ yr )); but the \natural" groupoids, multipli ation tables
ternary ve tors
mapping (111) now shines, with zl = xl ^ yl , zr = xr _ yr . Class III turns out pa k
to have ost 4; other lasses are inferior. To hoose between lasses II, III, and masking: AND ing with a mask
IVb in this ase, we need to know the relative frequen ies of x  y and max(x; y).
And if we add min(x; y) to the mix, lasses II, III, and IVb ompute it with the
respe tive osts 2, 5, 5; hen e (111) looks better yet.
The ternary max and min operations arise also in other ontexts, su h as the
three-valued logi developed by Jan Lukasiewi z in 1917. [See his Sele ted Works,
edited by L. Borkowski (1970), 84{88, 153{178.℄ Consider the logi al values
\true," \false," and \maybe," denoted respe tively by 1, 0, and . Lukasiewi z
de ned the three basi operations of onjun tion, disjun tion, and impli ation
on these values by spe ifying the tables
y y y
(

0  1 0  1 0  1
(
0 0 0 0 0 0  1
( (
0 1 1 1
x  0   ; x    1 ; x   1 1 : (127)
1 0  1 1 1 1 1 1 0  1
x^y x_y x)y
For these operations the methods above show that the binary representation
0 7! 00;  7! 01; 1 7! 11 (128)
works well, be ause we an ompute the logi al operations thus:
xl xr ^ yl yr = (xl ^ yl )(xr ^ yr ); xl xr _ yl yr = (xl _ yl )(xr _ yr );
(129)
xl xr ) yl yr = ((xr  yr ) ^ :(xl ^ yr ))(xl ^ yr ):
Of ourse x need not be an isolated ternary value in this dis ussion; we often
want to deal with ternary ve tors x = x1 x2 : : : xn , where ea h xj is either a, b,
or . Su h ternary ve tors are onveniently represented by two binary ve tors
xl = x1l x2l : : : xnl and xr = x1r x2r : : : xnr ; (130)
where xj 7! xjl xjr as above. We ould also pa k the ternary values into two-bit
elds of a single ve tor,
x = x1l x1r x2l x2r : : : xnl xnr ; (131)
that would work ne if, say, we're doing Lukasiewi z logi with the operations ^
and _ but not ). Usually, however, the two-ve tor approa h of (130) is better,
be ause it lets us do bitwise al ulations without shifting and masking.
32 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
Appli ations to data stru tures. Bitwise operations o er many eÆ ient ways hess
to represent elements of data and the relationships between them. For example, bit board
Emde Boas
hess-playing programs often use a \bit board" to represent the positions of van Emde Boas
pie es (see exer ise 143). Kaas
Zijlstra
In Chapter 8 we shall dis uss an important data stru ture developed by impli it data stru tures{
Peter van Emde Boas for representing a dynami ally hanging subset of integers heaps
sibling
between 0 and N. Insertions, deletions, and other operations su h as \ nd the sideways heap
largest element less than x" an be done in O(log log N ) steps pwith his methods; binary tree stru tures
ruler fun tion
the general idea is to organize the
p full stru ture re ursively as N substru tures
for subsets of intervals of size N , together with an auxiliary stru ture that
tells whi h of those intervals are o upied. [See Information Pro essing Letters
6 (1977), 80{82; also P. van Emde Boas, R. Kaas, and E. Zijlstra, Math. Systems
Theory 10 (1977), 99{127.℄ Bitwise operations make those omputations fast.
Hierar hi al data an sometimes be arranged so that the links between
elements are impli it rather than expli it. For example, we studied \heaps"
in Se tion 5.2.3, where n elements of a sequential array impli itly have a binary
tree stru ture like
1 0001
2 3 0010 0011
4 5 6 7 = 0100 0101 0110 0111 (132)
8 9 10 1000 1001 1010
when, say, n = 10. (Node numbers are shown here both in de imal and binary
notation.) There is no need to store pointers in memory to relate node j of a
heap to its parent (whi h is node j  1 if j 6= 1), or to its sibling (whi h is node
j  1 if j 6= 1), or to its hildren (whi h are nodes j  1 and (j  1) + 1 if those
numbers don't ex eed n), be ause a simple al ulation leads dire tly from j to
any desired neighbor.
Similarly, a sideways heap provides impli it links for another useful family
of n-node binary tree stru tures, typi ed by
8 1000
4 12 = 0100 1100 (133)
2 6 10 0010 0110 1010
1 3 5 7 9 0001 0011 0101 0111 1001
when n = 10. (We sometimes need to go beyond n when moving from a node to
its parent, as in the path from 10 to 12 to 8 shown here.) Heaps and sideways
heaps an both be regarded as nodes 1 to n of in nite binary tree stru tures:
The heap with n = 1 is rooted at node 1 and has no leaves; by ontrast, the
sideways heap with n = 1 has in nitely many leaves 1, 3, 5, : : : , but no root(!).
The leaves of a sideways heap are the odd numbers, and their parents are the
odd multiples of 2. The grandparents of leaves, similarly, are the odd multiples
of 4; and so on. Thus the ruler fun tion j tells how high node j is above leaf level.
The parent of node j in the in nite sideways heap is easily seen to be node
(j k) j (k  1); where k = j & j ; (134)
7.1.3 BITWISE TRICKS AND TECHNIQUES 33
this quantity is j rounded to the nearest multiple of 2 1+j . And the hildren are rounded
omplete binary tree
j (k  1) and j + (k  1) (135) symmetri order
inorder
when j is even. In general the des endants of node j form a losed interval Harel
lowest ommon an estor, see Nearest ommo
[j 2 j + 1 : : j + 2 j 1℄; (136) 
Harel
arranged as a omplete binary tree of 2 1+j nodes. The an estor of node j at Tarjan
S hieber
height h is node Vishkin
oriented forest
(j j (1  h)) & (1  h) = ((j  h) j 1)  h (137) a y li digraph
an estor
when h  j . Noti e that the symmetri order of the nodes, also alled inorder, rea hability
is just the natural order 1, 2, 3, : : : . transitive losure
nearest ommon an estor
Dov Harel noted these properties in his Ph.D. thesis (U. of California, Irvine, preorder++
1980), and observed that the nearest ommon an estor of any two nodes of a
sideways heap an also be easily al ulated. Indeed, if node l is the nearest
ommon an estor of nodes i and j , where i  j , there is a remarkable identity
l = maxfx j i  x  j g = (j & i); (138)
whi h relates the  and  fun tions. (See exer ise 146.) We an therefore use
formula (137) with h = (j & i) to al ulate l.
Subtle extensions of this approa h lead to an asymptoti ally eÆ ient algo-
rithm that nds nearest ommon an estors in any oriented forest whose ar s
grow dynami ally [D. Harel and R. E. Tarjan, SICOMP 13 (1984), 338{355℄.
Baru h S hieber and Uzi Vishkin [SICOMP 17 (1988), 1253{1262℄ subsequently
dis overed a mu h simpler way to ompute nearest ommon an estors in an
arbitrary (but xed) oriented forest, using an attra tive and instru tive blend of
bitwise and algorithmi te hniques that we shall onsider next.
Re all that an oriented forest with m trees and n verti es is an a y li
digraph with n m ar s. There is at most one ar from ea h vertex; the verti es
with out-degree zero are the roots of the trees. We say that v is the parent of u
when u ! v, and v is an an estor of u when u ! v. Two verti es have a
ommon an estor if and only if they belong to the same tree. Vertex w is alled
the nearest ommon an estor of u and v when we have
u ! z and v ! z if and only if w ! z . (139)
S hieber and Vishkin prepro ess the given forest, mapping its verti es into
a sideways heap S of size n by omputing three quantities for ea h vertex v:
v , the rank of v in preorder (1  v  n);
v , a node of the sideways heap S (1  v  n);
v, a (1 + n)-bit routing ode (1  v < 2 1+n ).
If u ! v we have u > v by the de nition of preorder. Node v is de ned to
be the nearest ommon an estor of all sideways-heap nodes u su h that v is an
an estor of vertex u. And we de ne
X
v = f2  w j v ! wg: (140)
34 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
For example, here's an oriented forest with ten verti es and two trees:
A 1 B 8
C 2 D 4 E 9
(141)
F 3 G 5 H 7 I 10
J 6
Ea h node has been labeled with its preorder rank, from whi h we an ompute
the and odes:
v = A B C D E F G H I J
v = 0001 1000 0010 0100 1001 0011 0101 0111 1010 0110
v = 0100 1000 0010 0100 1010 0011 0110 0111 1010 0110
v = 0100 1000 0110 0100 1010 0111 0110 0101 1010 0110
Noti e that, for instan e, A = 4 = 0100 be ause the preorder ranks of the
des endants of A are f1; 2; 3; 4; 5; 6; 7g. And H = 0101 be ause the an estors
of H have odes f H; D; Ag = f0111; 0100g. One an prove without
diÆ ulty that the mapping v 7! v satis es the following key properties:
i) If u ! v in the forest, then u is a des endant of v in S .
ii) If several verti es have the same value of v, they form a path in the forest.
Property (ii) holds be ause exa tly one hild u of v has u = v when v 6= v.
Now let's imagine pla ing every vertex v of the forest into node v of S :
1000 B! 
0100 D !A !  1100 (142)
0010 C !A 0110 J !G ! D 1010 I !E !B
0001 0011 F !C 0101 0111 H!D 1001
If k verti es map into node j , we an arrange them into a path
v0 ! v1 !    ! vk 1 ! vk ; where v0 = v1 =    = vk 1 = j . (143)
These paths are illustrated in (142); for example, J ! G ! D is a path in (141),
and `J !G!D' appears with node 0110 = J = G.
The prepro essing algorithm also omputes a table j for all nodes j of S ,
ontaining pointers to the verti es vk at the tail ends of (143):
j = 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010
j =  A C   D D   B
Exer ise 149 shows that all four tables v, v, v, and j an be prepared in
O(n) steps. And on e those tables are ready, they ontain just enough informa-
tion to identify the nearest ommon an estor of any two given verti es qui kly:
Algorithm V (Nearest ommon an estors ). Suppose v, v, v, and j are
known for all n verti es v of an oriented forest, and for 1  j  n. A dummy
vertex  is also assumed to be present, with   =  =  = 0. This algorithm
omputes the nearest ommon an estor z of any given verti es x and y, returning
z =  if x and y belong to di erent trees. We assume that the values j = blg j
have been pre omputed for 1  j  n, and that  0 = n.
7.1.3 BITWISE TRICKS AND TECHNIQUES 35
V1. [Find ommon height.℄ If x  y, set h ( y & x); otherwise set priority queue
h ( x & y). (See (138).) Katajainen
Vitale
V2. [Find true height.℄ Set k x & y & (1  h), then h (k & k). navigation pile
a he
V3. [Find z .℄ Set j (( x  h) j 1)  h. (Now j = z , if z 6= .) hyperboli plane{
non-Eu lidean geometry
V4. [Find x^ and y^.℄ (We now seek the lowest an estors of x and y in node j .)
If j = x, set x^ = x; otherwise set l ( x & ((1  h) 1)) and x^ =
 ((( x  l) j 1)  l). Similarly, if j = y, set y^ = y; otherwise set l
( y & ((1  h) 1)) and y^ =  ((( y  l) j 1)  l).
V5. [Find z .℄ Set z x^ if  x^   y^, otherwise z y^.
These artful dodges obviously exploit (137); exer ise 152 explains why they work.
Sideways heaps an also be used to implement an interesting type of priority
queue that J. Katajainen and F. Vitale all a \navigation pile," illustrated here
for n = 10:
16
8 24
4 12 20 (144)
2 6 10 14 18
503 087 512 061 908 170 275 897 653 426
1 3 5 7 9 11 13 15 17 19

Data elements go into the leaf positions 1, 3, : : : , 2n 1 of the sideways heap;


they an be many bits wide, and they an appear in any order. By ontrast, ea h
bran h position 2, 4, 6, : : : ontains a pointer to its largest des endant. And the
novel point is that these pointers take up almost no extra spa e | fewer than two
bits per item of data, on average | be ause only one bit is needed for pointers 2,
6, 10, : : : , only two bits for pointers 4, 12, 20, : : : , and only j for pointer j in
general. (See exer ise 153.) Thus the navigation pile requires very little memory,
and it behaves ni ely with respe t to a he performan e on a typi al omputer.
R ST S T
C P0 R P0
C
Q O
O
A
B Fig. 13. Two views of ve lines
P Q0
Q A B Q0 in the hyperboli plane.
R0 P R0
*Cells in the hyperboli plane. Hyperboli geometry suggests an instru tive
impli it data stru ture that has a rather di erent avor. The hyperboli plane is
a fas inating example of non-Eu lidean geometry that is onveniently viewed by
proje ting its points into the interior of a ir le. Its straight lines then be ome
ir ular ar s, whi h meet the rim at right angles. For example, the lines P P 0 ,
QQ0 , and RR0 in Fig. 13 interse t at points O, A, B , and those points form a
triangle. Lines SQ0 and QQ0 are parallel : They never tou h, but their points
get loser and loser together. Line QT is also parallel to QQ0 .
36 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
We get di erent views by fo using on di erent enter points. For example, Eu lid
the se ond view in Fig. 13 puts O sma k in the enter. Noti e that if a line passes ultraparallel
Sa heri
through the very enter, it remains straight after being proje ted; su h diameter- Loba hevsky
spanning hords are the spe ial ase of a \ ir ular ar " whose radius is in nite. Bolyai
Gauss
Most of Eu lid's axioms for plane geometry remain valid in the hyperboli tile
plane. For example, exa tly one line passes through any two distin t points; and ultraparallel
perpendi ular
if point A lies on line P P 0 there's exa tly one line QQ0 su h that angle PAQ has grid stru ture
any given value , for 0 <  < 180Æ . But Eu lid's famous fth postulate does not tessellation
pentagrid
hold: If point C is not on line QQ0 , there always are exa tly two lines through C Fibona i numbers
that are parallel to QQ0 . Furthermore there are many pairs of lines, like RR0 Margenstern
Herrmann
and SQ0 in Fig. 13, that are totally disjoint or ultraparallel, in the sense that negaFibona i
their points never be ome arbitrarily lose. [These properties of the hyperboli Fibona i number system
negaFibona i number system
plane were dis overed by G. Sa heri in the early 1700s, and made rigorous by
N. I. Loba hevsky, J. Bolyai, and C. F. Gauss a entury later.℄
Quantitatively speaking, when points are proje ted onto the unit disk jz j < 1,
the ar that meets the ir le at ei and e i has enter at se  and radius
tan . The a tual distan e between two points whose proje tions are z and z 0 is
d(z; z 0 ) = ln(j1 zz 0 j + jz z 0 j) ln(j1 zz 0 j jz z 0 j). Thus obje ts far from
the enter appear dramati ally shrunken when we see them near the ir le's rim.
The sum of the angles of a hyperboli triangle is always less than 180Æ . For
example, the angles at O, A, and B in Fig. 13 are respe tively 90Æ , 45Æ , and 36Æ .
Ten su h 36Æ -45Æ -90Æ triangles an be pla ed together to make a regular pentagon
with 90Æ angles at ea h orner. And four su h pentagons t snugly together at
their orners, allowing us to tile the entire hyperboli plane with right regular
pentagons (see Fig. 14). The edges of these pentagons form an interesting family
of lines, every two of whi h are either ultraparallel or perpendi ular; so we have
a grid stru ture analogous to the unit squares of the ordinary plane. We all it
the pentagrid , be ause ea h ell now has ve neighbors instead of four.
There's a ni e way to navigate in the pentagrid using Fibona i numbers,
based on ideas of Mauri e Margenstern [see F. Herrmann and M. Margenstern,
Theoreti al Comp. S i. 296 (2003), 345{351℄. Instead of the ordinary Fibona i
sequen e hFn i, however, we shall use the negaFibona i numbers hF n i, namely
F 1 = 1; F 2 = 1; F 3 = 2; F 4 = 3; : : : ; F n = ( 1)n 1 Fn : (145)
Exer ise 1.2.8{34 introdu ed the Fibona i number system, in whi h every non-
negative integer x an be written uniquely in the form
x = Fk1 + Fk2 +    + Fkr ; where k1  k2      kr  0; (146)
here `j  k' means `j  k +2'. But there's also a negaFibona i number system,
whi h suits our purposes better: Every integer x, whether positive, negative, or
zero, an be written uniquely in the form
x = Fk1 + Fk2 +    + Fkr ; where k1  k2      kr  1. (147)
For example, 4 = 5 1 = F 5 + F 2 and 2 = 3 + 1 = F 4 + F 1 . This
representation an onveniently be expressed as a binary ode = : : : a3 a2 a1 ,
7.1.3 BITWISE TRICKS AND TECHNIQUES 37
negade imal system
2-adi
magi mask

Fig. 14. The pentagrid,


in whi h identi al pentagons
tile the hyperboli plane.
A ir ular regular tiling, on ned on all sides
by in nitely small shapes, is really wonderful.
| M. C. ESCHER, letter to George Es her (9 November 1958)

P
standing for N ( ) = k ak F k , with no two 1s in a row. For example, here are
the negaFibona i representation odes of all integers between 14 and +15:
14 = 10010100 8 = 100000 2 = 1001 4 = 10010 10 = 1001000
13 = 10010101 7 = 100001 1 = 10 5 = 10000 11 = 1001001
12 = 101010 6 = 100100 0=0 6 = 10001 12 = 1000010
11 = 101000 5 = 100101 1=1 7 = 10100 13 = 1000000
10 = 101001 4 = 1010 2 = 100 8 = 10101 14 = 1000001
9 = 100010 3 = 1000 3 = 101 9 = 1001010 15 = 1000100
As in the negade imal system (see 4.1{(6) and (7)), we an tell whether x is
negative or not by seeing if its representation has an even or odd number of digits.
The prede essor and su essor + of any negaFibona i binary ode
an be omputed re ursively by using the rules
( 01) = 00; ( 000) = 010; ( 100) = 001; ( 10) = ( )01;
( 10)+ = 00; ( 00)+ = 01; ( 1)+ = ( )0: (148)
(See exer ise 157.) But ten elegant 2-adi steps do the al ulation dire tly:
y x  0 ; z y  (y  1); where x = ( )2 ;
z z j (x & (z  1)); (149)
w x  z  ((z + 1)  2); then w = ( )2 .
We just use y 1 in the top line to get the prede essor, y +1 to get the su essor.
38 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
And now here's the point: A negaFibona i ode an be assigned to ea h
ell of the pentagrid in su h a way that the odes of its ve neighbors are easy to
ompute. Let's all the neighbors n, s, e, w, and o, for \north," \south," \east,"
\west," and \other." If is the ode assigned to a given ell, we de ne
n =  2; s =  2; e = s +; w = s ; (150)
thus sn = , and also en = ( 01)n = . The \other" dire tion is tri kier:

+; if & 1 = 1;
o = n (151)
w ; if & 1 = 0.
For example, 1000o = 101001 and 101001o = 1000. This mysterious interloper
lies between north and east when ends with 1, but between north and west
when ends with 0.
If we hoose any ell and label it with ode 0, and if we also hoose an
orientation so that its neighbors are n, e, s, w, and o in lo kwise order, rules
(150) and (151) will assign onsistent labels to every ell of the pentagrid. (See
exer ise 160.) For example, the vi inity of a ell labeled 1000 will look like this:
w
s o
1010
w o
e n
s n w n
101001 10

w e e o
n o s n
s e
10100101 o n 1001
e s
o w
o 1000 o (152)
n w e n
w e
100010 100001

s s s
e n w
n o

w o o e n e
10001001 100000 10000001
s e w s
w s

The ode labels do not, however, identify ells uniquely, be ause in nitely
many ells re eive the same label. (Indeed, we learly have 0n = 0s = 0 and
1w = 1o = 1.) To get a unique identi er, we atta h a se ond oordinate so that
ea h ell's full name has the form ( ; y), where y is an integer. When y is onstant
and ranges over all negaFibona i odes, the ells ( ; y) form a more-or-less
hook-shaped strip whose edges take a 90Æ turn next to ell (0; y). In general, the
ve neighbors of ( ; y) are ( ; y)n = ( n ; y + Æn ( )), ( ; y)s = ( s ; y + Æs ( )),
7.1.3 BITWISE TRICKS AND TECHNIQUES 39
( ; y)e = ( e ; y + Æe ( )), ( ; y)w = ( w ; y + Æw ( )), and ( ; y)o = ( o ; y + Æo ( )), ylinder
where {impli it data stru tures
bitmap graphi s{
typeset
Æn ( ) = [ = 0℄; Æs ( ) = [ = 0℄; Æe ( ) = 0; Æw ( ) = [ = 1℄; raster
 pixels
sign( o n )[ o & n = 0℄; if & 1 = 1; printing
Æo ( ) = (153) ustering
sign( o w )[ o & w = 0℄; if & 1 = 0.
(See the illustration below.) Bitwise operations now allow us to surf the entire
hyperboli plane with ease. On the other hand, we ould also ignore the y
oordinates as we move, thereby wrapping around a \hyperboli ylinder" of
pentagons; the oordinates de ne an interesting multigraph on the set of all
negaFibona i odes, in whi h every vertex has degree 5.
(100001,1) (1001,2)

(100100,1) (10,1) (0,2)

(1001,1) (0,1)
(100101,1) (1,1)

(1010,0)
(0,0)
(101,0)
(154)
(10,0) (1,0)

(1000,0) (100,0)
(0, 1)

(1001,0) (1, 1)

(10, 1) (0, 2)

Bitmap graphi s. It's fun to write programs that deal with pi tures and shapes,
be ause they involve our left and right brains simultaneously. When image data
is involved, the results an be engrossing even if there are bugs in our ode.
The book you are now reading was typeset by software that treated ea h
page as a giganti matrix of 0s and 1s, alled a \raster" or \bitmap," ontaining
millions of square pi ture elements alled \pixels." The rasters were transmitted
to printing ma hines, ausing tiny dots of ink to be pla ed wherever a 1 appeared
in the matrix. Physi al properties of ink and paper aused those small lusters
of dots to look like smooth urves; but ea h pixel's basi squareness be omes
evident if we enlarge the images tenfold, as in the letter `A' shown in Fig. 15(a).
With bitwise operations we an a hieve spe ial e e ts like \ ustering," in
whi h the bla k pixels disappear when they are surrounded on all sides:

(a) (b) Fig. 15. The letter A,


before and after ustering.
40 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
This operation, introdu ed by R. A. Kirs h, L. Cahn, C. Ray, and G. E. Urban Kirs h
[Pro . Eastern Joint Computer Conf. 12 (1957), 221{229℄, an be expressed as Cahn
Ray
 Urban
uster(X ) = X &  (X 1) & (X  1) & (X  1) & (X 1) ; (155)



bla k
white
where `X 1' and X 1' stand respe tively for the result of shifting the bitmap X rook-neighbors


down or up by one row. Let us write king-neighbors


8-neighbors, see king-neighbors
XN = X 1; XW = X  1; XE = X  1; XS = X 1 (156) 4-neighbors, see rook-neighbors



pixel algebra
ellular automaton
for the 1-pixel shifts of a bitmap X . Then, for example, the symboli expression Gardner
`XN & (XS j XE )' evaluates to 1 in those pixel positions whose northern neighbor Conway
Life
is bla k, and whi h also have either a bla k neighbor on the south side or a white game
neighbor to the east. With these abbreviations, (155) takes the form pattern re ognition
opti al hara ter re ognition
uster(X ) = X & (XN & XW & XE & XS ); (157) ngerprints
thinning
Rutovitz
whi h an also be expressed as X & (XN j XW j XE j XS ).
Every pixel has four \rook-neighbors," with whi h it shares an edge at the
top, left, right, or bottom. It also has eight \king-neighbors," with whi h it
shares at least one orner point. For example, the king-neighbors that lie to the
northeast of all pixels in a bitmap X an be denoted by XNE , whi h is equivalent
to (XN )E in pixel algebra. Noti e that we also have XNE = (XE )N .
A 3  3 ellular automaton is an array of pixels that hanges dynami ally
via a sequen e of lo al transformations, all performed simultaneously: The state
of ea h pixel at time t + 1 depends entirely on its state at time t and the states
of its king-neighbors at that time. Thus the automaton de nes a sequen e of
bitmaps X (0) , X (1) , X (2) , : : : that lead from any given initial state X (0) , where
(t)
X (t+1) = f (XNW ; XN(t) ; XNE (t)
; XW ; X ; XE ; XSW ; XS(t) ; XSE
(t) (t) (t) (t) (t)
) (158)
and f is any bitwise Boolean fun tion of nine variables. Fas inating patterns
often emerge in this way. For example, after Martin Gardner introdu ed John
Conway's game of Life to the world in 1970, more omputer time was probably
devoted to studying its impli ations than to any other omputational task during
the next several years | although the people paying the omputer bills were
rarely told! (See exer ise 167.)
There are 2512 Boolean fun tions of nine variables, so there are 2512 di erent
3  3 ellular automata. Many of them are trivial, but most of them probably
have su h ompli ated behavior that they are humanly impossible to understand.
Fortunately there also are many ases that do turn out to be useful in pra ti e |
and mu h easier to justify on e onomi grounds than the simulation of a game.
For example, algorithms for re ognizing alphabeti hara ters, ngerprints,
or similar patterns often make use of a \thinning" pro ess, whi h removes ex ess
bla k pixels and redu es ea h omponent of the image to an underlying skeleton
that is omparatively simple to analyze. Several authors have proposed ellular
automata for this problem, beginning with D. Rutovitz [J. Royal Stat. So iety
A129 (1966), 512{513℄ who suggested a 4  4 s heme. But parallel algorithms
are notoriously subtle, and aws tended to turn up after various methods had
7.1.3 BITWISE TRICKS AND TECHNIQUES 41
Guo
Fig. 16. Example Hall
results of Guo and onne tivity stru ture
kingwise onne ted
Hall's 33 autom- rookwise onne ted
aton for thinning Rosenfeld
the omponents of a
bitmap. (\Hollow"
pixels were origi-
nally bla k.)
been published. For example, at least two of the bla k pixels in a omponent like
should be removed, yet a symmetri al s heme will erroneously erase all four.
A satisfa tory solution to the thinning problem was nally found by Z. Guo
and R. W. Hall [CACM 32 (1989), 359{373, 759℄, using a 3  3 automaton that
invokes alternate rules on odd and even steps. Consider the fun tion
f (xNW ;xN ;xNE ;xW ;x;xE ;xSW ;xS ;xSE ) = x ^:g(xNW ;:::;xW ;xE ;:::;xSE ); (159)
where g = 1 only in the following 37 on gurations surrounding a bla k pixel:

Then we use (158), but with f (xNW ; xN ; xNE ; xW ; x; xE ; xSW ; xS ; xSE ) repla ed by
its 180Æ rotation f (xSE ; xS ; xSW ; xE ; x; xW ; xNE ; xN ; xNW ) on even-numbered steps.
The pro ess stops when two onse utive y les make no hange.
With this rule Guo and Hall proved that the 3  3 automaton will preserve
the onne tivity stru ture of the image, in a strong sense that we will dis uss
below. Furthermore their algorithm obviously leaves an image inta t if it is
already so thin that it ontains no three pixels that are king-neighbors of ea h
other. On the other hand it usually su eeds in \removing the meat o the
bones" of ea h bla k omponent, as shown in Fig. 16. Slightly thinner thinning
is obtained in ertain ases if we add four additional on gurations
(160)
to the 37 listed above. In either ase the fun tion g an be evaluated with a
Boolean hain of length 25. (See exer ises 170{172.)
In general, the bla k pixels of an image an be grouped into segments or
omponents that are kingwise onne ted, in the sense that any bla k pixel an
be rea hed from any other pixel of its omponent by a sequen e of king moves
through bla k pixels. The white pixels also form omponents, whi h are rookwise
onne ted : Any two white ells of a omponent are mutually rea hable via rook
moves that tou h nothing bla k. It's best to use di erent kinds of onne tedness
for white and bla k, in order to preserve the topologi al on epts of \inside" and
\outside" that are familiar from ontinuous geometry [see A. Rosenfeld, JACM
17 (1970), 146{160℄. If we imagine that the orner points of a raster are bla k,
an in nitely thin bla k urve an ross between pixels at a orner, but a white
urve annot. (We ould also imagine white orner points, whi h would lead to
rookwise onne tivity for bla k and kingwise onne tivity for white.)
42 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
time = 0 time = 1 time = 3 Levialdi
Beyer
Cheshire at
online
ba kground
obje ts
holes
oriented tree

(a) (b) ( )
Fig. 17. The shrinking of a Cheshire at
An amusing algorithm for shrinking a pi ture while preserving its onne -
tivity, ex ept that isolated bla k or white pixels disappear, was presented by
S. Levialdi in CACM 15 (1972), 7{10; an equivalent algorithm, but with bla k
and white reversed, had also appeared in T. Beyer's Ph.D. thesis (M.I.T., 1969).
The idea is to use a ellular automaton with the simple transition fun tion
f (xNW ; xN ; xNE ; xW ; x; xE ; xSW ; xS ; xSE ) = (x ^ (xW_xSW_xS )) _ (xW ^ xS ) (161)
at ea h step. This formula is a tually a 2  2 rule, but we still need a 3  3 window
if we want to keep tra k of the ases when a one-pixel omponent goes away.
For example, the 25  30 pi ture of a Cheshire at in Fig. 17(a) has seven
kingwise bla k omponents: the outline of its head, the two earholes, the two
eyes, the nose, and the smile. The result after one appli ation of (161) is shown
in Fig. 17(b): Seven omponents remain, but there's an isolated point in one ear,
and the other earhole will be ome isolated after the next step. Hen e Fig. 17( )
has only ve omponents. After six steps the at loses its nose, and even the
smile will be gone at time 14. Sadly, the last bit of at will vanish during step 46.
At most M + N 1 transitions will wipe out any M  N pi ture, be ause
the lowest visible northwest-to-southeast diagonal line moves relentlessly upward
ea h time. Exer ises 176 and 177 prove that di erent omponents will never
merge together and interfere with ea h other.
Of ourse this ubi -time ellular method isn't the fastest way to ount or
identify the omponents of a pi ture. We an a tually do that job \online,"
while looking at a large image one row at a time, not bothering to keep all of
the previously seen rows in memory if we don't wish to look at them again.
While we're analyzing the omponents we might as well also re ord the
relationships between them. Let's assume that only nitely many bla k pixels
are present. Then there's an in nite omponent of white pixels alled the
ba kground. Bla k omponents adja ent to the ba kground onstitute the main
obje ts of the image. And these obje ts may in turn have holes, whi h may serve
as a ba kground for another level of obje ts, and so on. Thus the onne ted
omponents of any nite pi ture form a hierar hy | an oriented tree, rooted at
the ba kground. Bla k omponents appear at the odd-numbered levels of this
tree, and white omponents at the even-numbered levels, alternating between
7.1.3 BITWISE TRICKS AND TECHNIQUES 43
time = 5 time = 10 time = 20 surrounded
simply onne ted

(d) (e) (f)


by repeated appli ation of Levialdi's transformation.
kingwise and rookwise onne tedness. Ea h omponent ex ept the ba kground is
surrounded by its parent. Childless omponents are said to be simply onne ted.
For example, here are the Cheshire at's omponents, labeled with digits for
white pixels and letters for the bla k ones, and the orresponding oriented tree:
00000000000000000000000A000000
0000AAA000000000000000AA000000
0000A1AA0000000000000A11A00000
000A111AA000AAAAAAA00A11A00000
000A1B11AAAA1111111AAA111A0000
000A1B11A111111111111A1C1A0000 0
000A1B11A111111111111A1C1A0000
000A11111111111111111A111A0000
000A111111111111111111A11A0000 A
0000A1111111111111111111A1A000
00000A11111111111111111111A000
00000A11DDD1111111EEE11111A000
00000A1D222D11111E333E1111A000
AA00A11DDD2D11111EEE3E11111A00
1 (162)
00AAA11DDDDD111111EEE111111A00
0000AA1111111F1F11111111111A00 B C D E F G
AAAA111111111F1F1111111111AAAA
000AAAA11111F11F111111AAAA1A00
000A111111111FF111111111111A00
0000A111GG11111111111AAAAAAAAA 2 3
00000A111GGGGG111111111111A000
000000AA111GGGGGGGGGGG111A0000
00000000AA11GGGGGGG1111AA00000
0000000000A111111111AAA0000000
00000000000AAAAAAAAAA000000000

During the shrinking pro ess of Fig. 17, omponents disappear in the order
C , f B , 2 , 3 g (all at time 3), F , E , D , G , 1 , A .
Suppose we want to analyze the omponents of su h a pi ture by reading
one row at a time. After we've seen four rows the result-so-far will be
00000000000000000000000A000000 0
0000BBB000000000000000AA000000
0000B1BB0000000000000A22A00000 B C A (163)
000B111BB000CCCCCCC00A22A00000 1 2

and we'll be ready to s an row ve. A omparison of rows four and ve will
then show that B and C should merge into A , but that new omponents B
and 3 should also be laun hed. Exer ise 179 ontains full details about an
instru tive algorithm that properly updates the urrent tree as new rows are
input. Additional information an also be omputed on the y: For example, we
ould determine the area of ea h omponent, the lo ations of its rst and last
pixels, the smallest en losing re tangle, and/or its enter of gravity.
44 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
*Filling. Let's omplete our qui k tour of raster graphi s by onsidering how raster graphi s
to ll regions that are bounded by straight lines and/or simple urves. Parti u- oni se tions
ir les
larly eÆ ient algorithms are available when the urves are built up from \ oni ellipses
se tions" | ir les, ellipses, parabolas, or hyperbolas, as in lassi al geometry. parabolas
hyperbolas
In keeping with geometri tradition, we shall adopt Cartesian oordinates Cartesian oordinates
boundary urves++++
(x; y) in the following dis ussion, instead of speaking about rows or olumns edge transitions++++
of pixels: An in rease of x will signify a move to the right, while an in rease usterized
A kland
of y will move upward. More signi antly, we will fo us on the edges between Weste
square pixels, instead of on the pixels themselves. Edges run between integer Jordan urve theorem
points (x; y) and (x0 ; y0 ) of the plane when jx x0 j + jy y0 j = 1. Ea h pixel Veblen
inside
is bounded by the four edges (x; y) (x 1; y) (x 1; y 1) (x; y 1) outside
digitization
(x; y). Experien e has shown that algorithms for lling ontours be ome simpler
and faster when we on entrate on the edge transitions between white and bla k,
instead of on the bla k pixels of a usterized boundary. (See, for example, the
dis ussion by B. D. A kland and N. Weste in IEEE Trans. C-30 (1981), 41{47.)
Consider a ontinuous urve z (t) = x(t); y(t) that is tra ed out as t varies
from 0 to 1. We assume that the urve doesn't interse t itself for 0  t < 1, and
that z (0) = z (1). The famous Jordan urve theorem [C. Jordan, Cours d'analyse
3 (1887), 587{594; O. Veblen, Trans. Amer. Math. So . 6 (1905), 83{98℄ states
that every su h urve divides the plane into two regions, alled the inside and
the outside. We an \digitize" z (t) by for ing it to travel along edges between
pixels; then we obtain an approximation in whi h the inside pixels are bla k and
the outside pixels are white. This digitization pro ess essentially repla es the
original urve by the sequen e of integer points

round(z (t)) = bx(t) + 12 ; by(t) + 21 ; for 0  t  1. (164)
The urve an be perturbed slightly, if ne essary, so that z (t) never passes exa tly
through the enter of a pixel. Then the digitized urve takes dis rete steps along
pixel edges as t grows; and a pixel lies inside the digitization if and only if its
enter lies inside the original ontinuous urve fz (t) j 0  t  1g.
For example, the equations x(t) = 20 os 2t and y(t) = 10 sin 2t de ne an
ellipse. Its digitization, round(z (t)), starts at (20; 0) when t = 0, then jumps to
(20; 1) when t  :008 and 10 sin 2t = 0:5. Then it pro eeds to the points (20; 2),
(19; 2), (19; 3), (19; 4), (18; 4), : : : , (20; 1), (20; 0), as t in reases through the
values .024, .036, .040, .057, .062, : : : , .976, .992:

(165)

The horizontal edges of su h a boundary are onveniently represented by bit


ve tors H (y) for ea h y; for example, H (10) = : : : 000000111111111111000000 : : :
and H (9) = : : : 011111000000000000111110 : : : in (165). If the ellipse is lled
7.1.3 BITWISE TRICKS AND TECHNIQUES 45
with bla k to obtain a bitmap B , the H ve tors mark transitions between bla k quadrati form
and white, so we have the symboli relation Pitteway
three-register algorithm+++
H = B  (B 1): (166)


Conversely, it's easy to obtain B when the H ve tors are given:
B (y) = H (ymax )  H (ymax 1 )      H (y + 1)
= H (ymin )  H (ymin+1 )      H (y): (167)
Noti e that H (ymin )  H (ymin+1 )    H (ymax ) is the zero ve tor, be ause ea h
bitmap is white at both top and bottom. Noti e further that the analogous verti-
al edge ve tors V (x) are redundant: They satisfy the formulas V = B  (B  1)
and B = V  (see exer ise 36), but we need not bother to keep tra k of them.
Coni se tions are easier to deal with than most other urves, be ause we
an readily eliminate the parameter t. For example, the ellipse that led to (165)
an be de ned by the equation (x=20)2 + (y=10)2 = 1, instead of using sines
and osines. Therefore pixel (x; y) should be bla k if and only if its enter point
(x 21 ; y 12 ) lies inside the ellipse, if and only if (x 12 )2=400+(y 12 )2=100 1 < 0.
In general, every oni se tion is the set of points for whi h F (x; y) = 0,
when F is an appropriate quadrati form. Therefore there's a quadrati form
Q(x; y) = F (x 12 ; y 21 ) = ax2 + bxy + y2 + dx + ey + f (168)
that is negative at the integer point (x; y) if and only if pixel (x; y) lies on a
given side of the digitized urve.
For pra ti al purposes we may assume that the oeÆ ients (a; b; : : : ; f ) of Q
are not-too-large integers. Then we're in lu k, be ause the exa t value of Q(x; y)
is easy to ompute. In fa t, as pointed out by M. L. V. Pitteway [Comp. J.
10 (1967), 282{289℄, there's a ni e \three-register algorithm" by whi h we an
qui kly tra k the boundary points: Let x and y be integers, and suppose we've got
the values of Q(x; y), Qx (x; y), and Qy (x; y) in three registers (Q; Qx ; Qy ), where
Qx (x; y) = 2ax + by + d and Qy (x; y) = bx + 2 y + e (169)
are x Q and y Q. We an then move to any adja ent integer point, be ause
 

Q(x  1; y) = Q(x; y)  Qx (x; y)+ a; Q(x; y  1) = Q(x; y)  Qy (x; y)+ ;


Qx (x  1; y) = Qx (x; y)  2a; Qx (x; y  1) = Qx (x; y)  b;
Qy (x  1; y) = Qy (x; y)  b; Qy (x; y  1) = Qy (x; y)  2 : (170)
Furthermore we an divide the ontour into separate pie es, in ea h of whi h x(t)
and y(t) are both monotoni . For example, when the ellipse (165) travels from
(20; 0) to (0; 10), the value of x de reases while y in reases; thus we need only
move from (x; y) to (x 1; y) or to (x; y+1). If registers (Q; R; S ) respe tively
hold (Q; Qx a; Qy + ), a move to (x 1; y) simply sets Q Q R, R R 2a,
and S S b; a move to (x; y+1) is just as qui k. With are, this idea leads
to a blindingly fast way to dis over the orre tly digitized edges of almost any
oni urve.
46 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
For example, the quadrati form Q(x; y) for ellipse (165) is 4x2 + 16y2
(4x + 16y + 1595), when we integerize its oeÆ ients. We have Q(20; 0) =
F (19:5; 0:5) = 75 and Q(21; 0) = +85; therefore pixel (20; 0), whose enter is
(19:5; 0:5), is inside the ellipse, but pixel (21; 0) isn't. Let's zoom in loser:

51 93 245 405

179 35 117 277

275 131 21 181


(171)
339 195 43 117

371 227 75 85
(21; 0)
371 227 75 85

The boundary an be dedu ed without examining Q at very many points. In


fa t, we don't need to look at Q(21; 0), be ause we know that all edges between
(20; 0) and (0; 10) must go either upwards or to the left. First we test Q(20; 1)
and nd it negative ( 75); so we move up. Also Q(20; 2) is negative ( 43), so
we go up again. Then we test Q(20; 3), and nd it positive (21); so we move left.
And so on. Only the Q values 75, 43, 21, 131, 35, 93, 51, : : : a tually
need to be examined, if we've set the three-register method up properly.
Algorithm T (Three-register algorithm for oni s ). Given two integer points
(x; y) and (x0 ; y0 ), and an integer quadrati form Q as in (168), this algorithm
de ides how to digitize a portion of the oni se tion de ned by F (x; y) = 0,
where F (x; y) = Q(x + 12 ; y + 21 ). It reates jx0 xj horizontal edges and jy0 yj
verti al edges, whi h form a path from (x; y) to (x0 ; y0 ). We assume that
i) Real-valued points (; ) and ( 0 ; 0 ) exist su h that F (; ) = F ( 0 ; 0 ) = 0.
ii) The urve travels from (; ) to ( 0 ; 0 ) monotoni ally in both oordinates.
iii) x = b + 12 , y = b + 21 , x0 = b 0 + 21 , and y0 = b0 + 21 .
iv) If we traverse the urve from (; ) to ( 0 ; 0 ), we see F < 0 on our left.
v) No edge of the integer grid ontains two roots of Q (see exer ise 183).
T1. [Initialize.℄ If x = x0 , go to T11; if y = y0 , go to T10. If x < x0 and y < y0 ,
set Q Q(x+1; y+1), R Qx (x+1; y+1)+a, S Qy (x+1; y+1)+ , and
go to T2. If x < x0 and y > y0 , set Q Q(x+1; y), R Qx (x+1; y) + a,
S Qy (x+1; y) , and go to T3. If x > x0 and y < y0 , set Q
Q(x; y+1), R Qx (x; y+1) a, S Qy (x; y+1) + , and go to T4. If
x > x0 and y > y0 , set Q Q(x; y), R Qx (x; y) a, S Qy (x; y) ,
and go to T5.
T2. [Right or up.℄ If Q < 0, do T9; otherwise do T6. Repeat until interrupted.
T3. [Down or right.℄ If Q< 0, do T7; otherwise do T9. Repeat until interrupted.
7.1.3 BITWISE TRICKS AND TECHNIQUES 47
T4. [Up or left.℄ If Q < 0, do T6; otherwise do T8. Repeat until interrupted. ir les
tiling
T5. [Left or down.℄ If Q < 0, do T8; otherwise do T7. Repeat until interrupted. hyperboli plane
T6. [Move up.℄ Create the edge (x; y) (x; y+1), then set y y +1. Interrupt eo ll
to T10 if y = y0 ; otherwise set Q Q + S , R R + b, S S + 2 .
T7. [Move down.℄ Create the edge (x; y) (x; y 1), then set y y 1.
Interrupt to T10 if y = y0 ; otherwise set Q Q S , R R b, S S 2 .
T8. [Move left.℄ Create the edge (x; y) (x 1; y), then set x x 1.
Interrupt to T11 if x = x0 ; otherwise set Q Q R, R R 2a, S S b.
T9. [Move right.℄ Create the edge (x; y) (x+1; y), then set x x + 1.
Interrupt to T11 if x = x0 ; otherwise set Q Q+R, R R+2a, S S +b.
T10. [Finish horizontally.℄ While x < x0 , reate the edge (x; y) (x+1; y) and
set x x + 1. While x > x0 , reate the edge (x; y) (x 1; y) and set
x x 1. Terminate the algorithm.
T11. [Finish verti ally.℄ While y < y0 , reate the edge (x; y) (x; y+1) and
set y y + 1. While y > y0 , reate the edge (x; y) (x; y 1) and set
y y 1. Terminate the algorithm.
For example, when this algorithm is invoked with (x; y) = (20; 0), (x0 ; y0 ) =
(0; 10), and Q(x; y) = 4x2 + 16y2 4x 16y 1595, it will reate the edges
(20; 0) (20; 1) (20; 2) (19; 2) (19; 3) (19; 4) (18; 4)
(18; 5) (17; 5) (17; 6)    (5; 9) (5; 10), then make a beeline
for (0; 10). (See (165) and (171).) Exer ise 182 explains why it works.
Movement to the right in step T9 is onveniently implemented by setting
H (y) H (y)  (1  (xmax x)), using the H ve tors of (166) and (167).
Movement to the left is similar, but we set x x 1 rst. Step T10 ould set
H (y) H (y)  ((1  (xmax +1 min(x; x0 ))) (1  (xmax max(x; x0 )))); (172)
but one move at a time might be just as good, be ause jx0 xj is often small.
Movement up or down needs no a tion, be ause verti al edges are redundant.
Noti e that the algorithm runs somewhat faster in the spe ial ase when
b = 0; ir les always belong to this ase. The even more spe ial ase of straight
lines, when a = b = = 0, is of ourse faster yet; then we have a simple one-
register algorithm (see exer ise 185).

Fig. 18. Pixels hange from


white to bla k and ba k again,
at the edges of digitized ir les.
When many ontours are lled in the same image, using H ve tors, the
pixel values hange between bla k and white whenever we ross an odd number
of edges. Figure 18 illustrates a tiling of the hyperboli plane by equilateral
45Æ -45Æ -45Æ triangles, obtained by superimposing the results of several hundred
appli ations of Algorithm T.
48 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
33 35 36 38 0
31 32 34 37 39 Bezier splines
30 squines
29 ontrol points
1 S, the letter
28 3 drawing on a bitmap
{ lling a ontour in a bitmap
5 4 Hobby
27 {bitmap graphi s
2 onditional
6 bran h
CSNZ
26 pipelined ma hine
22 7
25
24
23 8
21 9
10 Fig. 19. Squines that de ne
19 17 16
18 14
11 the outline ontour of an ` '.
20 15 13 12

Algorithm T applies only to oni urves. But that's not really a limitation
in pra ti e, be ause just about every shape we ever need to draw an be well ap-
proximated by \pie ewise oni s" alled quadrati Bezier splines or squines. For
example, Fig. 19 shows a typi al squine urve with 40 points (z0 ; z1 ; : : : ; z39 ; z40 ),
where z40 = z0 . The even-numbered points (z0 ; z2 ; : : : ; z40 ) lie on the urve;
the others, (z1 ; z3 ; : : : ; z39 ), are alled \ ontrol points," be ause they regulate
lo al bending and exing. Ea h se tion S (z2j ; z2j +1 ; z2j +2 ) begins at point z2j ,
traveling in dire tion z2j +1 z2j . It ends at point z2j +2 , traveling in dire tion
z2j +2 z2j +1 . Thus if z2j lies on the straight line from z2j 1 to z2j +1 , the squine
passes smoothly through point z2j without hanging dire tion.
Exer ise 186 de nes S (z2j ; z2j +1 ; z2j +2 ) pre isely, and exer ise 187 explains
how to digitize any squine urve using Algorithm T. The region inside the
digitized edges an then be lled with bla k pixels.
In identally, the task of drawing lines and urves on a bitmap turns out
to be mu h more diÆ ult than the task of lling a digitized ontour, be ause
we want diagonal strokes to have the same apparent thi kness as verti al and
horizontal strokes do. An ex ellent solution to the line-drawing problem was
found by John D. Hobby, JACM 36 (1989), 209{229.
*Bran hless omputation. Modern omputers tend to slow down when a
program ontains onditional bran h instru tions, be ause an un ertain ow
of ontrol an interfere with predi tive lookahead ir uitry. Therefore we've
used MMIX's onditional-set instru tions like CSNZ in programs like (56). Indeed,
the four instru tions `SRU z,y,16; ADD t,lam,16; CSNZ y,q,z; CSNZ lam,q,t'
found in (56) are probably faster than their three-instru tion ounterpart
BZ q,+12; SRU y,y,16; ADD lam,lam,16 (173)
when the a tual running time is measured on a highly pipelined ma hine, even
though the rule-of-thumb ost of (173) is only 3 a ording to Table 1.3.1{1.
7.1.3 BITWISE TRICKS AND TECHNIQUES 49
Bitwise operations an help diminish the need for ostly bran hing. For mask
example, if MMIX didn't have a CSNZ instru tion we ould write signed shift right
NEG
NEG m,q; OR m,m,q; SR m,m,63; merge sort
a he
SRU t,y,16; XOR t,t,y; AND t,t,m; XOR y,y,t; (174)
ADD t,lam,16; XOR t,t,lam; AND t,t,m; XOR lam,lam,t;
here the rst line reates the mask m = [ q 6= 0℄. On some omputers these eleven
bran hless instru tions would still run faster than the three instru tions in (173).
The inner loop of a merge sort algorithm provides an instru tive example.
Suppose we want to do the following operations repeatedly:
If xi < yj , set zk xi , i i + 1, and go to x done if i = imax .
Otherwise set zk yj , j j + 1, and go to y done if j = jmax .
Then set k k + 1 and go to z done if k = kmax .
If we implement them in the \obvious" way, four onditional bran hes are in-
volved, three of whi h are a tive on ea h path through the loop:
1H CMP t,xi,yj; BNN t,2F Bran h if xi  yj .
STO xi,zbase,kk zk xi .
ADD ii,ii,8 i i + 1.
BZ ii,X Done To x done if i = imax.
LDO xi,xbase,ii Load xi into register xi.
JMP 3F Join the other bran h.
2H STO yj,zbase,kk zk yj .
ADD jj,jj,8 j j + 1.
BZ jj,Y Done To y done if j = jmax.
LDO yj,ybase,jj Load yj into register yj.
3H ADD kk,kk,8 k k + 1.
PBNZ kk,1B Repeat if k 6= kmax.
JMP Z Done To z done.
(Here ii = 8(i imax ), jj = 8(j jmax ), and kk = 8(k kmax ); the fa tor of
8 is needed be ause xi , yj , and zk are o tabytes.) Those four bran hes an be
redu ed to just one:
1H CMP t,xi,yj t sign(xi yj ).
CSN yj,t,xi yj min(xi ; yj ).
STO yj,zbase,kk zk yj.
AND t,t,8 t 8[ xi < yj ℄.
ADD ii,ii,t i i + [ xi < yj ℄.
LDO xi,xbase,ii Load xi into register xi.
XOR t,t,8 t t  8.
ADD jj,jj,t j j + [ xi  yj ℄.
LDO yj,ybase,jj Load yj into register yj.
ADD kk,kk,8 k k + 1.
AND u,ii,jj; AND u,u,kk u ii & jj & kk.
PBN u,1B Repeat if i<imax, j <jmax, and k <kmax .
When the loop stops in this version, we an readily de ide whether to ontinue at
x done, y done, or z done. These instru tions load both xi and yj from memory
ea h time, but the redundant value will already be present in the a he.
50 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
*More appli ations of MOR and MXOR. Let's nish o our study of bitwise MOR++
manipulation by taking a look at two operations that are spe i ally designed for MXOR++
matrix multipli ation
64-bit work. MMIX's instru tions MOR and MXOR, whi h essentially arry out matrix mask
multipli ation on 8  8 Boolean matri es, turn out to be extremely exible and bit permutations
byte permutations
powerful, both by themselves and in ombination with other bitwise operations. permutation matrix
If x = (x7 : : : x1 x0 )256 is an o tabyte and a = (a7 : : : a1 a0 )2 is a single byte, inverse permutation
perfe t shue
the instru tion MOR t,x,a sets t a7 x7 j    j a1 x1 j a0 x0 , while MXOR t,x,a sets zippered
t a7 x7      a1 x1  a0 x0 . For example, MOR t,x,2 and MXOR t,x,2 both set MUX
Æ-swaps
t x1 ; MOR t,x,3 sets t x1 j x0 ; and MXOR t,x,3 sets t x1  x0 . nite eld
In general, of ourse, MOR and MXOR are fun tions of o tabytes. When y =
(y7 : : : y1 y0 )256 is a general o tabyte, the instru tion MOR t,x,y produ es the
o tabyte t whose j th byte tj is the result of MOR applied to x and yj .
Suppose x = 1 = # ffffffffffffffff . Then MOR t,x,y omputes the
mask t in whi h byte tj is # ff whenever yj 6= 0, while tj is zero when yj = 0. This
simple spe ial ase is quite useful, be ause it a omplishes in just one instru tion
what we previously needed seven operations to a hieve in situations like (92).
We observed in (66) that two MORs will suÆ e to reverse the bits of any 64-bit
word, and many other important bit permutations also be ome easy when MOR
is in a omputer's repertoire. Suppose  is a permutation of f0; 1; : : : ; 7g that
takes 0 7! 0, 1 7! 1, : : : , 7 7! 7. Then the o tabyte p = (27 : : : 21 20)256
orresponds to a permutation matrix that makes MOR do ni e tri ks: MOR t,x,p
will permute the bytes of x, setting tj xj . Furthermore, MOR u,p,y will
permute the bits of ea h byte of y , a ording to the inverse permutation; it sets
uj (a7 : : : a1 a0 )2 when yj = (a7 : : : a1 a0 )2 .
With a little more skullduggery we an also expedite further permutations
su h as the perfe t shue (76), whi h transforms a given o tabyte z = 232 x + y =
(x31 : : : x1 x0 y31 : : : y1 y0 )2 into the \zippered" o tabyte
w = x z y = (x31 y31 : : : x1 y1 x0 y0 )2 : (175)
With appropriate permutation matri es p, q, and r, the intermediate results
t = (x31 x27 x30 x26 x29 x25 x28 x24 y31 y27 y30 y26 y29 y25 y28 y24 : : :
x7 x3 x6 x2 x5 x1 x4 x0 y7 y3 y6 y2 y5 y1 y4 y0 )2 ; (176)
u = (y27 y31 y26 y30 y25 y29 y24 y28 x27 x31 x26 x30 x25 x29 x24 x28 : : :
y3 y7 y2 y6 y1 y5 y0 y4 x3 x7 x2 x6 x1 x5 x0 x4 )2 (177)
an be omputed qui kly via the four instru tions
MOR t,z,p; MOR t,q,t; MOR u,t,r; MOR u,r,u; (178)
see exer ise 204. So there's a mask m for whi h ` PUT rM,m; MUX w,t,u' ompletes
the perfe t shue in just six y les altogether. By ontrast, the traditional
method in exer ise 53 requires 30 y les ( ve Æ-swaps).
The analogous instru tion MXOR is espe ially useful when binary linear alge-
bra is involved. For example, exer ise 1.3.1{37 shows that XOR and MXOR dire tly
implement addition and multipli ation in a nite eld of 2k elements, for k  8.
7.1.3 BITWISE TRICKS AND TECHNIQUES 51
The problem of y li redundan y he king provides an instru tive example y li redundan y he king
of another ase where MXOR shines. Streams of data are often a ompanied by CRC
Peterson
\CRC bytes" in order to dete t ommon types of transmission errors [see W. W. Brown
Peterson and D. T. Brown, Pro . IRE 49 (1961), 228{235℄. One popular method, MP3 (MPEG-1 Audio Layer III)
Perez
used for example in MP3 audio les, is to regard ea h byte = (a7 : : : a1 a0 )2 Warren
as if it were the polynomial
(x) = (a7 : : : a1 a0 )x = a7 x7 +    + a1 x + a0 : (179)
When transmitting n bytes n 1 : : : 1 0 , we then ompute the remainder

= n 1 (x) x8(n 1) +    + 1 (x) x8 + 0 (x) x16 mod p(x); (180)
where p(x) = x16 + x15 + x2 +1, using polynomial arithmeti mod 2, and append
the oeÆ ients of as a 16-bit redundan y he k.
The usual way to ompute is to pro ess one byte at a time, a ording to
lassi al methods like Algorithm 4.6.1D. The basi  idea is to de ne the partial
result m = n 1 (x) x8(n 1) +    + m (x) x8m x16 mod p(x) so that n = 0,
and then to use the re ursion
m = (( m+1  8) & # ff00 )  r table [( m+1  8)  m ℄ (181)
to de rease m by 1 until m = 0. Here r table [ ℄ is a 16-bit table entry that
holds the remainder of (x) x16 , modulo p(x) and mod 2, for 0  < 256.
[See A. Perez, IEEE Mi ro 3, 3 (June 1983), 40{50.℄
But of ourse we'd prefer to pro ess 64 bits at on e instead of 8. The solution
is to nd 8  8 matri es A and B su h that
(x) x64  ( A)(x) + ( B )(x) x 8 (modulo p(x) and 2); (182)
for arbitrary bytes , onsidering to be a 1  8 ve tor of bits. Then we an
pad the given data bytes n 1 : : : 1 0 with leading zeros so that n is a multiple
of 8, and use the following eÆ ient redu tion method:
Begin with 0, n n 8, and t ( n+7 : : : n )256 .
While n > 0, set u t  A, v t  B , n n 8, (183)
t ( n+7 : : : n )256  u  (v  8)  (  56), and v & # ff .
Here t  A and t  B denote matrix multipli ation via MXOR. The desired CRC
bytes, (tx16 + x8 ) mod p(x), are then readily obtained from the 64-bit quantity t
and the 8-bit quantity . Exer ise 213 ontains full details; the total running
time for n bytes omes to only ( + 10) n=8 + O(1).
The exer ises below ontain many more instan es where MOR and MXOR lead
to substantial e onomies. New tri ks undoubtedly remain to be dis overed.
For further reading. The book Ha ker's Delight by Henry S. Warren, Jr.
(Addison{Wesley, 2002) dis usses bitwise operations in depth, emphasizing the
great variety of options that are available on real-world omputers that are not
as ideal as MMIX.
52 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
EXERCISES Warren
subtra tion
x 1. [15 ℄ What is the net e e t of setting x x  y, y y  (x & m), x x  y? omplement
2. [16 ℄ (H. S. Warren, Jr.) Are any of the following relations valid for all integers x
negative
in nite-pre ision
and y? (i) x  y  x j y; (ii) x & y  x j y; (iii) jx yj  x  y. S hroeppel
negabinary
3. [M20 ℄ If x = (xn 1 : : : x1 x0 )2 with xn 1 = 1, let x
M = ( xn 1 : : : x1 x0 )2 . Thus we radix 2
have 0 , 1 M, 2 , 3 , : : : = 1, 0, 1, 0, 3, 2, 1, 0, 7, 6, : : : , if we let 0M = 1. Prove
M M M M minimal ex ludant
that (x  y) < jx yj  x  y for all x; y  0. mex
Nim
x 4. [M16 ℄ Let xC = x, xN = x, xS = x +1, and xP = x 1 denote the omplement, game
Conway's eld
the negative, the su essor, and the prede essor of an in nite-pre ision integer x. Then nim multipli ation
we have xCC = xNN = xSP = xPS = x. What are xCN and xNC? re ursively
eld
5. [M21 ℄ Prove or disprove the following onje tured laws on erning binary shifts: Lenstra
a) (x  j )  k = x  (j + k); nim division
Nim, se ond-order
b) (x  j ) & (y  k) = ((x  (j + k)) & y)  k = (x & (y  (j + k)))  j .
6. [M22 ℄ Find all integers x and y su h that (a) x  y = y  x; (b) x  y = y  x.

7. [M22 ℄ (R. S hroeppel, 1972.) Find a fast way to onvert the binary number
x = ( : : : x2 x1 x0 )2 to its negabinary ounterpart x = ( : : : x02 x01 x00 ) 2 , and vi e versa.
Hint: Only two bitwise operations are needed!
x 8. [M22 ℄ Given a nite set S of nonnegative integers, the \minimal ex ludant" of S
is de ned to be
mex(S ) = minf k j k  0 and k 2= S g:
Let x  S denote the set fx  y j y 2 S g. Prove that if x = mex(S ) and y = mex(T )
then x  y = mex((S  y) [ (x  T )).
9. [M26 ℄ (Nim.) Two people play a game with k piles of sti ks, where there are aj
sti ks in pile j . If a1 =    = ak = 0 when it is a player's turn to move, that player
loses; otherwise the player redu es one of the piles by any desired amount, throwing
away the removed sti ks, and it is the other player's turn. Prove that the player to
move an for e a vi tory if and only if a1      ak 6= 0.
10. [HM40 ℄ (Conway's eld.) Continuing exer ise 8, de ne the operation x
y of
\nim multipli ation" re ursively by the formula
x
y = mexf(x
j )  (i
y)  (i
j ) j 0  i < x; 0  j < yg:
Prove that  and
de ne a eld over the set of all nonnegative integers. Prove also
that if 0  x; y < 22n then nx
y < 22n , and x
22n = 22n x. (In parti ular, this eld
ontains sub elds of size 22 for all n  0.) Explain how to ompute x
y eÆ iently.
x 11. [M26 ℄ (H. W. Lenstra, 1978.) Find a simple way to hara terize all pairs of
positive integers (m; n) for whi h m
n = mn in Conway's eld.
12. [M26 ℄ Devise an algorithm for division in Conway's eld. Hint: If x < 2 2
n+1 then
we have x
(x  (x  2 )) < 22 .
n n
13. [M32 ℄ (Se ond-order nim.) Extend the game of exer ise 9 by allowing two kinds
of moves: Either aj is redu ed for some j , as before; or aj is redu ed and ai is repla ed
by an arbitrary nonnegative integer, for some i < j . Prove that the player to move
an now for e a vi tory if and only if the pile sizes satisfy either a2 6= a3      ak or
a1 6= a3  (2
a4 )    ((k 2)
ak ). For example, when k = 4 and (a1 ; a2 ; a3 ; a4 ) =
(7; 5; 0; 5), the only winning move is to (7; 5; 6; 3).
7.1.3 BITWISE TRICKS AND TECHNIQUES 53
14. [M30 ℄ Suppose ea h node of a omplete, in nite binary tree has been labeled with omplete, in nite binary tree
0 or 1. Su h a labeling is onveniently represented as a set T = ft; t0 ; t1 ; t00 ; t01 ; t10 ; t11 ; 2-adi integer
bran hing fun tion
t000 ; : : : g, with one bit t for every binary string ; the root is labeled t, the left permutation
subtree labels are T0 = ft0 ; t00 ; t01 ; t000 ; : : : g, and the right subtree labels are T1 = ruler fun tion rho
ft1 ; t10 ; t11 ; t100 ; : : : g. Any su h labeling an be used to transform a 2-adi integer group
omposition of permutations
x = ( : : : x2 x1 x0 )2 into the 2-adi integer y = ( : : : y2 y1 y0 )2 = T (x) by setting y0 = t, balan ed
y1 = tx0 , y2 = tx0 x1 , et ., so that T (x) = 2Tx0 (bx=2 ) + t. (In other words, x de nes Qui k
an in nite path in the binary tree, and y orresponds to the labels on that path, from XOR identities
animating
right to left in the bit strings as we pro eed from top to bottom of the tree.) pixel pattern
A bran hing fun tion is the mapping xT = x  T (x) de ned by su h a labeling.
For example, if t01 = 1 and all of the other t are 0, we have xT = x  4[ x mod 4 = 2℄.
a) Prove that every bran hing fun tion is a permutation of the 2-adi integers.
b) For whi h integers k is x  (x  k) a bran hing fun tion?
) Let x 7! xT be a mapping from 2-adi integers into 2-adi integers. Prove that xT
is a bran hing fun tion if and only if (x  y) = (x  y ) for all 2-adi x and y.
T T
d) Prove that ompositions and inverses of bran hing fun tions are bran hing fun -
tions. (Thus the set B of all bran hing fun tions is a permutation group.)
e) A bran hing fun tion is balan ed if the labels satisfy t = t 0  t 1 for all . Show
that the set of all balan ed bran hing fun tions is a subgroup of B.
x 15. [M21 ℄ J. H. Qui k noti ed that ((x +2)  3) 2 = ((x 2)  3)+2 for all x. Find
all onstants a and b su h that ((x + a)  b) a = ((x a)  b) + a is an identity.
16. [M31 ℄ A fun tion of x is alled animating if it an be written in the form

((: : : ((((x + a1 )  b1 ) + a2 )  b2 ) +    ) + am )  bm
for some integer onstants a1 , b1 , a2 , b2 , : : : , am , bm , with m > 0.
a) Prove that every animating fun tion is a bran hing fun tion (see exer ise 14).
b) Furthermore, prove that it is balan ed if and only if b1  b2      bm = 0. Hint:
What binary tree labeling orresponds to the animating fun tion ((x  ) 1)  ?
) Let bxe = x  (x 1) = 2 (x)+1 1. Show that every balan ed animating fun tion
an be written in the form
x  bx  p1 e  bx  p2 e      bx  pl e; p1 < p2 <    < pl ;
for some integers fp1 ; p2 ; : : : ; pl g, where l  0, and this representation is unique.
d) Conversely, show that every su h expression de nes a balan ed animating fun tion.
17. [HM36 ℄ The results of exer ise 16 make it pos-
sible to de ide whether or not any two given ani-
mating fun tions are equal. Is there an algorithm
that de ides whether any given expression is iden-
ti ally zero, when that expression is onstru ted
from a nite number of integer variables and on-
stants using only the binary operations + and ?
What if we also allow &?
18. [M25 ℄ The urious pixel pattern shown here
has (x2 y  11) & 1 in row x and olumn y, for
1  x; y  256. Is there any simple way to explain
some of its major hara teristi s mathemati ally?
54 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
x 19. [M37 ℄ (Paley's rearrangement theorem.) Given three ve tors A = (a0 ; : : : ; a2n 1 ), Paley
B = (b0 ; : : : ; b2n 1 ), and C = ( 0 ; : : : ; 2n 1 ) of nonnegative numbers, let sorted
zero-one prin iple
f (A; B; C ) =
X
aj bk l :
0 {1 prin iple
Gosper's ha k
j kl=0 nested parentheses
parenthesis tra e
For example, if n = 2 we have f (A; B; C ) = a0 b0 0 + a0 b1 1 + a0 b2 2 + a0 b3 3 + a1 b0 1 + Gosper's ha k
a1 b1 0 + a1 b2 3 +    + a3 b3 0 ; in general there are 22n terms, one for ea h hoi e of MMIX
prime numbers
j and k. Our goal is to prove that f (A; B; C )  f (A ; B  ; C  ), where A denotes the sieve
ve tor A sorted into nonin reasing order, a0  a1      a2n 1 . Eratosthenes
bookworm
a) Prove the result when all elements of A, B, and C are 0s and 1s. pa k
b) Show that it is therefore true in general. allo ation of memory
) Similarly, f (A; B; C; D) = Pjklm=0 aj bk l dm  f (A ; B ; C ; D). storage allo ation
division, avoiding
x 20. [21 ℄ (Gosper's ha k.) The following seven operations produ e a useful fun tion y Pratt
of x, when x is a positive integer. Explain what this fun tion is and why it is useful. magi mask

u x & x; v x + u; y v + (((v  x)=u)  2):


21. [22 ℄ Constru t the reverse of Gosper's ha k: Show how to ompute x from y .
22. [21 ℄ Implement Gosper's ha k eÆ iently with MMIX ode, assuming that x < 2 ,
64

without using division.


x 23. [27 ℄ A sequen e of nested parentheses an be represented as a binary number by
putting a 1 in the position of ea h right parenthesis. For example, `(())()' orresponds
in this way to (001101)2, the number 13. Call su h a number a parenthesis tra e.
a) What are the smallest and largest parenthesis tra es that have exa tly m 1s?
b) Suppose x is a parenthesis tra e and y is the next larger parenthesis tra e with
the same number of 1s. Show that y an be omputed from x with a short hain
of operations analogous to Gosper's ha k.
) Implement your method on MMIX, assuming that x  32.
x 24. [M30 ℄ Program 1.3.2P instru ted MMIX to produ e a table of the rst ve hundred
prime numbers, using trial division to establish primality. Write an MMIX program that
uses the \sieve of Eratosthenes" (exer ise 4.5.4{8) to build a table of all odd primes
that are less than N, pa ked into o tabytes Q0 , Q1 , : : : , QN=128 as in (27). Assume that
N  232 , and that it's a multiple of 128. What is the running time when N = 3584?
x 25. [15 ℄ Four volumes sit side by side on a bookshelf. Ea h of them ontains exa tly
500 pages, printed on 250 sheets of paper 0.1mm thi k; ea h book also has a front and
ba k over whose thi knesses are 1 mm ea h. A bookworm gnaws its way from page 1
of Volume 1 to page 500 of Volume 4. How far does it travel while doing so?
26. [22 ℄ Suppose we want random a ess to a table of 12 million items of 5-bit data.
We ould pa k 12 su h items into one 64-bit word, thereby tting the table into 8
megabytes of memory. But random a ess then seems to require division by 12, whi h
is rather slow; we might therefore prefer to let ea h item o upy a full byte, thus using
12 megabytes altogether.
Show, however, that there's a memory-eÆ ient approa h that avoids division.
27. [21 ℄ In the notation of Eqs. (32){(43), how would you ompute (a) ( 10 01 )2 ?
a b
(b) ( 10a 11b)2 ? ( ) ( 00a01b )2 ? (d) (0111a 00b )2 ? (e) (01 01a 00b )2 ? (f) (0111a 11b )2 ?
28. [16 ℄ What does the operation (x+1) & x  produ e?
29. [20 ℄ (V. R. Pratt.) Express the magi mask k of (47) in terms of k+1 .
7.1.3 BITWISE TRICKS AND TECHNIQUES 55
30. [20 ℄ If x = 0, the MMIX instru tions (46) will set  64 (whi h is a lose enough 1
approximation to 1). What hanges to (50) and (51) will produ e the same result? Presume
ruler fun tion
x 31. [20 ℄ A mathemati ian named Dr. L. I. Presume de ided to al ulate the ruler analysis of algorithms
fun tion with a simple loop as follows: \Set  0; then while x & 1 = 0, set   + 1 Leiserson
Prokop
and x x  1." He reasoned that, when x is a random integer, the average number Randall
of right shifts is the average value of , whi h is 1; and the standard deviation is only
p 2-adi integers
2, so the loop almost always terminates qui kly. Criti ize his de ision. XOR identities
ruler fun tion
32. [20 ℄ What is the exe ution time for x when (52) is programmed for MMIX? Reitwiesner
notation x
x 33. [26 ℄ (Leiserson, Prokop, and Randall, 1998.) Show that if `58' is repla ed by `49' suÆx parity
in (52), we an use that method to identify both bits of the number y = 2j +2k qui kly, pre x problem, see suÆx parity
when 64 > j > k  0. (Altogether 642 = 2016 ases need to be distinguished.) leftmost
run of 1 bits
34. [M23 ℄ Let x and y be 2-adi integers. True or false: (a) (x & y ) = max(x; y );
[lg x℄
generating fun tions
(b) (x j y) = min(x; y); ( ) x = y if and only if x  y = (x 1)  (y 1). n
x 35. [M26 ℄ A ording to Reitwiesner's theorem, exer ise 4.1{34, every integer n has a n
n
unique representation n = n+ n su h that  (n+) +  (n ) is minimized. Show that binary re urren e
n and n an be al ulated qui kly with bitwise operations. Hint: Prove the identity
+ nu(k) summed
Freed
(x  3x) & ((x  3x)  1) = 0. weighted
36. [20 ℄ Given x = (x63 : : : x1 x0 )2 , suggest eÆ ient ways to al ulate the quantities
sum of bits, weighted
Roki ki
i) x = (x63 : : : x1 x0 )2 , where xk = xk      x1  x0 for 0  k < 64; reversing
ii) x& = (x&63 : : : x&1 x&0 )2 , where x&k = xk ^    ^ x1 ^ x0 for 0  k < 64. inter hange two bits
37. [16 ℄ What hanges to (55) and (56) will make  0 ome out 1?

38. [17 ℄ How long does the leftmost-bit-extra tion pro edure (57) take when imple-
mented on MMIX?
x 39. [20 ℄ Formula (43) shows how to remove the rightmost run of 1 bits from a given
number x. How would you remove the leftmost run of 1 bits?
x 40. [21 ℄ Prove (58), and nd a simple way to de ide if x < y, given x and y  0.
41. [M22 ℄ What are the generating fun tions of the integer sequen es (a) n, (b) n,
and ( ) n? Pn 1
42. [M21 ℄ If n = 2 1 +    + 2 r , with e1 >    > er  0, express the sum
e e
k=0 k
in terms of the exponents e1 , : : : , er .
x 43. [20 ℄ How sparse should x be, to make (63) faster than (62) on MMIX?
x 44. [23 ℄ (E. Freed, 1983.) What's a fast way to evaluate the weighted bit sum P jxj ?
x 45. [20 ℄ (T. Roki ki, 1999.) Explain how to test if xR< yR, without reversing x and y.
46. [22 ℄ Method (68) uses six operations to inter hange two bits xi $ xj of a register.
Show that this inter hange an a tually be done with only three MMIX instru tions.
47. [10 ℄ Can the general Æ -swap (69) also be done with a method like (67)?
48. [M21 ℄ How many di erent Æ -swaps are possible in an n-bit register? (When n = 4,
a Æ-swap an transform 1234 into 1234, 1243, 1324, 1432, 2134, 2143, 3214, 3421, 4231.)
x 49. [M30 ℄ Let s(n) denote the fewest Æ-swaps that suÆ e to reverse an n-bit number.
a) Prove that s(n)  dlog3 ne when n is odd, s(n)  dlog3 3n=2e when n is even.
b) Evaluate s(n) when n = 3m , 2  3m , (3m + 1)=2, and (3m 1)=2.
) What are s(32) and s(64)? Hint: Show that s(5n + 2)  s(n) + 2.
50. [M37 ℄ Continuing exer ise 49, prove that s(n) = log3 n + O (log log n).
56 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
51. [23 ℄ Let be a onstant, 0  < 2 . Find all sequen es of masks (0 ; 1 ; : : : ; d 1 ;
d reversal
^d 2 ; : : : ; ^1 ; ^0 ) su h that the general permutation s heme (71) takes x 7! x , where y li right shift
perfe t shue
the bit permutation  is de ned by either (a) j = j  ; or (b) j = (j + ) mod 2d . outshue
[The masks should satisfy k  d;k and ^k  d;k , so that (71) orresponds to Fig. 12; transposes
see (48). Noti e that reversal, x = xR , is the spe ial ase = 2d 1 of part (a), while fast Fourier transforms
permutation of index digits
part (b) orresponds to the y li right shift x = (x  ) + (x  (2d )).℄ Gosper
52. [22 ℄ Find hexade imal onstants (0 ; 1 ; 2 ; 3 ; 4 ; 5 ; ^4 ;  ^3 ; ^2 ; ^1 ; ^0 ) that ause transposed
matrix multipli ation
(71) to produ e the following important 64-bit permutations, based on the binary Boolean matrix multipli ation
representation j = (j5 j4 j3 j2 j1 j0 )2: (a) j = (j0 j5 j4 j3j2 j1 )2 ; (b) j = (j2 j1 j0 j5 j4 j3 )2 ; MOR
MXOR
( ) j = (j1 j0 j5 j4 j3 j2 )2 ; (d) j = (j0 j1 j2 j3 j4 j5 )2 . [Case (a) is alled a \perfe t shuf- swap
e" be ause it takes (x63 : : : x33 x32 x31 : : : x1 x0 )2 into (x63 x31 : : : x33 x1 x32 x0 )2 ; ase (b) Omega network for routing
transposes an 8  8 matrix of bits; ase ( ), similarly, transposes a 16  4 matrix; and butter y network
shue network for routing
ase (d) arises in onne tion with \fast Fourier transforms," see exer ise 4.6.4{14.℄ bran hing fun tions
x 53. [M25 ℄ The permutations in exer ise 52 are said to be \indu ed by a permutation animating fun tions
of index digits," be ause we obtain j by permuting the binary digits of j . Suppose
j = (j(d 1) : : : j1 j0 )2 , where is a permutation of f0; 1; : : : ; d 1g. Prove that if
has t y les, the 2d -bit permutation x 7! x an be obtained with only d t swaps.
In parti ular, show that this observation speeds up all four ases of exer ise 52.
54. [22 ℄ (R. W. Gosper, 1985.) If an m  m bit matrix is stored in the rightmost
m2 bits of a register, show that it an be transposed by doing (2k (m 1))-swaps for
0  k < dlg me. Write out the method in detail when m = 7.
x 55. [26 ℄ Suppose an n  n bit matrix is stored in the rightmost n2 bits of an n3 -bit reg-
ister. dProve that 18d +2 bitwise operations suÆ e to multiply two su h matri es, when
n = 2 ; the matrix multipli ation an be either Boolean (like MOR) or mod 2 (like MXOR).
56. [24 ℄ Suggest a way to transpose a 7  9 bit matrix in a 64-bit register.
57. [22 ℄ Prove that any permutation of 2 elements an be realized with the network
d
P (2 ) of Fig. 12 by some setting in whi h at most d=(2d 1) of the rossbars are a tive.
d
x 58. [M27 ℄ The rst d olumns of rossbar modules in the permutation network P (2d )
perform a 1-swap, then a 2-swap, : : : , and nally a 2d 1-swap, when the wires of the
network are stret hed into horizontal lines as shown here for d = 3. 0'
Let N = 2d . These N lines, together with the Nd=2 rossbars, 01 1'
form a so- alled \Omega router." The purpose of this exer ise is 2 2'
3'
to study the set
of all permutations ' su h that we an obtain 34 4'
(0'; 1'; : : : ; (N 1)') as outputs on the right of an Omega router 56 5'
when the inputs at the left are (0; 1; : : : ; N 1). 7
6'
7'
a) Prove that j
j = 2Nd=2. (Thus lg j
j = Nd=2  21 lg N !.)
b) Prove that a permutation ' of f0; 1; : : : ; N 1g belongs to
if and only if
i mod 2k = j mod 2k and i'  k = j'  k implies i' = j' ()
for all 0  i; j < N and all 0  k  d.
) Simplify ondition () to the following, for all 0  i; j < N :
(i'  j') < (i  j ) implies i = j:
d) Let T be the set of all permutations  of f0; 1; : : : ; N 1g su h that (i  j ) =
(i  j ) for all i and j . (This is the set of bran hing fun tions onsidered in exer-
ise 14, modulo 2d ; dso it has 2N 1 members, 2N=2+d 1 of whi h are the animating
fun tions modulo 2 .) Prove that ' 2
if and only if ' 2
for all  2 T .
7.1.3 BITWISE TRICKS AND TECHNIQUES 57
e) Suppose ' and are permutations of
that operate on di erent elements; that permutation network
is, j' 6= j implies j = j , for 0  j < N . Prove that ' 2
. generating fun tion
varian e
59. [M30 ℄ Given 0  a < b < N = 2 , how many Omega-routable permutations
d NP-hard
operate only on the interval [a : : b℄? (Thus we want to ount the number of ' 2
su h represent an arbitrary permutation
zipper fun tion
that j' 6= j implies a  j  b. Exer ise 58(a) is the spe ial ase a = 0, b = N 1.) polynomial
polynomial remainder mod 2
60. [HM28 ℄ Given a random permutation of f0; 1; : : : ; 2n 1g, let pnk be the proba- trinomial
bility that there are 2k ways to set the rossbars in the rst and last olumns of the squaring a polynomial
permutation network P (2n) when realizing this permutation. In other words, pnk is the perfe t shuing
MMIX
probability that the asso iated graph has k y les (see (75)). What is the generating Steele
fun tion Pk0 pnk zk ? What are the mean and varian e of 2k ? ompression
unpa king
61. [46 ℄ Is it NP-hard to de ide whether a given permutation is realizable with at un ompressing
least one mask j = 0, using the re ursive method of Fig. 12 as implemented in (71)? sheep-and-goats
x 62. [22 ℄ Let N = 2d . We an obviously represent a permutation  of f0; 1; : : : ; N 1g
by storing a table of N numbers, d bits ea h. With this representation we have instant
a ess to y = x, given x; but it takes
(N ) steps to nd x = y when y is given.
Show that, with the same amount of memory, we an represent an arbitrary
permutation in su h a way that x and y are both omputable in O(d) steps.
63. [16 ℄ For what integers w , x, y , and z does the zipper fun tion satisfy (i) x z y =
y z x? (ii) (x z y)  z = (x dz=2e) z (y bz=2 )? (iii) (w z x)&(y z z ) = (w & y) z (x & z )?
0
64. [22 ℄ Find a \simple" expression for the zipper-of-sums (x + x ) z (y + y ), as a
0
0
fun tion of z = x z y and z = x z y .0 0
65. [M16 ℄ The binary polynomial u(x) = u0 + u1 x +    + un 1 x
n 1 (mod 2) an be
represented by the integer u = (un 1 : : : u1 u0 )2 . If u(x) and v(x) orrespond to integers
u and v in this way, what polynomial orresponds to u z v ?
x 66. [M26 ℄ Suppose the polynomial u(x) has been represented as an n-bit integer u as
in exer ise 65, and let v = u  (u  Æ)  (u  2Æ)  (u  3Æ)     for some integer Æ.
a) What's a simple way to des ribe the polynomial v(x)?
b) Suppose n is large, and the bits of u have been pa ked into 64-bit words. How
would you ompute v when Æ = 1, using bitwise operations in 64-bit registers?
) Consider the same question as (b), but when Æ = 64.
d) Consider the same question as (b), but when Æ = 3.
e) Consider the same question as (b), but when Æ = 67.
67. [M31 ℄ If u(x) is a polynomial of degree < n, represented as in exer ise 65, dis uss
the omputation of v(x) = u(x)2 mod (xn + xm + 1), when 0 < m < n and both m
and n are odd. Hint: This problem has an interesting onne tion with perfe t shuing.
68. [20 ℄ What three MMIX instru tions implement the Æ -shift operation, (79)?

69. [25 ℄ Prove that method (80) always extra ts the proper bits when the masks k
have been set up properly: We never lobber any of the ru ial bits yj .
x 70. [31 ℄ (Guy L. Steele Jr., 1994.) What's a good way to ompute the masks 0 , 1 ,
: : : , d 1 that are needed in the general ompression pro edure (80), given  6= 0?
71. [17 ℄ Explain how to reverse the pro edure of (80), going from the ompa t value
y = (yr 1 : : : y1 y0 )2 to a number z = (z63 : : : z1 z0 )2 that has zji = yi for 0  i < r.
2d 1
72. [10 ℄ Simplify the expression (xzy ) 0 , when x; y < 2 . (See Eqs. (76) and (81).)


73. [22 ℄ Prove that d sheep-and-goats steps will implement any 2 -bit permutation.
d
58 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
74. [22 ℄ Given ounts ( 0 ; 1 ; : : : ; 2d 1 ) for the Chung{Wong pro edure, explain why Chung
an
P 0
appropriate
P 0
y li 1-shift an always produ e new ounts ( 00 ; 01 ; : : : ; 02d 1 ) for whi h Wong
repli ates
2l = 2l+1 , thus allowing the re ursion to pro eed. mapping modules
x 75. [32 ℄ The method of Chung and Wong repli ates bit l of a register exa tly l rossbar module
mapping network
times, but it produ es results in s rambled order. For example, the ase ( 0 ; : : : ; 7 ) = sorting network
(1; 2; 0; 2; 0; 2; 0; 1) illustrated in the text produ es (x7 x0 x1 x5 x5 x3 x1 x3 )2 . In some distribution network, see mapping network
appli ations this an be a disadvantage; we might prefer to have the bits retain their permutation network
Floyd
original order, namely (x7 x5 x5 x3 x3 x1 x1 x0 )2 ind that example. Pratt
Prove that the permutation network P (2 ) of Fig. 12 an be modi ed to a hieve sorting network
disjointness
this goal, given any sequen e of ounts ( 0 ; 1 ; : : : ; 2d 1 ), if we repla e the d  2d 1 represent sets
rossbar modules in the right-hand half by general 2  2 mapping modules. (A rossbar Qui k
module with inputs (a; b) produ es either (a; b) or (b; a) as output; a mapping module maximal proper subsets
s attered di eren e
an also produ e (a; a) or (b; b).) s attered a umulator
s attered shifting
76. [47 ℄ A mapping network is analogous to a sorting network or a permutation stret hed
network, but it uses 2  2 mapping modules instead of omparators or rossbars, and it segmented broad asting, see stret hing
is supposed to be able to output all nn possible mappings of its n inputs. Exer ise 75,
in onjun tion with Fig. 12, shows that a mapping network for n = 2d exists with only
4d 2 levels of delay, and with n=2 modules on ea h level; furthermore, this onstru tion
needs general 2  2 mapping modules (instead of simple rossbars) in only d of those
levels.
To within O(n), what is the smallest number G(n) of modules that are suÆ ient
to implement a general n-element mapping network?
77. [26 ℄ (R. W. Floyd and V. R. Pratt.) Design an algorithm that tests whether
or not a given standard n-network is a sorting network, as de ned in the exer ises
of Se tion 5.3.4. When the given network has r omparator modules, your algorithm
should use O(r) bitwise operations on words of length 2n .
78. [M27 ℄ (Testing disjointness.) Suppose the binary numbers x1 , x2 , : : : , xm ea h
represent sets in a universe of n k elements, so that ea h xj is less than 2n k . J. H.
Qui k (a student) de ided to test whether the sets are disjoint by testing the ondition
x1 j x2 j    j xm = (x1 + x2 +    + xm ) mod 2n :
Prove or disprove: Qui k's test is valid if and only if k  lg(m 1).
x 79. [20 ℄ If x 6= 0 and x  , what is an easy way to determine the largest integer
x0 < x su h that x0  ? (Thus (x0)0 = (x0 ) 0 = x , in onne tion with (84).)
80. [20 ℄ Suggest a fast way to nd all maximal proper subsets of a set. More pre isely,
given  with  = m , we want to nd all x   su h that x = m 1.
81. [21 ℄ Find a formula for \s attered di eren e," to go with the \s attered sum" (86).

82. [21 ℄ Is it easy to shift a s attered a umulator to the left by 1, for example to
hange (y2x4 x3 y1 x2 y0 x1 x0 )2 to (y1 x4 x3 y0 x2 0 x1x0 )2 ?
x 83. [28 ℄ Continuing exer ise 82, nd a way to shift a s attered 2d -bit a umulator to
the right by 1, given z and , in O(d) steps.
84. [25 ℄ Given n-bit numbers z = (zn 1 : : : z1 z0 )2 and  = (n 1 : : : 1 0 )2 , explain
how to al ulate the \stret hed" quantities z )  = (z(n 1)) : : : z1) z0) )2 and
z +  = (z(n 1)+ : : : z1+ z 0+ )2 , where
j )  = maxfk j k  j and k = 1g; j +  = minfk j k  j and k = 1g;
7.1.3 BITWISE TRICKS AND TECHNIQUES 59
we let zj) = 0 if k = 0 for 0  k  j , and zj+ = 0 if k = 0 for n > k  j . For To her
example, if n = 11 and  = (01101110010)2, then z )  = (z9 z9 z8 z6 z6 z5 z4 z1 z1 z1 0)2 allo ate
storage allo ation
and z +  = (0z9 z8 z8 z6 z5 z4 z4 z4 z1 z1 )2 . interleaving the bits
85. [22 ℄ (K. D. To her, 1954.) Imagine that you have a vintage 1950s omputer
page fault
ASCII ode
with a drum memory for storing data, and that you need to do some omputations lower ase letters
with a 32  32  32 array a[i; j; k℄, whose subs ripts are 5-bit integers in the range upper ase
0  i; j; k < 32. Unfortunately your ma hine has only a very small high-speed memory: multibyte subtra tion
pa ked data+
You an a ess only 128 onse utive elements of the array in fast memory at any0 time. division, 2-bit
Sin e your appli ation
0 0
usually moves
0
from a[i; j; k℄ to a neighboring position a[i ; j 0 ; k0 ℄, averaging
rounding to the nearest odd
where ji i j + jj j j + jk k j = 1, you have de ided to allo ate the array so that, if unbiased rounding
i = (i4 i3 i2 i1 i0 )2 , j = (j4 j3 j2 j1 j0 )2 , and k = (k4 k3 k2 k1 k0 )2 , the array entry a[i; j; k℄ is Alpha hannels
stored in drum lo ation (k4 j4 i4k3 j3 i3 k2 j2 i2 k1 j1 i1 k0 j0 i0 )2 . By interleaving the bits in subtra tion
distin t
this way, a small hange to i, j , or k will ause only a small hange in the address. ags
Dis uss implementation of this addressing fun tion: (a) How does it hange when Lamport
i, j , or k hanges by 1? (b) How would you handle a random a ess to a[i; j; k℄, given
i, j , and k? ( ) How would you dete t a \page fault" (namely, the ondition that a
new segment of 128 elements must be swapped into fast memory from the drum)?
86. [M25 ℄ An array of 2  2  2 elements is to be allo ated by putting a[i; j; k ℄
p q r
into a lo ation whose bits are the p + q + r bits of (i; j; k), permuted in some fashion.
Furthermore, this array is to be stored in an external memory using pages of size 2s .
(Exer ise 85 onsiders the ase p = q = r = 5 and s = 7.) What allo ation strategy
of 0this0 kind minimizes the number of times that a[i; j; k℄ is on a di erent page from
a[i ; j ; k0 ℄, summed over all i, j , k, i0 , j 0 , and k0 su h that ji i0 j + jj j 0 j + jk k0 j = 1?
x 87. [20 ℄ Suppose ea h byte of a 64-bit word x ontains an ASCII ode that represents
either a letter, a digit, or a spa e. What three bitwise operations will onvert all the
lower ase letters to upper ase?
88. [20 ℄ Given x = (x7 : : : x0 )256 and y = (y7 : : : y0 )256 , ompute z = (z7 : : : z0 )256 ,
where zj = (xj yj ) mod 256 for 0  j < 8. (See the addition operation in (87).)
89. [23 ℄ Given x = (x31 : : : x1 x0 )4 and y = (y31 : : : y1 y0 )4 , ompute z = (z31 : : : z1 z0 )4 ,
where zj = bxj =yj for 0  j < 32, assuming that no yj is zero.
90. [20 ℄ The bytewise averaging rule (88) always rounds downward when xj + yj is
odd. Make it less biased by rounding to the nearest odd integer in su h ases.
x 91. [26 ℄ (Alpha hannels.) Re ipe (88) is a good way to ompute bytewise averages,
but appli ations to omputer graphi s often require a more general blending of 8-bit
values. Given three o tabytes x = (x7 : : : x0 )256 , y = (y7 : : : y0)256 , = (a7 : : : a0 )256 ,
show that bitwise operations allow us to ompute z = (z7 : : : z0 )256 , where ea h byte zj
is a good approximation to ((255 aj )xj + aj yj )=255, without doing any multipli ation.
Implement your method with MMIX instru tions.
x 92. [21 ℄ What happens if the se ond line of (88) is hanged to `z (x j y) z'?
93. [18 ℄ What basi formula for subtra tion is analogous to formula (89) for addition?
94. [21 ℄ Let x = (x7 : : : x1 x0 )256 and t = (t7 : : : t1 t0 )256 in (90). Can tj be nonzero
when xj is nonzero? Can tj be zero when xj is zero?
95. [22 ℄ What's a bitwise way to tell if all bytes of x = (x7 : : : x1 x0 )256 are distin t?
96. [21 ℄ Explain (93), and nd a similar formula that sets test ags tj 128[xj  yj ℄.
97. [23 ℄ Leslie Lamport's paper in 1975 presented the following \problem taken from
an a tual ompiler optimization algorithm": Given o tabytes x = (x7 : : : x0 )256 and y =
60 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
(y7 : : : y0 )256, ompute t = (t7 : : : t0 )256 and z = (z7 : : : z0 )256 so that tj 6= 0 if and only multibyte max and min
if xj 6= 0, xj 6= '*', and xj 6= yj ; and zj = (xj = 0? yj : (xj 6= '*' ^ xj 6= yj ? '*': xj )). omparison of bytes
bytes, testing relative order of
98. [20 ℄ Given x = (x7 : : : x0 )256 and y = (y7 : : : y0 )256 , ompute z = (z7 : : : z0 )256 ags
and w = (w7 : : : w0 )256 , where zj = max(xj ; yj ) and wj = min(xj ; yj ) for 0  j < 8. binary- oded de imal
radix onversion
x 99. [28 ℄ Find hexade imal onstants a, b, , d, e su h that the six bitwise operations time
mixed-radix representation
polynomials modulo 5
y x  a; t ((((y & b) + ) j y)  d) & e unary
max
will ompute the ags t = (f7 : : : f1f0 )256 7 from any bytes x = (x7 : : : x1 x0 )256, where min
date
f0 = [ x0 = '!' ℄; f1 = [ x1 6= '*' ℄; f2 = [ x2 < 'A' ℄; f3 = [ x3 > 'z' ℄; f4 = [ x4  'a' ℄; range he king
sort
f5 = [ x5 2 f'0'; '1'; : : : ; '9'g ℄; f6 = [ x6  168℄; f7 = [ x7 2 f'<'; '='; '>'; '?'g ℄: Fredman
Willard
100. [25 ℄ Suppose x = (x15 : : : x1 x0 )16 and y = (y15 : : : y1 y0 )16 are binary- oded de -
2 x +
binary log+
imal numbers, where 0  xj ; yj < 10 for ea h j . Explain how to ompute their sum x+
u = (u15 : : : u1 u0 )16 and di eren e v = (v15 : : : v1 v0 )16 , where 0  uj ; vj < 10 and extra t the most signi ant bit
hyper oor
(u15 : : : u1 u0 )10 = ((x15 : : : x1 x0 )10 + (y15 : : : y1y0 )10 ) mod 1016 ; MMIX
broadword
(v15 : : : v1 v0)10 = ((x15 : : : x1 x0 )10 (y15 : : : y1y0 )10 ) mod 1016 ; ruler fun tion

without bothering to do any radix onversion.


x 101. [22 ℄ Two o tabytes x and y ontain amounts of time, represented in ve elds
that respe tively signify days (3 bytes), hours (1 byte), minutes (1 byte), se onds
(1 byte), and millise onds (2 bytes). Can you add and subtra t them qui kly, without
onverting from this mixed-radix representation to binary and ba k again?
102. [25 ℄ Dis uss routines for the addition and subtra tion of polynomials modulo 5,
when (a) 16 4-bit oeÆ ients or (b) 21 3-bit oeÆ ients are pa ked into a 64-bit word.
x 103. [22 ℄ Sometimes it's onvenient to represent small numbers in unary notation, so
that 0, 1, 2, 3, : : : , k appear respe tively as (0)2 , (1)2 , (11)2, (111)2, : : : , 2k 1 inside
the omputer. Then max and min are easily implemented as j and &.
Suppose the bytes of x = (x7 : : : x0 )256 are su h unary numbers, while the bytes
of y = (y7 : : : y0 )256 are all either 0 or 1. Explain how to \add" y to x or \subtra t" y
from x, giving u = (u7 : : : u0 )256 and v = (v7 : : : v0 )256 where
uj = 2 min(8;lg(xj +1)+yj ) 1 and vj = 2max(0;lg(xj +1) yj ) 1:
104. [22 ℄ Use bitwise operations to he k the validity of a date represented in \year-
month-day" elds (y; m; d) as in (22). You should ompute a value t that is zero if and
only if 1900 < y < 2100, 1  m  12, and 1  d  max day (m), where month m has
at most max day (m) days. Can it be done in fewer than 20 operations?
105. [30 ℄ Given x = (x7 : : : x0 )256 and y = (y7 : : : y0 )256 , dis uss bitwise operations
that will sort the bytes into order, so that x0  y0      x7  y7 afterwards.
106. [27 ℄ Explain the Fredman{Willard pro edure (95). Also show that a simple
modi ation of their method will ompute 2x without doing any left shifts.
x 107. [22 ℄ Implement Algorithm B on MMIX when d = 4, and ompare it with (56).
108. [26 ℄ Adapt Algorithm B to ases where n does not have the form d  2 .
d
109. [20 ℄ Evaluate x for n-bit numbers x in O (log log n) broadword steps.
7.1.3 BITWISE TRICKS AND TECHNIQUES 61
x 110. [30 ℄ Suppose n = 22e and 0  x < n. Show how to ompute 1  x in O(e) extra t the most signi ant bit
broadword steps, using only shift ommands that shift by a onstant amount. (Together pattern
strong broadword hain
with Algorithm B we an therefore extra t the most signi ant bit of an n-bit number broadword hain, strong
in O(log log n) su h steps.) 2-adi hains
rational 2-adi numbers
111. [23 ℄ Explain the 01 pattern re ognizer, (98).
r regular language
shifts
112. [46 ℄ Can all o urren es of the pattern 1 0 be identi ed in O (1) broadword steps?
r
bran hes
113. [23 ℄ A strong broadword hain is a broadword hain of a spe i ed width n that
monus
is also a 2-adi hain, for all n-bit hoi es of x0 . For example, the 2-bit broadword
hain (x0 ; x1 ) with x1 = x0 + 1 is not strong be ause x0 = (11)2 makes x1 = (00)2 .
But (x0 ; x1 ; : : : ; x4 ) is a strong broadword hain that omputes (x0 + 1) mod 4 for all
0  x0 < 4 if we set x1 = x0  1, x2 = x0 & 1, x3 = x2  1, and x4 = x1  x3 .
Given a broadword hain (x0 ; x1 ; : : : ; xr ) of width n, onstru t a strong broadword
hain (x00 ; x01 ; : : : ; x0 0r0 )0 of the 0same width, su h that r0 = O(r) and (x0 ; x1 ; : : : ; xr ) is a
subsequen e of (x0 ; x1 ; : : : ; xr0 ).
114. [16 ℄ Suppose (x0 ; x1 ; : : : ; xr ) is a strong broadword hain of width n that om-
putes the value f (x) = xr whenever an n-bit number x = x0 is given. Constru t a
broadword hain (X0; X1 ; : : : ; Xr ) of width mn that omputes Xr = (f (1 ) : : : f (m ))2n
for any given mn-bit value X0 = (1 : : : m )2n , where 0  1 ; : : : ; m < 2n .
x 115. [24 ℄ Given a 2-adi integer x = ( : : : x2 x1 x0 )2 , we might want to ompute y =
( : : : y2 y1 y0 )2 = f (x) from x by zeroing out all blo ks of onse utive 1s that (a) are
not immediately followed by two 0s; or (b) are followed by an odd number of 0s
before the next blo k of 1s begins; or ( ) ontain an odd number of 1s. For exam-
ple, if x is ( : : : 01110111001101000110)2 then y is (a) ( : : : 00000111000001000110)2;
(b) ( : : : 00000111000000000110)2; ( ) ( : : : 00000000001100000110)2. (In nitely many
0s are assumed to appear at the right of x0 . Thus, in ase (a) we have
yj = xj ^ ((xj 1 ^xj 2 ) _ (xj 1 ^xj 2 ^xj 3 ) _ (xj 1 ^xj 2 ^xj 3 ^xj 4 ) _    )
for all j , where xk = 0 for k < 0.) Find 2-adi hains for y in ea h ase.
116. [HM30 ℄ Suppose x = ( : : : x2 x1 x0 )2 and y = ( : : : y2 y1 y0 )2 = f (x), where y is
omputable by a 2-adi hain having no shift operations. Let L be the set of all binary
strings su h that yj = [ xj : : : x1 x0 2 L ℄, and assume that all onstants used in the hain
are rational 2-adi numbers. Prove that L is a regular language. What languages L
orrespond to the fun tions in exer ise 115(a) and 115(b)?
117. [HM46 ℄ Continuing exer ise 116, is there any simple way to hara terize the reg-
ular languages L that arise in shift-free 2-adi hains? (The language L = 0 (1010 )
does not seem to orrespond to any su h hain.)
118. [30 ℄ A ording to Lemma A, we annot ompute the fun tion x  1 for all n-
bit numbers x by using only additions, subtra tions, and bitwise Boolean operations
(no shifts or bran hes). Show, however, that O(n) su h operations are ne essary and
suÆ ient if we in lude also the \monus" operator y . z in our repertoire.
119. [20 ℄ Evaluate the fun tion fpy (x) in (102) with four broadword steps.

x 120. [M25 ℄ There are 2n2mn fun tions that take n-bit numbers (x1 ; : : : ; xm ) into an
n-bit number f (x1 ; : : : ; xm ). How many of them an be implemented with addition,
subtra tion, multipli ation, and non-shift bitwise Boolean operations (modulo 2n )?
x 121. [M25 ℄ By exer ise 3.1{6, a fun tion from [0 : : 2n ) into itself is eventually periodi .
62 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
a) Prove that if f is any n-bit broadword fun tion that an be implemented without period length
shift instru tions, the lengths of its periods are always powers of 2. multiplying
broadword hain
b) However, for every p between 1 and n, there's an n-bit broadword hain of length 3 extra ting the most signi ant bit
that has a period of length p. basi RAM
sideways addition
122. [M22 ℄ Complete the proof of Lemma B. symmetri fun tion
parity fun tion
123. [M23 ℄ Let aq be the onstant 1 + 2 + 2 +    + 2
q 2q (q 1)q
= (2q2 1)=(2q 1). ir ular list
Using (104), showq2that there are in nitely many q su h that the operation of multiplying graph algorithms+
liques+
by aq , modulo 2 , requires
(log q) steps in any n-bit broadword hain with n  q2 . graph representation
0
124. [M38 ℄ Complete the proof of Theorem R by de ning an n-bit broadword hain
set representation
maximal
(x0 ; x1 ; : : : ; xf ) and sets (U0 ; U1 ; : : : ; Uf ) su h that, for 0  t  f , all inputs x 2 Ut lead Moody
to an essentially similar state Q(x; t), in the following sense: (i) The urrent instru tion Hollis
in Q(x; t) does not depend on x. (ii) If register rj has a known value in Q(x; t), it holds adja en y matrix
xj 0 for some de nite index j 0  t. (iii) If memory lo ation M [z ℄ has been hanged, it
holds xz00 for some de nite index z00  t. (The t
values of j 0 and z00 depend on j , z,
and t, but not on x.) Furthermore jUtj  n=2 , and the program annot guarantee
2 1
that r1 = x when t < f . Hint: Lemma B implies that a limited number of shift
amounts and memory addresses need to be onsidered when t is small.
0
125. [M33 ℄ Prove Theorem P . Hint: Lemma B remains true if we repla e `= 0' by
`= s ' in (103), for any values s .
126. [M46 ℄ Does the operation of extra ting the most signi ant bit, 2 , require
x

(log log n) steps in an n-bit basi RAM? (See exer ise 110.)
127. [20 ℄ Prove that if there's a way to arry out sideways addition of n-bit numbers
in O(log log n) broadword steps, then every symmetri fun tion of a number's n bits
an also be done in O(log log n) broadword steps.
128. [M46 ℄ Does sideways addition require
(log n) broadword steps?

129. [M46 ℄ Can the parity fun tion (x) mod 2 be omputed in O (1) broadword
steps?
130. [M46 ℄ Is there an n-bit onstant a su h that the fun tion (a  x) mod 2 requires
n

(log n) n-bit broadword steps?
x 131. [23 ℄ Write an MMIX program for Algorithm R when the graph is represented by
ar lists. Vertex nodes have at least two elds, alled LINK and ARCS, and ar nodes have
TIP and NEXT elds, as explained in Se tion 7. Initially all LINK elds are zero, ex ept
in the given set of verti es Q, whi h is represented as a ir ular list. Your program
should hange that ir ular list so that it represents the set R of all rea hable verti es.
x 132. [M27 ℄ A lique in a graph is a set of mutually adja ent verti es; a lique is
maximal if it's not ontained in any other. The purpose of this exer ise is to dis uss
an algorithm due to J. K. M. Moody and J. Hollis, whi h provides a onvenient way
to nd every maximal lique of a not-too-large graph, using bitwise operations.P u
Suppose G is a graph with n verti es V = f0; 1; : : : ; n 1g. Let v = Pf2u j
u v or u = vg be row v of G's re exive adja en y matrix, and let Æv = f2 j
u 6= vg P = 2n u 1 2v . Every subset U  V is representable as an n-bit integer
(U ) = u2U 2 ; for example, Æv = (V n v). We also de ne the bitwise interse tion
 (U ) = 0& u<n
(u 2 U ? u : Æu ):
For example, if n = 5 we have  (f0; 2g) = 0 & Æ1 & 2 & Æ3 & Æ4 .
7.1.3 BITWISE TRICKS AND TECHNIQUES 63
a) Prove that U is a lique if and only if  (U ) = (U ). independent sets
b) Show that if  (U ) = (T ) thenk T is a lique. vertex overs
mappings for ternary values
) For 1  k  n, onsider the 2 bitwise interse tions Lukasiewi z
n o three-valued logi
Ck = 0& u<k
( u 2 U ? u : Æ u ) U  f0; 1; : : : ; k 1g ;
negation
possibility
ne essity
and let Ck+ be the maximal elements of Ck . Prove that U is a maximal lique if equivalen e
multipli ation tables for groupoids
and only if (U ) 2 Cn+ . + groupoids, mult tables for
d) Explain how to ompute Ck from Ck+ 1 , starting with C0+ = 2n 1. pa ked
2-bit en oding
x 133. [20 ℄ Given a graph G, how an the algorithm of exer ise 132 be used to nd half adder
(a) all maximal independent sets of verti es? (b) all minimal vertex overs (sets that balan ed ternary numbers
hit every edge)? full adder
Ulam numbers
134. [15 ℄ Nine lasses of mappings for ternary values appear in (119), (123), and (124).
sub ube
asterisk odes
To whi h lass does the representation (128) belong, if a = 0, b = , = 1? bit odes
135. [22 ℄ Lukasiewi z in luded a few operations besides (127) in his three-valued logi :
prime impli ants
onsensus
:x (negation) inter hanges 0 with 1 but leaves  un hanged;  x (possibility) is de ned hessboard
as :x ) x ; x (ne essity) is de ned as ::x ; and x , y (equivalen e) is de ned as knight
bit board
(x ) y) ^ (y ) x). Explain how to perform these operations using representation (128). sibling
136. [29 ℄ Suggest two-bit en odings for binary operations on the set fa; b; g that are
sideways heap+
de ned by the following \multipli ation tables":

a b  
a b 
a b a
(a) b ; (b) b a ; ( ) a a :
b a a b
137. [21 ℄ Show that the operation in exer ise 136( ) is simpler with pa ked ve tors
like (131) than with the unpa ked form (130).
138. [24 ℄ Find an example of three-state-to-two-bit en oding where lass Va is best.

139. [25 ℄ If x and y are signed bits 0, +1, or 1, what 2-bit en oding is good for
al ulating their sum (z1 z2 )3 = x + y, where z1 and z2 are also required to be signed
bits? (This is a \half adder" for balan ed ternary numbers.)
140. [27 ℄ Design an e onomi al full adder for balan ed ternary numbers: Show how
to ompute signed bits u and v su h that 3u + v = x + y + z when x; y; z 2 f0; +1; 1g.
x 141. [30 ℄ The Ulam numbers hU1 ; U2 ; : : : i = h1; 2; 3; 4; 6; 8; 11; 13; 16; 18; 26; : : : i are
de ned for n  3 by letting Un be the smallest integer > Un 1 that has a unique
representation Un = Uj + Uk for 0 < j < k < n. Show that a million Ulam numbers
an be omputed rapidly with the help of bitwise te hniques.
x 142. [33 ℄ A sub ube su h as 10101 an be represented by asterisk odes 10010100
and bit odes 01001001, as in (85); but many other en odings are also possible. What
representation s heme for sub ubes works best, for nding prime impli ants by the
onsensus-based algorithm of exer ise 7.1.1{31?
143. [20 ℄ Let x be a 64-bit number that represents an 8  8 hessboard, with a 1 bit
in every position where a knight is present. Find a formula for the 64-bit number f (x)
that has a 1 in every position rea hable in one move by a knight of x. For example,
the white knights at the start of a game orrespond to x = # 42 ; then f (x) = # a51800 .
144. [16 ℄ What node is the sibling of node j in a sideways heap? (See (134).)

145. [17 ℄ Interpret (137) when h is less than the height of j .


64 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
x 146. [M20 ℄ Prove Eq. (138), whi h relates the  and  fun tions. 
x 147. [M20 ℄ What values of v, v, v, and j o ur in Algorithm V when the forest is 
range minimum query
a) the empty digraph with verti es fv1 ; : : : ; vn g and no ar s? minimum element in subarray
b) the oriented path vn !    ! v2 ! v1? binary sear h tree
nearest ommon an estors
148. [M21 ℄ When prepro essing for Algorithm V, is it possible to have x3 !
 navigation pile
  
y2 ! x2 ! y1 ! x1 in S when x3 ! x2 ! x1 !  and y2 ! y1 !  hyperboli plane
tessellation
in the forest? (If so, two di erent trees are \entangled" in S .) phi
x 149. [23 ℄ Design a prepro essing pro edure for Algorithm V. negaFibona i
Fibona i number system
x 150. [25 ℄ Given an array of elements A1 , : : : , An , the range minimum query problem pentagrid
usterization+
is to determine k(i; j ) su h that Ak(i;j) = min(Ai ; : : : ; Aj ) for any given indi es i and j bitmaps++++
with 1  i  j  n. Prove that Algorithm V will solve this problem, after O(n) steps of
prepro essing on the array A have prepared the ne essary tables (; ; ;  ). Hint: Con-
sider the binary sear h tree onstru ted from the sequen e of keys (p(1); p(2); : : : ; p(n)),
where p is a permutation of f1; 2; : : : ; ng su h that Ap(1)  Ap(2)      Ap(n) .
151. [22 ℄ Conversely, show that any algorithm for range minimum queries an be used
to nd nearest ommon an estors, with essentially the same eÆ ien y.
152. [M21 ℄ Prove that Algorithm V is orre t.
x 153. [M20 ℄ The pointers in a navigation pile like (144) an be pa ked into a binary
string su h as
0100100000101000000000:
2 4 6 8 10 12 14 16 18 20 22 24
At what bit position (from the left) does the pointer for node j end?
154. [20 ℄ The gray lines in Fig. 14 show how ea h pentagon is omposed of ten
triangles. What de omposition of the hyperboli plane is de ned by those gray lines
alone, without the bla k pentagon edges?
x 155. [M21 ℄ Prove that (x) mod 1 = ( 0)1= when is the negaFibona i ode for x.
156. [21 ℄ Design algorithms (a) to onvert a given integer x to its negaFibona i
ode , and (b) to onvert a given negaFibona i ode to x = N ( ).
157. [M21 ℄ Explain the re ursion (148) for negaFibona i prede essor and su essor.
158. [M26 ℄ Let = an : : : a1 be the binary ode for F ( 0) = an Fn+1 +    + a1 F2
in the standard Fibona i number system (146). Develop methods analogous to (148)
and (149) for in rementing and de rementing su h odewords.
159. [M34 ℄ Exer ise 7 shows that it's easy to onvert between the negabinary and
binary number systems. Dis uss onversion between negaFibona i odewords and the
ordinary Fibona i odes in exer ise 158.
160. [M29 ℄ Prove that (150) and (151) yield onsistent ode labels for the pentagrid.
161. [20 ℄ The ells of a hessboard an be olored bla k and white, so that neighboring
ells have di erent olors. Does the pentagrid also have this property?
x 162. [HM37 ℄ Explain how to draw the pentagrid, Fig. 14. What ir les are present?
163. [HM41 ℄ Devise a way to navigate through the triangles in the tiling of Fig. 18.
164. [23 ℄ The original de nition of usterization in 1957 was not (157) but

uster0 (X ) = X & (XNW & XN & XNE & XW & XE & XSW & XS & XSE ):
Why is (157) preferable?
7.1.3 BITWISE TRICKS AND TECHNIQUES 65
165. [21 ℄ (R. A. Kirs h.) Dis uss the omputation of the 33 ellular automaton with Kirs h
ellular automaton
X (t+1) = uster(X (t) ) = X (t) & (XN(t) j XW (t)
j XE(t) j XS(t)): Life
broadword
166. [M23 ℄ Let f (M; N ) be the maximum number of bla k pixels in an M  N torus
bitmap X for whi h X = uster(X ). Prove that f (M; N ) = 54 MN + O(M + N ). Cheshire at
Life
167. [24 ℄ (Life.) If the bitmap X represents an array of ells that are either dead (0) Guo
Hall
or alive (1), the Boolean fun tion thinning
noisy data
f (xNW ; : : : ; x; : : : ; xSE ) = [2 < xNW + xN + xNE + xW + 12 x + xE + xSW + xS + xSE < 4℄ thinning
opti al hara ter re ognition
an lead to astonishing life histories when it governs a ellular automaton as in (158). losed
a) Find a way to evaluate f with a Boolean hain of 26 steps or less. open
lean
b) Let Xj(t) denote row j of X at time t. Show that Xj(t+1) an be(t)evaluated in
at(tmost 23 broadword steps, as a fun tion of the three rows Xj 1 , Xj(t) , and
)
Xj +1 .
x 168. [23 ℄ To keep an image nite, we might insist that a 3  3 ellular automaton
treats a M  N bitmap as a torus, wrapping around seamlessly between top and bottom
and between left and right. The task of simulating its a tions eÆ iently with bitwise
operations is somewhat tri ky: We want to minimize referen es to memory, yet ea h
new pixel value depends on old values that lie on all sides. Furthermore the shifting of
bits between neighboring words tends to be awkward, taxing the apa ity of a register.
Show that su h diÆ ulties an be surmounted by maintaining an array of n-bit
words Ajk for 0  j  M and 0  k  N 0 = dN=(n 2)e. If j 6= M and k 6= 0, word Ajk
should ontain the pixels of row j and olumns (k 1)(n 2) through k(n 2) + 1,
in lusive; the other words AMk and Aj0 provide auxiliary bu er spa e. (Noti e that
some bits of the raster appear twi e.)
169. [22 ℄ Continuing the previous two exer ises, what happens to the Cheshire at of
Fig. 17(a) when it is subje ted to the vi issitudes of Life, in a 26  31 torus?
x 170. [21 ℄ What result does the Guo{Hall thinning automaton produ e when given a
solid bla k re tangle of M rows and N olumns? How long does it take?
171. [24 ℄ Find a Boolean hain of length  25 to evaluate the lo al thinning fun tion
g(xNW ; xN ; xNE ; xW ; xE ; xSW ; xS ; xSE ) of (159), with or without the extra ases in (160).
172. [M29 ℄ Prove or disprove: If a pattern ontains three bla k pixels that are king-
neighbors of ea h other, the Guo{Hall pro edure extended by (160) will redu e it,
unless none of those pixels an be removed without destroying the onne tivity.
x 173. [M30 ℄ Raster images often need to be leaned up if they ontain noisy data. For
example, a idental spe ks of bla k or white may well spoil the results when a thinning
algorithm is used for opti al hara ter re ognition.
Say that a bitmap X is losed if every white pixel is part of a 2  2 square of
white pixels, and open if every bla k pixel is part of a 2  2 square of bla k pixels. Let
X D = & f Y j Y  X and Y is losedg; X L = f Y j Y  X and Y is openg:
A bitmap is alled lean if it equals X DL for some X . We might, for example, have
X = ; XD = ; X DL = :
In general X D is \darker" than X , while X L is \lighter": X D  X  X L .
a) Prove that (X DL )DL = X DL . Hint: X  Y implies X D  Y D and X L  Y L .
b) Show that X D an be omputed with one step of a 3  3 ellular automaton.
66 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
174. [M46 ℄ (M. Minsky and S. Papert.) Is there a three-dimensional shrinking algo- Minsky
rithm that preserves onne tivity, analogous to (161)? Papert
shrinking
175. [15 ℄ How many rookwise onne ted bla k omponents does the Cheshire at have? Cheshire at
zipper fun tion
176. [M24 ℄ Let G be the graph whose verti es are the bla k pixels of a given bitmap X , kingwise onne ted
with u v when u and v are a king move apart. Let G0 be the orresponding graph rookwise onne ted
after the shrinking transformation (161) has been applied. The purpose of this exer ise surroundedness tree
hyperbola
is to show that the number of onne ted omponents of G0 is the number of omponents oni
of G minus the number of isolated verti es of G. monotoni parts
three-register method
Let N(i;j) = f(i; j ); (i 1; j ); (i 1; j +1); (i; j +1)g be pixel (i; j ) together with its Rote
north and/or east neighbors. For ea h v 2 G let S (v) = fv0 2 G0 j v0 2 Nv g. straight line
a) Prove that S (v) is empty if and only if v is isolated in G. Bezier splines+
squines
b) If u v in0 G, u0 0 2 S (0u),0 and v0 2 S (v),0 prove that u0 0 0  v0 in G0 . parabola
) For ea h v 2 G let S (v ) = fv 2 G j v 2 Nv g. Is S (v ) always nonempty?
d) If u0 v0 in G0 , u 2 S 0 (u0 ), and v 2 S 0 (v0 ), prove that u  v in G.
e) Hen e there's a one-to-one orresponden e between the nontrivial omponents
of G and the omponents of G0 .
177. [M22 ℄ Continuing exer ise 176, prove an analogous result for the white pixels.
178. [20 ℄ If X is an M  N bitmap,
let X  be the M  (2N + 1) bitmap
X z (X j (X  1)). Show that the
kingwise onne ted omponents of
X  are also rookwise onne ted, and
that bitmap X  has the same \sur-
roundedness tree" (162) as X .
x 179. [34 ℄ Design an algorithm that onstru ts the surroundedness tree of a given
M  N bitmap, s anning the image one row at a time as dis ussed in the text. (See
(162) and (163).)
x 180. [M24 ℄ Digitize the hyperbola y2 = x2 + 13 by hand, for 0 < y  7.
181. [HM20 ℄ Explain how to subdivide a general oni (168) with rational oeÆ ients
into monotoni parts so that Algorithm T applies.
182. [M31 ℄ Why does the three-register method (Algorithm T) digitize orre tly?
x 183. [M29 ℄ (G. Rote.) Explain why Algorithm T might fail if ondition (v) is false.
x 184. [M22 ℄ Find a quadrati form Q0 (x; y) so that, when Algorithm T is applied to
(x0 ; y0 ), (x; y), and Q0 , it produ es exa tly the same edges as it does from (x; y), (x0 ; y0 ),
and Q, but in the reverse order.
x 185. [22 ℄ Design an algorithm that properly digitizes a straight line from (; ) to
(0 ; 0 ), when , , 0 , and 0 are rational numbers, by simplifying Algorithm T.
186. [HM22 ℄ Given three omplex numbers (z0 ;z1 ;z2 ), onsider the urve tra ed out by

B (t) = (1 t)2 z0 + 2(1 t)tz1 + t2 z2 ; for 0  t  1.


a) What is the approximate behavior of B(t) when t is near 0 or 1?
b) Let S (z0 ; z1 ; z2 ) = fB(t) j 0  t  1g. Prove that all points of S (z0 ; z1 ; z2 ) lie
on or inside the triangle whose verti es are z0 , z1 , and z2 .
) True or false? S (w + z0 ; w + z1 ; w + z2 ) = w + S (z0 ; z1 ; z2 ).
d) Prove that S (z0 ; z1 ; z2 ) is part of a straight line if and only if z0 , z1 , and z2 are
ollinear; otherwise it is part of a parabola.
7.1.3 BITWISE TRICKS AND TECHNIQUES 67
e) Prove that if 0    1, we have the re urren e transpose
rotate
S (z0 ; z1 ; z2 ) = S (z0 ; (1 )z0 + z1 ; B ()) [ S (B (); (1 )z1 + z2 ; z2 ): pixels
shades of gray
187. [M29 ℄ Continuing exer ise 186, show how to digitize S (z0 ; z1 ; z2 ) using the three- gray levels
register method (Algorithm T). For best results, the digitizations of S (z2 ; z1 ; z0 ) and bla k
white
S (z0 ; z1 ; z2 ) should produ e the same edges, but in reverse order. parity pattern+
188. [25 ℄ Given a 64  64 bitmap, what's a good way (a) to transpose it, or (b) to
perfe t parity pattern+
0{1 matrix, see also bitmap
rotate it by 90Æ, using operations on 64-bit numbers? Boolean matrix, see also bitmap
wraparound parity pattern
x 189. [25 ℄ Bitmap images an often be viewed onveniently using pixels that are shades re urren e
of gray instead of just bla k or white. Su h gray levels typi ally are 8-bit values that Fibona i polynomials+
range from 0 (bla k) to 255 (white); noti e that the bla k/white onvention is tradition- polynomial remainder mod 2
ally reversed with respe t to the 1-bit ase. An m  n bitmap whose resolution is 600
dots per in h orresponds ni ely to the (m=8)  (n=8) grays ale image with 75 pixels
per in h that is obtained by mapping ea h 8  8 subarray of 1-bit pixels into the gray
level b255(1 k=64)1= + 21 , where = 1:3 and k is the number of 1s in the subarray.
Write an MMIX routine that onverts a given m  n array BITMAP into the orre-
sponding (m=8)  (n=8) image GRAYMAP, assuming that m = 8m0 and n = 64n0 .
190. [23 ℄ A parity pattern of length m and width n is an m  n matrix of 0 s and 1s
with the property that ea h element is the sum of its neighbors, mod 2. For example,
0 01 1 10 0 0 111 0
11 0 10 0 0 10 10 11 0 1 010 1
0 0 ; 1 10 1 ; 1 10 11 ; 10 1 ; and 1 101 1
11 0 10 1 0 10 10 01 1 1 010 1
00 1 0 111 0
are parity patterns of sizes 3  2, 4  4, 3  5, 5  3, and 5  5.
a) If the binary ve tors 1 , 2 , : : : , m are the rows of a parity pattern, show that
2 , : : : , m an all be omputed from the top row 1 by using bitwise operations.
Thus at most one m  n parity pattern an begin with any given bit ve tor.
b) True or false: The sum (mod 2) of two m  n parity patterns is a parity pattern.
) A parity pattern is alled perfe t if it ontains no all-zero row or olumn. For
example, three of the matri es above are perfe t, but the 3  2 and 3  5 examples
are not. Show that every m  n parity pattern ontains a perfe t parity pattern
as a submatrix.
0
Furthermore, all su h submatri es
0
have the same size, m0  n0 ,
where m + 1 is a divisor of m + 1 and n + 1 is a divisor n + 1.
d) There's a perfe t parity pattern whose rst row is 0011, but there is no su h
pattern beginning with 01010. Is there a simple way to de ide whether a given
binary ve tor is the top row of a perfe t parity pattern? n 1
z }| {
e) Prove that there's a unique perfe t parity pattern that begins with 1 0 : : : 0.
191. [M30 ℄ A wraparound parity pattern is analogous to the parity patterns of exer-
ise 190, ex ept that the leftmost and rightmost elements of ea h row are also neighbors.
a) Find a simple relation between the parity pattern of width n that begins with
and the wraparound parity pattern of width 2n + 2 that begins with 0 0 R.
b) The Fibona i polynomials Fj (x) are de ned by the re urren e
F0 (x) = 0; F1 (x) = 1; and Fj+1 (x) = xFj (x) + Fj 1 (x) for j  1.
Show that there's a simple relation between the wraparound parity patternsN that
begin with 10 : : : 0 (N1 1 zeros) and the Fibona i polynomials modulo x N + 1.
Hint: Consider Fj (x + 1 + x), and do arithmeti mod 2 as well as mod x + 1.
68 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
) If is the binary string a1 : : : an , let f (x) = a1 x +    + an xn . Show that fra tals
0{1 matrix
f( j 0 Rj ) (x) = (f (x) + f (x 1 ))Fj (x 1 +1+x) mod (xN + 1) and mod 2; rank
triangularization
null spa es
when N = 2n +2 and j is row j of a width-n parity pattern that begins with . Thompson
d) Consequently we an ompute j from in only O(n2 log j ) steps. Hints: See ex- multibyte en oding+
lexi ographi order
er ise 4.6.3{26; and use the identity Fm+n (x) = Fm (x) Fn+1(x) + Fm 1 (x) Fn(x),
whi h generalizes Eq. 1.2.8{(6).
192. [HM38 ℄ The shortest parity pattern that begins with a given string an be quite
long; for example, it turns out that the perfe t pattern of width 120 whose rst row is
10 : : : 0 has length 36,028,797,018,963,966(!). The purpose of this exer ise is to onsider
how to al ulate the interesting fun tion
(q) = 1+maxf m j there exists a perfe t parity pattern of length m and width q 1 g;
whose initial values (1; 3; 4; 6; 5; 24; 9; 12; 28) for 1  q  9 are easy to ompute by hand.
a) Chara terize (q) algebrai ally, using the Fibona i polynomials of exer ise 191.
b) Explain how to al ulate (q) if we know a number M su h that (q) divides M ,
and if we also know the prime fa tors of M .
) Prove that (2e ) = 3  2e 1 when e > 0. Hint: F2e(y) has a simple form, mod 2.
d) Prove that when q is odd and not a multiple of 3, (q) is a divisor of 22e 1,
where e is the order of 2 modulo q. Hint: F2e 1 (y) has a simple form, mod 2.
e) What happens when q is an odd multiple of 3?
f) Finally, explain how to handle the ase when q is even.
x 193. [M21 ℄ If a perfe t m  n parity pattern exists,
when m and n are odd, show that there's also a perfe t
(2m+1)(2n+1) parity pattern. (Intri ate fra tals arise
when this observation is applied repeatedly; for example,
the 5  5 pattern in exer ise 190 leads to Fig. 20.)
194. [24 ℄ Find all n  383 for whi h there exists a
perfe t n  n parity pattern with 8-fold symmetry, su h
as the example in Fig. 20. Hint: The diagonal elements Fig. 20. A perfe t
of all su h patterns must be zero. 383  383 parity pattern.
x 195. [HM25 ℄ Let A be a binary matrix having rows
1 , : : : , m of length n. Explain how to use bitwise operations to ompute the
rank m r of A over the binary eld f0; 1g, and to nd linearly independent binary
ve tors 1 , : : : , r of length m su h that j A = 0 : : : 0 for 1  j  r. Hint: See the
\triangularization" algorithm for null spa es, Algorithm 4.6.2N.
196. [21 ℄ (K. Thompson, 1992.) Integers in the range 0  x < 2 an be en oded as
31

a string of up to six bytes (x) = 1 : : : l in the following way: If x <8 278, setl l 1 and
1 x. Otherwise let x = (x5 : : : x1 x0 )64 ; set l d(x)=5e, 1 2 2 + xl 1 , and
j = 27 + xl j for 2  j  l. Noti e that (x) ontains a zero byte if and only if x = 0.
a) What are the en odings of # a , # 3a3 , # 7b97 , and # 1D141 ?
b) If x  x0 , prove that (x)  (x(1)0 ) in(2)lexi ographi order.
) Suppose a sequen e of values x x : : : x(n) has been en oded as a byte string
(x(1) ) (x(2) ) : : : (x(n) ), and let k be the kth byte in that string. Show that
it's easy to determine the value x(i) from whi h k ame, by looking at a few of
the neighboring bytes if ne essary.
7.1.3 BITWISE TRICKS AND TECHNIQUES 69
197. [22 ℄ The Universal Chara ter Set (UCS), also known as Uni ode, is a standard Universal Chara ter Set
mapping of hara ters to integer odepoints x in the range 0  x < 220 + 216 . An UCS
Uni ode
en oding alled UTF-16 represents su h integers as one or two wydes (x) = 1 or UTF-16: 16-bit UCS Transformation Forma
(x) = 1 2 , in the following way: If x < 216 then (x) = x; otherwise UTF-8
ASCII
1 = # d800 + by=210 and 2 = # d 00 + (y mod 210 ), where y = x 216 . pa king
fra tional pre ision
Answer questions (a), (b), and ( ) of exer ise 196 for this en oding. table lookup by shifting
wyde: a 16-bit quantity
x 198. [21 ℄ Uni ode hara ters are often represented as strings of bytes using a s heme byte: an 8-bit quantity
alled UTF-8, whi h is the en oding of exer ise 196 restri ted to integers in the range nybble: a 4-bit quantity
nyp: a 2-bit quantity
0  x < 220 +216 . Noti e that UTF-8 eÆ iently preserves the standard ASCII hara ter tetrabyte or tetra: a 32-bit quantity
set (the odepoints with x < 27 ), and that it is quite di erent from UTF-16. o tabyte or o ta: a 64-bit quantity
Let 1 be the rst byte of a UTF-8 string (x). Show that there are reasonably se urity
bran hless
small integer onstants a, b, and su h that only four bitwise operations MOR+
MXOR+
(a  (( 1  b) & )) & 3 hexade imal digits
masks
suÆ e to determine the number l 1 of bytes between 1 and the end of (x). ASCII
hexade imal digit
x 199. [23 ℄ A person might try to en ode # a as # 08a or # e0808a or # f080808a in perfe t shue
UTF-8, be ause the obvious de oding algorithm produ es the same result in ea h ase. outshue
But su h unne essarily long forms are illegal, be ause#they ould lead to se urity holes. inshue
3-way perfe t shue
Suppose 1 and 2 are bytes su h that 1  80 and # 80  2 < # 0 . Find triple zip
a bran hless way to de ide whether 1 and 2 are the rst two bytes of at least one transpose
Boolean matrix
legitimate UTF-8 string (x). suÆx parity
200. [20 ℄ Interpret the ontents of register $3 after the following three MMIX instru -
tions have been exe uted: MOR $1,$0,#94; MXOR $2,$0,#94; SUBU $3,$2,$1.
201. [20 ℄ Suppose x = (x15 : : : x1 x0 )16 has sixteen hexade imal digits. What one MMIX
instru tion will hange ea h nonzero digit to f, while leaving zeros untou hed?
202. [20 ℄ What two instru tions will hange an o tabyte's nonzero wydes to ffff ?
#
203. [22 ℄ Suppose we want to onvert a tetrabyte x = (x7 : : : x1 x0 )16 to the o tabyte
y = (y7 : : : y1 y0 )256 , where yj is the ASCII ode for the hexade imal digit xj . For
example, if x = # 1234ab d , y should represent the 8- hara ter string "1234ab d".
What lever hoi es of ve onstants a, b, , d, and e will make the following MMIX
instru tions do the job?
MOR t,x,a; SLU s,t,4; XOR t,s,t; AND t,t,b;
ADD t,t, ; MOR s,d,t; ADD t,t,e; ADD y,t,s .
x 204. [22 ℄ What are the amazing onstants p, q, r, m that a hieve a perfe t shue
with just six MMIX ommands? (See (175){(178).)
x 205. [22 ℄ How would you perfe tly unshue on MMIX, going from w in (175) ba k to z?
206. [20 ℄ The perfe t shue (175) is sometimes alled an \outshue," by omparison
with the \inshue" that takes z 7! y z x = (y31x31 : : : y1 x1 y0 x0 )2 ; the outshue
preserves the leftmost and rightmost bits of z, but the inshue has no xed points.
Can an inshue be performed as eÆ iently as an outshue?
207. [22 ℄ Use MOR to perform a 3-way perfe t shue or \triple zip," taking (x63 : : : x0 )2
to (x21 x42 x63 x20 : : : x2 x23 x44 x1 x22 x43 x0 )2, as well as the inverse of this shue.
x 208. [23 ℄ What's a fast way for MMIX to transpose an 8  8 Boolean matrix?
x 209. [21 ℄ Is the suÆx parity operation x of exer ise 36 easy to ompute with MXOR?
70 COMBINATORIAL ALGORITHMS (F1A) 7.1.3
210. [22 ℄ A puzzle: Register x ontains a number 8j + k, where 0  j; k < 8. Registers truth table
a and b ontain arbitrary o tabytes (a7 : : : a1 a0 )256 and (b7 : : : b1 b0 )256 . Find a sequen e monotone Boolean fun tion
polynomial multipli ation
of four MMIX instru tions that will put aj & bk into register x. MXOR
x 211. [M25 ℄ The truth table of a Boolean fun tion f (x1 ; : : : ; x6 ) is essentially a 64-bit CRC
Gosper
number bran hless
f = (f (0; 0; 0; 0; 0; 0) : : : f (1; 1; 1; 1; 1; 0)f (1; 1; 1; 1; 1; 1))2 : inverse
0
matrix X of s and 1s
Show that two MOR instru tions will onvert f to the truth table of f^, the least monotone Divisibility by 3
Boolean fun tion that is greater than or equal to f at ea h point.
212. [M32 ℄ Suppose a = (a63 : : : a1 a0 )2 represents the polynomial

a(x) = (a63 : : : a1 a0 )x = a63 x63 +    + a1 x + a0 :


Dis uss using MXOR to ompute the produ t (x) = a(x) b(x), modulo x64 and mod 2.
x 213. [HM26 ℄ Implement the CRC pro edure (183) on MMIX.
x 214. [HM28 ℄ (R. W. Gosper.) Find a short, bran hless MMIX omputation that om-
putes the inverse of any given 8  8 matrix X of 0s and 1s, modulo 2, if det X is odd.
x 215. [21 ℄ What's a qui k way for MMIX to test if a 64-bit number is a multiple of 3?
7.1.3 ANSWERS TO EXERCISES 71
SECTION 7.1.3 inter hange
1. These operations inter hange the bits of x and y in positions where m is 1. (In
swap
Warren
parti ular, if m = 1, the step `y y  (x & m)' be omes just `y y  x', and the unsigned 2-adi integers
three assignments will swap x $ y without needing an auxiliary register. H. S. Warren, magi
HAKMEM
Jr., has lo ated this tri k in vintage-1961 IBM programming ourse notes.) Agrawal
2. All three hold when x and y are nonnegative, or if we regard x and y as \unsigned
two's omplement
binary basis
2-adi integers" in whi h 0 < 1 < 2 <    < 3 < 2 < 1. But if negative integers Sprague
are less than nonnegative integers, (i) fails if and only if x < 0 and y < 0; (ii) and (iii) Grundy
fail if and only if x  y < 0, namely, if and only if x < 0 and y  0 or x  0 and y < 0. Bouton
ommutative laws
3. Note that x y = (x  y) 2(x & y) (see exer ise 93). By removing bits ommon
to x and y atnthe1 left, we may assumeMthat xn 1 = 1 and yn 1 = 0. Then 2(x & y) 
2((x  y) 2 ) = (x  y) (x  y) 1.
4. x
CN = x + 1 = xS , by (16). Hen e xNC = xNCSP = xNCCNP = xNNP = xP .
5. (a) Disproof: Let x = ( : : : x2 x1 x0 )2 . Then digit l of x  k is x l k [ l  k ℄. So digit
l of the left-hand side is x l k j [ l  k ℄[ l k  j ℄, while digit l of the right-hand side is
x l j k [ l  j + k ℄. These expressions agree if j  0 or k  0. But if j < 0 < k, they
di er when l = max(0; j + k) and xl j k = 1.
(We do, however, have (x  j )  k  x  (j + k) in all ases.)
(b) Proof: Digit l in all three formulas is xl+j [ l  j ℄ ^ yl k [ l  k ℄.
6. Sin e x  y  0 if and only if x  0, we must have x  0 if and only if y  0.
Obviously x = y is always a solution. The solutions with x > y are (a) x = 1 and
y = 2, or 2y > x > y > 0; (b) x = 2 and y = 1, or 2 x  y > x > 0.
7. Set x
0 (x + 0 )  0 , where 0 is the onstant in (47). Then x0 = ( : : : x02 x01 x00 )2 ,
sin e (x  0 ) 0 = ( : : : x03 x02 x01 x00 )2 ( : : : 1010)2 = ( : : : 0x02 0x00 )2 ( : : : x03 0x01 0)2 = x.
0
[This is Ha k 128 in HAKMEM; see answer 20 below. An alternative formula,
x0 (0 x)  0 , has also been suggested by D. P. Agrawal, IEEE Trans. C-29 (1980),
1032{1035. The results are orre t modulo 2n for all n, but over ow or under ow an
o ur. For example, two's omplement binary numbers in an n-bit register range from
2 n 1 to 2n 1 1, in lusive, but negabinary numbers range from 2 (2n 1) to
3
3 (2
1 n
1) when n is even. In general the formula x0 (x + )n  onverts from
binary notation to the general number system with binary basis h2 ( 1)mn i dis ussed
in exer ise 4.1{30( ), when  = ( : : : m2 m1 m0 )2 .℄
8. First, x  y 2 = (S  y) [ (x  T ). Se ond, suppose that 0  k < x  y, and let x  y =
( 1 0 )2 , k = ( 0 00 )2 , where , 0 , and 00 are strings of 0s and 1s with j 0 j = j 00 j.
Assume by symmetry that x = ( 1 0 )2 and y = ( 0 0 )2 , where j j = j j = j j. Then
k  y = ( 0 00 )2 is less than x. Hen e k  y 2 S , and k = (k  y)  y 2 S  y. [See R. P.
Sprague, T^
ohoku Math. J. 41 (1936), 438{444; P. M. Grundy, Eureka 2 (1939), 6{8.℄
9. The Sprague{Grundy theorem in the previous exer ise shows that two piles of x
and y sti ks are equivalent in play to a single pile of x  y sti ks. (There is a nonnegative
integer k < x  y if and only if there either is a nonnegative i < x with i  y < x  y or
a nonnegative j < y with x  j < x  y.) So the k piles are equivalent to a single pile
of size a1      ak . [See C. L. Bouton, Annals of Math. (2) 3 (1901{1902), 35{39.℄
10. For larity and brevity we shall write simply xy for x
y and x + y for x  y , in
parts (i) through (iv) of this answer only.
(i) Clearly 00y = 0 and x0 + y = y + x and xy = yx. Also 1y = y, by indu tion on y.
(ii) If x 6= x0 and0 y 6=0 y0 then xy + xy00 + x0 y + x00y0 6= 0, be ause the de nition of
xy says that xy + x y + x y < xy when x < x and y < y. In parti ular, if x 6= 0 and
72 ANSWERS TO EXERCISES 7.1.3
y 6= 0 then xy 6= 0. Another onsequen e is that, if x = mex(S ) and y = mex(T ) for an ellation law
arbitrary nite sets S and T , we have xy = mexfxj + iy + ij j i 2 S; j 2 T g. distributive law
an ellation law
(iii) Consequently, by indu tion on the (ordinary) sum of x, y, and z, (x + y)z is distributive law
re ursively
mexf(x + y)z0 + (x0 + y)z + (x0 + y)z0 ; (x + y)z0 + (x + y0 )z + (x + y0 )z0
j 0  x0 < x; 0  y0 < y; 0  z0 < zg;
whi h is mexfxz0 + x0 z + x0 z0 + yz; xz + yz0 + y0 z + y0 z0 g = xz + yz. In parti ular,
there's a an ellation law: If xz = yz then (x + y)z =0 0, so x0 = y0 or z =0 0.0
(iv) By a similar indu tion, (xy)z = mexf(xy)z + (xy + x y + x y )(z + z0 )g =
mexf(xy)z0 0 + (xy0 0 )z 0 + (xy0 0 )0 z0 + 0   g = mexfx(yz0 ) + x(y0 z) + x(y0 z0 ) +    g =
mexf(x + x )(yz + y z + yn z ) + x (yz)g = x(yz).
n
(v) nIf 0 3 x;n y < 22 we shall prove that x
y < 22n , 22n
y = 22n y, anda
2
22 = 2 22 . By the distributive law (iii) it suÆ es to onsider the ase x = 2
2
and y0 = 2bq for 0  a; b <p 2n . Let a = 2p + a0 and b = 2q + b0 , where 0  a0 < 2p and
0  b < 2 ; then x = 2
2 and y = 22q
2b0 , by indu tion onn n1.
2 a 0
If p < n0 1 andq q < n 1 we've already proved that x
y < 22 . If p < q = n 1,
then x
2 b < 2 2 , hen e x
y < 2 2n . And if p = q = n 1, we have x
y =
22p
22p
2a0
2b0 = ( 32 22p )
z, where z < 22p . Thus x
y < 22nn in all ases.
By the an ellation law, the nonnegative integers less than 22 form a sub eld.
Hen e in the formula
22n
y = mexf22n y0  x0 (y  y0 ) j 0  x0 < 22n ; 0  y0 < yg
we ann hoose x0 for ea h y0 to ex lude all numbers between 22n y0 and 22n (y0 + 1) 1;
but 22 y is never nex luded.
0
Finally
0
in 22
22n = mexf22n (x0  y0 )  (x0
y0 ) jn0  x0 ; y0 < 22n g, hoosing
x = y will ex lude all numbers up to and in luding 2 2 1, sin e x
x = y
y
implies nthat 3(x n y)
(x  y) = 0, hen e x = y. Choosing x0 = y0  1 ex ludes numbers
from 22 to 2 22 1, sin e (x
x)  x = (y
y)  y implies that x = y or x = y  1, and
sin e the most signi ant bit of x
x is the same as that of x. This same observation
shows that 32 22n is not ex luded. QED.
Consider, for example, the sub eld f0; 1; : : : ; 15g. By the distributive law we an
redu e x
y to a sum of x
1, x
2, x
4, and/or x
8. We have 2
2 = 3, 2
4 = 8,
4
4 = 6; and multipli ation by 8 an be done by multiplying rst by 2 and then by 4
or vi e versa, be ause 8 = 2
4. Thus 2
8 = 12, 4
8 = 11, 8
8 = 13.
In general, for n > 0, let n = 2m + r nwhere 0  r < 2m . There is a 2m+1  2m+1
matrix Qn su h that multipli ation by 2 is equivalent to applying Qn to blo ks of
2m+1 bits and working mod 2. For example, Q1 = 11 10, and ( : : : x4 x3 x2 x1 x0 )2
21 =
( : : : y4 y3 y2 y1y0 )2 , where y0 = x1 , y1 = x1  x0 , y2 = x3 , y3 = x3  x2 , y4 = x5 , et .
The matri es are formed re ursively as follows: Let Q0 = R0 = (1) and

I R
 Qr
. 0!  2 
Q 2 m +r = I 0 m . . ; Rm+1 = RRmm R0m = Q2m+1 1 ;
0 Qr
where Qr is repli ated enough times to make 2m+1 rows and olumns. For example,
0
1 0 1 11  
0
1 1 0 11
B0 1 1 0C
Q3 = Q2 Q01 Q0 = B
B1 0 1 1C
Q2 = B 1 0 0 0A;
C
 1 1 0 0 A = R2 :
C
1
0100 1000
7.1.3 ANSWERS TO EXERCISES 73
If register x holds any 64-bit number, and if 1  j  7, the MMIX instru tion MXOR y,qj,x MMIX
will ompute y = x
2j , given the hexade imal matrix onstants MXOR
Conway
q1 = 08030200 080302 ; ordinal numbers
q4 = 8d4b2 1880402010 ; q6 = b9678d4bb0608040 ;
q2 = b06080400b060804 ; Lenstra
q5 = 68d342 0803020 ; q7 = deb9 68dd0b0 080 : nim multipli ation
q3 = d0b0 0800d0b0 08 ; Berlekamp
Conway
[For further information, see J. H. Conway, On Numbers and Games (1976), Chapter 6, Guy
where it is shown that these de nitions a tually yield an algebrai ally losed eld over parity
Gray binary ode
the ordinal numbers.℄
11. Let m = 2 s +    + 2 1 with as >    > a1  0 and n = 2 t +    + 2 1 with
a a b b
bt >    > b1  0. Then m
n = mn if and only if (as j    j a1 ) & (bt j    j b1 ) = 0.
2n 2n 0
12. If x = 2 a + b where 0  a; b < 2 , let x = x
(x  a). Then
n n n n
x0 = ((2 2
a)  b)
((2 2
a)  a  b) = (2 2 1
a
a)  (b
(a  b)) < 2 2 :
To nim-divide by x we an therefore nim-divide by x0 and multiply by x  a. [This algo-
rithm is due to H. W. Lenstra, Jr.; see S
eminaire de Th
eorie des Nombres (Universite
de Bordeaux, 1977{1978), expose 11, exer ise 5.℄
13. If a2    ak = a1  a3    ((k 2)
ak ) = 0, every move breaks this ondition;
we an't have (a
x)  (b
y) = (a
x0 )  (b
y0 ) when a 6= b unless (x; y) = (x0; y0 ).
Conversely, if a2      ak 6= 0 we an redu e some aj with j  2 to make this
sum zero; then a1 an be set to a3      ((k 2)
ak ). If a2      ak = 0 and
a1 6= a3    ((k 2)
ak ), we simply redu e a1 if it is too large. Otherwise there's a
j  3 su h that equality will o ur if (j 2)
aj is repla ed by an appropriate smaller
value ((j 2)
a0j )  ((i 2)
(aj  a0j )), for some 2  i < j and 0  a0j < aj , be ause
of the de nition of nim multipli ation; hen e both of the desired equalities are a hieved
by setting aj a0j and ai ai  aj  a0j . [This game was introdu ed in Winning Ways

by Berlekamp, Conway, and Guy, at the end of Chapter 14.℄


14. (a) Ea h y = ( : : : y2 y1 y0 )2 = x determines x = ( : : : x2 x1 x0 )2 uniquely, sin e
T
x0 = y0  t and by=2 = bx=2 Tx0 .
(b) When k > 0, it is a bran hing fun tion with labels t a = a for j j = k 1,
and t = 0 for j j < k. But when k  0, the mapping is not a permutation; in fa t, it
sends in nitely many 2-adi integers into 0.
[The ase k = 1 is parti ularly interesting: Then xT takes nonnegative integers
into nonnegative integers of even parity, negative integers into nonnegative integers of
odd parity, and 1=3 7! 1. Furthermore bxT=2 is \Gray binary ode," 7.2.1.1{(9).℄
( ) If (x  y) = k we have T (x)  T (y) and x  y + 2k (modulo 2k+1 ). Hen e
(xT  yT ) = (x  y  T (x)  T (y)) = k. Conversely, if (xT  yT ) = k whenever y =
x +2k , we obtain a suitable bit labeling by letting t =(xT j j) mod 2 when x =( R)2 .
(d) This statement follows immediately from (a) and ( ). For if we always have
(x  y) = (xU  yU ) = (xV  yV ), then (x  y) = (xU  yU ) = (xUV  yUV ). And
if xTU = x for all x, (xU  yU ) = (x  y) is equivalent to (x  y) = (xT  yT ).
We an also onstru t the labelings expli itly: If W = UV , note that when a; b; 2
f0; 1g we have Wa = Ua Va0 , Wab = Uab Va0 b0 , and Wab = Uab Va0 b0 0 , where a0 = a  u,
b0 = b  ua , 0 =  uab , and so on; hen e w = u  v, wa = ua  va0 , wab = uab  va0 b0 ,
et . The labeling T inverse to U is obtained by swapping left and right subtrees of all
nodes labeled 1; thus t = u, ta = ua0 , tab = ua0 b0 , et .
(e) The expli it onstru tions in (d) demonstrate that the balan e ondition is
preserved by ompositions and inverses, be ause f00 ; 10 g = f0; 1g at ea h level.
74 ANSWERS TO EXERCISES 7.1.3
Notes: Hendrik Lenstra observes that bran hing fun tions an pro tably be viewed Lenstra
as the isometries (distan e-preserving permutations) of the 2-adi integers, when we isometries
distan e
use the formula 1=2(xy) to de ne the d\distan e" between 2-adi integers x and y. 2-adi integers as a metri spa e
Moreover, the bran hing fun tions mod d2 1turn out to be the Sylow 2-subgroup of the 2
Sylow -subgroup
group of all permutations of f0; 1; : : : ; 2 g, namely the unique (up to isomorphism) symmetri group
omplete binary tree
subgroup that has maximum even order among all subgroups of that group. They also in nite ex lusive or
are equivalent to the automorphisms of the omplete binary tree with 2d leaves. Conway
Welter
15. Equivalently, (x +2a)  b = (x  b)+2a; so we might as well nd all b and su h that Slanina
(x  b)+ = (x + )  b. By (89), the latter is equivalent to (x  b  )+2((x  b)& ) = quanti ations
se ond-order monadi logi with one su ess
(x   b) + 2(x & ), so the ondition b & = 0 is ne essary and suÆ ient. Thus the S1S
ondition ((a  1) & b) = 0 is ne essary and suÆ ient for the original problem.
16. (a) If (x  y ) = k we have x  y + 2 (modulo 2
k k+1 ); hen e x + a  y + a + 2k
and ((x + a)  (y + a)) = k. And ((x  b)  (y  b)) is obviously k.
(b) The hinted labeling, all it P ( ), has 1s on the path orresponding to , and
0s elsewhere; thus it is balan ed. The general animating fun tion an be written
a a a
xP ( 0 ) 1 P ( 1 ) 2 :::P ( m 1 ) m  m ; where j = b1      bj ;
so it is balan ed if and only if m = 0.
[In identally, the set S = fP (0)g[f P (k)  P (k +2e ) j k  0 and 2e > k g provides
an interestingLbasis for all possible balan ed labelings: A labeling is balan ed if and
only if it is f q j q 2 Q g for some Q  S . This ex lusive or is well de ned even
though Q might be in nite, be ause only nitely many 1s appear at ea h node.℄
( ) The fun tion P ( ) in (b) has this form, be ause xP ( ) = x  bx  e. Its
inverse, x = ((x  ) + 1)  , is x  bx  e = xP ( ). Furthermore we have
S ( )

xP ( )P (d) = xP ( )bxP ( )de = xbx ebxdS( )e, be ause bxye = bxT yT e for any
bran hing fun tion xT . Similarly xP ( )P (d)P (e) = xbx ebxdS( )ebxeS(d)S( )e,
et . After dis arding equal terms we obtain the desired form. The resulting numbers
pj are unique be ause they are the only values of x at whi h the fun tion hanges sign.
0 0 0
0
(d) We0
have,0 for example,
0
x  bx  ae  bx  be  bx  e = xP (a )P (b )P ( ) where
0 0
a = a, b = b P ( a ), and = bP ( a ) P ( b ) .
[The theory of animating fun tions was developed by J. H. Conway in Chapter 13
of his book On Numbers and Games (1976), inspired by previous work of C. P. Welter
in Indagationes Math. 14 (1952), 304{314; 16 (1954), 194{200.℄
17. (Solution by M. Slanina.) Su h equations are de idable even if we also allow opera-
tions su h as x & y, x, x  1, x  1, 2x, and 2x , and even if we allow Boolean ombina-
tions of statements and quanti ations over integer variables, by translating them into
formulas of se ond-order monadi logi with one su essor (S1S). Ea h 2-adi variable
x = ( : : : x2 x1 x0 )2 orresponds to an S1S set variable X, where j 2 X means xj = 1:
z = x be omes 8t(t 2 Z , t 2= X );
z = x & y be omes 8t(t 2 Z , (t 2 X ^ t 2 Y ));
z = 2 x be omes 8t(t 2 Z , (t 2 X ^ 8s(s < t ) s 2= X )));
z = x + y be omes 9 C 8t(0 2= C ^ (t 2 Z , (t2X )  (t2Y )  (t2C ))
^ (t+1 2 C , h(t2X )(t2Y )(t2C )i)):
An identity su h as x & ( x) = 2 is equivalent to the translation of
x

8X 8Y 8Z ((integer(X ) ^ 0 = x + y ^ z = x & y) ) z = 2x);


7.1.3 ANSWERS TO EXERCISES 75
where integer(X ) stands for 9 t 8s(s > t ) (s 2 X , t 2 X )). We an in lude rational magi
2-adi onstants as well; for example, z = 0 is equivalent to 0 2 Z ^ 8t(t 2 Z , Bu hi
Hamburg
t + 1 2= Z ). But of ourse we annot in lude arbitrary (un omputable) onstants. unsolvable
J. R. Bu hi proved that all formulas of S1S are de idable, in Logi , Methodology, Welter
and Philosophy of S ien e: Pro eedings (Stanford, 1960), 1{11. If we restri t attention hyperbolas
Pi kover
to equations, one an show in fa t that exponential time suÆ es. Lakhtakia
On the other hand M. Hamburg has shown that the problem would be unsolvable sorting
if x, x, or 1  x were added to the repertoire; multipli ation ould then be en oded. Hardy
Littlewood
In identally, many nontrivial identities exist, even if we use only the operations Polya
x  y and x + 1. For example, C. P. Welter noti ed in 1952 that Paley
HAKMEM
((x  (y + 1)) + 1)  (x + 1) = ((((x + 1)  y) + 1)  x) + 1: ombinations+
set repr
18. Of ourse row x is entirely blank when x is a multiple of 64. The ne details
of this image are apparently \ haoti " and omplex, but there is a fairly peasy way to
understand what happens near the points where the straight lines x = 64 j interse t
the hyperbolas xy = 211 k, for integers j; k  1 that aren't2 too large.
Indeed, when x and y are integers, p the value of x y  11 is odd if and only if
x2 y=212 mod 1  12 . Thus, if x = 64 j + Æ and xy = 211 (k + ) we have
p
x2 y mod 1 =  128 jÆ + Æ 2 y mod 1 =  2Æx Æ 2 y mod 1 = (k + )Æ Æ 2 y  mod 1;
212 4096 4096 4096
and this quantity has a known relation to 12 when, say, Æ is lose to a small integer.
[See C. A. Pi kover and A. Lakhtakia, J. Re reational Math. 21 (1989), 166{169.℄
19. (a) When n = 1, f (A; B; C ) has the same value under all arrangements ex ept
when a0 6= a1 , b0 6= b1 , and 0 6= 1 ; and then it annot ex eed 1. For larger values of n
we argue by indu tion, assuming that n = 3 in order to avoid umbersome notation. Let
P0 = (a0 ; a1 ; a2 ; a3 ), A1 = P
A (a4 ; a5 ; a6 ; a7 ),: : : , C1= ( 4 ; 5 ; 6 ; 7 ). Then f (A; B; C ) =
j kl=0 f (Aj ; Bk ; C l )  j kl=0 f (Aj ; Bk ; C l ) by indu tion. Thus we an assume
that a0  a1  a2 0 a3 , a4  a5  a6 0 a7 , : : : , 4  5  6  0 7 . We an also
sort the subve tors A0 = (a0 ; a1 ; a4 ; a5 ), A1 = (a2 ; a3 ; a6 ; a7 ), : : : , C1 = ( 2 ; 3 ; 6 ; 7 )
in00a similar way. Finally, we an sort A000 = (a0 ; a1 ; a6 ; a7 ), A001 = (a2 ; a3 ; a4 ; a5 ), : : : ,
C1 = ( 2 ; 3 ; 4 ; 5 ), be ause in ea h term aj bk l the number of subs ripts fj; k; lg with
leading bits 01, 10, and 11 must satisfy s01  s10  s11 (modulo 2). And these three
sorting operations leave A, B, C fully sorted, by exer ise 5.3.4{48. P n
(b) Suppose A = A , B = B , and C = C . Then we have aj = t2=0 1 t [ j  t ℄,
where j = aj aj+1  0 and we set a2n = 0; similar formulas hold for bk and l . Let
A(p) denote the ve tor (ap(0) ; : : : ; ap(2n 1) ) when p is a permutation of f0; 1; : : : ; 2n 1g.
Then by part (a) we have
P P
f (A(p) ; B(q) ; C(r) ) = j kl=0 t;u;v t u v [ p(j )  t ℄[ q(k)  u ℄[ r(l)  v ℄
 Pjkl=0 Pt;u;v t u v [ j  t ℄[ k  u ℄[ l  v ℄ = f (A; B; C ):
[This proof is due to Hardy, Littlewood, and Polya, Inequalities (1934), x10.3.℄
( ) The same proof te hnique extends to any number of ve tors. [R. E. A. C.
Paley, Pro . London Math. So . (2) 34 (1932), 263{279, Theorem 15.℄
20. The given steps ompute the least integer y greater than x su h that y = x.
They're useful for generating all ombinations of n obje ts, taken m at a time (that is,
all m-element subsets of an n-element set, with elements represented by 1 bits).
[This tidbit is Ha k 175 in HAKMEM, Massa husetts Institute of Te hnology
Arti ial Intelligen e Laboratory Memo No. 239 (29 February 1972).℄
76 ANSWERS TO EXERCISES 7.1.3
21. Set t y + 1, u t  y, v t & y, x v (v & v)=(u + 1). If y = 2m 1 is mone
the rst m- ombination, these eight operations set x to zero. (The fa t that x = f (y) magi mask
SADD
does not seem to yield any shorter s heme.) Arndt
22. Sideways addition avoids the division: SUBU t,x,1; ANDN u,x,t; SADD k,t,x;
Wada
NEG
ADDU v,x,u; XOR t,v,x; ADDU k,k,2; SRU t,t,k; ADDU y,v,t. But we an a tually little-endian
save a step by judi iously using the onstant mone = 1: SUBU t,x,1; XOR u,t,x;
ADDU y,x,u; SADD k,t,y; ANDN y,y,u; SLU t,mone,k; ORN y,y,t.
23. (a) (0 : : : 01 : : : 1)2 = 2
m 1 and (0101 : : : 01)2 = (22m 1)=3.
(b) This solution uses the 2-adi onstant 0 = ( : : : 010101)2 = 1=3:
t x  0 ; u (t 1)  t; v x j u; w v + 1; y w + pv & w :
u+1
If x = (22m 1)=3, the operations produ e a strange result be ause u = 22m+1 1.
( ) XOR t,x,m0; SUBU u,t,1; XOR u,t,u; OR v,x,u; SADD y,u,m0; ADDU w,v,1;
ANDN t,v,w; SRU y,t,y; ADDU y,w,y. [This exer ise was inspired by Jorg Arndt.℄
24. It's expedient to \prime the pump" by initializing the array to the state that it
should have after all multiples of 3, 5, 7, and 11 have been sieved out. We an ombine
3 with 11 and 5 with 7, as suggested by E. Wada:
LOC Data Segment
qbase GREG  ;N IS 3584 ;n GREG N ;one GREG 1
Q OCTA #816d129a64b4 b6e Q0 (little-endian)
LOC Q+N/16
qtop GREG  End of the Q table
Init OCTA #9249249249249249|#4008010020040080 Multiples of 3 or 11 in [129 : : 255℄
OCTA #8421084210842108|#0408102040810204 Multiples of 5 or 7
t IS $255 ;x33 IS $0 ;x35 IS $1 ;j IS $4
LOC #100
Main LDOU x33,Init; LDOU x35,Init+8
LDA j,qbase,8; SUB j,j,qtop Prepare to set Q1 .
1H NOR t,x33,x33; ANDN t,t,x35
STOU t,qtop,j Initialize 64 sieve bits.
SLU t,x33,2; SRU x33,x33,31; OR x33,x33,t Prepare for the next 64 values.
SLU t,x35,6; SRU x35,x35,29; OR x35,x35,t
ADD j,j,8; PBN j,1B Repeat until rea hing qtop.
Then we ast out nonprimes p , p + 2p, : : : , for p = 13, 17, : : : , until p2 > N :
2 2

p IS $0 ;pp IS $1 ;m IS $2 ;mm IS $3 ;q IS $4 ;s IS $5
LDOU q,qbase,0; LDA pp,qbase,8
SET p,13; NEG m,13*13,n; SRU q,q,6 Begin with p = 13.
1H SR m,m,1 m b p2 ( N )=2 .
2H SR mm,m,3; LDOU s,qtop,mm
AND t,m,#3f; SLU t,one,t
ANDN s,s,t; STOU s,qtop,mm Zero out a bit.
ADD m,m,p; PBN m,2B Advan e by p bits.
SRU q,q,1; PBNZ q,3F Move to next potential prime.
2H LDOU q,pp,0; INCL pp,8 Read in another bat h
OR p,p,#7f; PBNZ q,3F of potential primes.
ADD p,p,2; JMP 2B Skip past 128 nonprimes.
7.1.3 ANSWERS TO EXERCISES 77
2H SRU q,q,1 Prit hard
3H ADD p,p,2; PBEV q,2B Set p p + 2 2until p is prime. a he
segmented sieve
MUL m,p,p; SUB m,m,n; PBN m,1B Repeat until p > N . Lander
The running time, 1172 +5166, is of ourse mu h less than the time needed for steps Parkin
Bays
P1{P8 of Program 1.3.2P, namely 10037 + 641543 (improved to 10096 + 215351 Hudson
in exer ise 1.3.2{14). [See P. Prit hard, S ien e of Computer Programming 9 (1987), Ni ely
gap
17{35, for several instru tive variations. In pra ti e, a program like this one tends Knuth
to slow down dramati ally when the sieve is too big for the omputer's a he. Better little-endian
results are obtained by working with a segmented sieve, whi h ontains bits for numbers Loyd
big-endian
between N0 + kÆ and N0 + (k + 1)Æ, as suggested by L. J. Lander and T. R. Parkin, little-endian
Math. Comp. 21 (1967), 483{488; C. Bays and R. H. Hudson, BIT 17 (1977), 121{127. transposed
Here N0 an be quite large, but Æ is limited by the a he size; al ulations are done large megabytes
Woodrum
separately for k = 0, 1, : : : . Segmented sieves have be ome highly developed; see, for zip
example, T. R. Ni ely, Math. Comp. 68 (1999), 1311{1315, and the referen es ited CSZ
there. The author used su h a program in 2006 to dis over an unusually large gap of
length 1370 between 418032645936712127 and the next larger prime.℄
25. (1 + 1 + 25 + 1 + 1 + 25 + 1 + 1 = 56)mm; the worm never sees pages 2{500 of
Volume 1 or 1{499 of Volume 4. (Unless the books have been pla ed in little-endian
fashion on the bookshelf; then the answer would be 106mm.) This lassi brain-teaser
an be found in Sam Loyd's Cy lopedia (New York: 1914), pages 327 and 383.
26. We ould multiply by aa:::ab instead of dividing by 12 (see exer ise 1.3.1{17); but
#
multipli ation is slow too. Instead we an use a s heme that is neither big-endian nor
little-endian but transposed : Put item k into o tabyte 8(k mod 220 ), where it is shifted
left by 5bk=220 . Sin e k < 12000000, the amount of shift is always less than 60.
The MMIX ode to put item k into register $1 is AND $0,k,[#fffff℄; SLU $0,$0,3;
LDOU $1,base,$0; SRU $0,k,20; 4ADDU $0,$0,$0; SRU $1,$1,$0; AND $1,$1,#1f.
[This solution uses 8 large megabytes (223 bytes). Any onvenient s heme for
onverting item numbers to o tabyte addresses and shift amounts will work, as long as
the same method is used onsistently. Of ourse, just `LDBU $1,base,k' would be faster.℄
27. (a) ((x 1)  x) + x. [This exer ise is based on an idea of Luther Woodrum, who
noti ed that ((x 1) j x) + 1 = (x & x) + x.℄
(b) (y + x) j y, where y = (x 1)  x.
( ,d,e) ((y  x) + x) & y, ((y  x) + x)  y, and ((y  x) + x) & y, where y = x 1.
(f) x  (a); alternatively, y  (y+1), where y = x j (x 1). [The number (01 01a 11b)2
looks simpler, but it apparently requires ve operations: ((y + 1) & y) 1.℄
28. A 1 bit indi ates x's rightmost 0 (for example, (101011)2 7! (000100)2 ); 1 7! 0.

29. k = k+1  (k+1  2 ) [see 6 (1974), 125℄. This relation holds also for
k STOC

the onstants d;k of (48), when 0  k < d, if we start with d;d = 22d 1. (There is,
however, no easy way to go from k to k+1, unless we use the \zip" operation; see (77).)
30. Append `CSZ rho,x,64' to (50), thereby adding 1 to its exe ution time. For (51),
we simply need to make sure that rhotab [0℄ = 8.
31. In the rst pla e, his ode loops forever when x = 0. But even after that bug is
pat hed, his assumption that x is a random integer is highly questionable. In most
appli ations when we want to ompute x for a nonzero 64-bit number x, a more
reasonable assumption would be that ea h of the out omes f0; 1; : : : ; 63g is equally
likely. The average and standard deviation then be ome 31.5 and  18:5.
78 ANSWERS TO EXERCISES 7.1.3
32. `NEGU y,x; AND y,x,y; MULU y,debruijn,y; SRU y,y,58; LDB rho,de ode,y ' has perfe t hash fun tion
estimated ost  + 14, although multipli ation by a power of 2 might well be faster Prodinger
signed right shift
than a typi al multipli ation. Add 1 for the orre tion in answer 30. CSZ
33. In fa t, an exhaustive al ulation shows that exa tly 94727 suitable onstants a
Warren
smearing
yield a \perfe t hashj fun tion" for this problem, 90970 of whi h also identify the power- Warren
of-two ases y = 2 ; 90918 of those also distinguish the ase y = 0. The multiplier
# 208b2430 8 82129 is uniquely best, in the sense that it doesn't need to refer to table
Diri hlet generating fun tion
zeta fun tion
magi masks
entries above de ode [32400℄ when y is known to be a valid input. Allou he
Shallit
34. Identities (a) and (b) are obviously true, also when xy = 0. Proof of ( ): If x = y 
we have either x = y = 0 or x = 10k and y = 10k+2 k ; hen e x  y = (  )01k = 
(x 1)  (y 1). If x > y = k we have (x  y) mod 2 6= ((x 1)  (y 1)) mod 2k+2 . Knuth
fra tal
35. Let f (x) = x  3x. Clearly f (2x) = 2f (x), and f (4x + 1) = 4f (x) + 2. We also
have f (4x 1) = 4f (x) + 2, by exer ise 34( ). The hinted+ identity follows.
Given n, set u n  1, v u + n, t u  v, n v & t, and n u & t.
Clearly u = bn=2 and v =+ b3n=2 , so n+ n = v u = n. And this is Reitwiesner's
representation, be ause n j n has no onse utive 1s. [H. Prodinger, Integers 0 (2000),
paper a8, 14 pp. In identally we also have f ( x) = f (x).℄
36. (i) The ommands x x(x1), x x(x2), x x(x4), x x(x8),
x x  (x  16), x x  (x  32) hange x to x . (ii) x& = ((x + 1) & x) 1.
(See exer ises 66, 70, and 117 for appli ations of x ; see also exer ise 209.)
37. Insert `CSZ y,x,half ' after the FLOTU in (55), where half = 3fe0000000000000 ;
#
note that (55) says `SR' (not `SRU'). No hange is needed to (56), if lamtab [0℄ = 1.
38. ` SRU t,x,1; OR y,x,t; SRU t,y,2; OR y,y,t; SRU t,y,4; OR y,y,t; ...;
SRU t,y,32; OR y,y,t; SRU y,y,1; CMPU t,x,0; ADDU y,y,t' takes 15 .
39. (Solution by H. S. Warren, Jr.) Let  (x) denote the result of smearing x to the
right, as in the rst line of (57). Compute x & ((x  1) & x).
40. Suppose x = y = k . If x = y = 0, (58) ertainly holds, regardless of how we
de ne  0. Otherwise x = (1 k)2 and y = (1 )2, for some binary strings and with
j j = j j = k; and x  y < 2  x & y. On the other hand if x < y = k, we have
x  y  2k > x & y. And H. S. Warren, Jr., notes that x < y if and only if x < y & x.
P1 P1 P1
41. (a) 2k 2k ) = z=(1 2k 2k
n=1 (n)z = k=1 z =(1 zP 1 z) k=0z z =(1 + z ). The
n
Diri hletPgenerating fun tion is simpler: n=1 (n)=n =  (z )=(2
z 1).
(b) P n1=1 (n)zn = P P1
k=1 z 2k=(1 z ).
( ) n1=1 (nk )zn = k1=0k+1z2k=((1 z)(1 + z2k )) = Pk1=0 z2k k (z), where k (z) =
(1 + z +    + z2 1)=(1 z2 ). (The \magi masks" of (47) orrespond to k (2).)
[SeeAutomati Sequen es by J.-P. Allou he and J. Shallit (2003), Chapter 3, for
further information about the fun tions  and  , whi h they denote by 2 and s2 .℄
42. e1 2 1
e 1 +(e +2)2e2 1 +    +(e +2r 2)2er 1 , by indu tion on r. [D. E. Knuth,
2 r
Pro . IFIP Congress (1971), 1, 19{27. The fra tal aspe ts of this sum are illustrated
in Figs. 3.1 and 3.2 of the book by Allou he and Shallit.℄
43. The straightforward implementation of (63), `SET nu,0; SET y,x; BZ y,Done;
1H ADD nu,nu,1; SUBU t,y,1; AND y,y,t; PBNZ y,1B' osts (5 + 4x) ; it beats the
implementation of (62) when x < 4, ties when x = 4, and loses when x > 4.
But we an save 4 from the implementation of (62) if we repla e the nal
multipli ation-and-shift by `y y + (y  8), y y + (y  16), y y + (y  32),
 y & # ff '. [Of ourse, MMIX's single instru tion `SADD nu,x,0' is mu h better.℄
7.1.3 ANSWERS TO EXERCISES 79
44. Let this sum be  x. If we an solve the problem for 2 -bit numbers, we an
(2) d MMIX
solve it for 2d+1-bit numbers, be ause  (2)(2d x + x0 ) =  (2) x +  (2)x0 +2d x. Therefore sideways addition
SADD
a solution analogous to (62) suggests itself, on a 64-bit ma hine: 2ADDU
4ADDU
Set z (x  1) & 0 and y x z. 8ADDU
Set z ((z + (z  2)) & 1 ) + ((y & 1 )  1) and y (y & 1 ) + ((y  2) & 1). 16ADDU
Set z (((2)z + (z  4)) & 2 )64+ ((y & 2 )  2) and y 64(y + (y  4)) & 2 . Roki ki
randomized data stru tures
Finally  (((Az) mod 2 )  56) + ((((By) mod 2 )  65)  3), binary sear h trees
where A = (11111111)256 and B = (01234567)256. Cartesian trees
treaps
But another approa h is better on MMIX, whi h has sideways addition built in: Patent
parity
SADD nu2,x,m0 SADD t,x,m2 8ADDU nu2,t,nu2 SADD t,x,m5 SADD
NXOR
SADD t,x,m1 4ADDU nu2,t,nu2 SADD t,x,m4 SLU t,t,5 CSOD
2ADDU nu2,t,nu2 SADD t,x,m3 16ADDU nu2,t,nu2 ADD nu2,nu2,t balan ed ternary notation
P k
[In general,  (2)x = 2 ( &  ). See
k  x k Dr. Dobb's Journal 8,4 (April 1983), 24{37.℄

45. Let d = (x y) & (y x); test if d & p 6= d. [Roki ki found that this idea an be
used with node addresses to near-randomize binary sear h trees or Cartesian trees as
if they were treaps, without needing an additional random \priority key" in ea h node.
See U.S. Patent 6347318 (12 February 2002).℄
46. SADD t,x,m; NXOR y,x,m; CSOD x,t,y; the mask m is ~(1<<i|1<<j) . (In general,
these instru tions omplement the bits spe i ed by m if those bits have odd parity.)
47. y (x  Æ ) & , z (xQ& )  Æ, x (x & m) j y j z, where m =  j (  Æ ).
48. Given Æ , there are sÆ = j =0 b(n+j )=Æ +1 di erent Æ -swaps, in luding the identity
Æ 1F
Pn 1
permutation. (See exer ise 4.5.3{32.) Summing over Æ gives 1+ Æ=1 (sÆ 1) altogether.
49. (a) The set S = fa1 d1 +  +am dm j fa1 ; : : : ; am g  f 1; 0; +1gg for displa ements
Æ1 , : : : , Æm must ontain fn 1; n 3; : : : ; 1 ng, be ause the kth bit must be ex hanged
with the (n + 1 2k)thm bit1 for 1  k  n. Hen e jS j  n. And S ontains at most 3m
numbers, at most 2  3 of whi h are odd.
(b) Clearly s(mn)  s(m) + s(n), be ause we an reverse m elds of n bits ea h.
Thus s(3m)  m and s(2  3m )  m + 1. Furthermore the reversal of 3m bits uses
only Æ-swaps with even values of Æ; the orresponding (Æ=2)-swaps prove that we have
s((3m  1)=2)  m. These upper bounds mat h the lower bounds of (a) when m > 0.
( ) The string a  z! with j j = j j = jj = j j = j!j = n an be hanged to
!z  a with a (3n + 1)-swap followed by an (n + 1)-swap. Then s(n) further swaps
reverse all. Hen e s(32)  s(6) + 2 = 4, and s(64)  5. Again, equality holds by (a).
In identally, s(63) = 4 be ause s(7) = s(9) = 2. The lower bound in (a) turns out
to be the exa t value of s(n) for 1  n  22, ex ept that s(16) = 4.
50. Express n = (tm : : : t1 t0 )3 in balan ed ternary notation. Let nj = (tm : : : tj )3 and
Æj = 2nj + tj 1 , so that nj 1 Æj = nj and 2Æj nj 1 = nj + tj for 1  j  m. Let
E0 = f0g and Ej +1 = Ej [ (tj Ej ) for 0  j < m. (Thus, for example, E1 = f0; t0 g
and E2 = f0; t0 ; t1 ; t1 t0 g.)
Assume by indu tion on j that Æ-swaps for Æ = Æ1 , : : : , Æj have hanged the n-
bit word 1 : : : 3j to 3j : : : 1 , where ea h subword k has length nj + "k for some
"k 2 Ej . If j < m, a Æj +1 -swap within ea h subword will preserve this assumption. If
j = m, ea h k has j k j  m + 1, be ause " 2 Ej implies j"j  j . Therefore 2k -swaps
for blg m  k  0 will reverse them all. (Note that a 2k -swap on a subword of size t,
where 2k < t  2k+1 , redu es it to three subwords of sizes t 2k , 2k+1 t, t 2k .)
80 ANSWERS TO EXERCISES 7.1.3
51. (a) If = ( d 1 : : : 0 )2 , we must have d 1 = d 1 d;d 1 . But for 0  k < d 1 zipping
we an take k = k d;k  ^k , where ^k is any mask  d;k . magi masks
inshue
(b) Let (d; ) be the set of all su h mask sequen es. Clearly (1; ) = f g. When Lenfant
d > 1 we will have, re ursively, transpositions
perfe t shue
(d; ) = f(0 ; : : : ; d 2 ; d 1 ; ^d 2 ; : : : ; ^0 ) j k = k0 1 z k00 1 ; ^k = ^k0 1 z ^k00 1 g; Steele
matrix transposition
by \zipping together" two sequen es (00 ; : : : ; d0 3 ; d0 2 ; ^d0 3 ; : : : ; ^00 ) 2 (d 1; 0 ) and
(000 ; : : : ; d00 3 ; d00 2 ; ^d00 3 ; : : : ; ^000 ) 2 (d 1; 00 ) for some appropriate 0 , ^0 , 0 , and 00 .
When is odd, the bigraph orresponding to (75) has only one y le; so (0; ^0 ;
0 ; 00 ) is either (d;0 ; 0; d =2e; b =2 ) or (0; d;0 ; b =2 ; d =2e). But when is even, the
bigraph has 2d 1 double bonds; so 0 = ^0 is any mask  d;0 , and 0 = 00 = =2.
[In identally, lg j(d; )j = 2d 1 (d 1) Pdk=11 (2k 1) (2k 1 j2k 1 mod 2k j).℄
In both ases we an therefore let ^d 2 =    = ^0 = 0 and omit the se ond half
of (71) entirely. Of ourse in ase (b) we would do the y li shift dire tly, instead of
using (71) at all. But exer ise 58 proves that many other useful permutations, su h as
sele tive reversal followed by y li shift, an also be handled by (71) with ^k = 0 for
all k. The inverses of those permutations an be handled with k = 0 for 0  k < d 1.
52. The following solutions make  ^j = 0 whenever possible. We shall express the
 masks in terms of the 's, for example by writing 6 & 6;5 instead of stating the
requested hexade imal form # 55555555 ; the  form is shorter and more instru tive.
(a) k = 5 & 6;k and ^k = (k+1  k 1 ) & 6;k for 0  k < 5; 5 = 4 . (Here
 1 = 0. To get the \other" perfe t shue, (x31 x63 : : : x1 x33 x0 x32 )2 , let ^0 = 6;0 &1 .)
(b)  = 3 = ^0 = 6;0 & 3 ; 1 = 4 = ^1 = 6;1 & 4 ; 2 = 5 = ^2 = 6;2 & 5 ;
^3 = ^4 =00. [See J. Lenfant,IEEE Trans. C-27 (1978), 637{647, for a general theory.℄
( ) 0 = 6;0 & 4 ; 1 = 6;1 & 5; 2 = 4 = 6;2 & 4 ; 3 = 5 = 6;3 & 5 ;
^0 = 6;0 & 2 ; ^1 = 6;1 & 3 ; ^2 = ^0  2 ; ^3 = ^1  3 ; ^4 = 0.
(d) k = 6;k & 5 k for 0  k  5; ^k = k for 0  k  2; ^3 = ^4 = 0.
53. We an write as a produ t of d t transpositions, (u1 v1 ) : : : (ud t vd t) (see
exer ise 5.2.2{2). The permutation indu ed by a single transposition (uv) on the index
digits, when u < v, orresponds to a (2v 2u )-swap with mask d;v & u . We should
do su h a swap for (u1 v1) rst, : : : , (ud 1 vd 1 ) last.
In parti ular, the perfe t shue in a 2d -bit register orresponds to the ase where
= (01 : : : (d 1)) is a one- y le; so it an be a hieved by doing su h (2v 2u )-swaps
for (u; v) = (0; 1), : : : , (0; d 1). For example, when d = 3 the two-step pro edure is
12345678 7! 13245768 7!k15263748. [Guy Steele suggests an alternative (d 1)-step
pro edure: We an do a 2 -swap with mask d;k+1 & k for d 1 > k  0. When d = 3
his method takes 12345678 7! 12563478 7! 15263748.℄
The matrix transposition in exer ise 52(b) orresponds to d = 6 and (u; v) = (0; 3),
(1; 4), (2; 5). These operations are the 7-swap, 14-swap, and 28-swap steps for 8  8
matrix transposition illustrated in the text; they an be done in any order.
For exer ise 52( ), use d = 6 and (u; v) = (0; 2), (1; 3), (0; 4), (1; 5). Exer ise 52(d)
is as easy as 52(b), with (u; v) = (0; 5), (1; 4), (2; 3).
54. Transposition amounts to reversing the bits of the minor diagonals. Su essive
elements of those diagonals are m 1 apart in the register. Simultaneous reversal of
all diagonals orresponds to simultaneous reversal of subwords of sizes 1, : : : , m, whi h
an be done with 2k -swaps for 0  k < dlg me (be ause su h transposition is easy
7.1.3 ANSWERS TO EXERCISES 81
when m is a power of 2, as illustrated in the text). Here's the pro edure for m = 7: Pratt
Sto kmeyer
Given 6-swap 12-swap 24-swap permutation networks
00 01 02 03 04 05 06 00 10 02 12 04 14 06 00 10 20 30 04 14 24 00 10 20 30 40 50 60 banyan
10 11 12 13 14 15 16 01 11 03 13 05 15 25 01 11 21 31 05 15 25 01 11 21 31 41 51 61 Lawrie
20 30 22 32 24 16 26 02 12 22 32 06 16 26 02 12 22 32 42 52 62 mapping
20 21 22 23 24 25 26 don't- are
30 31 32 33 34 35 36 21 31 23 33 43 35 45 03 13 23 33 43 53 63 03 13 23 33 43 53 63 notation
40 41 42 43 44 45 46 40 50 42 34 44 36 46 40 50 60 34 44 54 64 04 14 24 34 44 54 64
50 51 52 53 54 55 56 41 51 61 53 63 55 65 41 51 61 35 45 55 65 05 15 25 35 45 55 65
60 61 62 63 64 65 66 60 52 62 54 64 56 66 42 52 62 36 46 56 66 06 16 26 36 46 56 66
55. Given x and y , rst set x x j (x  2k ) and y y j (y  2k ) for 2d  k < 3d. Then
set x (22d+k 2k )-swap of x with mask 2d+k & k and y (22d+k 2d+k )-swap ofky
with mask 2d+k &d+k for 0  k < d. Finally set z x & y, then either z z j (z  2 )
or z z  (z  2k ) for 2d  k < 3d, and z z & (2n2 1). [The idea is to form two
n  n  n arrays x = (x000 : : : x(n 1)(n 1)(n 1) )n and y = (y000 : : : y(n 1)(n 1)(n 1) )n
with xijk = ajk and yijk = bjk3 , then transpose oordinates so that xijk = aji and
yijk = bik ; now x & y does all n bitwise multipli ations at on e. This method is due to
V. R. Pratt and L. J. Sto kmeyer, J. Computer and System S i. 12 (1976), 210{213.℄

56. Use (71) with 0 =  ^0 = 0, 1 = # 0010201122113231 , 2 = # 00080e0400080 06 ,


3 = # 00000092008100a2 , 4 = # 0000000000000f16 , 5 = # 0000000003199 26 , ^4 =
# 00000 9f0000901a , ^ = # 003a00b50015002b , ^ = # 000103080 0d0f0 , and ^ =
3 2 1
# 0020032033233333 .

57. The two hoi es for ea h y le when d > 1 have omplementary settings. So we
an hoose a setting in whi h at least half of the rossbars are ina tive, ex ept in the
middle olumn. (See exer ise 5.3.4{55 for more about permutation networks.)
58. (a) Every di erent setting of the rossbars gives a di erent permutation, be ause
there is exa tly one path from input line i to output line j for all 0  i; j < N . (A net-
work with that property is alled a \banyan.") The unique su h path arries input i
on line l(i; j; k) = ((i  k)  k) + (j mod 2k ) after k swapping steps have been made.
(b) We have l(i'; i; k) = l(j'; j; k) if and only if i mod 2k = j mod 2k and i'  k =
j'  k; so () is ne essary. And it is also suÆ ient, be ause a mapping ' that sat-
is es () an always be routed in su h a way that j' appears on line l = l(j'; j; k)
after k steps: If k > 1, j' will appear on line l(j'; j; k 1), whi h is one of the inputs
to l. Condition () says that we an route it to l without on i t, even if l is l(i'; i; k).
[In
IEEE Transa tions C-24 (1975), 1145{1155, Dun an Lawrie proved that ondi-
tion () is ne essary and suÆ ient for an arbitrary mapping ' of the set f0; 1; : : : ; N 1g
into itself, when the rossbar modules are allowed to be general 2  2 mapping modules
as in exer ise 63. Furthermore the mapping ' might be only partially spe i ed, with
j' =  (\wild ard" or \don't- are") for some values of j . The proof that appears in
the previous paragraph a tually demonstrates Lawrie's more general theorem.℄
( ) i mod 2k = j mod 2k if and only if k  (i  j ); i  k = j  k if and only if
k > (i  j ); and i' = j' if and only if i = j , when ' is a permutation.
(d) (i'  j') < (i  j ) for all i and j if and only if (i'  j') < (i  j ) =
(i  j ) for all i and j , be ause  is a permutation. [Note that the notation an be
onfusing: Bit j appears in bit position j if permutation  is applied rst, then  .℄
(e) Given i 6= j we must prove that (i'  j' )  (i  j ). Case 1, i and j are
xed by both ' and : Then (i'  j' ) = (i  j )  (i  j ). Case 2, i' 6= i and
j = j : Then (i'  j' ) = (i'  j')  (i  j ). Case 3, i' 6= i and j 6= j : Then
(i'  j' ) = (i'  j ). Let k = (i  j ), and suppose (i'  j ) < k. Then
82 ANSWERS TO EXERCISES 7.1.3
i mod 2k = j mod 2k and i'  k = j  k. Hen e l(i'; i; k) = l(j ; j; k), and that line monus
arries both i' and j . But those two values annot be equal. nu(k) summed
He kel
59. It is 2 d
M (a;b) , where M (a; b) is the number of rossbars that have both endpoints S hroeppel
d
in [a : : b℄.k To 0 ount0 them, let k = (ab), a00 =ka mod 0
2k , and b0 = b mod 2k ; noti e that magi mask
b a = 2 + b a , and Md (a; b) = Mk+1 (a ; 2 + b ). Counting the rossbars in the top
half and0
bottom half, plus 0
those0 that. jump
0
between halves, gives M0 k+1 (a0 ; 02k + b0 ) =
Mk (a ; 2 1) + Mk (0; b ) + ((b + 1) a ). Finally, we have Mk (0; b ) = S (b + 1); and
k
Mk (a0 ; 2k 1) = Mk (0; 2k 1 a0 ) = S (2k a0 ) = k2k 1 ka0 + S (a0 ), where S (n) is
evaluated in exer ise 42.
60. A y le of length 2l orresponds to a pattern u0 v0 $ v1 ! u1 $ u2 v2 $
   $ v2l 1 ! u2l 1 $ u2l , where u2l = u0 and `u v' or `v ! u' means that the
permutation sends u to v, `x $ y' means that x = y  1.
We an generate a random permutation as follows: Given u0 , there are 2n hoi es
for v0 , then 2n 1 hoi es for u1 only one of whi h auses u2 = u0 , then 2n 2 hoi es
for v2 , then 2n 3 hoi es for u3 only one of whi h loses a y le, et .
Consequently the generating fun tion is G(z) = Qjn=1 22nn 22jj++1z . The expe ted
number of y les, k, is G0 (1) = Hp2n 12 Hn = 21 ln n + ln 2 21 + O(n 1 ). The mean
of 2k is G(2) = (2n n!)2=p(2n)!3==2 n + O(n 1=2); and the varian e is G(4) G(2)2 =
(n + 1 G(2)) G(2) =  n + O(n).
62. The rossbar settings in P (2 ) an be stored in (2d 1)2
d d 1 = Nd 1 N bits. To get
2
the inverse permutation pro eed from right to left. [See P. He kel and R. S hroeppel,
Ele troni Design 28, 8 (12 April 1980), 148{152. Note that any way to represent an
arbitrary permutation requires at least lg N ! > Nd N=ln 2 bits of memory; so this
representation is nearly optimum, spa ewise.℄
63. (i) x = y . (ii) z must be even. (When z is odd we have (x z y )  z = (y  dz=2e) z
(x  bz=2 ), even when z < 0.) (iii) This identity holds for all w, x, y, and z (and also
with any other binary bitwise Boolean operator in pla e of &).
64. (((z & 0 ) + (z j 
0 0 )) & 0 ) j (((z & 0 ) + (z 0 j 0 )) & 0 ). (See (86).)
65. xu(x ) + v (x ) = xu(x) + v (x) .
2 2 2 2

66. (a) v (x) = (u(x)=(1+ x )) mod x ; it's the unique polynomial of degree less than n
Æ n
su h that (1+ x ) v(x)  u(x) n(modulo xn ). (Equivalently, v is the unique n-bit integer
Æ
su h that (v  (v  Æ)) mod 2 = u.)
(b) We may as well assume that n = 64m, and that u = (um 1 : : : u1 u0 )264 ,
v = (vm 1 : : : v1 v0 )264 . Set 0; then, using exer ise 36, set vj uj  ( ) and
vj  63 for j = 0, 1, : : : , m 1.
( ) Set v0 u0 ; then vj uj  and vj , for j = 1, 2, : : : , m 1.
(d) Start with 0 and do the following for j = 0, 1, : : : , m 1: Set t uj ,
t t  (t  3), t t  (t  6), t t  (t  12), t t  (t  24), t t  (t  48),
vj t  , (t  61)  # 9249249249249249 .
(e) Start with v u. Then, for j = 1, 2, : : : , m 1, set vj vj  (vj 1  3) and
(if j < m 1) vj+1 vj+1  (vj 1  61).
67. Let n = 2l 1 and m = n 2d. If 21 n < k < n we have x2k  xm+t + xt (modulo
x + x +1), where t = 2k n is odd. Consequently, if v = (vn 1 : : : v1 v0 )2 , the number
n m

w = u  (((u  d)  (u  2d)  (u  3d)     ) & 2l d )


turns out to equal (vn 2 : : : v3 v1vn 1 : : : v2 v0 )2 . For example, when l = 4 and d = 2,
the square of u6 x6 +    + u1 x + u0 modulo (x7 + x3 +1) is u6 x5 + u5 x3 +(u6  u4 )x1 +
7.1.3 ANSWERS TO EXERCISES 83
(u5  u3 )x6 + (u6 l u4  u2 )x4l+ u1 x2 + u0 . To ompute v, we therefore do a perfe t zipper
shue, v = bw=2 z (w mod 2 ). The number w an be al ulated by methods like Brent
Larvala
those of the previous exer ise. [See R. P. Brent, S. Larvala, and P. Zimmermann, Zimmermann
Math. Comp. 72 (2003), 1443{1452; 74 (2005), 1001{1002.℄ MUX
Warren
68. SRU t,x,delta; PUT rM,theta; MUX x,t,x. ompress
69. Noti e that the pro edure might fail if we attempt to do the 2
d 1 -shift rst instead Steele Jr.
Patent
of last. The key to proving that a small-shift- rst strategy works orre tly is to wat h Shi
the spa es between sele ted bits; we will prove that the lengths of these spa es are Lee
multiples of 2k+1 after the 2k -shift. BESM-6 omputer
gather
Consider the in nite string k = : : : 1t4 02k 1tk3 02k 1t2 02k 1t1 02k 1t0 , whi hk represents pa k
the situation where tl  0 items need to move 2 l pla es to the right. A 2 -shift with ompress
s atter
any mask of the form k = : : : 0t4 2k+1 1t3 0kt+12 2k+1 1kt1+10t0 leaves us with the situation unpa k
represented by the string k+1 =k+1: : : 1T2 02 1T1 02 1T0 , where exa tly Tl = t2l +
t2l+1 items need to move right 2 l pla es. So the laim holds by indu tion on k.

70. Let k = k  (k  1), so that k = k in the notation of exer ise 36. If we take
 = 0 1 in the previous answer, we have 0 =  and k+1 = ( k & k )  2k .
2 k +1 2 k +1 2k+1
Therefore we an pro eed as follows:
Set , k 0, and repeat the following steps while 6= 0: Set x , then
x x  (x  2 l ) for 0  l < d, then k x, ( & x)  2k , and k k + 1.
The omputation ends with k =   + 1; the remaining masks k , : : : , d 1 , if
any, are zero and those steps an be omitted from (80). Sometimes this pro edure gives
nonzero masks k that a tually do nothing useful, be ause t1 = t3 =    = 0. To avoid
su h redundan y, hange `k x' to `k x & (x + (x & & (  2k )))'.
[See ompress in H. S. Warren, Jr., Ha ker's Delight (Addison{Wesley, 2002), x7{4;
also G. L. Steele Jr., U.S. Patent 6715066 (30 Mar h 2004).℄
71. Start with x y. Do a ( 2k )-shift of x with mask k , for k = d 1, : : : , 1, 0, using
the masks of exer ise 70. Finally set z x (or z x & , if you want a \ lean" result).
2d 1
72. 2 x + y.
73. Equivalently, d sheep-and-goats operations must be able to transform the word
x = (x(2d 1) : : : x1 x0 )2 into (x2d 1 : : : x1 x0 )2 , for any permutation  of f0; 1; : : : ;
2d 1g. And this an be done by radix-2 sorting (Algorithm 5.2.5R): First bring the
odd numbered bits to the left, then bring the bits j for odd bj=2 left, and so on.
For example, when d = 3 and x = (x3 x1 x0 x7 x5 x2 x6 x4 )2, the three operations yield
su essively (x3 x1 x7 x5 x0 x2 x6 x4 )2 , (x3 x7 x2 x6 x1 x5 x0 x4 )2 , (x7 x6 x5 x4 x3 x2 x1 x0 )2 . [See
Z. Shi and R. Lee, Pro . IEEE Conf. ASAP'00 (IEEE CS Press, 2000), 138{148.℄
Histori al note: The BESM-6 omputer, designed in 1965, implemented half of
the sheep-and-goats operation: Its <sborka> (\gather" or \pa k") ommand produ ed
(z & ) P, and its <razborka> ommand (\s atter" or \unpa k") went the other way.


74. If j
P
2l 2l+1 j = 2 > 0, we must rob  from the ri h half and give it to
the poor. There'sd a1position l in the poor half with l = 0; otherwise that half wouldd
sum to at0 least 2 . A y li 1-shift that modi es positions l through (l + t) mod 2
makes l+k = l+k+1 for 0  k < t, 0l+t = l+t+1 Æ, 0l+t+1 = Æ, and 0l+k = l+k
for all other k; here Æ an be any desired value in the range 0  Æ  l+t+1 . (We've
treated all subs ripts modulo 2d in these formulas.) So we an use the smallest even t
su h that l+1 + l+3 +    + l+t+1 = l + l+2 +    + l+t +  + Æ for some Æ  0.
(The 1-shift need not be y li , if we allow ourselves to shift left instead of right.
But the y li property may be needed in subsequent steps.)
84 ANSWERS TO EXERCISES 7.1.3
75. Equivalently, given indi es 0  i0 < i1 <    < is 1 < is = 2 and 0 = j0 <
d re ursively
j1 <    < js 1 < js = 2d , we want to map (x2d 1 : : : x1 x0 )2 7! (x(2d 1)' : : : x1' x0' )2 , Ofman
Æ -map
where j' = ir for jr  j < jr+1 and 0  r < s. If d = 1, a mapping module does this. Ofman
When d > 1, we an set the left-hand rossbars so that they route input ir to line magi masks
ir  ((ir + r) mod 2). If s is even, we re ursively ask one of the networks P (2d 1) inside Pratt
Rabin
P (2d) to solve the problem for indi es bfi0 ; i2 ; : : : ; is g=2 and bfj0 ; j2 ; : : : ; js g=2 , while Sto kmeyer
the other solves it for dfi1 ; i3 ; : : : ; is 1 ; 2dg=2e and dfj0 ; j2 ; : : : ; js g=2e. At the right of Arndt
P (2d), one an now he k that when jr  j < jr+1 , the mapping module for lines j MMIX
2ADDU
and j  1 has input ir on line j if j  r (modulo 2), otherwise ir is on line j  1.
A similar proof works when s is odd.
Notes: This network is a slight improvement over a onstru tion by Yu. P. Ofman,
Trudy Mosk. Mat. Obsh hestva 14 (1965), 186{199. We an implement the orrespond-
ing network by substituting a \Æ-map" for a Æ-swap; instead of (69), we use two0 masks
and do seven operations instead of six: y x  (x  Æ), x x  (y & )  ((y &  )  Æ).
This extension of (71) therefore takes only d additional units of time.
76. When a mapping network realizes a permutation, all of its modules must a t as
rossbars; hen e G(n)  lg n!. Ofman proved that G(n)  2:5n lg n, and remarked in
a footnote that the onstant 2.5 ould be improved (without giving any details). We
have seen that in fa t G(n)  2n lg n. Note that G(3) = 3.
77. Represent an n-network by (x2n 1 : : : x1 x0 )2 , where xk = [the binary representa-
tion nof k is a possible on guration of 0s andn 1s when the network has been applied to
all 2 sequen es of 0s and 1s℄, for 0  k < 2 . Thus the empty network is represented
by 22n 1, and a sorting network for n = 3 is represented by (10001011)2. In general,
x represents a sorting network for n elements if and only if it represents an n-network
and x = n + 1, if and only if x = 20 + 21 + 23 + 27 +    + 22n 1.
If x represents a ording to these onventions, the representation of [i:j ℄ is
(x  y) j (y  (2n i 2n j )), where y = x & n j & n i .
[See V. R. Pratt, M. O. Rabin, and L. J. Sto kmeyer, 6 (1974), 122{126.℄
STOC

78. If k  lg(m 1) the test is valid, be ause we always have x1 + x2 +    + xm 


x1 j x2 j    j xm , with equality if and only if the sets are disjoint. Moreover, we have
(x1 +    + xm ) (x1 j    j xm )  (m 1)(2n k 1 +    + 1) < (m 1)2n k  2n .
Conversely, if m  2k + 2 and n > 2k, the test is invalid. We might have, for
example, x1 +    + xm = (2k +1)(2n k 2n 2k 1 )+2n k 1 = 2n +(2n k 2n 2k 1 ).
But if n  2k the test is still valid whenn mk = 2k +2,n be ause our proof shows that
x1 +    + xm (x1 j    j xm )  (2 + 1)(2
k 1) < 2 in that ase.
79. x0 = (x 1) & . (And the formula x0 = ((x b 1) & a) + b orresponds to (85).)
These re ipes for x0 and x0 are part of Jorg Arndt's \bit wizardry" routines (2001);
their origin is unknown.
80. Perhaps the ni est way is to start with x  1 as a signed number; then while
x  0, set x x & , visit x, and set x 2x . (The operation 2x  an in fa t
be performed with a single MMIX instru tion, `2ADDU x,x,minus hi'.)
But that tri k fails if  is so large as to be already \negative." A slightly slower
but more general method starts with x  and does the following while x 6= 0: Set
t x & x, visit  t, and set x x t.
81. ((z & ) (z0 & )) & . (One way to verify this formula is to use (18).)
0
82. Yes, by letting z = z in (86): w j (z &  ), where w = ((z & ) + (z j )) & .
7.1.3 ANSWERS TO EXERCISES 85
83. (The following iteration propagates bits of y to the right, in the gaps of a s attered table lookup
a umulator t. Auxiliary variables u and v respe tively mark the left and right of ea h Dijkstra
zip-fastener
gap; they double in size until being wiped out by w.) Set t z & , u (  1) & , Z order, see zip
v ((  1) + 1) & , w 3(u & v), u 3u, v 3v, and k 1. Then, while Morton
u 6= 0, do the following steps: t t j ((t  k) & u), k k  1, u u & w, v v & w, quadtrees
Samet
w ((v & (u  1) & u)  (k + 1)) ((u & (v  1) & v)  k), u u + ((u & v)  k), Raman
v v + ((v & u)  k). Finally return the answer ((t  1) & ) j (z & ). Wise
Cantor
84. z )  = w (z & ), where w = (((z & )  1) + ) &  appears in answer 82; analysis of alg
z +  is the quantity t omputed (with more diÆ ulty) in the answer to exer ise 83. onvex optimization
To her
85. (a) If x = LOC(a[i; j; k ℄) is the drum lo ation orresponding to interleaved bits as
Latin-1
Woods
stated, then LOC(a[i + 1; j; k℄) = x  ((x  ((x & ) )) & ) and LOC(a[i 1; j; k℄) = game
x  ((x  ((x & ) 1))& ), where  = (11111)8 , by (84) and answer 79. The formulas Adventure
for LOC(a[i; j  1; k℄) and LOC(a[i; j; k  1℄) are similar, with masks 2 and 4.
(b) For random a ess, let's hope there is room for a table of length 32 giving
f [(i4 i3 i2 i1 i0 )2 ℄ = (i4 i3 i2 i1 i0 )8 . Then LOC(a[i; j; k℄) = (((f [k℄  1) + f [j ℄)  1) + f [i℄.
(On a vintage ma hine, bitwise omputation of f would be mu h worse than table
lookup, be ause register operations used to be as slow as fet hes from memory.)
( ) Let p be the lo ation of the page urrently in fast memory, and let z = 128.
When a essing lo ation x, if x & z 6= p it is ne essary to read 128 words from drum
lo ation x & z (after saving the urrent data to drum lo ation p if it has hanged);
then set p x & z. [See J. Royal Stat. So . B-16 (1954), 53{55. This s heme of array
allo ation for external storage was devised independently by E. W. Dijkstra, ir a 1960,
who alled it the \zip-fastener" method. It has often been redis overed, for example
in 1966 by G. M. Morton and later by developers of quadtrees; see Hanan Samet,
Appli ations of Spatial Data Stru tures (Addison{Wesley, 1990). See also R. Raman
and D. S. Wise, IEEE Trans. C57 (2008), to appear, for a ontemporary perspe tive.
Georg Cantor had onsidered interleaving the digits of de imal fra tions in Crelle 84
(1878), 242{258, x7; but he observed that this idea does not lead to an easy one-to-one
orresponden e between the unit interval [0 : : 1℄ and the unit square [0 : : 1℄  [0 : : 1℄.℄
0 0 0
86. If (p ; q ; r ) bits of (i; j; k ) are in the part of the address that does not a e t the
page 0number, the total number of page faults is 20 ((2p p00 1)20 q+r + (2q q0 1)2p+r +
(2r r0 1)2p+q ). 0Hen e we want to0 minimize 2 ap +2b q +2 r over the set 0  p0  p,
0 0
0  q  q, 0  r  r, p + q + r = s. Sin e 2 + 2 > 2 + 2b+1 when a and b are
a 1

integers with a > b +1, the minimum (for all s) o urs when we sele t bits from right to
left y li ally until running out. For example, when (p; q; r) = (2; 6; 3) the addressing
fun tion would be (j5 j4 j3 k2 j2 k1 j1 i1 k0 j0 i0 )2 . In parti ular, To her's s heme is optimal.
[But su h a mapping is not ne essarily best when the page size isn't a power of 2.
For example, onsider a 16  16 matrix; the addressing fun tion (j3 i3i2 i1 i0 j2 j1j0 )2 is
better than (j3 i3 j2 i2j1 i1 j0 i0 )2 for all page sizes from 17 to 62, ex ept for size 32 when
they are equally good.℄
87. Set x x & ((x & "" )  1); ea h byte (a7 : : : a0 )2 is thereby hanged to
(a7 a6 (a5 ^a6 )a4 : : : a0 )2 . The same transformation works also on 30 additional letters
in the Latin-1 supplement to ASCII (for example,  7! ); but there's one glit h, y 7! .
[Don Woods used this tri k in his original program for the game of Adventure
(1976), upper asing the user's input words before looking them up in a di tionary.℄
88. Set z (x  y) & h, then z ((x j h) (y & h ))  z.
86 ANSWERS TO EXERCISES 7.1.3
89. x
0 x  1, y0 y  1, t (x0 &(x j y)) j (x & y0 ), z (x & y & 0 ) j (t & 0 ). [From Dietz
the \nasty" test program for H. G. Dietz and R. J. Fisher's SWARC ompiler (1998).℄ Fisher
SWARC ompiler
90. Insert `z z j ((x  y) & l)' either before or after `z (x & y) + z '. (The ordering MMIX
makes no di eren e, be ause x+y  xy (modulo 4) when x+y is even. Therefore MMIX MOR
Rounding to even
an round to odd at no additional ost, using MOR. Rounding to even in the ambiguous round to odd
ases is more diÆ ult, and with xed point arithmeti it is not advantageous.) MUX
unbiased rounding
91. If 2 [x; y ℄ denotes the average as in (88), the desired result is obtained by repeating
1
Warren
the following operations seven times, then on luding with z 12 [x; y℄ on e more: identity
Borrows
z 1 [x; y℄; t & h; m (t  1) (t  7);
arries
2 borrow
MOR
x (m & z ) j (m & x); y (m & z ) j (m & y);  1: BDIF
y li shift
Although rounding errors a umulate through eight levels, the resulting absolute error medians
never ex eeds 807/255. Moreover, it is  1:13 if we average over all 2563 ases, and
it is less than 2 with probability  94:2%. If we round to odd as in exer ise 90, the
maximum and average error are redu ed to 616/255 and  0:58; the probability of error
< 2 rises to  99:9%. Therefore the following MMIX ode uses su h unbiased rounding:
8 9
x GREG ;y GREG ;z GREG >
>XOR t,x,y MOR m,ffhi,alf >
>
>
> >
>
alf GREG ;m GREG ;t IS $255 <MOR z,rodd,t PUT rM,m =
ffhi GREG -1<<56 repeat seven times: >
AND t,x,y MUX x,z,x
>
l GREG #0101010101010101 > >
>ADDU z,z,t
>
:
MUX y,y,z >
>
;
rodd GREG #4020100804020101 SLU alf,alf,1
after whi h the rst four instru tions are repeated again. The total time for eight
-blends (67) is less than the ost of eight multipli ations.
92. We get zj = d(xj + yj )=2e for ea h j . (This fa t, noti ed by H. S. Warren, Jr.,
follows from the identity x + y = ((x j y)  1) (x  y). See also the next exer ise.)
93. x y = (x  y) ((x & y)  1). (\Borrows" instead of \ arries.")
94. (x l)j = (xj 1 bj ) mod 256, where bj is the \borrow" from elds to the right.
So tj is nonzero if and only if (xj : : : x0 )256 < (1 : : : 1)256 = (256j+1 1)=255. (The
answers to the stated questions are therefore \yes" and \no.")
In general if the onstant l is allowed to have any value (l7 : : : l1 l0 )256 , opera-
tion (90) makes tj 6= 0 if and only if (xj : : : x0 )256 < (lj : : : l0 )256 and xj < 128.
95. Use (90): Test if h & (t(x  ((x  8) + (x  56))) j t(x  ((x  16) + (x  48))) j
t(x  ((x  24)+(x  40))) j t(x  ((x  32)+(x  32)))) = 0, where t(x) = (x l)& x.
(These 28 steps redu e to 20 if y li shift or MOR is available, or 15 with BDIF and MOR.)
96. Suppose 0  x; y < 256, xh = bx=128 , x l = x mod 128, yh = by=128 , y l =
y mod 128. Then [ x < y ℄ = hxh yh [ x l < y l ℄i; see exer ise 7.1.1{106. And [ x l < y l ℄ =
[ yl + 127 xl  128℄. Hen e [ x < y ℄ = bhxyzi=128 , where z = (x & 127) + (y & 127).
It follows that t = h & hxyzi has the desired properties, when z = (x & h )+(y & h ).
This formula an also be written t = h & hxyzi, where z = ((x & h ) + (y & h )) =
(x j h) (y & h ) by (18).
To get a similar test fun tion for [ xj  yj ℄ = 1 [ yj < xj ℄, we just inter hange x $ y
and take the omplement: t h & hxyzi = h & hxyzi, where z = (x & h ) + (y & h ).
97. Set x
0 x  "********", y0 x  y, t h & (x j ((x j h) l)) &(y0 j ((y0 j h) l)),
m (t1) (t7), t t&(x0 j ((x0 j h) l)), z (m&"********") j (m&y). (20 steps.)
98. Set u xy, z (x&h )+(y &h ), t hx(u j (xz )), v ((t1) (t7))&u,
z x  v, w y  v. [This 14-step pro edure invokes answer 96 to ompute t =
7.1.3 ANSWERS TO EXERCISES 87
h & hxyz i, using the footprint method of Se tion 7.1.2 to evaluate the median in only footprint method
three steps when x  y is known. Of ourse the MMIX solution is mu h qui ker, if median
MMIX
available: BDIF t,x,y; ADDU z,y,t; SUBU w,x,t.℄ BDIF
99. In this potpourri, ea h of the eight bytes appears to be solving a di erent kind potpourri
arry
of problem; we must re ast the onditions so that they t into a ommon framework: borrow
f0 = [ x0  '!'  0℄, f1 = [ x1  '*' > 0℄, f2 = [ x2  'A' 1℄, f3 = [ x3 > 'z' ℄, f4 = medians
[ x4 > 'a' 1℄, f5 = [ x5  '0'  9℄, f6 = [ x6  255 > 86℄, f7 = [ x7  '?'  3℄. Aha! We Borrows
arries
an use the formulas in answer 96, adjusting d to swit h between  and > as needed: Soule
a = ('?'(255)'0'00'*''!')256 = # 3fff300000002a21 ; b = h = # 7f7f7f7f7f7f7f7f ; multibyte arithmeti
= h & (3(86)9('a' 1)'z'('A' 1)00)256 = # 7 29761f053f7f7f (the hardest one); toruses
d = # 8000800000800080 ; and e = h = # 8080808080808080 .
100. We want uj = xj +yj + j 10 j +1 and vj = xj yj bj +10bj +1 , where j and bj are
the \ arry"
0
and \borrow"64into digit position 0j . Set u0 (x + y + (6 : : : 66)16) mod0264
and v (x y) mod 2 . Then we nd uj = xj + yj + j + 6 16 j+1 and vj =
xj yj bj + 16bj +1 for 0  j < 16, by indu tion on j . Hen e u0 and v0 have the
same pattern of arries and borrows as if we were working in radix 10, and we have
u = u0 6( 16 : : : 2 1 )16 , v = v0 6(b16 : : : b2 b1 )16 . The following omputation s hemes
therefore provide the desired results (10 operations for addition, 9 for subtra tion):
y0 y + (6 : : : 66)16 ; u0 x + y0; v0 x y;
0 0
t hxy u i & (8 : : : 88)16 ; t hxyv0 i & (8 : : : 88)16 ;
0
u u t + (t  2); v v0 t + (t  2):
101. For subtra tion, set z x y; for addition, set z x + y + # e8 4 4f 18 , where
this onstant is built from 256 24 = # e8 , 256 60 = # 4 , and 65536 1000 =
# f 18 . Borrows and arries will o ur between elds as if mixed-radix subtra tion or
addition were being performed. The remaining task is to orre t for ases in whi h
borrows o urred or arries did not; we an do this easily by inspe ting individual
digits, be ause the radi es are less than half of the eld sizes: Set t z & # 8080808000 ,
t (t  1) (t  7) ((t  15) & 1), z z (t & # e8 4 4f 18 ). [See Stephen Soule,
CACM 6 (1975), 344{346. We're lu ky that the ` ' in `f 18' is even.℄
102. (a) We assume that x = (x15 : : : x0 )16 and y = (y15 : : : y0 )16 , with 0  xj ; yj < 5;
the goal is to ompute u = (u15 : : : u0 )16 and v = (v15 : : : v0)16 , with omponents
uj = (xj + yj ) mod 5 and vj = (xj yj ) mod 5. Here's how:
u x + y; v x y + 5l;
t (u + 3l) & h; t (v + 3l) & h;
u u ((t (t  3)) & 5l); v v ((t (t  3)) & 5l):
Here l = (1 : : : 1)16 = (264 1)=15, h = 8l. (Addition in 7 operations, subtra tion in 8.)
(b) Now x = (x20 : : : x0 )8 , et ., and we must be more areful to on ne arries:
t x + h ;
z (x j h) (y & h );
z (t & h ) + (y & h );
t (y j z) & x & h;
t (y j z ) & t & h;
v x y + t + (t  2):
u x + y (t + (t  2));
Here h = (4 : : : 4)8 = (265 4)=7. (Addition in 11 operations, subtra tion in 10.)
Similar pro edures work, of ourse, for other moduli. In fa t we an do multibyte
arithmeti on the oordinates of toruses in general, with di erent moduli in ea h
omponent (see 7.2.1.3{(66)).
88 ANSWERS TO EXERCISES 7.1.3
103. Let h and l be the onstants in (87) and (88). Addition is easy: u x j ((x&h )+y). leap year
For subtra tion, take away 1 and add xj &(1 yj ): t (x&l)1, v t j (t+(x&(y l))). table lookup by shifting
bytewise min and max
104. Yes, in 19: Let a = (((1901  4) + 1)  5) + 1, b = (((2099  4) + 12)  5) + 28. perfe t shues
Set m (x  5) & # f (the month), # 10 & ((x j (x  1))  5) (the leap year Albers
Hagerup
orre tion), u b + # 3 & ((# 3bbee + ) #(m + m)) (the max day adjustment), and sideways addition
t ((x  a  (x a)) j (x  u  (u x))) & 1000220 (the test for unwanted arries). CSNZ
ZSNZ
105. Exer ise 98 explains how to ompute bytewise min and max; a simple modi ation Stret h
will ompute min in some byte positions and max in others. Thus we an \sort by magi
perfe t shues" as in Se tion 5.3.4, Fig. 57, if we an permute bytes between x and y
appropriately. And su h permutation is easy, by exer ise 1. [Of ourse there are mu h
simpler and faster ways to sort 16 bytes. But see S. Albers and T. Hagerup, Inf. and

Computation 136 (1997), 25{51, for asymptoti impli ations of this approa h.℄

106. The n bits are regarded as g elds of g bits ea h. First the nonzero elds are
dete ted (t1 ), and we form a word y that has (yg 1 : : : y0 )2 in ea h g-bit eld, where
yj = [ eld j of x is nonzero℄. Then we ompare ea h eld with the onstants 2g 1 ,
: : : , 20 (t2 ), and form a mask m that identi es the most signi ant nonzero eld of x.
After putting g opies of that eld into z, we test z as we tested y (t3 ). Finally an appro-
priate sideways addition of t2 and t3 (g-bit-wise) yields . (Try the ase g = 4, n = 16.)
To ompute 2 without shifting left, repla e `t2  1' by `t2 + t2 ', and repla e the
nal line by w (((a  (t3  (t3  g))) mod 2n )  (n g))  l ; then w & m is 2x .
107. h GREG #8000800080008000 CSNZ x,q,z SUBU t,q,t
ms GREG #00ff0f0f33335555 CSNZ lam,q,t OR t,t,y
1H ANDN q,x,m5 2H SLU q,x,16 AND t,t,h
SRU z,x,32 ADDU x,x,q 5H SLU q,t,15
CSNZ x,q,z SLU q,x,32 ADDU t,t,q
ZSNZ lam,q,32 ADDU x,x,q SLU q,t,30
ANDN q,x,m4 3H ANDN y,x,ms ADDU t,t,q
SRU z,x,16 4H XOR t,x,y 6H SRU q,t,60
ADD t,lam,16 OR q,y,h ADDU lam,lam,q
The total time 25 (and no mems) should be in reased by  for a fair omparison with
(56), be ause (56) doesn't lobber x.
e 2e
108. For example, let e be minimum so that n  2  2 . If n is a multiple of 2 , we
e
an use 2 elds of size n=2 , with e redu tions in step B1; otherwise we an use 2e
e e
elds of size 2dlg ne e 1, with e + 1 redu tions in step B1. In either ase there are e
iterations in steps B2 and B5, so the total running time is O(e) = O(log log n).
109. Start with x x & x and apply Algorithm B. (Step B4 of that algorithm an
be slightly simpli ed in this spe ial ase, using a onstant l instead of x  y.)
110. Let s = 2 where d = 2
d e e. We will use s-bit elds in n-bit words.
K1. [Stret h x mod s.℄ Set y x &(s 1). Then set t y & j and y y  t 
(t2j (s 1)) for e > j  0. Finally set y (yss) y. [If x = (x2e 1 : : : x0 )2
we now have y = (y2e 1 : : : y0 )2s , where yj = (2 1)xj [ j < d ℄.℄
K2. [Set up minterms.℄ Set y y (a2e 1 : : : a0 )2s , where aj = d;j for 0  j < d
and aj = 2s 1 for d  j < 2e .
K3. [Compress.℄ Set y (y  2j s) for e > j  0, then y y & (2s 1). [Now
y = 1  (x mod s). This is the key point that makes the algorithm work.℄
7.1.3 ANSWERS TO EXERCISES 89
K4. [Finish.℄ Set y y j (y  2j se) for 0  j < e. Finally set y y & (2e;j  extra t the most signi ant bit
((x  j ) & 1)) for d  j < 2 . quanti ation
Pratt
111. The n bits are divided into elds of s bits ea h, although the leftmost eld might nite state automaton
be shorter. First y is set to ag the all-1 elds. Then t = (k: : : t1 t0 )2s ontains andidate Gray binary ode
bits for q, in luding \false drops" for ertain patterns 01 with s  k < r. We always
have tj  1, and tj 6= 0 implies tj 1 = 0. The bits of u and v subdivide t into two
parts so that we an safely ompute m = (t  1) j (t  2) j    j (t  r), before making
a nal test to eliminate the false drops.
112. Noti e that if q = x & (x  1) &    & (x  (r 1)) & (x  r) then we have
x & x + q = x & (x  1) &    & (x  (r 1)).
If we an solve the stated problem in O(1) steps, we an also extra t the most
signi ant bit of an r-bit number in O(1) steps: Apply the ase n = 2r to the number
22n 1 x. Conversely, a solution to the extra tion problem an be shown to yield a so-
lution to the 1r 0 problem. Exer ise 110 therefore implies a solution in O(log log r) steps.
0 0 0
113. Let 0 = 0, x0 = x0 , and onstru t xi0 = xi for 1  i  r as follows: If
xi = a Æi b and Æi 2= f+; ; g, let i0 = (i 1)0 + 1 and x0i0 = a0 Æi b0 , where a0 = x0j 0
if 0a = xnj and a0 = a if a = i . If xi = a  , let i0 = (i 1)0 + 2 and (x0i0 1 ; x0i0 ) =
(a &(b2 1); x0i0 1  ). If xi = a + b, let i0 = (i 1)0 +6 and let (x0(i 1)0 +1 ; : : : ; x0i0 )

ompute ((a0 & h ) + (b0 & h ))0  ((a0 0 b0 ) & h), where h = 2n 1 . And if xi = a b, do

the similar omputation ((a j h) (b & h))  ((a  b0 ) & h). Clearly r0  6r.
0
114. Simply let Xi = Xj (i) Æi Xk(i) when xi = xj (i) Æi xk(i) , Xi = Ci Æi Xk(i) when
xi = i Æi xk(i) , and Xi = Xj (i) Æi Ci when xi = xj (i) Æi i , where Ci = i when i is a
shift amount, otherwise Ci = ( i : : : i )2n = (2mn 1) i =(2n 1). This onstru tion is
possible thanks to the fa t dthat variable-length shifts are prohibited.
[Noti e that if m = 2 , we an use this idea to simulate 2d instan es of f (x; yi );
then O(d) further operations allow \quanti ation."℄
115. (a) z x & (x  1) & (x  2), y x & (x + z ). [This problem was posed to the
author by Vaughan Pratt in 1977.℄
(b) First nd xl 0 (x  1) & x and xr x & (x  1),0 the 0left and right ends
of x's blo ks; and set xr = xr & (xr 1). Then ze xr & (xr (xl & 0 )) and
zo x0r & (x0r (xl & 0 )) are the right ends that are followed by a left end in even or
odd position, respe tively. The answer is y x & (x + (ze & 0 ) + (zo & 0 )); it an be
simpli ed to y x & (x + (ze  (x0r & 0))).
( ) This ase is impossible, by Corollary I.
116. The language L is well de ned, by Lemma A (ex ept that the presen e or absen e
of the empty string is irrelevant). A language is regular if and only if it an be de ned by
a nite state automaton, and a 2-adi integer is rational if and only if it an be de ned
by a nite state automaton that ignores its inputs. The identity fun tion orresponds
to the language L = 1(0 [ 1) , and a simple onstru tion will de ne an automaton that
orresponds to the sum, di eren e, or Boolean ombination of the numbers de ned by
any two given automata a ting on thesequen e x0 x1 x2 : : : . Hen e L is regular.
In exer ise 115, L is (a) 11 (000 1(0 [ 1) [ 0 ); (b) 11(00(00)1(0 [ 1) [ 0).
117. In identally, the stated language L orresponds to an inverse Gray binary ode:
It de nes a fun tion with the property that f (2x) = f (2x + 1), and g(f (2x)) =
g(f (2x + 1)) = x, where g(x) = x  (x  1) (see Eq. 7.2.1.1{(9)).
Pn 1
118. If x = (xn 1 : : : x1 x0 )2 and 0  aj  2 for 0  j < n, we have j =0 aj xj =
j
Pn 1
j =0 ( a j
. (
x & 2j )). Take a = b2j 1 to get x  1.
j
90 ANSWERS TO EXERCISES 7.1.3
Conversely, the following argument by M. S. Paterson proves that monus must be Paterson
used at least n 1 times: Consider any hain for f (x) that uses addition, subtra tion, under ow
bitwise Booleans, and k o urren es of the \under ow"0 00
operation0y / z = (2n 00 1)[ y <z ℄.
If k < n 1 there must be two n-bit numbers x and x su h that x mod0 2 = x 00mod 2 =
0 and su h that all k of the /'s yield the same result for both x and x . Then
f (x0 ) mod 2j = f (x00 ) mod 2j when j =  (x0  x00 ). So f (x) is not the fun tion x  1.
119. z x  y, f 2p & z & (z 1). (See (90).)
120. Generalizing Corollary W, these are the fun tions su h that f (x1 ; : : : ; xm ) 
f (y1 ; : : : ; ym ) (modulo 2k ) whenever xj  yj (modulo 2k ) for 1  j  m, for 0  k  n.
The least signi ant bit is a binary fun tion of m variables, so it has 22m possibilities.
The next-to-least is a binary fun tion of 2m variables, namely the bits of (x1 mod 4;
: : : ; xm mod 4), so it has 222m ; and so on. Thus the answer is 22m +22m ++2nm .
121. (a) If f has a period of length pq , where q > 1 is odd, its p-fold iteration f has a
[p℄

period of length q, say y0 7! y1 7!    7! yq = ny0 1where yj+1 =nf 1 (yj ) and y1 6= y0 . But
[p℄

then, by Corollary W, we must have y0 mod 2 7! y1 mod 2 7!    7! yq mod 2n 1


in the orresponding (n 1)-bit n hain. Consequently y1  y0 (modulo 2 ), by n 1

indu tion on n. Hen e y1 = y0  2 1 , and y2 = y0 , et ., a ontradi tion.


(b) x1 = x0 + x0 , x2 = x0  (p 1), x3 = x1 j x2 ; a period of length p starts with
the value x0 = (1 + 2p + 22p +    ) mod 2n .
122. Subtra tion is analogous to addition; Boolean operations are even simpler; and
onstants have only one bit pattern. The only remaining ase is xr = xj  , where we
have Sr = Sj + ; the shift goes left when < 0. Then Vpqr = V(p+ )(q+ )j , and
xr & b2 p 2q = ((xj & b2 p+ 2q+ )  ) & (2n 1):
Hen e jXpqr j  jX(p+ )(q+ )j j  Bj = Br by indu tion.
123. If x = (xg 1 : : : x0 )2 , note rst that t = 2
g 1 (x : : : x ) g in (104); hen e y =
0 g 1 2
(x0 : : : xg 1 )2 as laimed. Theorem P now implies that b 13 lg g broadword steps are
needed to multiply by ag+1 and by ag 1 . At least one of those multipli ations must
require b 61 lg g or more steps.
124. Initially t 0, x0 = x, U0 = f1; 2; : : : ; 2n 1g, and 10 0. When advan ing
t t + 1, if the urrent instru tion is ri rj  rk we simply de ne xt = xj 0  xk0 and
i0 t. The ases ri rj Æ rk and ri are similar.
If the urrent instru tion bran hes when ri  rj , de ne xt = xt 1 and let V1 =
fx 2 Ut 1 j xi0  xj0 g, V0 = Ut n V1 . Let Ut be the larger of V0 and V1 ; bran h if
Ut = V1 . Noti e that jUt j  jUt 1 j=2 in this ase.
If the urrent instru tion is ri rj rk , let W =t 1+ fxe+2fUt 1 j x&b2lg n+s 2s 6= 0
for some s 2 Sk0 g, and note that jW j  jSk0 jSlg n  2 . Let V = fx 2 Ut 1 n W j
xk0 = g for j j < n, and Vn = Ut 1 n W n j j<n V . Lemma B tells us that at most
Bk0 +1  22t 1 1 +1 of the sets V are nonempty. Let Ut be the largest; and if it is V ,
de ne xt = xj0  , i0 t. In this ase jUt j  (jUt 1 j 2t 1+e+f )=(22t 1 1 + 1).
Similarly for ri M [rj mod 2m ℄ or M [rj mod 2m ℄ ri , let W = fx 2 Ut 1 j
x & b2m+s 2s 6= 0 for some s 2 Sj 0 g, and Vz = fx 2 Ut 1 n W j xj 0 mod 2m = z g,
for 0  z < 2m . By Lemma B, at most Bj0  22t 1 1 of the sets00 Vz are nonempty; let
Ut = Vz be the largest. To write ri in M [z ℄, de ne xt = xt 1 , z i0 ; to read ri from
M [z ℄, set i0 t and put xt = xz00 if z 00 is de ned, otherwise let xt be the pre omputed
onstant M [z℄. In both ases jUt j  (jUt 1j 2t 1m)=22t 1 1 is suÆ iently large.
7.1.3 ANSWERS TO EXERCISES 91
If t <f we annot be sure that r1 = x. The reason is that the set W = RAM
fx 2 Ut j x & b2lgen++fs t 2s 6= 0 for somet s 2 S10 g has size jW j  jS10 j lg n  2t+e+f , Paterson
2-adi hain
and jUt n W j  22 2 +1 2t+e+f > 22 1  jfx10 & b2lg n 1 lgj nx0 2 Ut n W gj. Two suÆx parity fun tion
elements of Ut n W annot have the same value of x = x10 & b2 1 . breadth- rst sear h
[The same lower bound applies even if we allow the RAM to make arbitrary a he-friendly
sequential
22t 1-way bran hes based on the ontents of (r1 ; : : : ; rl ) at time t.℄ linked
125. Start as in answer 124, but with U0 = [0 : : 2 ). Simplifying that argument by
g radix ex hange sort
eliminating the sets W will yield sets su h that jUt j  2g= max(2m ; 2n)t; for example,
at most 2n di erent shift instru tions an o ur.
Suppose we an stop at time t with t < blg(h + 1)p+ s. Theq+sproof of Theorem P
yields p and q with xR & b2p 2q independent of x & b2t 2(h 1) .=2Hen e the hinted
extension of Lemma B shows that xR takesp+ons at most 22 1  2 di erent values,
for every setting of the otherRbits fx & b2 (h 1)2q=+2+s g j hs 2 St g. Consequently r1 = x10
an be the orre t value of x for at most 2 values of x. But 2(h 1)=2+g h
is less than jUt j, by (106).
126. M. S. Paterson has proposed a related (but di erent) onje ture: For every 2-adi
hain with k addition-subtra tion operations, there is a (possibly huge) integer x with
x = k + 1 su h that the hain does not al ulate 2 x .
127. Use exer ise 110 to ompute [(1  (x)) & 6= 0℄ for a suitable onstant . (The
spe ial ase x = n may also need to be handled separately.)
128. The weaker lower bound
(log log n) follows from Theorem R, be ause x =
 (x0 1) when x0 = x & x 1.

129. Note that the suÆx parity fun tion x is onsidered in exer ises 36 and 117.
130. If the answer is \no," the analogous question with variable a suggests itself.
131. This program does a typi al \breadth- rst sear h," keeping LINK(q) = r. Regis-
ter u is the vertex urrently being examined; v is one of its su essors.
0H LDOU r,q,link 1 r LINK(q). STOU v,q,link jRj jQj LINK(q) v.
SET u,r 1 u r. STOU r,v,link jRj jQj LINK(v) r.
1H LDOU a,u,ar s jRj a ARCS(u). SET q,v j R j jQ j q v .
BZ a,4F jRj Is S [u℄ = ; ? 3H PBNZ a,2B S Loop on a.
2H LDOU v,a,tip S v TIP(a). 4H LDOU u,u,link jRj u LINK(u).
LDOU a,a,next S a NEXT(a). CMPU t,u,r jRj Is u 6= r?
LDOU t,v,link S t LINK(v). PBNZ t,1B jRj Continue if so.
PBNZ t,3F S Is v 2 R?
132. (a) We always have  (U )  &u= 2U Æu = (U ). And equality holds if and only if
2u  (u0 ) for all u 2 U and u0 2 U .
(b) We've proved that  (U )  (U ); hen e T  U . And if t 2 T we have 2t  u
for all u 2 U . Therefore (T )   (T ).
( ) Parts (a) and (b) prove that the elements of Cn represent the liques.
(d) If u  v then u & k  v & k and u & Æk  v & Æk ; so we an work entirely with
maximal entries. The following algorithm uses a he-friendly sequential (rather than
linked) allo ation, in a manner analogous to radix ex hange sort (Algorithm 5.2.2R).
We assume that w1 : : : ws is a workspa e of s unsigned words, bounded by w0 = 0
and ws+1 = 2n 1. The elements of Ck+ 1 appear initially in positions w1 : : : wm , and
our goal is to repla e them by the elements of Ck+ .
M1. [Initialize.℄ Terminate if k = 2
n 1. Otherwise set v 2k , i 1, j m.
92 ANSWERS TO EXERCISES 7.1.3
M2. [Partition on v.℄ While wi & v = 0, set i i + 1. While wj & v 6= 0, set over ow
j j 1. Then if i > j , go to M3; otherwise swap wi $ wj , set i i + 1, queen graph
maximal independent sets
j j 1, and repeat this step. eight queens problem
M3. [Split wi : : : wm .℄ Set l j , p s + 1. While i  m, do subroutine Q with Jardine
Sibson
u = wi and set i i + 1. Knodel
M4. [Combine maximal elements.℄ Set m l. While p  s, set m m + 1, Bron
Kerbos h
wm wp , and p p + 1. Tsukiyama
Subroutine Q uses global variables j , k, l, p, and v. It essentially repla es the word u Ide
Ariyoshi
by u0 = u & k and u00 = u & Æk , retaining00them if they are still maximal. If so, u0 goes Shirakawa
into the upper workspa e wp : : : ws but u stays below. Loukakis
Johnson
0
Q1. [Examine u .℄ Set w u & k and q s. If w = u, go to Q4. Yannakakis
Q2. [Is it omparable?℄ If q < p, go to Q3. Otherwise if w & wq = w , go to Q7.
Papadimitriou
omplement
Otherwise if w & wq = wq , go to Q4. Otherwise set q q 1 and repeat Q2. omplement
0
Q3. [Tentatively a ept u .℄ Set p p 1 and wp w. Memory over ow o urs saturated addition
if p  m + 1. Otherwise go to Q7.
Q4. [Prepare for loop.℄ Set r p and wp 1 0.
Q5. [Remove nonmaximals.℄ While w j wq 6= w , set q q 1. While w j wr = w,
set r r + 1. Then if q < r, go to Q6; otherwise set wq wr , wr 0,
q q 1, r r 1, and repeat this step.
Q6. [Reset p.℄ Set wq w and p q. Terminate the subroutine if w = u.
00
Q7. [Examine u .℄ Set w u & v. If w = wq for some q in the range 1  q  j ,
do nothing. Otherwise set l l + 1 and wl w.
In pra ti e this algorithm performs quite well; for example, when it is applied to the
8  8 queen graph (exer ise 7{129), it nds the 310 maximal liques after only 57283
mems of omputation, using 397 words of workspa e. It nds the 10188 maximal
independent sets of that same graph after about 26 megamems, using 15090 words;
there are respe tively (728; 6912; 2456; 92) su h sets of sizes (5; 6; 7; 8), in luding the 92
famous solutions to the eight queens problem.
Referen e: N. Jardine and R. Sibson, Mathemati al Taxonomy (Wiley, 1971), Ap-
pendix 5. Many other algorithms for listing maximal liques have also been published.
See, for example, W. Knodel, Computing 3 (1968), 239{240, 4 (1969), 75; C. Bron
and J. Kerbos h, CACM 16 (1973), 575{577; S. Tsukiyama, M. Ide, H. Ariyoshi, and
I. Shirakawa, SICOMP 6 (1977), 505{517; E. Loukakis, Computers and Math. with

Appl. 9 (1983), 583{589; D. S. Johnson, M. Yannakakis, and C. H. Papadimitriou, Inf.

Pro . Letters 27 (1988), 119{123. See also exer ise 5{23.


133. (a) An independent set is a lique of G; so omplement G. (b) A vertex over is
the omplement of an independent set; so omplement G, then omplement the outputs.
134. a 7! 00, b 7! 01, 7! 11 is the rst mapping of lass II.
135. The unary operators are simple: :(xl xr ) = x r xl ; (xlxr ) = xr xr ; (xl xr ) = xl xl .
And xl xr , yl yr = (zl _ zr )(zl ^ zr ), where zl = xl  yl and zr = xr  yr .
136. (a) Classes II, III, IVa , and IV all have the optimum ost 4. Curiously the
fun tions zl = xl _ yl _ (xr ^ yr ), zr = xr _ yr work for the mapping (a; b; ) 7! (00; 01; 11)
of lass II as well as for the mapping (a; b; ) 7! (00; 01; 1) of lass IV . [This operation
is equivalent to saturated addition, when a = 0, b = 1, and stands for \more than 1."℄
(b) The symmetry between a, b, and implies that we need only try lasses I,
IVa , and Va ; and those lasses turn out to ost 6, 7, and 8. One winner for lass I, with
7.1.3 ANSWERS TO EXERCISES 93
(a; b; ) 7! (00; 01; 10), is zl = vr ^ ul , zr = vl ^ ur , where ul = xl  yl , ur = xr  yr , SET
vl = yr  ul , and vr = yl  ur . [See exer ise 7.1.2{60, whi h gives the same answer but greedy footprint
footprints
with zl $ zr . The reason is that we have (x + y + z) mod 3 = 0 in this problem but Wunderli h
(x + y z) mod 3 = 0 in that one; and zl $ zr is equivalent to negation. The binary Ulam
operation z = x Æ y in this ase an also be hara terized by the fa t that the elements Wilson
M Cranie
(x; y; z) are all the same or all di erent; thus it is familiar to people who play the game Knuth
of SET. It is the only binary operation on n-element sets that has n! automorphisms gap
and di ers from the trivial examples x Æ y = x or x Æ y = y.℄
( ) Cost 3 is a hieved only with lass I: Let (a; b; ) 7! (00; 01; 10) and zl =
(xl _ xr ) ^ yl , zr = xr ^ yr .
137. In fa t, z = (x + 1) & y when (a; b; ) 7! (00; 01; 10). [It's a ontrived example.℄

138. The simplest ase known to the author requires the al ulation of two binary
operations, su h as 
a b b 
a b a
a b b and a b a ;
a a a
ea h has ost 2 in lass Va , but the osts are (3; 2) and (2; 3) in lasses I and II.
139. The al ulation of z2 is essentially equivalent to exer ise 136(b); so the natural
representation (111) wins. Fortunately this representation also is good for z1 , with
z1l = xl ^ yl , z1r = xr ^ yr .
140. With representation (111), rst use full binary adders to ompute (a1 a0 )2 =
xl + yl + zl and (b1 b0 )2 = xr + yr + zr in 5 + 5 = 10 steps. Now the \greedy footprint"
method shows how to ompute the four desired fun tions of (a1 ; a0 ; b1 ; b0 ) in eight
further steps: ul = a1 ^ b0 , ur = a0 ^ b1 ; t1 = a1  b0 , t2 = a0  b1 , t3 = a1  t2 ,
t4 = a0  t1 , vl = t3 ^ t1 , vr = t4 ^ t2 . [Is this method optimum?℄
141. Suppose we've omputed bits a = a0 a1 : : : a2m 1 and b = b0 b1 : : : b2m 1 su h that

as = [ s = 1 or s = 2 or s is a sum of distin t Ulam numbers  m in exa tly one way℄;


bs = [ s is a sum of distin t Ulam numbers  m in more than one way℄;
for some integer m = Un  2. For example, when m = n = 2 we have a = 0111 and
b = 0000. Then fs j s  m and as = 1g = fU1 ; : : : ; Un g; and Un+1 = minfs j s > m
and as = 1g. (Noti e that as = 1 when s = Un 1 + Un .) The following simple bitwise
operations preserve these onditions: n n + 1, m Un , and
(am : : : a2m 1 ; bm : : : b2m 1 ) ((am : : : a2m 1  a0 : : : am 1 ) & bm : : : b2m 1;
(am : : : a2m 1 & a0 : : : am 1 ) j bm : : : b2m 1 );
where as = bs = 0 for 2Un 1  s < 2Un on the right side of this assignment.
[See M. C. Wunderli h, BIT 11 (1971), 217{224; Computers in Number Theory

(1971), 249{257. These mysterious numbers, whi h were rst de ned by S. Ulam in
SIAM Review 6 (1964), 348, have baed number theorists for many years. The ratio
Un =n appears to onverge to a onstant,  13:52; for example, U20000000 = 270371127
and U40000000 = 540752349. Furthermore, D. W. Wilson has observed empiri ally that
the numbers form quasi-periodi \ lusters" whose enters di er by multiples of another
onstant,  21:6016. Cal ulations by Jud M Cranie and the author for Un < 640000000
indi ate that the largest gap Un Un 1 may o ur between U24576523 = 332250401 and
U24576524 = 332251032; the smallest gap Un Un 1 = 1 apparently o urs only when
Un 2 f2; 3; 4; 48g. Certain small gaps like 6, 11, 14, and 16 have never been observed.℄
94 ANSWERS TO EXERCISES 7.1.3
142. Algorithm E in that exer ise performs the following0 operations on sub ubes: sideways addition
(i) Count the s in a given sub ube . (ii) Given and , test if  0 . (iii) Given don't- ares
Breuer
and 0 , ompute t 0 (if it exists). Operation (i) is simple with sideways addition; Frey
let's see whi h of the nine lasses of two-bit en odings (119), (123), (124) works best MOR
for (ii) and (iii). Suppose a = 0, b = 1, = ; the symmetry between 0 and 1 means triply linked tree
traversal in preorder
that we need only examine lasses I, III, IVa , IV , Va , and V .
For the asterisks-and-bits mapping (0; 1; ) 7! (00; 01; 10), whi h belongs to
lass I, the truth table for 6 0 is 010100110 in ea h omponent. (For example,
0   and  6 1. The s in this truth table are don't- ares for the unused odes 11.)
The methods of Se tion 7.1.2 tell us that the heapest su h fun tions have ost 3;
for example,  0 if and only if ((b  b0 ) j a) & a0 = 0. Furthermore the onsensus
t 0 = 00 exists if and only if z = 1, where z = (b  b0 ) & (a  a0 ). And in that
ase, a00 = (a  b  b0 ) & (a  a0 ), b00 = (b j b0 ) & z. [The asterisk and bit odes were
used for this purpose by M. A. Breuer in Pro . ACM Nat. Conf. 23 (1968), 241{250.℄
But lass III works out better, with (0 ; 1; ) 7! (01; 10; 00). Then  0 if and only
if ( l & 0l )0j ( r & 0r )00 = 0; t 0 =00 00 exists if and only if z = 1 where z = x & y, x = l j 0l ,
y = r j r ; and l = x  z , r = y  z . We save two operations for ea h onsensus,
with respe t to lass I, ompensating for an extra step when ounting asterisks.
Classes IVa , Va , and V turn out to be far inferior. Class IV has some merit,
but lass III is best.
143. f (x) = ((x&m1 )17) j ((x17)&m1 ) j ((x&m2 )15) j ((x15)&m2 ) j ((x&m3 )
10) j ((#x  10) & m3 ) j ((x & m#4 )  6) j ((x  6) & m4 ), where m1 = # 7f7f7f7f7f7f ,
m2 = fefefefefefe , m3 = 3f3f3f3f3f3f3f , m4 = f f f f f f f . [See, for #
example, Chess Skill in Man and Ma hine , edited by Peter W. Frey (1977), page 59.
Five steps suÆ e to ompute f (x) on MMIX (four MOR operations and one OR), sin e
f (x) = q  x  q0 j q0  x  q with q = # 40a05028140a0502 and q0 = # 2010884422110804 .℄
144. Node j  (k  1), where k = j & j .
145. It names the an estor of the leaf node j j 1 at height h.
146. By (136) we want to show that (j & i) = l when l 2 < i  l  j < l + 2 .
l l
The desired result follows from (35) be ause l  i < l + 2 . l
147. (a) vj = vj = j , vj = 1  j , and j = , for 1  j  n.
(b) Suppose n = 2e1 +  +2et where e1 >    > et  0, and let nk = 2e1 +  +2ek
for 0  k  t. Then vj = j and vj = vj = nk for nk 1 < j  nk . Also  nk = vnk 1
for 1  k  t, where v0 = ; all other j = .
148. Yes, if y1 = 010000, y2 = 010100, x1 = 010101, x2 = 010110, x3 = 010111,
x3 = 010111, y2 = 010100, x2 = 011000, y1 = 010000, and x1 = 100000.
149. We assume that CHILD(v ) = SIB(v ) = PARENT(v ) =  initially for all verti es v
(in luding v = ), and that there is at least one nonnull vertex.
S1. [Make triply linked tree.℄ For ea h of the n ar s u ! v (perhaps v = ), set
SIB(u) CHILD(v ), CHILD(v ) u, PARENT(u) v . (See exer ise 2.3.3{6.)
S2. [Begin rst traversal.℄ Set p CHILD(), n 0, and  0 1.
S3. [Compute in the easy ase.℄ Set n n + 1, p n,  n , and
n 1+ (n  1). If CHILD(p) 6= , set p CHILD(p) and repeat this step;
otherwise set p n.
S4. [Compute  , bottom-up.℄ Set  p PARENT(p). Then if SIB(p) 6= , set
p SIB(p) and return to S3; otherwise set p PARENT(p).
7.1.3 ANSWERS TO EXERCISES 95
S5. [Compute in the hard ase.℄ If p 6= , set h (n & p), then p traversal in postorder
((n  h) j 1)  h, and go ba k to S4. Cartesian trees
Vuillemin
S6. [Begin se ond traversal.℄ Set p CHILD();  0 n,   0. right-to-left minimum
left-to-right minimum
S7. [Compute , top-down.℄ Set p (PARENT(p)) j ( p & p). Then if triply linked tree
CHILD(p) 6= , set p CHILD(p) and repeat this step. Gabow
Bentley
S8. [Continue to traverse.℄ If SIB(p) 6= , set p SIB(p) and go to S7. Tarjan
Otherwise set p PARENT(p), and repeat step S8 if p 6= . Fis her
Heun
150. We may assume that the elements Aj are distin t, by regarding them as ordered sum of rho
pairs (Aj ; j ). The hinted binary sear h tree, whi h is a spe ial ase of the \Cartesian
trees" introdu ed by Jean Vuillemin [ CACM 23 (1980), 229{239℄, has the property that
k(i; j ) is the nearest ommon an estor of i and j . Indeed, the an estors of any given
node j are pre isely the nodes k su h that Ak is a right-to-left minimum of A1 : : : Aj
or Ak is a left-to-right minimum of Aj : : : An .
The algorithm of the pre eding answer does the desired prepro essing, ex ept
that we need to set up a triply linked tree di erently on the nodes f0; 1; : : : ; ng. Start
as before with CHILD(v) = SIB(v) = PARENT(v) = 0 for 0  v  n, and let  = 0.
Assume that A0  Aj for 1  j  n. Set t 0 and do the following steps for v = n,
n 1, : : : , 1: Set u 0; then while Av < At set u t and t PARENT(t). If u 6= 0,
set SIB(v) SIB(u), SIB(u) 0, PARENT(u) v, CHILD(v) u; otherwise simply
set SIB(v) CHILD(t). Also set CHILD(t) v, PARENT(v) t, t v.
Continue with step S2 after the tree has been built. The running time is O(n),
be ause the operation t PARENT(t) is performed at most on e for ea h node t. [This
beautiful way to redu e the range minimum query problem to the nearest ommon
an estor problem was dis overed by H. N. Gabow, J. L. Bentley, and R. E. Tarjan,
STOC 16 (1984), 137{138, who also suggested the following exer ise.℄
151. For node v with k hildren u1 , : : : , uk , de ne the node sequen e S (v ) = v if
k = 0; S (v) = vS (u1 ) if k = 1; and S (v) = S (u1 ) v : : : vS (uk ) if k > 1. (Consequently
v appears exa tly max(k 1; 1) times in S (v).) If there are k trees in the forest, rooted at
u1 , : : : , uk , write down the node sequen e S (u1 ) : : :  S (uk ) = V1 : : : VN . (The length
of this sequen e will satisfy n  N < 2n.) Let Aj be the depth of node Vj , for 1 
j  N , where  has depth 0. (For example, onsider the forest (141), but add another
hild K ! D and an isolated node L. Then V1 : : : V15 = CFAGJDHDK  BEI  L
and A1 : : : A15 = 231342323012301.) The nearest ommon an estor of u and v, when
u = Vi and v = Vj , is then Vk(i;j ) in the range minimum query problem. [See J. Fis her
and V. Heun, Le ture Notes in Comp. S i. 4009 (2006), 36{48.℄
152. Step V1 nds the level above whi h x and y have bits that apply to both of
their an estors. (See exer ise 148.) Step V2 in reases h, if ne essary, to the level where
they have a ommon an estor, or to the top level n if they don't (namely if k = 0).
If x 6= z, step V4 nds the topmost level among x's an estors that leads to level h;
hen e it knows the lowest an estor x^ for whi h x^ = z (or x^ = ). Finally in V5,
preorder tells us whi h of x^ or y^ is an an estor of the other.
153. That pointer has j bits, so it ends after 1 +  2 +    + j = j j bits of the
pa ked string, by (61). [Here j is even. Navigation piles were introdu ed in Nordi

Journal of Computing 10 (2003), 238{262.℄


Æ Æ Æ
154. The gray lines de ne 36 -36 -90 triangles, ten of whi h make a pentagon with
Æ
72 angles at ea h vertex. These pentagons tile the hyperboli plane in su h a way
that ve of them meet at ea h vertex.
96 ANSWERS TO EXERCISES 7.1.3
155. Observe rst that 0  ( 0)1= <  1 +  3 +  5 +    = 1, sin e there are no broadword hain
onse utive 1s. Observe next that F n    n (modulo 1), by exer ise 1.2.8{11. Now 2-adi hain
monus
add Fk1 +  +Fkr . For example, (4) mod 1 =  5 + 2 ; ( 2) mod 1 =  4 + 1 . magi mask
This argument also proves the interesting formula bN ( ) = N ( 0). odd Fibona i number system
2-adi hain
156. (a) Start with y 0, and with k large enough that jxj < Fk+1 . If x < 0, set magi mask
k (k 1) j 1, and while x + Fk > 0 set k k 2; then set y y + (1  k), string reversals
x x + Fk+1 ; repeat. Otherwise if x > 1, set k k & 2, and while x Fk  0 set
k k 2; then set y y + (1  k), x x Fk+1 ; repeat. Otherwise set y y + x
and terminate with y = ( )2 .
(b) The operations x1 a1 , y1 a1 , xk yk 1 + ak , yk xk 1 xk
ompute xk = N (a1 : : : ak ) and yk = N (a1 : : : ak 0). [Does every broadword hain for
N (a1 : : : an ) require
(n) steps?℄
157. The laws are obvious ex ept for the two ases involving ( ). For those we have
N (( )0k ) = N ( 0k ) + F k 2 for all k  0, be ause de rementation never\borrows"
at the right. (But the analogous formula N (( +)0k )= N ( 0k )+ F k 1 does not hold.)
158. In rementation satis es the rules ( 00)+ = 01, ( 10)+ = ( +)00, ( 1)+ =
( +)0. It an be a hieved with six 2-adi operations on the integer x = ( )2 by setting
y x j (x  1), z y & (y + 1), x (x j z ) + 1.
De rementation of a nonzero odeword is more diÆ ult. It satis es ( 102k ) =
0(10)k , ( 102k+1 ) = (01)k+1 ; hen e by Corollary I it annot be omputed by a
2-adi hain. Yet six operations suÆ e, if we allow monus: y x 1, z y & x,
w z & 0 , x y w + (w . (z w)).
159. Besides the Fibona i number system (146) and the negaFibona i number sys-
tem (147), there's also an odd Fibona i number system : Every positive integer x
an

be written uniquely in the form

x = F l1 + F l2 +    + F ls ;
where l1  l2      ls > 0 and ls is odd.
Given a negaFibona i ode , the following 20-step 2-adi hain onverts x = ( )2 to
y = ( )2 to z = ( )2 , where is the odd odeword with N ( ) = F ( ) and is the
standard odeword with F ( ) = F ( 0): x+ x & 0 , x x  x+ ; d x+ x ;
t d j x , t t & (t  1); y (d & 0 )  t  ((t & x )  1); z (y + 1)  1;
w z  (40 ); t w & (w+1); z z  (t & (z  ((w+1)  1))).
Corresponding negaFibona i and odd representations satisfy the remarkable law
Fk1 +m +    + Fkr +m = ( 1)m (F l1 m +    + F ls m ); for all integers m.
For example, if N ( ) < 0 the steps above will onvert x = ( 0)2 to y = ( )2, where
F ((  2)0) = N ( ). Furthermore is the odd ode for negaFibona i if and only
if R is the odd ode for negaFibona i R , when j j = j j is odd and N ( ) > 0.
No nite 2-adi hain will go the other way, by Corollary I, be ause the Fibona i
ode 10k orresponds to negaFibona i 10k+1 when k is odd, (10)k=21 when k is even.
But if is a standard Fibona i odeword we an ompute y = ( )2 from z = ( )2 by
setting y z  1, t y & (y 1) & 0 , y y t + [ t 6= 0℄((t 1) & 0 ). And then
the method above will ompute R from R . The overall running time for onversion
to negaFibona i form will then be of order log j j, for two string reversals.
160. The text's rules are a tually in omplete: They should also de ne the orientation
of ea h neighbor. Let us stipulate that sn = ; en = ; ( 0)wn = 0, ( 1)wo = 1;
( 00)ns = 00, ( 10)nw = 10, ( 1)ne = 1; ( 0)oo = 0, ( 101)oo = 101,
7.1.3 ANSWERS TO EXERCISES 97
( 1001)oo = 1001, ( 0001)ow = 0001. Then a ase analysis proves that all ells bipartite
within d steps of the starting ell have a onsistent labeling and orientation, by indu - ylinder
hyperboli plane
tion on the graph distan e d. (Note the identity + = (( 0) )  1.) Furthermore the upper halfplane
labeling remains onsistent when we atta h y oordinates and move when ne essary re e tion
from one strip to another via the Æ-rules of (153). breadth- rst sear h
S hla i
161. Yes, it is bipartite, be ause all of its edges are de ned by the set of boundary
lines. (The hyperboli ylinder annot be bi olored; but two adja ent strips an.) B0
162. It's onvenient to view the hyperboli plane through another lens,
by mapping its points to the upper halfplane =z > 0. Then the \straight C 0 A
36

45

lines" be ome semi ir les entered on the x-axis, together with verti al p 90 90

halflines as a limiting ase. In this representation, the edges jzp 1j = 2,


90

jzj = r, and <z = 0 de ne a 36Æ-45Æ -90Æ 0triangle0 if r2 =  + 0 . Every B 36


36 45
45

C
triangle ABC has three neighbors CBA , ACB , and BAC , obtained
36 45

by \re e ting"
0 0
two of its edges about the third, where1 the re e tion of2 0 0 A0 90

jz j = r about jz j = r is jz 2 (x1 + x2 )j = 2 jx1 x2 j, xj = r =(  r ).


1

The mapping z 7! (z z0 )=(z z0 ) takes the upper halfplane into the unit ir le;
when z0 = 12 (p 1=)(1 + 51=4i) the entral pentagon will be symmetri . Repeated
re e tions of the initial triangle, using breadth- rst sear h until rea hing triangles that
are invisible, will lead to Fig. 14. To get just the pentagons (without the gray lines),
one an begin with just the entral ell and perform re e tions about its edges, et .
163. (This gure an be drawn as in exer ise 162, starting with verti es that proje t to
the three points ir, ir!, and ir!2 , where r2 = 21 (1+ p2)(4 p2 p6) and ! = e2i=3 .
Using a notation devised by L. S hla i in 1852, it an be des ribed as the in nite tiling
with parameters f3; 8g, meaning that eight triangles meet at every vertex; see S hla i's
Gesammelte Mathematis he Abhandlungen 1 (1950), 212. Similarly, the pentagrid and
the tiling of exer ise 154 have S hla i symbols f5; 4g and f5; 5g, respe tively.)
164. The original de nition requires more omputation, even though it an be fa tored:

uster0 (X ) = X & (YN & Y & YS ); Y = XW & X & XE :


But the main reason for preferring (157) is that it produ es a thinner, kingwise on-
ne ted border. The rookwise onne ted border that results from the 1957 de nition is
less attra tive, be ause it's noti eably darker when the border travels diagonally than
when it travels horizontally or verti ally. (Try some experiments and you'll see.)
165. The rst image X is the \outer" border of the original bla k pixels. Fingerprint-
(1)

like whorls are formed thereafter. For example, starting with Fig. 15(a) we get

; ; ; :::; ; ;
in a 120  120 bitmap, eventually alternating endlessly between two bizarre patterns.
(Does every nonempty M  N bitmap lead to su h a 2- y le?)
166. If X = uster(X ), the sum of the elements of X +(X 1)+(X 1)+(X 1)+(X 1)



is at most 4MN + 2M + 2N , sin e it is at most 4 in ea h ell of the re tangle and at


most 1 in the adja ent ells. This sum is also ve times the number of bla k pixels.
Hen e f (M; N )  45 MN + 25 M + 52 N . Conversely we get f (M; N )  45 MN 52 by
98 ANSWERS TO EXERCISES 7.1.3
letting the pixel in row i and olumn j be bla k unless (i + 2j ) mod 5 = 2. (This dominating set
problem is equivalent to nding a minimum dominating set of the M  N grid.) grid
half adder
167. (a) With 17 steps we an onstru t a half adder and three full adders (see 7.1.2{ full adders
(23)) so that (z1 z2 )2 = xNW + xW + xSW , (z3 z4 )2 = xN + xS , (z5 z6 )2 = xNE + xE + xSE , Mann
Sleator
and (z7 z8 )2 = z2 + z4 + z6 . Then f = S1 (z1 ; z3 ; z5 ; z7 ) ^ (x _ z8 ), where the symmetri Turing ma hine
fun tion f1 needs seven operations by Fig. 9 in Se tion 7.1.2. [This solution is based Gardner
on ideas of W. F. Mann and D. Sleator.℄ Berlekamp
Conway
(b) Given x = Xj(t)1, x = Xj(t), and x+ = Xj(+1 t) , ompute a x & x+ (= z3 ), Guy
b x  x (= z4 ), x  b, d  1 (= z6 ),  1 (= z2 ), e  d, & d,
+ CONWAY
Gardner
f b & e, f f j (= z7 ), e b  e (= z8 ), x & b, j a, b  1 (= z5 ), lean
 1 (= z1 ), d b & , b j , b a & f , f a j f , f d j f , b j ,
f f  (= S1 (z1 ; z3 ; z5 ; z7 )), e e j x, f f & e.
[For ex ellent summaries of the joys and passions of Life, in luding a proof that
any Turing ma hine an be simulated, see Martin Gardner, Wheels, Life and Other

Mathemati al Amusements (1983), Chapters 20{22; E. R. Berlekamp, J. H. Conway,


and R. K. Guy, Winning Ways 4 (A. K. Peters, 2004), Chapter 25.℄
At last I've got what I wanted | an apparently unpredi table law of geneti s.
. . . Overpopulation, like underpopulation, tends to kill.
A healthy so iety is neither too dense nor too sparse.
| JOHN H. CONWAY, letter to Martin Gardner (Mar h 1970)
168. The following algorithm, whi h uses four n-bit registers x , x, x+ , and y, works
properly even when M = 1 or N(t) = 1. (Itt+1)needs only about two reads and two writes
per raster word to transform X to X in (158):
0
C1. [Loop on k .℄ Do step C2 for k = 1, 2, : : : , N ; then go to C5.
C2. [Loop on j .℄ Set x A(M 1)k , x A0k , and AMk x+ . Then perform
+

steps C3 and C4 for j = 0, 1, : : : , M 1.


C3. [Move down.℄ Set x x, x x+ , and x+ A(j +1)k . (Now x = Ajk , and
x holds the former value of A(j 1)k .) Compute the bitwise fun tion values
y f (x  1; x ; x  1; x  1; x; x  1; x+  1; x+ ; x+  1).
C4. [Update Ajk .℄ Set x Aj (k 1) & 2, y y & (2n 1 1), Aj (k 1)
x + (y  (n 2)), Ajk y + (x  (n 2)).
C5. [Wrap around.℄ For 0  j < M , set x AjN 0 & 2n 1 d , AjN 0 x+
(Aj1  d), and Aj1 Aj1 + (x  d), where d = 1+ (N 1) mod (n 2).
[An M  N torus is equivalent to an (M 1)  (N 1) array surrounded by zeros,
in many ases like (157) and (159) and even (161). For exer ise 173 we an lean an
(M 2)  (N 2) array that is bordered by two rows and olumns of zeros. But Life
images (exer ise 167) an grow without bound; they an't safely be on ned to a torus.℄
169. It qui kly morphs into a rabbit, whi h pro eeds to explode. Beginning at time
278, all a tivity stabilizes to a two- y le formed from a set of traÆ lights and three
additional blinkers, together with three still lifes (tub, boat, and bee hive).
170. If M  2 and N  2, the rst step blanks out the top row and the rightmost
olumn. Then if M  3 and N  3, the next step blanks out the bottom row and the
leftmost olumn. So in general we're left after t = min(M; N ) 1 steps with a single
row or olumn of bla k pixels: The rst dt=2e rows, the last dt=2e olumns, the last
bt=2 rows, and the rst bt=2 olumns have been set to zero. The automaton will stop
after making two more (nonprodu tive) y les.
7.1.3 ANSWERS TO EXERCISES 99
171.Without (160): x1 xSE & xN , x2 xN & xSE , x3 xE & x1 , x4 xNE & x2 , dual
x5 x3 j x4 , x6 xW & x5 , x7 x1 & xNE , x8 x7 & xNW , x9 xE j xSW , Fu hs
Van Wyk
x10 x8 & x9 , x11 x10 j x6 , x12 xS & x11 , x13 x2 & xE , x14 x13 & xW , Knuth
x15 xN & xNE , x16 xSW & xW , x17 x15 j x16 , x18 xNE & xSW , x19 x17 & x18 , Malgouyres
x20 xE j xSE , x21 x20 j xS , x22 xNW & x21 , x23 x22 & x19 , x24 x12 j x14 ,
g x23 j x24 . With (160), set x4 xNE & xN and leave everything else the same.
172. The statement isn't quite true; onsider the following examples:

The `I' and `H' at the left show that pixels are sometimes left inta t where paths join,
and that rotating by 90Æ an make a di eren e. The next two examples illustrate
a quirky in uen e of left-right re e tion. The diamond example demonstrates that
very thi k images an be unthinnable; none of its bla k pixels an be removed without
hanging the number of holes. The nal examples, one of whi h was inspired by the
answer to exer ise 166, were pro essed rst without (160), in whi h ase they are
un hanged by the transformation. But with (160) they're thinned dramati ally.
173. (a) If X and Y are losed, X & Y is losed; if X and Y are open, X j Y is
open. The hinted statement follows. Furthermore X DD = X D , be ause X D is losed;
similarly X = X . (In fa t we have X = (XDL)D , be ause
LL L L the de nitions are dual,
obtained byL swapping bla k with white.) Now X  X D , so X DLD  X DD = X D .
Dually, X  X LDL . We on lude that there's no reason to launder a lean pi ture:
X DLDL = (X DLD )L  X DL  (X D )LDL = X DLDL .
(b) We have X D = (X j XW j XNW j XN )&(X j XN j XNE j XE )&(X j XE j XSE j XS )&
(X j XS j XSW j XW ). Furthermore, in analogy with answer 167(b), this fun tion an be
omputed from x , x, and x+ in ten broadword steps: f x j (x  1) j ((x j (x  1))&
(x+ j (x+  1))), f f & (f  1). [This answer in orporates ideas of D. R. Fu hs.℄
To get X L , just inter hange j and &. For further dis ussion, see C. Van Wyk
and D. E. Knuth, Report STAN-CS-79-707 (Stanford Univ., 1979), 15{36.℄
174. Three-dimensional digital topology has been studied by R. Malgouyres, Theoret-

i al Computer S ien e 186 (1997), 1{41.


175. There are 25 in the outline, 2+ 3 in the eyes, 1+1 in the ears, 4 in the nose, and
1 in the smile, totalling 37. (All white pixels are onne ted kingwise to the ba kground.)
176. (a) If v isn't isolated, there are eight easy ases to onsider, depending on what
kind of neighbor v has in G.0 0
(b) There's a vertex w 2 G adja ent to ea h vertex of0 N0u [ Nv . (Four ases.)
( ) Yes. In0 fa t, by de nition (161), we always have jS (v )j  2.
(d) Let Nv0 = fv j v0 2 Nv g. If v0 is the east neighbor of u0 , all it u0E , either
u0 2 G or u0S 2 G; this element is adja ent to every vertex of Nu0 0 [ Nv0 0 . A similar
argument
0
applies
0
when v0 = u0N0. If v0 = u0NE0 , there's no problem
0
if u0 2 G. Otherwise
0
uW 2 G, uS 2 G, and either uN 2 G or uE 2 G; hen e Nu0 [ Nv0 is onne ted in G.
Finally if v0 = u0NE , the proof is easy if u0S 2 G; otherwise u0 2 G0 and v0 2 G. 0
(e) Given a nontrivial omponent C of G, with v 2 C and v 2 S (v), let C be the
omponent of G0 that 0 ontains v0 . This omponent C 0 is well de ned, by (a) and (b).
Given a omponent C of G0 , with v0 2 C 0 and v 2 S 0 (v0 ), let C be the omponent of
G that ontains v. This omponent C is nontrivial and well de ned, by ( ) and (d).
Finally, the orresponden e C $ C 0 is one-to-one.
100 ANSWERS TO EXERCISES 7.1.3
177. Now the verti es of G are the white pixels, adja ent when they are rook -neighbors. runlength en oded
So we de ne N(i;j) = f(i; j ); (i 1; j ); (i; j +1)g. Arguments like those of answer 176, runlength en oding, see also edge transition
Cheshire at
but simpler, establish a one-to-one orresponden e between the nontrivial omponents triply linked tree
of G and the omponents of G0 . ir ular list
double order

178. Observe that in adja ent rows of X , two pixels of the same value are kingwise ruler fun tion
neighbors only if they are rookwise onne ted.
179. The pixels of ea h row x1 : : : xN an be \runlength en oded" as a sequen e of
integers 0 = 0 < 1 <    < 2m+1 = N +2 so that xj = 0 for j 2 [ 0 : : 1 ) [ [ 2 : : 3 ) [
   [ [ 2m : : 2m+1 ) and xj = 1 for j 2 [ 1 : : 2 ) [    [ [ 2m 1 : : 2m ). (The number of
runs per row tends to be reasonably small in most images. Noti e that the ba kground
ondition x0 = xN +1 = 0 is impli itly assumed.)
The algorithm below uses a modi ed en oding with aj = 2 j (j mod 2) for
0  j  2m +1. For example, the se ond row of the Cheshire at has ( 1 ; 2 ; 3 ; 4 ; 5 ) =
(5; 8; 23; 25; 32); we will use (a1 ; a2 ; a3 ; a4 ; a5 ) = (9; 16; 45; 50; 63) instead. The reason is
that white runs of adja ent rows are rookwise adja ent if and only if the orresponding
intervals [ aj : : aj+1 ) and [ bk : : bk+1 ) overlap, and exa tly the same ondition hara -
terizes when bla k runs of adja ent rows are kingwise adja ent. Thus the modi ed
en oding ni ely uni es both ases (see exer ise 178).
We onstru t a triply linked tree of omponents, where ea h node has several
elds: CHILD, SIB, and PARENT (tree links); DORMANT (a ir ular list of all hildren that
aren't onne ted to the urrent row); HEIR (a node that has absorbed this one); ROW and
COL (lo ation of the rst pixel); and AREA (the total number of pixels in the omponent).
The algorithm traverses the tree in double order (see exer ise 2.3.1{18), using
pairs of pointers (P; P0 ), where P0 = P when P is traversed the rst time, P0 = PARENT(P)
when P is traversed the se ond0 time. The su essor of (P; P0 )0 is (Q; Q0 ) = next(P; P0 ),
determined as0 follows: If P = P and CHILD(P) 6= , then Q Q CHILD(P); otherwise
Q P and Q PARENT(Q). If P 6= P0 and SIB(P) 6= , then Q Q0 SIB(P);
otherwise Q PARENT(P) and Q PARENT(Q). 0
When there are m bla k runs, the tree will have m +1 nodes, not ounting nodes
that are dormant or have been0 absorbed. Moreover, the primed pointers P01 , : : : , P02m+1
of the double traversal (P1 ; P1 ), : : : , (P2m+1 ; P02m+1 ) are pre isely the omponents of
the0 urrent row, in left-to-right order. For example, in (163) we have m = 5; and
(P1 ; : : : ; P011 ) point respe tively to 0 , B , 1 , B , 0 , C , 0 , A , 2 , A , 0 .
I1. [Initialize.℄ Set t 1, ROOT LOC(NODE(0)), CHILD(ROOT) SIB(ROOT)
PARENT(ROOT) DORMANT(ROOT) HEIR(ROOT) ; also ROW(ROOT)
COL(ROOT) 0, AREA(ROOT) N + 2, s 0, a0 b0 0, a1 2N + 3.
I2. [Input a new row.℄ Terminate if s > M . Otherwise set bk ak for k = 1, 2,
: : : , until bk = 2N +3; then set bk+1 bk as a \stopper." Set s s+1. If s >
M, set a1 2N + 3; otherwise let a1 , : : : , a2m+1 be the modi ed runlength
en oding of row s as dis ussed above. (This en oding an be obtained with
the help of the  fun tion; see (43).) Set j k 1 and P P0 ROOT.
I3. [Gobble up short b's.℄ If bk+1  aj , go to I9. Otherwise set (Q; Q )
0
0 0 0
next(P; P ), (R; R ) 0 next(Q; Q0 ), and do a four-way bran h to (I4; I5; I6; I7)
a ording as 2[ Q 6= Q ℄ + [ R 6= R ℄ = (0; 1; 2; 3).
0 0 0
I4. [Case 0.℄ (Now Q = Q is a hild of P , and R = R is the rst hild of Q . Node Q
0
0
will remain a hild of P , but it will be pre eded by any hildren of R.) Absorb
R into P0 (see below). Set CHILD(Q) SIB(R) and Q0 CHILD(R). If Q0 6= ,
7.1.3 ANSWERS TO EXERCISES 101
set R Q0 , and while R 6=  set PARENT(R) P0 , R0 R, R SIB(R); then Lutz
SIB(R) Q Q , Q0 . Set CHILD(P) Q if P = P0 , SIB(P) Q if P 6= P0 . Go to I8.
I5. [Case 1.℄ (Now omponent Q = R is surrounded by P = R .) If P = P , set
0 0 0
CHILD(P) SIB(Q); otherwise set SIB(P) SIB(Q). Set R DORMANT(R0 ).
0
Then if R = , set DORMANT(R ) SIB(Q) Q; otherwise SIB(Q) SIB(R)
and SIB(R) Q. Go to I8.
0 0
I6. [Case 2.℄ (Now Q is the parent of both P and R. Either P = P is hildless, or
0
P is the last hild of P .) Absorb R into P (see below). Set SIB(P0 )
0 0 SIB(R)
and R CHILD(R). If P = P0 , set CHILD(P) R; otherwise SIB(P) R.
While R 6= , set PARENT(R) P0 and R SIB(R). Go to I8.
0 0
I7. [Case 3.℄ (Node P = Q is the last hild of Q = R, whi h is a hild of R .)
0
0 0 0
Absorb P into R (see below). If P = P , set P R0. Otherwise set P 0
CHILD(P0 ), and while P0 6=  set PARENT(P0 ) R , P0 SIB(P0 ); also
set SIB(P) SIB(Q0 ) and SIB(Q0 ) CHILD(Q). If Q = CHILD(R), set
CHILD(R) . Otherwise set R CHILD(R) , then R SIB(R) until
SIB(R) = Q, then SIB(R) . Finally set P0 R0 .
I8. [Advan e k .℄ Set k k + 2 and return to step I3.
I9. [Update the area.℄ Set AREA(P )
0 AREA(P0 ) + daj =2e daj 1 =2e. Then go
ba k to I2 if aj = 2N + 3.
I10. [Gobble up short a.℄ If aj +1  bk , go to I11. Otherwise set Q LOC(NODE(t))
and t t + 1. Set PARENT(Q) P0 , DORMANT(Q) HEIR(Q) ;0 also
ROW(Q) s, COL(Q) daj =2e, AREA(Q) daj +1 =2e daj =2e. If P = P , set
SIB(Q) CHILD(P) and CHILD(P) Q; otherwise set SIB(Q) SIB(P) and
SIB(P) Q. Finally set P Q, j j + 2, and return to I3.
I11. [Move on.℄ Set j j + 1, k k + 1, (P; P0 ) next(P; P0 ), and go to I3.
To \absorb P into Q" means to do the following things: If (ROW(P); COL(P)) is less
than (ROW(Q); COL(Q)), set (ROW(Q); COL(Q)) (ROW(P); COL(P)). Set AREA(Q)
AREA(P) + AREA(Q). If DORMANT(Q) = , set DORMANT(Q) DORMANT(P) ; otherwise if
DORMANT(P) 6= , swap SIB(DORMANT(P)) $ SIB(DORMANT(Q)) . Finally, set HEIR(P)
Q. (The HEIR links ould be used on a se ond pass to identify the nal omponent of
ea h pixel. Noti e that the PARENT links of dormant nodes are not kept up to date.)
[A similar algorithm was given by R. K. Lutz in Comp. J. 23 (1980), 262{269.℄
180. Let F (x; y ) = x y + 13 and Q(x; y) = F (x 2 2) = x y2 x + yp+ 13.
2 2 1;y 1 2

Apply Algorithm T to digitize the hyperbola from (; ) = ( 6; 7) to ( ; 0 ) = (0; 13); 0


hen e x = 6, y = 7, x0 = 0, y0 = 4. The resulting edges are ( 6; 7) ( 5; 7)
( 5; 6) ( 4; 6) ( 4; 5) ( 3; 5) ( 3; 4)    (0; 4). Then apply it again
with  = 0,  = p13, 0 = 6, 0 = 7, x = 0, y = 4, x0 = 6, y0 = 7; the same edges are
found (in reverse order), but with negated x oordinates.
181. Subdivide at points (;  ) where Fx (;  ) = 0 or Fy (;  ) = 0, namely at the real
roots of 1fQ( (b + d)=(2a);  + 12 ) = 0;  = (b + d1)=(2a) 21 g or the real roots of
fQ( + 2 ; (b + e)=(2 )) = 0;  = (b + e)=(2 ) 2 g, if they exist.
182. By indu tion on jx
0 xj + jy0 yj. Consider, for example, the ase x0 < x
and 1y0 > y. We1 know from (iii) that (; ) lies in the box x 21   < x + 12 and
y 2   < y + 2 , and from (ii) that the urve travels monotoni ally as it moves from
(; ) to 1(0 ; 0 ).1 It must therefore exit the box at the edge (x 21 ; y 12 ) 1(x 211; y + 12 )
or (x 2 ; y + 2 ) (x + 2 ; y + 2 ). The latter holds if and0 only if F (x 2 ; y1 + 2 ) 1< 0,
1 1
be ause the urve an't interse t that edge twi e when x < x. And F (x 2 ; y + 2 ) is
102 ANSWERS TO EXERCISES 7.1.3
the value Q(x; y + 1) that is tested in step T3, be ause of the initialization in step T1. ellipse
(We assume that the urve doesn't go exa tly through (x 21 ; y + 12 ), by impli itly igar-shaped urve
parabola
adding a tiny positive amount to the fun tion F behind the s enes.) Hyperbolas
183. Consider, for example, the ellipse de ned by F (x 2 ) = Q(x; y ) = 13x +
1;y 1 2 urvature
2 Sto kton
7xy + y 2 = 0; this ellipse is a igar-shaped urve that extends roughly between
2
Bresenham
( 2; 5) and (1; 6). Suppose we want to digitize its upper right boundary. Hypotheses
(i){(iv) of Algorithm T hold with
r r r r
 = 8 1;  = 98 1 ; 0 = 98 1 ; 0 = 104 1 ;
3 2 3 2 39 2 3 2
0 0
x = 1, y = 6, x = 2, y = 5. Step T1 sets Q Q(1; 5) = 1, whi h auses step T4
to move left (L); in fa t,3 the4 resulting path is L3 U11 , while the orre t digitization
a ording to (164) is U LU LU3 LU. Failure o urredp be ause Q(x; y) = 0 has two
roots on the edge (1; 5) (2; 5), namely ((35  29)=26; 5), ausing Q(1; 5)
to have the same sign as Q(2; 5). (One of those roots is on the boundary we are not
trying to draw, but it's still there.) Similar failure o urs with the parabola de ned
by Q(x; y) = 9x0 2 + 6xy +0 y2 y = 0,  = 5=12,  = 1=4, 0 = 2 5=2, 0 = 2 19=2,
x = 0, y = 0, x = 2, y = 9. Hyperbolas an fail too ( onsider 6x + 5xy + y = 1).
Algorithms for dis rete geometry are notoriously deli ate; unusual ases tend to
drive them berserk. Algorithm T works properly for portions of any ellipse or parabola
whose maximum urvature is less than 2. The maximum urvature of an ellipse with
semiaxes  is = 2 ; the igar-shaped example has maximum urvature  42:5.
The maximum urvature of the parabola y = x2 is =2; the anomalous parabola above
has maximum urvature  5:27. \Reasonable" oni s don't make su h sharp turns.
To make Algorithm T work orre tly without hypothesis (v), we need to slow it
down a bit, by hanging the tests `Q < 0' to `Q < 0 or X', where X is a test on the
sign of a derivative. Namely, X is respe tively `S > ', `R > a', `R < a', `S < ', in
steps T2, T3, T4, T5.
0
184. Let Q (x; y ) = 1 Q(x; y). The key point is that Q(x; y) < 0 if and only if
0
Q (x; y)  0. (Curiously the algorithm makes the same de isions, ba kwards, although
it probes the values of Q0 and Q in di erent pla es.)
185. Find a positive integer h so that d = ( 0 )h and e = ( 0  )h are integers and
d + e is even. Then arry out Algorithm T with x = b + 21 , y = b + 21 , x0 = b 0 + 12 ,
y0 = b0 + 21 , and Q(x; y) = d(x 21 ) + e(y 12 ) + f , where
f = b(0   0 )h [ d > 0 and (0   0 )h is an integer℄:
(The `d > 0' term ensures that the opposite straight line, from (0 ; 0 ) ba k to (; ), will
have pre isely the same edges; see exer ise 183.) Steps T1 and T6{T9 be ome mu h
simpler than they were in the general ase, be ause R = d and S = e are onstant.
(F. G. Sto kton [ CACM 6 (1963), 161, 450℄ and J. E. Bresenham [ IBM Systems

Journal 4 (1965), 25{30℄ gave similar algorithms, but with diagonal edges permitted.)
186. (a) B () = z0 + 2(z1 z0 ) + O(2 ); B (1 ) = z2 2(z2 z1 ) + O(2 ).
(b) Every point of S (z0 ; z1 ; z2 ) is2 a onvex ombination of z0 , z1 , and z2 .
( ) Obviously true, sin e (1 t) + 2(1 t)t + t2 = 1.
(d) The ollinear ondition follows from (b). Otherwise, by ( ), we need only
onsider the ase z0 = 0 and z2 2z1 = 1, where z1 = x1 + iy1 and y1 6= 0. In that
ase all points lie on the parabola 42 x = (y=y1)2 + 4yx1 =y1.
(e) Note that B(u) = (1 u) z +2u(1 u)((1 )z0 + z1 )+ u2 B() for 0  u  1.
7.1.3 ANSWERS TO EXERCISES 103
[S. N. Bernshten introdu ed Bn (z0 ; z1 ; : : : ; zn ; t) = P n
(1 t)n k tk zk in Bernshten
k k

Soobsh henia Khar'kovskoe matemati heskoe obsh hestvo (2) 13 (1912), 1{2.℄ xed-point
Kaasila
187. We an assume that z0 = (x0 ; y0 ), z1 = (x1 ; y1 ), and z2 = (x2 ; y2 ), where the squines
oordinates are (say) xed-point numbers represented as 16-bit integers divided by 32. TrueType
Bezier
If z0 , z1 , and z2 are ollinear, use the method of exer ise 185 to draw a straight Knuth
line from z0 to z2 . (If z1 doesn't lie between z0 and z2 , the other edges will an el out, METAFONT
six-register algorithm
be ause edges are impli itly XORed by a lling algorithm.) This ase o urs if and Hobby
only if D = x0 y1 + x1 y2 + x2 y0 x1 y0 x2 y1 x0 y2 = 0. Pratt
oni splines
Otherwise the points (x; y) of S (z0 ; z1 ; z2 ) satisfy F (x; y) = 0, where ellipti al
hyperboli
F (x; y) = ((x x0 )(y2 2y1 + y0 ) (y y0 )(x2 2x1 + x0 ))2 magi
4D((x1 x0 )(y y0 ) (y1 y0 )(x x0 )) MOR
MUX
Guibas
and D is de ned above. We multiply by 324 to obtain integer oeÆ ients; then negate Stol
this formula and subtra t 1, if D < 0, to satisfy ondition (iv) of Algorithm T and the lower bounds
redundant representation
reverse-order ondition. (See exer ise 184.) big-endian
The monotoni ity ondition (ii) holds if and only if (x1 x0 )(x2 x1 ) > 0 and
(y1 y0 )(y2 y1 ) > 0. If ne essary, we an use the re urren e of exer ise 186(e)
to break S (z0 ; z1 ; z2 ) into at most three monotoni subsquines; for example, setting
 = (x0 x1 )=(x0 2x1 + x2 ) will a hieve monotoni ity in x. (A slight rounding error
may o ur during this xed point arithmeti , but the re urren e an be performed in
su h a way that the subsquines are de nitely monotoni .)
Notes: When z0 , z1 , and z2 are near ea h other, a simpler and faster method based
on exer ise 186(e) with  = 12 is adequate for most pra ti al purposes, if one doesn't
are about making the exa tly orre t hoi e between lo al edge sequen es like \up-
then-left" versus \left-then-up." In the late 1980s, Sampo Kaasila hose to use squines
as the basi method of shape spe i ation in the TrueType font format, be ause they
an be digitized so rapidly. The hijklmnj system a hieves greater exibility with
ubi Bezier splines [see D. E. Knuth, 89:;<=>: : The Program (Addison{Wesley,
1986)℄, but at the ost of extra pro essing time. A fairly fast \six-register algorithm"
for the resulting ubi urves was, however, developed subsequently by John Hobby
[ACM Trans. on Graphi s 9 (1990), 262{277℄. Vaughan Pratt introdu ed oni splines,
whi h are sort of midway between squines and Bezier ubi s, in Computer Graphi s

9, 3 (July 1985), 151{159. Coni spline segments an be ellipti al and hyperboli as


well as paraboli , hen e they require fewer intermediate points and ontrol points than
squines; furthermore, they an be handled by Algorithm T.
188. If the rows of the bitmap are (X0 ; X1 ; : : : ; X63 ), do the following operations for
k = 0, 1, : : : , 5: For all i su h that 0  i < 64 and i & 2k = 0, let j = i + 2k and either
(a) set t (Xi  (Xj  2k )) & 6;k , Xi kXi  t, Xj Xj  (t  2kk); or (b) set
t Xi & 6;k , u Xj & 6;k , Xi ((Xi  2 )& 6;k ) j u, Xj ((Xj  2 )& 6;k ) j t.
[The basi idea is to transform 2k  2k submatri es for in reasing k, as in exer ise
5{12. Speedups are possible with MMIX, using MOR and MUX as in exer ise 208, and using
LDTU/STTU when k = 5. See L. J. Guibas and J. Stol , ACM Transa tions on Graphi s

1 (1982), 204{207. In identally, Theorem P and answer 54 show that


(n log n)
operations on n-bit numbers are needed to transpose an nn bit matrix. An appli ation
that needs frequent transpositions might therefore be better o using a redundant
representation, maintaining its matri es in both normal and transposed form.℄
189. The following big-endian program assumes that n  74880.
104 ANSWERS TO EXERCISES 7.1.3
LOC Data Segment LDO k,Initk Hunt
BITMAP LOC +M*N/8 0H SET s,N/64 DVIPAGE
base GREG  1H SET a,h A tri k (see below) Knuth
tri k
GRAYMAP LOC +M*N/64 SET r,8 MOR
GTAB BYTE 255,252,249,246,243 2H LDOU t,base,k pigeonhole prin iple
BYTE 240,236,233,230,227 MOR u, 1,t
BYTE 224,221,217,214,211 SUBU t,t,u (Nypwise sums)
BYTE 208,204,201,198,194 MOR u, 2,t
BYTE 191,188,184,181,178 AND t,t,mu1
BYTE 174,171,167,164,160 ADDU t,t,u (Nybblewise sums)
BYTE 157,153,150,146,142 MOR u, 3,t
BYTE 139,135,131,128,124 AND t,t,mu2
BYTE 120,116,112,108,104 ADDU t,t,u (Bytewise sums)
BYTE 100,96,92,88,84 ADDU a,a,t
BYTE 79,75,70,66,61 INCL k,N/8 Move to next row.
BYTE 56,52,46,41,36 SUB r,r,1
BYTE 30,24,18,10,0 PBNZ r,2B Repeat 8 times.
Initk OCTA BITMAP-GRAYMAP 3H SRU t,a,56
orr GREG N-8 LDBU t,gtab,t
1 GREG #4000100004000100 SLU a,a,8
2 GREG #2010000002010000 STBU t,z,0
3 GREG #0804020100000000 INCL z,1
mu1 GREG #3333333333333333 PBN a,3B (The tri k)
mu2 GREG #0f0f0f0f0f0f0f0f SUB k,k, orr
h GREG #8080808080808080 SUB s,s,1
gtab GREG GTAB-#80 PBNZ s,1B Loop on olumns.
LOC #100 INCL k,7*N/8 Loop on groups
MakeGray LDA z,GRAYMAP PBN k,0B of 8 rows.
[Inspired by Neil Hunt's DVIPAGE, the author used su h graymaps extensively
when preparing new editions of The Art of Computer Programming in 1992{1998.℄
190. (a) We must have j +1 = f ( j )  j 1 for j  1, where 0 = 0 : : : 0 and
f ( ) = ((  1) & 1 : : : 1)   (  1). The elements of the bottom row m satisfy
the parity ondition if and only if this rule makes m+1 entirely zero.
(b) True. The parity ondition on matrix entries aij is aij = a(i 1)j  ai(j 1) 
ai(j +1)  a(i+1)j , where aij = 0 if i = 0 or i = m + 1 or j = 0 or j = n + 1. If two
matri es (aij ) and (bij ) satisfy this ondition, so does ( ij ) when ij = aij  bij .
( ) The upper left submatrix onsisting of all rows that pre ede the rst all-zero
row (if any) and all olumns that pre ede the rst all-zero olumn (if any) is perfe t.
And this submatrix determines the entire matrix, be ause the pattern on the other side
of a row or olumn of zeros is the top/bottom or left/right re e tion of its0 neighbor.
For example, if m0 +1 is zero, then m0 +1+j = m0 +1 j for 1  j < m m .
(d) Starting with a given ve tor 1 and using the rule in (a) will always lead to
a row with2n m+1 = 0 : : : 0. Proof: We must have ( j ; j+1 ) = ( k ; k+1 ) for some 0 
j < k  2 , by the pigeonhole prin iple. If j > 0 we also have ( j 1 ; j ) = ( k 1 ; k ),
be ause j 1 = f ( j )  j+1 = f ( k )  k+1 = k 1 . Therefore the rst repeated
pair begins with a row k of zeros. Furthermore we have i = k i for 0  i  k;
hen e the rst all-zero row m+1 o urs when m is k 1 or k=2 1.
Rows 1 , : : : , m will form a perfe t pattern unless there is a olumn of 0s. There
are tR> 0 su h olumns if andRonly if tR+ 1 is a divisor of n + 1 and 1 has the form
0 0 : : : 0 (t even) or 0 0 : : : 0 (t odd), where j j + 1 = (n + 1)=(t + 1).
7.1.3 ANSWERS TO EXERCISES 105
(e) This starting ve tor does not have the form forbidden in (d). ontinuant polynomial
191. (a) The former is 1 , 2 , : : : if and only if the latter is 0 1 0 1 , 0 2 0 2 , : : : .
R R Chebyshev polynomial
nite eld
(b) Let the binary string a0 a1 : : : aN 1 orrespond to the polynomial a0 + a1 x + Lights Out
   + aN 1 xN 1 , and let y = x 1 +1+x. Then 0 = 0 : : : 0 orresponds to F0 (y); Uri
1 = 10 : : : 0 orresponds to F1 (y); and by indu tion j orresponds to Fj (y), mod
xN + 1 and mod 2. For example, when N = 6 we have 2 = 110001 $ 1 + x + x5
be ause x 1 mod (x6 + 1) = x5 , et .
( ) Again, indu tion on j .
(d) The identity in the hint holds by indu tion on m, be ause it is learly true
when m = 1 and m = 2. Working mod 2, this identity yields the simple equations
F2k (y) = yFk (y)2 ; F2k 1 (y) = (Fk 1 (y) + Fk (y))2 :
So we an go from the pair Pk = (Fk 1 (y) mod (x2N+1); Fk (y) mod (xN+1)) to the pair
Pk+1 in O(n) steps, and to the pair P2k in O(n ) steps. We an therefore ompute
Fj (y) mod (xN + 1) after O(log j ) iterations. Multiplying by f (x) + f (x 1 ) and
redu ing mod xN + 1 then allows us to read o the value of j .
In identally, Fn+1 (x) is the spe ial ase Kn (x; x; : : : ; x) of a ontinuant polyno-
mial; see Eq. 4.5.3{(4). We have Fn+1 (x) = Pnk=0 n k kxn 2k = i n Un (ix=2), where
Un is the lassi al Chebyshev polynomial de ned by Un ( os ) = sin((n + 1))=sin .
192. (a) By exer ise 191( ), (q ) is the least j > 0 su h that (x + x )Fj (x +1+x)  0
1 1

(modulo x2q + 1), using polynomial arithmeti mod 2. Equivalently, it's the smallest
positive j for1 whi h Fj (y) is a multiple of (x2q + 1)=(x2 + 1) = (1 + x +    + xq 1 )2 ,
when y = x +1+x.
(b) Use the method of exer ise 191(d) to evaluate ((x + x 1 )Fj (y)) mod (x2q +1)
when j = M=p, for all prime divisors p of M . If the result is zero, set M M=p and
repeat the pro ess. If no su h result eis zero, (q) = M . e 1
( ) We want to show that (2 ) is a divisor e
of 3  2 but not ofe+13  2e 2 or
2 . The latter holds be ause F2e 1(y) = y
e 1 2 1 1
is relatively prime to x2 +1. The
former holds be ause
e 1 1 e 1 e 1 1
F32e 1(y) = y2 F3 (y)2 = y2 (1 + y)2e = y2e 1 1 (x 1 +x)2e ;
e+1 e
whi h is  0 modulo P x2 + 1 but not modulo x2 + 1.
(d) F21e 1 (y) = ke=1 y2e 2k . Sin e y = x 1 (1+x+x2 ) is relatively prime to xq +1,
we have y  a0 + a1 x +    + aq 1 x (modulo xq +1) for some oeÆ ients ai ; hen e
q 1

k k k k+e k+e k+e


y 2  a0 + a1 x2 +    + aq 1 x2 (q 1)  a0 + a1 x2 +    + aq 1 x2 (q 1)  y 2
(modulo xq + 1) for 0  k < e, and it follows that F22e 1 (y) is a multiple of x2q + 1.
(e) In this ase (q) divides 4(22e 1). Proof: Let xq + 1 = f1(x) f2(x) : : : fr (x)
where f1 (x) = x + 1, f2 (x) = x2 + x + 1, and ea h fi (x) is irredu ible mod 2. Sin e
q is odd, these fa tors are distin t. Therefore, in the nite eld of polynomials mod
fj (x) for j  3, we have y 2k = y 2k+e as in (d). Consequently F22e 1 (y) is a multiple
of f3 (x) : : : fr (x) = (xq + 1)=(x3 + 1). So F2(22e 1) (y) = y3 F22e 1 (y)2 is a multiple of
(x2q + 1)=(x2 + 1) as desired.
(f) If F (q) (y) is a multiple of x2q +1,2 it's easy to see that (2q) = 2 (q). Otherwise
F3 (q) (y) is a multiple of F3 (y) = (1 + y) = x 2 (1 + x)4 ; hen e F6 (q) (y) is a multiple
of x4q + 1 and (2q) divides 6 (q). The latter ase an happen only when q is odd.
Notes: Parity patterns are related to a popular puzzle alled \Lights Out,"
whi h was invented in the early 1980s by Dario Uri, also invented independently about
106 ANSWERS TO EXERCISES 7.1.3
the same time by Laszlo Meero and alled . [See David Singmaster's Cubi Meero
Cir ular , issues 7&8 (Summer 1985), 39{42; Dieter Gebhardt, Cubism For Fun 69 XL25
Singmaster
(Mar h 2006), 23{25.℄ Klaus Sutner has pursued further aspe ts of this theory in Gebhardt
Theoreti al Computer S ien e 230 (2000), 49{73. Sutner
Knuth
193. Let b(2i)(2j ) = aij , b(2i+1)(2j ) = aij  a(i+1)j , b(2i)(2j +1) = aij  ai(j +1) , and mikado pattern
b(2i+1)(2j +1) = 0, for 0  i  m and 0  j  n, where we regard aij = 0 when i = 0 Eriksson
or i = m + 1 or j = 0 or j = n + 1. We don't have (b(2i)1 ; b(2i)2 ; : : : ; b(2i)(2n+1) ) = Eriksson
Sjostrand
(0; 0; : : : ; 0) be ause (ai1 ; : : : ; ain ) 6= (0; : : : ; 0) for 1  i  m. And we don't have Wolfram
(b(2i+1)1 ; b(2i+1)2 ; : : : ; b(2i+1)(2n+1) ) = (0; 0; : : : ; 0) be ause adja ent rows (ai1 ; : : : ; ain ) ve tor spa e
and (a(i+1)1 ; : : : ; a(i+1)n ) always di er for 0  i  m when m is odd.
194. Set i (1  (n i)) j (1  (i 1)) for 1  i  m, where m = dn=2e. Also set
i ( 1 & i1 ) + ( 2 & i2 ) +    + ( m & im ), where ij is the j th row of the parity
pattern that begins with i ; ve tor i re ords the diagonal elements of su h a matrix.
Then set r 0 and apply subroutine N of answer 195 for i 1, 2, : : : , m. The resulting
ve tors 1 , : : : , r are a basis for all n  n parity patterns with 8-fold symmetry.
To test if any su h pattern is perfe t, let the pattern starting with i rst be zero
in row i . If any i = n + 1, the answer is yes. If l m( 1 ; : : : ; r ) < n, the answer
is no. If neither rof these onditions de ides the matter, we an resort to brute-for e
examination of 2 1 nonzero linear ombinations of the  ve tors.
For example, when n = 9 we nd 1 = 111101111, 2 = 3 = 010101010, 4 =
000000000, 5 = 001010100; then r = 0, 1 = 011000110, 2 = 000101000, 1 = 2 = 5.
So there is no perfe t solution.
In the author's experiments for n  3000, \brute for e" was needed only when
n = 1709. Then r = 21 and the values of i were all equal to 171 or 855 ex ept that
21 = 342. The solution 1  21 was found immediately.
The answers for 1  n  383 are 4, 5, 11, 16, 23, 29, 30, 32, 47, 59, 62, 64, 65,
84, 95, 101, 119, 125, 126, 128, 131, 154, 164, 170, 185, 191, 203, 204, 239, 251, 254,
256, 257, 263, 314, 329, 340, 341, 371, 383.
[A fra tal similar to Fig. 20, alled the \mikado pattern," appears in a paper by
H. Eriksson, K. Eriksson, and J. Sjostrand, Advan es in Applied Math. 27 (2001), 365.
See also S. Wolfram, A New Kind of S ien e (2002), rule 150R on page 439.℄
195. Set i 1  (m i) and i i for 1  i  m; also set r 0. Then perform
the following subroutine for i = 1, 2, : : : , m:
N1. [Extra t low bit.℄ Set x i & i . If x = 0, go to N4.
N2. [Find j .℄ Find the smallest j  1 su h that j & x 6= 0 and j & (x 1) = 0.
N3. [Dependent?℄ If j < i, set i i  j , i i  j , and return to N1.
(These operations preserve the matrix equation C = BA.) Otherwise termi-
nate the subroutine (be ause i is linearly independent from 1 , : : : , i 1 ).
N4. [Re ord a solution.℄ Set r r + 1 and r i .
At the on lusion, the m r nonzero ve tors i are a basis for the ve tor spa e of all
linear ombinations of 1 , : : : , m ; they're hara terized by their low bits.
196. (a) 0a ; ea3 ; e7ae97 ; f09d8581 .
# # # #
(b) If x = x0 , the result# is lear# be ause l = l0 . Otherwise we(i) have 1 < 01 .
( ) Set j k; while j  80 < 40 , set j j 1. Then (x ) begins with j .
197. (a) 000a ; 03a3 ; 7b97 ; d834dd41 .
# # # #
(b) Lexi ographi order is not preserved when, say, x = # ffff and x0 = # 10000 .
7.1.3 ANSWERS TO EXERCISES 107
( ) To answer this question properly one needs to know that the 2048 integers surrogates
in the range # d800  x < # e000 are not(ilegal odepoints of UCS; they are alled Raynaud-Ri hard
Pournader
surrogates. With this understanding, (x ) ) begins at k if  # d 00  # 0400 , Kuhn
otherwise it begins at k 1 . surrogate
Warren
198. a =
# e50000 , b = 3, = # 16 . (We ould let b = 0, but then a would be swap
huge. This tri k was suggested by P. Raynaud-Ri hard in 1997. The stated onstants, MUX
Boolean matrix produ t
suggested by R. Pournader in 2008, are the smallest possible.)
199. We want 1 > 1 ; 2 1 + 2 < f490 ; and either ( 1 & 1 ) + 1 < 100 or
# 8 # #
1 + 2 > # 17f . These onditions hold if and only if
(# 1 1 )&(28 1 + 2 # f490 )& ((( 1 & 1 )+ 1 # 100 ) j (# 17f 1 2 )) < 0:
Markus Kuhn suggests adding the further lause ` & (# 20 ((28 1 + 2 )  # eda ))', to
ensure that 1 2 doesn't begin the en oding of a surrogate.
200. If $0 = (x7 : : : x1 x0 )256 then $3 = S2 (x7 ; x4 ; x2 ) = (x7 & x4 ) j (x7 & x2 ) j (x4 & x2 ).

201. MOR x, ,x, where = f0f0f0f00f0f0f0f .


#

202. MOR x,x, , where = 0 030300 0 0303 ; then MOR x,mone,x . (See answer 209.)
#

203. a = 0008000400020001 , b = 0f0f0f0f0f0f0f0f , = 0606060606060606 ,


# # #
d = # 0000002700000000 , e = # 2a2a2a2a2a2a2a2a . (The ASCII ode for 0 is 6 + # 2a ;
the ASCII ode for a is 6 + # 2a + 10 + # 27 .)
204. p =
# 8008400420021001 , q = # 8020080240100401 (the transpose of p), r =
# 4080102004080102 (a symmetri matrix), and m = # aa55aa55aa55aa55 .

205. Shue, but with p $ q , r = 0804020180402010 , m = f0f0f0f00f0f0f0f .


# #

206. Just hange p to 0880044002200110 . (In identally, these shues an also be


#
de ned as permutations on z = (z63 : : : z1 z0 )2 in another way: The outshue maps
zj 7! z(2j ) mod 63 while the inshue maps zj 7! z(2j +1) mod 65 .)
207. Do MOR y,p,x; MOR y,y,p; MOR t,y,q; PUT rM,m1; MUX y,y,t; MOR t,t,q;
PUT rM,m2; MUX y,y,t. In both ases p = # 2004801002400801 ; for triple-zip, q =
# 402010080402018 , m = # 4949494949494949 , m = # dbdbdbdbdbdbdbdb ; for the
1 2
inverse, q = # 0402018040201008 , m1 = # 07070707070707 , m2 = # 3f3f3f3f3f3f3f .
208. (Solution by H. S. Warren, Jr.) The text's 7-swap, 14-swap, 28-swap method an
be implemented with only 12 instru tions:
MOR t,x, 1; MOR t, 1,t; PUT rM,m1; MUX y,x,t;
MOR t,y, 2; MOR t, 2,t; PUT rM,m2; MUX y,y,t;
MOR t,y, 3; MOR t, 3,t; PUT rM,m3; MUX y,y,t;
here 1 = # 4080102004080102 , 2 = # 2010804002010804 , # 3 = # 0804020180402010 ,
m1 = aa55aa55aa55aa55 , m2 = 3333 3333 , m3 = f0f0f0f00f0f0f0f .
# #

209. Four instru tions suÆ e: MXOR y,p,x; MXOR x,mone,x; MXOR x,x,q; XOR x,x,y;
here p = # 80 0e0f0f8f feff , mone = 1, and q = p.
210. SLU x,one,x; MOR x,b,x; AND x,x,a; MOR x,x,#ff; here register one = 1.
W
211. In general, element ij of the Boolean matrix produ t AXB is fxkl j aik ^ blj g.
For this problem we hoose aik = [ i  k ℄ and b lj = [ l  j ℄; the answer is ` MOR t,f,a;
MOR t,b,t' where a = # 80 0a0f088 aaff and b = # ff5533110f050301 = aT .
108 ANSWERS TO EXERCISES 7.1.3
(Noti e that this tri k gives a simple test [ f = f^℄ for monotoni ity. Furthermore, multilinear representation
the 64-bit result (t63 : : : t1 t0 )2 gives the oeÆ ients of the multilinear representation MXOR
big-endian
f (x1 ; : : : ; x6 ) = (t63 + t62 x6 +    + t1 x1 x2 x3 x4 x5 + t0 x1 x2 x3 x4 x5 x6 ) mod 2; GULLIVER
SWIFT
if we substitute MXOR for MOR, by the result of exer ise 7.1.1{11.) Clift
MXOR
212. If  denotes MXOR as in (183) and b = ( 7 : : : 1 0 )256 has bytes j , we an evaluate identity matrix
SADD
= (a  B0L )  ((a  8)  (B1L +B0U ))  ((a  16)  (B2L +B1U ))    ((a  56)  (B7L +B6U )); NEG
2ADDU
where# BjU = (q j ) & m, BjL = (((q j )  8) + j ) & m, q = # 0080402010080402 , and table lookup by shifting
m = 7f3f1f0f07030100 . (Here q j denotes ordinary multipli ation of integers.)
213. In this big-endian omputation, register nn holds n, and register data points
to the o tabyte following the given bytes n 1 : : : 1 0 in memory (with n 1 rst).
The onstants aa = # 8381808080402010 and bb = # 339b f6530180 06 orrespond to
matri es A and B, found by omputing the remainders xk mod p(x) for 72  k < 80.
SET ,0 0. LDOU t,data,nn t next o ta.
LDOU t,data,nn t next o ta. XOR u,u, u u  .
ADD nn,nn,8 n n 8. SLU ,v,56 v  56.
BZ nn,2F Done if n = 0. SRU v,v,8 v v  8.
1H MXOR u,aa,t u t  A. XOR u,u,v u u  v.
MXOR v,bb,t v t  B. XOR t,t,u t t  u.
ADD nn,nn,8 n n 8. PBN nn,1B Repeat if n > 0.
A similar method nishes the job, with no auxiliary table needed:
2H SET nn,8 n 8. SRU v,v,8 v v  8.
3H AND x,t,ffooo x high 0byte. XOR t,t,v t t  v.
MXOR u,aaa,x u xA . SUB nn,nn,1 n n 1.
MXOR v,bbb,x v x  B0 . PBP nn,3B Repeat if n > 0.
SLU t,t,8 t t  8. XOR t,t, t t  .
XOR t,t,u t t  u. SRU r ,t,48 Return t  48.
Here aaa = # 8381808080808080 , bbb = # 0383 363331b0f05 , and ffooo = # ff00:::00 .
The Books of the Big-Endians have been long forbidden.
| LEMUEL GULLIVER, Travels Into Several Remote Nations of the World (1726)
214. By onsidering the irredu ible fa tors of the hara teristi polynomial of X ,
we must have X n = I where n = 23  32  5  7  17  31  127 = 168661080. Neill
Clift has shown that l(n 1) = 33 and found the following sequen e of 33 MXOR
instru tions to ompute Y = X 1 =6 X n 1: MXOR t,x,x; MXOR $1,t,x; MXOR $2,t,$1;
MXOR $3,$2,$2; MXOR t,$3,$3; S ; MXOR t,t,$2; S 3 ; MXOR $1,t,$1; MXOR t,$1,$3;
S 13 ; MXOR t,t,$1; S ; MXOR y,t,x; here S stands for `MXOR t,t,t'. To test if X is
nonsingular, do MXOR t,y,x and ompare t to the identity matrix # 8040201008040201 .
215. SADD $0,x,0; SADD $1,x,a; NEG $0,32,$0; 2ADDU $1,$1,$0; SLU $0,b,$1; then
BN $0,Yes; here a = # aaaaaaaaaaaaaaaa and b = # 2492492492492492 .
INDEX AND GLOSSARY
When an index entry refers to a page ontaining a relevant exer ise, see also the answer to
that exer ise for further information. An answer page is not indexed here unless it refers to a
topi not in luded in the statement of the exer ise.

0{1 matri es, 67{70, see also Bitmaps. Allou he, Jean-Paul, 78.
multipli ation of, 50{51, 56. Alpha hannels, 59.
transposing, 15, 56, 67, 69, 80. Alphabeti data, 20, 59.
triangularizing, 68. Analysis of algorithms, 55, 85.
0{1 prin iple, 54. An estors in a forest, 33.
1 (the onstant (    111)2 ), 3, 8, 9, nearest ommon, 33{35, 64.
50, 71, 76, 107. AND (bitwise onjun tion), 2{3.
2-adi hains, 23{27, 37, 61, 91, 96. Animating fun tions, 53, 56.
2-adi fra tions, 9, 75. Ar lists, 62.
2-adi integers: In nite binary strings Ariyoshi, Hiromu ( ), 92.
(: : : x2 x1 x0 )2 subje t to arithmeti and Arndt, Jorg Uwe, 76, 84.
bitwise operations, 2, 16, 21, 53, 55. Array storage allo ation, 16, 22, 54, 59.
as a metri spa e, 74.
2-bit en oding for 3-state data, 28{31, 63. ASCII: Ameri an Standard Code for
2- ube equivalen e, 29{30. Information Inter hange, iv, 59, 69, 118.
2-dimensional data allo ation, 16. Asso iative laws, 3, 72.
2ADDU (times 2 and add unsigned), Asterisk odes for sub ubes, 18, 63.
79, 84, 108. Averages, bytewise, 19, 59.
3-valued logi , 31, 63.
4-neighbors, see Rook-neighbors, 40. Ba kground of an image, 42.
4ADDU (times 4 and add unsigned), 79. Balan ed bran hing fun tions, 53.
8-neighbors, see King-neighbors, 40. Balan ed ternary notation, 63, 79.
8ADDU (times 8 and add unsigned), 79. Banyan networks, 81.
16ADDU (times 16 and add unsigned), 79. Basi RAM (random-a ess ma hine)
1 (in nity), 8, 55. model, 26, 62, 91.
Æ -maps, 84. Baumgart, Bru e Guenther, 12.
Æ -shifts, 16. Bays, John Carter, 77.
Æ -swaps, 13{15, 50, 55{56, 107.
BDIF (byte di eren e), 20, 86{87.
x (blg x ), see Binary logarithm.
Benes, Va lav Edvard, 13.
 (average memory a ess time), 118.
k and d;k , see Magi masks.
Bentley, Jon Louis, 95.
 x, see Sideways addition.
Berlekamp, Elwyn Ralph, 21, 73, 98.
x, see Ruler fun tion.
Bernshten, Serge Natanovi h (Bernxten,
Serge Natanoviq), 103.
 (instru tion y le time), 118.
BESM-6 (BESM-6) omputer, 83.
Beyer, Wendell Terry, 42.
Absorption laws, 3. Bezier, Pierre Etienne, splines, 48,
Abstra t RISC (redu ed-instru tion-set 66{67, 103.
omputer) model, 26. Big-endian onvention, 6{8, 12, 20, 77,
A kland, Bryan David, 44. 103{104, 108.
A y li digraph, 33. Binary basis, 71.
Addition, 3, 19.
bytewise, 19, 87. Binary logarithm (x = blg x ), 10{11,
modulo 5, 60. 21, 25, 33, 55, 60{61, 64.
s attered, 18, 57. Binary re urren e relations, 8, 10, 55.
sideways, 2, 11{12, 55, 62, 94. Binary sear h trees, 64, 79.
unary, 60. Binary tree stru tures, 32.
Adja en y lists, 62. Binary valuation, see Ruler fun tion.
Adja en y matri es of graphs, 28, 62. Binary- oded de imal notation, 60.
Adventure game, 85. Bipartite graphs, 14{15, 97.
Agrawal, Dharma Prakash (Dm þkAf Bit boards, 32, 63.
ag}vAl), 71. Bit odes for sub ubes, 18, 63.
Albers, Susanne, 88. Bit permutations, 13{17, 25, 50.
109
110 INDEX AND GLOSSARY

Bitmaps, 39{48, 64{68. Cir ular lists, 62, 100.


leaning, 65. Cleaning images, 65, 98.
drawing on, 48. Clift, Neill Mi hael, 108.
lling ontours in, 44{48, 66{67. Cliques, maximal, 62{63.
rotation and transposition of, 67. Closed bitmaps, 65.
Bitwise manipulations, 1{108. Collation of bits, 2.
Bla k pixels, 4, 40, 67. Colman, George, the younger, 1.
Bolyai, Janos, 36. Combinations, 75{76.
Bookworm problem, 54. Commutative laws, 3, 71.
Boolean matri es, 50, 69, see also Bitmaps. Comparison of bytes, 21, 60.
multipli ation of, 50, 56, 107. Complementation, 3, 52, 92.
Borkowski, Ludwik Stefan, 31. Complete binary trees, 33, 74.
Borrows, 86{87. in nite, 53.
Boundary urves, digitized, 44{48. Composition of permutations, 53, 56{57.
Bouton, Charles Leonard, 71. Compression of s attered bits, 16, 57, 83.
Bran h instru tions, 10, 48{49, 61. Conditional-set instru tions, 9{10, 48.
Bran hing fun tions, 53, 56. Coni se tions, digitizing, 44{48, 66{67.
Bran hless omputation, 23{26, 48{49, Coni splines, 103.
69, 70. Conjun tion, in 3-valued logi , 31.
Braymore, Caroline, 1. Conne tivity stru ture of an image,
Breadth- rst sear h, 91, 97. 41{43, 66.
Brent, Ri hard Peir e, 83. Consensus of sub ubes, 63.
Bresenham, Ja k Elton, 102. Continuant polynomials, 105.
Breuer, Melvin Allen, 94. Control points, 48.
Broadword hains, 23{27, 60{65, 96. Convex optimization, 85.
strong, 61. Conway, John Horton, 40, 73, 74, 98.
Broadword omputations, 21{27, 60{65. eld, 52.
Brodal, Gerth Stlting, 22. CRC ( y li redundan y he k), 51, 70.
Brodnik, Andrej, 27. Crossbar modules, 14, 58.
Bron, Coenraad, 92. CSNZ ( onditional set if nonzero), 10,
Brooker, Ralph Anthony, 2. 48{49, 88.
Brown, David Trent, 51. CSOD ( onditional set if odd), 79.
Bruijn, Ni olaas Govert de, y les, 10. CSZ ( onditional set if zero), 9, 77, 78.
Bu hi, Julius Ri hard, 75. Curvature: Re ipro al of the radius, 102.
Butter y networks, 56. Custering, 39, 44, 64{65.
Byte: An 8-bit quantity, 69. Cy les in a graph, 15.
Byte permutations, 50. Cy li redundan y he king, 51, 70.
Bytes, parallel operations on, see Multibyte Cy li shifts, 17, 56, 86.
pro essing. Cylinder, hyperboli , 39, 97.
Dallos, Jozsef, 9.
Ca he memory, 5, 35, 49, 77, 91. Dates, pa ked, 4, 60.
Ca he-oblivious array addressing, see Zip. de Bruijn, Ni olaas Govert, y les, 10.
Cahn, Leonard, 40. Depth of a Boolean fun tion, 13.
Can ellation law, 72. Des artes, Rene, oordinates, 44.
Cantor, Georg Ferdinand Ludwig Dietz, Henry Gordon, 19, 86.
Philipp, 85. Digitization of ontours, 44{48, 66{67.
Cardinality of a set, 11. Dijkstra, Edsger Wybe, 85.
Carries, 18{19, 25, 86{87. Dilated numbers, see S attered
Cartesian oordinates, 44. arithmeti , Zip.
Cartesian trees, 79, 95. Diri hlet, Johann Peter Gustav Lejeune,
Cellular automata, 40{43, 65{66. generating fun tion, 78.
Chebyshev (= Ts hebys he ), Pafnutii Dis rete logarithm, see Binary logarithm.
Lvovi h (Qebyxev, Pafnuti Disjointness testing, 58.
L~voviq), polynomials, 105. Disjun tion, in 3-valued logi , 31.
Cheshire at, 42{43, 65, 66, 100. Distan e between 2-adi integers, 74.
Chessboards, 32, 63. Distin t bytes, testing for, 59.
Chung, Kin-Man ( ), 17, 58. Distribution networks, see Mapping
Cigar-shaped urve, 102. networks.
Cir les, digitized, 44, 47. Distributive laws, 3, 72.
INDEX AND GLOSSARY 111
Divide and onquer paradigm, 12, 16. Fixed point arithmeti , 86, 103.
Divisibility by 3, 70. Flag: A 1-bit indi ator, 20, 59, 60.
Division, 54. Floating point arithmeti , 10, 78.
avoiding, 4, 54. Floyd, Robert W, 58.
by 10, 24. Footprints, 87, 93.
by powers of 2, 3{4. Fra tals, 68, 78.
in Conway's eld, 52. Fra tional pre ision, 4, 69.
of 2-bit numbers, 59. Fragmented elds, 18, 58.
Dominating sets, minimum, 98. Fredman, Mi hael Lawren e, 22, 60.
Don't- ares, 18, 29{30, 81, 94. Freed, Edwin Earl, 55.
Dot-minus operation (x . y ), v, 20, Frey, Peter William, 94.
24, 61, 82, 96. Fu hs, David Raymond, 99.
Double order for traversing trees, 100{101. Full adders, 98.
Dovetailing, 16, 59, see also Perfe t for balan ed ternary numbers, 63.
shues, Zip.
Drawing on a bitmap, 48. Gabow, Harold Neil, 95.
Duality between 0 and 1, 99. Games, 40, 52, 65, 85.
Duguid, Andrew Melville, 13. Gaps, between prime numbers, 77.
DVIPAGE program, 104. between Ulam numbers, 93.
in a s attered a umulator, 85.
Edges between pixels, 44{48, 66{67. Garbage olle tion, 27.
EDSAC omputer, 2, 11. Gardner, Martin, 40, 98.
Eight queens problem, 92. Gathering bits, 83.
Ellipses, 44{47, 102, 103. Gau (= Gauss), Johann Frideri h Carl
En oding of ternary data, 28{31, 63. (= Carl Friedri h), 36.
Eo ll (even/odd lling), 47. Gebhardt, Dieter, 106.
Equality of bytes, 20, 59, 60. Generating fun tions, 55, 57.
Equivalen e, in 3-valued logi , 63. Diri hlet, 78.
Eratosthenes of Cyrene (>Eratosjènh Gill, Stanley, 11.
å KurhnaØo ); sieve (kìskinon), 5, 54.
Gillies, Donald Bru e, 11.
Eriksson, Henrik, 106. Gladwin, Harmon Timothy, 8.
Eriksson, Kimmo, 106. Gosper, Ralph William, Jr., v, 56, 70.
Es her, Giorgio Arnaldo (= George ha k, 4, 54.
Arnold), 37. Graphs, 14{15.
algorithms on, 27{28, 62{63.
Es her, Maurits Cornelis, 37. Gray, Frank, binary ode, 73, 89.
Eu lid (EÎkledh ), 36. Gray levels in image data, 59, 67.
Extra ting bits, 2, 4, 8. Greedy-footprint heuristi , 87, 93.
and ompressing them, 16, 57, 83. GREG (global register de nition), 9, 12.
the least signi ant only (2x ), Grid stru ture, 36, 98.
8{10, 18, 54. Group of fun tions, 53.
the most signi ant only (2x ), 11, Groupoids, multipli ation tables for, 31, 63.
60{62, 89. Grundy, Patri k Mi hael, 71.
Guibas, Leonidas John (Gkmpa , Lewnda
Fast Fourier transforms, 56. Iwˆnnou), v, 103.
Ferranti Mer ury omputer, 2. Gulliver, Lemuel, 108.
Fibona i, Leonardo, of Pisa (= Leonardo Guo, Zi heng Charles ( ), 41, 65.
lio Bona ii Pisano), numbers, 36. Guy, Ri hard Kenneth, 73, 98.
Fibona i number system, 36, 64; see also
NegaFibona i number system. Ha ks, 1{108.
odd, 96. Hagerup, Torben, 88.
Fibona i polynomials, 67{68. HAKMEM, 26, 71, 75.
Fields, algebrai , 50, 52, 105. Half adders, 98.
Fields of data, see Pa king of data. for balan ed ternary numbers, 63.
Filling a ontour in a bitmap, 44{48, 66{67. Hall, Ri hard Wesley, Jr., 41, 65.
Fingerprints, 40. Hamburg, Mi hael Alexander, 75.
Finite elds, 50, 105. Hardy, Godfrey Harold, 75.
Finite state automata, 89. Harel, Dov (LXD AC), 33.
Fis her, Johannes Christian, 95. Heaps, 32.
Fisher, Randall James, 19, 86. sideways, 32, 63{64.
112 INDEX AND GLOSSARY

He kel, Paul Charles, 82. Lakhtakia, Akhlesh (aEKl f lKVEkyA), 75.


Herrmann, Fran ine, 36. Lamport, Leslie B., 19, 20, 59.
Heun, Volker, 95. Lander, Leon Joseph, 77.
Hexade imal onstants, v. Large megabytes, 77.
Hexade imal digits, 69. Largest element of a set, 11.
Hobby, John Douglas, 48, 103. Larvala, Samuli Kristian, 83.
Holes in images, 42{43. Latin-1 supplement to ASCII, 85.
Hollis, Je rey John, 62. Lauter, Martin, 10.
Hudson, Ri hard Howard, 77. Lawrie, Dun an Hamish, 81.
Hunt, Neil, 104. Le Corre, J., 13.
Leap year, 88.
Hyperbolas, 44, 66, 75, 102, 103. Least ommon an estors, see Nearest
Hyperboli plane geometry, 35{39, ommon an estors.
47, 64, 97. Least signi ant 1 bit (2x ), 8{9, 54.
Hyper oor fun tion (2x ), 60, 74. Lee, Ruby Bei-Loh ( ), 83.
Left-to-right minimum, 95.
Ide, Mikio ( ), 92. Leftmost bits, 10{11, 22, 55.
Identities for bitwise operations, 3, 52, Lehmer, Derri k Henry, 4.
53, 55, 75, 77, 86. Leiserson, Charles Eri , 10, 55.
Identity matrix, 108. Lenfant, Ja ques, 80.
ILLIAC I omputer, 11. Lenstra, Hendrik Willem, Jr., 52, 73, 74.
Impli ation, in 3-valued logi , 31. Levialdi Ghiron, Stefano, 42{43.
Impli it data stru tures, 32{39, 63{64. Lexi ographi order, 18, 68.
lg, see Binary logarithm.
Independent sets, maximal, 63. Life game, 40, 65.
In nite binary trees, 53. Lights Out puzzle, 105.
In nite ex lusive or, 74. Linked allo ation, 91.
In nite-pre ision numbers, 2, 4, 52. Little-endian onvention, 6{8, 12, 20,
Inorder of nodes, 33. 28, 76, 77.
Inshues, 69, 80. Littlewood, John Edensor, 75.
Inside of a urve, 44. Loba hevsky, Nikolai Ivanovi h
Inter hanging two bits, 55. (Lobaqevsk , Nikola
Inter hanging sele ted bits, 71. Ivanoviq), 36.

Interleaving bits, 16, 59, see also Perfe t Loukakis, Emmanuel (Loukˆkh ,
shues, Zip. Man¸lh ), 92.

Internet, ii, iii. Lower bounds, 23{27, 61{62, 103.


Inverse of a binary matrix, 70. Lower ase letters, 59.
Inverse of a permutation, 50. Lowest ommon an estor, see Nearest
Isometries, 74. ommon an estor.
Loyd, Samuel, 77.
Lukasiewi z, Jan, 31, 63.
Jardine, Ni holas, 92. Lutz, Rudiger Karl (= Rudi), 101.
Johnson, David Sti er, 92. Lyn h, William Charles, 11.
Jordan, Marie Ennemond Camille,
urve theorem, 44. Magi masks (k and d;k ), 9{12, 16,
22, 37, 54, 71, 75, 76, 78{80, 82,
Kaas, Robert, 32. 84, 88, 96, 103.
Kaasila, Sampo Juhani, 103. Majority fun tion, 27, see also Median
fun tion.
Katajainen, Jyrki Juhani, 35. Malgouyres, Remy, 99.
Kerbos h, Joep A. G. M., 92. Man hester Mark I omputer, 2.
King-neighbors, 40. Mann, William Fredri k, 98.
Kingwise onne ted omponents, Mapping modules, 58, 81.
41{43, 65{66. Mapping networks, 58, 81.
Kirs h, Russell Andrew, 40, 65. Mapping three items into two-bit odes,
Knight moves, 63. 28{31, 63.
Knuth, Donald Ervin ( ), i, v, 22, Mappings of bits, 17, 58, 81.
77, 78, 93, 99, 103, 104, 106. Margenstern, Mauri e, 36.
Kuhn, Markus Gunther, 107. Mark II omputer (Man hester/Ferranti), 2.
INDEX AND GLOSSARY 113
Martin, Monroe Harnish, 10. Multilinear representation of a Boolean
Mask: A bit pattern with 1s in key fun tion, 108.
positions, 9, 16{18, 20, 49, 50, 69. Multiple-pre ision arithmeti , 6.
Masked integers, see S attered arithmeti . Multipli ation, 4, 10{11, 22, 61, 78.
Masking: ANDing with a mask, 31. avoiding, 21, 22, 59, 78.
Matri es of 0s and 1s, 67{70, see also by powers of 2, 3, 78.
Bitmaps. in Conway's eld, 52.
multipli ation of, 50{51, 56. in groupoids, 31, 63.
transposing, 15, 56, 67, 69, 80. lower bound for, 22, 26, 62.
triangularizing, 68. of 0{1 matri es, 56; see also MOR and MXOR.
Matrix multipli ation, 50{51, 56. of polynomials mod 2, 70.
Matrix transposition, 15, 56, 67, 69, 80. of signed bits, 29{30.
max (maximum) fun tion, 2, 31, 60. Munro, James Ian, 27.
Maximal liques, 62. MUX (multiplex), 50, 83, 86, 103, 107.
Maximal independent sets, 92. MXOR (multiple xor), 50{51, 56, 69{70,
Maximal proper subsets, 58. 70, 73, 107{108.
Maybe, 31. My roft, Alan, 20.
M Cranie, Judson Shasta, 93.
Median fun tion, v, 21, 86, 87.
Meero, Laszlo, 106. Navigation piles, 35, 64.
Mems: Memory a esses. Nearest ommon an estors, 33{35, 64.
Merge sorting, 49. Ne essity, in 3-valued logi , 63.
opqrstuq, 103. NEG (negation), 49, 76, 108.
mex (minimal ex ludant) fun tion, 52. Negabinary number system, 52.
Mikado pattern, 106. Negade imal number system, 37.
Miller, Je rey Charles Per y, 11. NegaFibona i number system, 36{39, 64.
Miltersen, Peter Bro, 27. Negation, 3, 52, 63.
min (minimum) fun tion, 2, 31, 60. Nested parentheses, 54.
Minimal ex ludant, 52. Newline symbol, 20.
Minimum element in subarray, 64. Ni ely, Thomas Ray, 77.
Minsky, Marvin Lee, 66. Nim, 2, 52.
Mixed-radix representation, 60. addition, 2, 52.
MMIX, ii, iv, 5, 7{10, 12, 19, 20, 28, 48{51, division, 52.
54, 55, 57, 59, 60, 62, 67, 69, 70, multipli ation, 52, 73.
73, 79, 84, 86, 87. se ond-order, 52.
mod (remainder) fun tion, 4. Noisy data, 65.
Mod-5 arithmeti , 60.
Modal logi , 31, 63. Non-Eu lidean geometry, 35{36, 97.
mone, 76, see 1. Nonzero bytes, testing for, 20{21.
Monotone Boolean fun tions, 70. NOT (bitwise omplementation), 2.
Monotoni portions of urves, 45{47, 66. Notational onventions, v, 81.
Monus operation (x . y ), v, 20, 24, hxyzi (median of three), v.
61, 82, 96. u ! v (transitive losure), 27.
Moody, John Kenneth Montague  or  x (bitwise omplement), 3.
x
(= Ken), 62. x (suÆx parity), 55.
MOR (multiple or), 12, 19, 50{51, 56, 69{70, x & y (bitwise AND), 3.
86, 94, 103, 107{108. x j y (bitwise OR), 3.
Morton, Guy Ma donald, 85. x  y (bitwise XOR), 3.
Most signi ant 1 bit (2x ), 2, 11, 60{62, 89. x  y (bitwise left shift), 3.
MP3 (MPEG-1 Audio Layer III), 51. x  y (bitwise right shift), 3.
x z y (zipper fun tion), see Zip.
Muller, David Eugene, 11. .
x y (max(x y; 0)), v, 20, 24, 61, 82, 96.
Multibyte en oding, 68{69.
x? y : z = xy + xz (mux), 60, 62.
Multibyte pro essing, 19{23, 59{61. z  (sheep-and-goats), 17, 57.


addition, 19, 60, 87.


omparison, 20{21. NP-hard problems, 57.
max and min, 60, 88. Null spa es, 68.
modulo 5, 60. NXOR (not xor), 79.
potpourri, 87. Nybble: A 4-bit quantity, 12, 69.
subtra tion, 59, 60, 87. Nyp: A 2-bit quantity, 12, 69.
114 INDEX AND GLOSSARY

Obje ts in images, 42. Pi kover, Cli ord Alan, 75.


O tabyte or o ta: A 64-bit quantity, 69. Pigeonhole prin iple, 104.
Odd Fibona i number system, 96. Pipelined ma hine, 48{49.
Ofman, Yuri Petrovi h (Ofman, ri Pitteway, Mi hael Lloyd Vi tor, 45.
Petroviq), 84. Pixel algebra, v, 40.
Omega network for routing, 56{57. Pixel patterns, 4, 53.
One-to-many mapping, 17, 30. Pixels, 39{48, 64{68.
Ones ounting, see Sideways addition. gray, 59, 67.
Online algorithms, 42{43, 66. Polya, Gyorgy (= George), 75.
Open bitmaps, 65. Polynomials modulo 2, 57.
Opti al hara ter re ognition, 40, 65. multipli ation of, 70.
OR (bitwise disjun tion), 2{3. remainders of, 57, 68.
Ordinal numbers, 73. Polynomials modulo 5, 60.
Oriented forests and trees, 33, 42. Population ount, 11, see Sideways addition.
Oriented paths, 27. Portability, 7.
Outshues, 56, 69. Possibility, in 3-valued logi , 63.
Outside of a urve, 44. Pournader, Roozbeh (ŽŠnώØ) Öp×Ž), 107.
Over owing memory, 92. Pratt, Vaughan Ronald, 54, 58, 81,
84, 89, 103.
Pa ked data, operating on, 4, 19{21, Pre x problem, see SuÆx parity fun tion.
31, 59{60, 63, 69. Preorder of nodes, 33{35.
Pa king of data, 4{6, 16, 31, 54, 69, 83. Presume, Livingstone Irving, 55.
Page faults, 59. Prime impli ants, 63.
Paley, Raymond Edward Alan Christopher, Prime numbers, 5, 54.
54, 75. Printing, 39.
Papadimitriou, Christos Harilaos Priority queues, 35.
(Papadhmhtrou, Qrsto Qarilˆou), 92. Prit hard, Paul Andrew, 77.
Papert, Seymour Aubrey, 66. Prodinger, Helmut, 78.
Program ounter, 26.
Parabolas, 44, 66, 102. Proje tion fun tions, 9.
Parallel pro essing of subwords, 19{23, Prokop, Harald, 10, 55.
59{61.
Parenthesis tra es, 54. Quadrati forms, 45{47, 66{67.
Parity fun tion, 27, 62, 73, 79. Quadtrees, 85.
suÆx, 55, 69, 91. Quanti ations, 74, 89.
Parity patterns, 67{68. Queen graph, 92.
Parkin, Thomas Randall, 77. Qui k, Jonathan Horatio, 53, 58.
Patents, 79, 83. Quilt, 4.
Paterson, Mi hael Stewart, 22, 90, 91.
Pattern re ognition, 40. Rabin, Mi hael Oser (OIAX XFER LKIN), 84.
Patterns, sear hing for, 20{22, 61. Radix 2, 52.
Pentagrid, 36{39, 64. Radix onversion, 60.
Perez, Aram, 51. Radix ex hange sort, 91.
Perfe t hash fun tions, 78. RAM (random-a ess ma hine), 26{27,
Perfe t parity patterns, 67{68. 62, 91.
Perfe t shues, 16, 50, 56, 57, 69, 80, 88. Raman, Rajeev, 85.
3-way, 69, 85. Ramshaw, Lyle Harold, 21.
Period length, 62. Randall, Keith Harold, 10, 55.
Permutation matri es, 50. Randomized data stru tures, 79.
Permutation networks, 13{15, 57{58, 81. Range he king, 60.
Permutations, Range minimum query problem, 64.
indu ed by index digits, 56. Rank of a binary matrix, 68.
of bits within a word, 13{17, 25, 50. Rasters, 39, see Bitmaps.
of bytes within a word, 50. Rational 2-adi numbers, 61.
of the 2-adi integers, 53. Ray, Louis Charles, 40.
Omega-routable, 56{57. Raynaud-Ri hard, Pierre, 107.
Perpendi ular lines, 36. Rea hability problem, 27{28, 33.
Peterson, William Wesley, 51. Rearrangeable networks, see Permutation
Phi (), 64. networks.
Pi ( ), as \random" example, 17. Re urren e relations, 8, 10, 37, 51, 55, 67.
INDEX AND GLOSSARY 115
Re ursive pro esses, 15, 17, 32, 52, 72, 84. Sequential allo ation, 91.
Redundant representations, 103. SET, the game, 93.
Re e tion of bits, 12{13, 25, 55, 56, 96, 97. Sets, represented as integers, 11, 18,
Regular languages, 61. 27{28, 58, 62{63, 75.
Reitwiesner, George Walter, 55. maximal proper subsets of, 58.
Remainder mod 2n 1, 11. Shades of gray, 67.
Remainder mod 2n , 4. Shallit, Je rey Outlaw, 78.
Removal of bits, 8. Sheep-and-goats operation, 17{18, 57.
Repli ation of bits, 17, 58, 88. Shi, Zhi-Jie Jerry ( ), 83.
Representation, Shift instru tions, 3, 19, 52, 61.
of graphs, 27, 62. signed, 49, 78.
of permutations, 57. table lookup via, 5, 23, 69, 85, 88.
of sets as integers, 11, 18, 27{28, Shift sets, 24{25.
58, 62{63, 75. Shirakawa, Isao ( ), 92.
of three states with two bits, 28{31, 63. Shrinking of images, 42{43, 66.
Reversal of bits, 12{13, 25, 55, 56, 96, 97. Shue network for routing, 56.
Right-to-left minimum, 95. Sibling links, 32, 63.
Rightmost bits, 8{10, 54. Sibson, Robin, 92.
Ro hdale, Simon, 1. Sideways addition, 2, 11{12, 62, 79, 94.
Roki ki, Tomas Gerhard, v, 55, 79. bytewise, 11, 88.
Rook-neighbors, 40. fun tion  x, 11, 27, 55, 78.
Rookwise onne ted omponents, summed, 55, 82.
41{43, 65{66. Sideways heaps, 32{35, 63{64.
Rosenfeld, Azriel (CLTPFEX LIXFR), 41. Sieve of Eratosthenes, 5, 54.
Rotation of square bitmaps, 67. Signed bits, representation of, 29, 55.
Rote, Gunter (= Rothe, Gunther Alfred Signed right shifts, 49, 78.
Heinri h), 66. SIMD (single instru tion, multiple data)
Rounding, 33, 86. ar hite ture, 19.
to an odd number, 2, 59, 86. Simply onne ted omponents, 43.
Ruler fun tion (x), 8, 20, 21, 25, 26, 28, Singmaster, David Breyer, 106.
32, 53, 55, 60, 64, 78, 100. Six-register algorithm, 103.
summed, 95. Sjostrand, Jonas Erik, 106.
Runlength en oding, 100, see also Edges Slanina, Matteo, 74.
between pixels. Sleator, Daniel Domini Kaplan, 4, 98.
Runs of 1s, 8, 11, 22{23, 55, 61. Slepian, David, 13.
Rutovitz, Denis, 40. Smallest element of a set, 11.
Smearing bits to the right, 8, 11, 78.
S, the letter, 48. Sorted data, 54.
S1S, 74{75. Sorting, 60, 75.
Sa heri, Giovanni Girolamo, 36. networks for, 58.
SADD (sideways addition), 9, 28, 76, Soule, Stephen Parke, 87.
78, 79, 108. Sprague, Roland Per ival, 71.
Samet, Hanan (HNQ OPG), 85. Squaring a polynomial, 57.
Saturated addition, 92. Squines, 48, 66, 103.
Saturated subtra tion, see Monus. SR (shift right, preserving the sign),
S attered arithmeti , 18, 58. 10, 49, 76, 78.
addition, 18, 57. SRU (shift right unsigned), 5, 78.
shifting, 58. Stanford GraphBase, ii, iii.
subtra tion, 58. Steele, Guy Lewis, Jr., v, 16, 57, 80, 83.
S attering bits, 83. Sterne, Lauren e, iii.
S hieber, Baru h Mena hem Sto kmeyer, Larry Joseph, 81, 84.
(XAIY MGPN JEXA), 33. Sto kton, Fred G., 102.
S hla i, Ludwig, 97. Stol , Jorge, v, 103.
S hroeppel, Ri hard Crabtree, 26, 52, 82. Storage allo ation, 16, 22, 54, 59.
Seal, David, 10. Stra hey, Christopher, 12.
Se ond-order logi , 74{75. Straight lines, digitizing, 66.
Se urity holes, 69. Stret hing bits, 58, 88.
Segmented broad asting, see Stret hing bits. Strings, sear hing for spe ial bytes in, 20.
Segmented sieves, 77. Strong broadword hains, 61.
116 INDEX AND GLOSSARY

Sub ubes, 18, 63. Two's omplement notation, 2, 26, 71.


Subsets, 11, 27{28, 62{63, 75. Typesetting, 39.
generating all, 18.
maximal proper, 58. UCS (Universal Chara ter Set), 69.
Subtra tion, 3, 52, 59. Ulam, Stanislaw Mar in, 93.
bytewise, 59, 87. numbers, 63.
modulo 5, 60. Ultraparallel lines, 36.
saturated, see Monus. Unary notation, 60.
s attered, 58.
unary, 60. Unbiased rounding, 59, 86.
SuÆx parity fun tion, 55, 69, 91. Un ompressing bits, 57.
Sum of bits, see Sideways addition. Under ow mask, 90.
weighted, 55. Unger, Stephen Herbert, 19.
Surrogates, 107. Uni ode, 69.
Surroundedness tree, 43, 66. Universal Chara ter Set, 69.
Sutner, Klaus, 106. Unpa king of data, 2, 4{6, 57, 83.
Swapping bits, 12{15, 55{56, 107. Unsigned 2-adi integers, 71.
between variables, 71. Unsolvable problems, 75.
SWAR methods, 19{23, 59{61. Upper halfplane, 97.
SWARC ompiler, 85. Upper ase letters, 59.
Swift, Jonathan, 108. Urban, Genevie Hawkins, 40.
Sylow, Peter Ludvig Mejdell, 2-subgroup, 74. Uri, Dario, 105.
Symmetri fun tions, Boolean, 62. UTF-8: 8-bit UCS Transformation
Symmetri group, 74. Format, 69.
Symmetri order of nodes, 33. UTF-16: 16-bit UCS Transformation
Format, 69.
Table lookup, 9, 10, 85.
by shifting, 5, 23, 69, 88.
Tarjan, Robert Endre, 33, 95. van Emde Boas, Peter, 32.
Ternary ve tors, 31. Van Wyk, Christopher John, 99.
Tessellation, 36, 47, 64. Varian e, 57.
Tetrabyte or tetra: A 32-bit quantity, 69. Veblen, Oswald, 44.
Thinning an image, 40{41, 65. Ve tor spa e, basis for, 74, 106.
Thompson, Kenneth Lane, 68. Vertex overs, minimal, 63.
Three-register algorithm, 45{48, 66. Vishkin, Uzi Yehoshua (OIWYIE RYEDI IFER), 33.
Three-state en odings, 28{31, 63. Vitale, Fabio, 35.
Three-valued logi , 31, 63. Vuillemin, Jean Etienne, 95.
Tiling, 36, 47, 64.
Time, mixed-radix representation of, 60. Wada, Eiiti ( ), 76.
To her, Keith Douglas, 2, 59, 85. Warren, Henry Stanley, Jr., v, 8, 11, 12,
Toruses, 65, 87. 25, 51, 52, 71, 78, 83, 86, 107.
Trailing zeros, 8, see Ruler fun tion. Wegner, Peter (= Weiden, Puttilo Leonovi h
Transdi hotomous methods, see Broadword = Veden, Puttilo Leonoviq), 8, 12.
omputations. Weighted sum of bits, 55.
Transitive losure, 27, 33. Welter, Cornelis P., 74, 75.
Transposing a 0{1 matrix, 15, 56, 67, 69, 80. Weste, Neil Harry Earle, 44.
Transposed allo ation, 77. Wheeler, David John, 11.
Traversal in postorder, 95, 100{101. White pixels, 4, 40, 67.
Traversal in preorder, 94, 100{101. Wilkes, Mauri e Vin ent, 11.
Treaps, 79.
Triangularizing a 0{1 matrix, 68. Willard, Dan Edward, 22, 60.
Tri ks versus te hniques, 2, 104. Wilson, David Whitaker, 93.
Trinomials, 57. Wise, David Stephen, 85.
Triple zipper fun tion, 69, 85. Wolfram, Stephen, 106.
Triply linked trees, 94, 95, 100. Wong, Chak-Kuen ( ), 17, 58.
TrueType, 103. Woodrum, Luther Jay, 77.
Truth tables, 9, 70. Woods, Donald Roy, 85.
Tsukiyama, Shuji ( ), 92. Wraparound parity patterns, 67.
Turing, Alan Mathison, 2. Wunderli h, Charles Marvin, 93.
ma hines, 98. Wyde: A 16-bit quantity, 69.
INDEX AND GLOSSARY 117
XL25 game, 106. Zero-or-set instru tions, 9, 10, 88.
XOR (bitwise ex lusive or), 2. Zeta fun tion, 78.
identities involving, 3, 53, 55, 75. Zijlstra, Erik, 32.
Zimmermann, Paul Vin ent Marie, 83.
Yannakakis, Mihalis (Giannakˆkh , Zip: The zipper fun tion, 16, 50, 57,
Miqˆlh ), 92. 66, 77, 80, 83, 85.
triple (three-way), 69, 85.
Z order, see Zip. Zip-fastener method, 85.
Zero-byte test, 20{21. ZSNZ (zero or set if nonzero), 10, 88.
Zero-one prin iple, 54. ZSZ (zero or set if zero), 9.
ASCII CHARACTERS
#0 #1 #2 #3 #4 #5 #6 #7 #8 #9 #a #b # #d #e #f
# 2x ! " # $ % & ' ( ) * + , - . / # 2x
# 3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ? # 3x
# 4x  A B C D E F G H I J K L M N O # 4x
# 5x P Q R S T U V W X Y Z [ \ ℄ ^ _ # 5x
# 6x ` a b d e f g h i j k l m n o # 6x
# 7x p q r s t u v w x y z { | } ~ # 7x
#0 #1 #2 #3 #4 #5 #6 #7 #8 #9 #a #b # #d #e #f

MMIX OPERATION CODES


#0 #1 #2 #3 #4 #5 #6 #7

TRAP 5 FCMP  FUN  FEQL  FADD 4 FIX 4 FSUB 4 FIXU 4


# 0x # 0x
FLOT[I℄ 4 FLOTU[I℄ 4 SFLOT[I℄ 4 SFLOTU[I℄ 4
FMUL 4 FCMPE 4 FUNE  FEQLE 4 FDIV 40 FSQRT 40 FREM 4 FINT 4
# 1x # 1x
MUL[I℄ 10 MULU[I℄ 10 DIV[I℄ 60 DIVU[I℄ 60
ADD[I℄  ADDU[I℄  SUB[I℄  SUBU[I℄ 
# 2x # 2x
2ADDU[I℄  4ADDU[I℄  8ADDU[I℄  16ADDU[I℄ 
CMP[I℄  CMPU[I℄  NEG[I℄  NEGU[I℄ 
# 3x # 3x
SL[I℄  SLU[I℄  SR[I℄  SRU[I℄ 
BN[B℄ + BZ[B℄ + BP[B℄ + BOD[B℄ +
# 4x # 4x
BNN[B℄ + BNZ[B℄ + BNP[B℄ + BEV[B℄ +
PBN[B℄ 3  PBZ[B℄ 3  PBP[B℄ 3  PBOD[B℄ 3 
# 5x # 5x
PBNN[B℄ 3  PBNZ[B℄ 3  PBNP[B℄ 3  PBEV[B℄ 3 
CSN[I℄  CSZ[I℄  CSP[I℄  CSOD[I℄ 
# 6x # 6x
CSNN[I℄  CSNZ[I℄  CSNP[I℄  CSEV[I℄ 
ZSN[I℄  ZSZ[I℄  ZSP[I℄  ZSOD[I℄ 
# 7x # 7x
ZSNN[I℄  ZSNZ[I℄  ZSNP[I℄  ZSEV[I℄ 
LDB[I℄ + LDBU[I℄ + LDW[I℄ + LDWU[I℄ +
# 8x # 8x
LDT[I℄ + LDTU[I℄ + LDO[I℄ + LDOU[I℄ +
LDSF[I℄ + LDHT[I℄ + CSWAP[I℄ 2+2 LDUNC[I℄ +
# 9x # 9x
LDVTS[I℄  PRELD[I℄  PREGO[I℄  GO[I℄ 3
STB[I℄ + STBU[I℄ + STW[I℄ + STWU[I℄ +
# Ax # Ax
STT[I℄ + STTU[I℄ + STO[I℄ + STOU[I℄ +
STSF[I℄ + STHT[I℄ + STCO[I℄ + STUNC[I℄ +
# Bx # Bx
SYNCD[I℄  PREST[I℄  SYNCID[I℄  PUSHGO[I℄ 3
OR[I℄  ORN[I℄  NOR[I℄  XOR[I℄ 
# Cx # Cx
AND[I℄  ANDN[I℄  NAND[I℄  NXOR[I℄ 
BDIF[I℄  WDIF[I℄  TDIF[I℄  ODIF[I℄ 
# Dx # Dx
MUX[I℄  SADD[I℄  MOR[I℄  MXOR[I℄ 
SETH  SETMH  SETML  SETL  INCH  INCMH  INCML  INCL 
# Ex # Ex
ORH  ORMH  ORML  ORL  ANDNH  ANDNMH  ANDNML  ANDNL 
JMP[B℄  PUSHJ[B℄  GETA[B℄  PUT[I℄ 
# Fx # Fx
POP 3 RESUME 5 [UN℄SAVE 20+ SYNC  SWYM  GET  TRIP 5
#8 #9 #A #B #C #D #E #F
 = 2 if the bran h is taken,  = 0 if the bran h is not taken