Вы находитесь на странице: 1из 147

Advanced

Complexity
Theory

Markus Bläser & Bodo Manthey


Universität des Saarlandes
Draft—February 27, 2010 and forever
2
1 Complexity of optimization prob-
lems

1.1 Optimization problems


The study of the complexity of solving optimization problems is an impor-
tant practical aspect of complexity theory. A good textbook on this topic is
the one by Ausiello et al. [ACG+ 99]. The book by Vazirani [Vaz01] is also
recommend, but its focus is on the algorithms side.

Definition 1.1. An optimization problem P is a 4-tuple (IP , SP , mP , goalP )


where

1. IP ⊆ {0, 1}∗ is the set of valid instances of P ,

2. SP is a function that assigns to each valid instance x the set of feasible


solutions SP (x) of x, which is a subset of {0, 1}∗ .1

3. mP : {(x, y) | x ∈ IP and y ∈ SP (x)} → N+ is the objective function


or measure function. mP (x, y) is the objective value of the feasible
solution y (with respect to x).

4. goalP ∈ {min, max} specifies the type of the optimization problem.


Either it is a minimization or a maximization problem.

When the context is clear, we will drop the subscript P . Formally,


an optimization problem is defined over the alphabet {0, 1}. But as usual,
when we talk about concrete problems, we want to talk about graphs, nodes,
weights, etc. In this case, we tacitly assume that we can always find suitable
encodings of the objects we talk about.
Given an instance x of the optimization problem P , we denote by S∗P (x)
the set of all optimal solutions, that is, the set of all y ∈ SP (x) such that

mP (x, y) = goal{mP (x, z) | z ∈ SP (x)}.

(Note that the set of optimal solutions could be empty, since the maximum
need not exist. The minimum always exists, since we mP only attains val-
ues in N+ . In the following we will assume that there are always optimal
1
Some authors also assume that for all x ∈ IP , SP (x) 6= ∅. In this case the class NPO
defined in the next section would be equal to the class exp-APX (defined somewhere else).

3
4 1. Complexity of optimization problems

solutions provided that SP (x) 6= ∅.) The objective value of any y ∈ S∗ (x) is
denoted by OPTP (x).2
Given an optimization problem P , there are (at least) three things one
could do given a valid instance x:

1. compute an optimal solution y ∈ S∗ (x) (construction problem).

2. compute OPT(x) (evaluation problem)

3. given an additional bound B, decide whether OPT(x) ≥ B (if goal =


max) or whether OPT(x) ≤ B (if goal = min) (decision problem).

The first task seems to be the most natural one. Its precise formalization
is however a subtle task. One could compute the function F : I → P({0, 1}∗ )
mapping each x to its set of optimal solutions S∗ (x). However S∗ (x) could
be very large (or even infinite). Moreover, one is almost always content with
only one optimal solution. A cut of F is any function f : I → Σ∗ that maps
every x to some y ∈ S∗ (x). We say that we solve the construction problem
associated with P if there is a cut of F that we can compute efficiently.3
It turns out to be very useful to call such a cut again P and assume that
positive statements containing P are implicitly ∃-quantified and negative
statements are ∀-quantified. (Do not worry too much now, everything will
become clear.)
The second task is easy to model. We want to compute the function
x 7→ OPT(x). We denote this function by Peval .
The third task can be modelled as a decision problem. Let
(
{hx, bin(B)i | OPT(x) ≥ B} if goal = max
Pdec =
{hx, bin(B)i | OPT(x) ≤ B} if goal = min

Our task is now to decide membership in Pdec .

1.2 PO and NPO


We now define optimization analogs of P and NP.

Definition 1.2. NPO is the class of all optimization problems P = (I, S, m, goal)
such that
2
The name m∗P (x) would be more consequent, but OPTP (X) is so intuitive and con-
venient.
3
Note that not every cut can be computable, even for very simple optimization problems
like computing minimum spanning trees and even on very simple instances. Consider a
complete graph Kn , all edges with weight one. Then every spanning tree is optimal. But
a cut that maps Kn to a line if the nth Turing machine halts on the empty word and to
a star otherwise is certainly not computable.
1.3. Example: TSP 5

1. I ∈ P, i.e., we can decide in polynomial time whether a given x is a


valid instance,
2. there is a polynomial p such that for all x ∈ I and y ∈ S(x), |y| ≤
p(|x|), and for all y with |y| ≤ p(|x|), we can decide y ∈ S(x) in time
polynomial in |x|,
3. m is computable in polynomial time.
Definition 1.3. PO is the class of all optimization problems P ∈ NPO
such that the construction problem P is deterministically polynomial time
computable. (Recall that this means that there is a cut that is polynomial
time computable).
We will see the relation of PO and NPO to P and NP in Section 1.5.
Even though it is not explicit in Definition 1.2, NPO is a nondeterministic
complexity class.
Theorem 1.4. For each P ∈ NPO, Pdec ∈ NP.
Proof. Let p be the polynomial in the definition of NPO. The following
nondeterministic Turing machine M decides Pdec in polynomial time:
Input: instance x ∈ I and bound B
1. M guesses a string y with |y| ≤ p(|x|).
2. M deterministically tests whether y ∈ S(x).
If not, M rejects.
3. If y is, then M computes m(x, y) and tests whether m(x, y) ≤ B
(minimization problem) or m(x, y) ≥ B (maximization problem).
4. If the test is positive, then M accepts, otherwise, M rejects.

It is easy to see that M indeed decides Pdec and that its running time is
polynomial.

1.3 Example: TSP


Problem 1.5 (TSP, ∆-TSP). The Traveling Salesperson Problem (TSP) is
defined as follows: Given a complete loopless (undirected) graph G = (V, E)
and a weight function w : E → N+ assigning each edge a positive weight,
find a Hamiltonian tour of minimum weight. If in addition w fulfills the
triangle inequality, i.e.,

w({u, v}) ≤ w({u, x}) + w({x, v}) for all nodes u, x, v,

then we speak of the Metric Traveling Salesperson Problem (∆-TSP).


6 1. Complexity of optimization problems

In the example of the Traveling Salesperson Problem TSP, we have the


following:

• The set of all valid instances is the set (of suitable encodings) of all
edge-weighted complete loopless graphs G. In the special case ∆-TSP,
the edge weights should also fulfill the triangle inequality (which can
be easily checked).

• Given an instance x, a feasible solution is any Hamiltonian tour of G,


i.e., a permutation of the vertices of G. (Note that for TSP, the set of
feasible solutions only depends on the number of nodes of G.)

• The objective value of a solution y of an instance x is the sum of the


edges used in the tour specified by y. (This can be interpreted as the
length of the tour.)

• Finally, TSP and ∆-TSP are minimization problems.

It is easy to verify, that TSP ∈ NPO. However it is very unlikely that it


is in PO. Even finding a very rough approximate solution seems to be very
hard.

Exercise 1.1. Assume that there is a polynomial time algorithm that given
an instance x of TSP, returns a Hamiltonian tour whose weight is at most
2p(n) · OPT for some polynomial p, where n is the number of nodes of the
given graph. Then P = NP. (Hint: Show that under this assumption, one
can decide whether a graph has a Hamiltonian circuit.)

1.4 Construction, evaluation, and decision


Let us investigate the relation between the construction, evaluation, and
decision problem associated with a problem P ∈ NPO.

Theorem 1.6. Let P ∈ NPO. Then

1. Pdec ≤T T
P Peval and Peval ≤P Pdec .

2. Peval ≤T
P P . (Since this is a negative statement about P , it means that
Peval ≤T
P P holds for all cuts P .)

Proof. We start with the first statement: Pdec ≤T P Peval is seen easily:
On input hx, bin(B)i, we can compute OPT(x) using the oracle Peval , and
compare it with B.
Peval ≤T
P Pdec is a little trickier: Since m is polynomial time computable,
OPT(x) ≤ 2q(|x|) for some polynomial q. Using binary search, we can find
OPT(x) with q(n) oracle queries.
1.5. NP-hard optimization problems 7

For the second statement, note that when we have an optimum solution,
then we can compute OPT(x).
If Pdec is NP-complete, then the optimization problem is not harder than
the decision problem.
Theorem 1.7. Let P ∈ NPO such that Pdec is NP-complete. Then P ≤T
P
Pdec .
Proof. Assume that P is a maximization problem, the minimization
case is symmetric. Let q be a polynomial such for every x ∈ I and y ∈ S(x),
|y| ≤ q(|x|) and m(x, y) is bounded by 2q(|x|) .
For given x, fix some polynomial time computable total order on the
set {0, 1}≤q(|x|) . For y ∈ {0, 1}≤q(|x|) , let λ(y) be the rank that y has with
respect to this order.
We derive a new problem P̂ from P by defining a new objective function.
The objective function of P̂ is given by
m̂(x, y) = 2q(n)+1 m(x, y) + λ(y).
Note that the first summand is always bigger than 2q(n)+1 > λ(y). This
implies that for all x and y1 , y2 ∈ S(x), m̂(x, y1 ) 6= m̂(x, y2 ). Furthermore,
if m̂(x, y1 ) ≥ m̂(x, y2 ) then m(x, y1 ) ≥ m(x, y2 ). Thus if y ∈ Sˆ∗ (x) then
y ∈ S∗ (x), too. (Here Sˆ∗ (x) is the set of optimum solutions of x as an
instance of P̂ .)
An optimal solution y of Sˆ∗ (x) can be easily derived from OPT(x): ˆ We
ˆ
compute the remainder of the division OPT(x) with 2 q(n)+1 . This remainder
is λ(y) from which we can obtain y. Thus P ≤T T
P P̂ ≤P P̂eval .
By Theorem 1.6, P̂eval ≤T P P̂dec . Since P̂dec ∈ NP and Pdec is NP-complete
by assumption, P̂dec ≤P Pdec . Using transitivity, we get P ≤T P Pdec .

1.5 NP-hard optimization problems


Definition 1.8. An optimization problem P is NP-hard if for all L ∈ NP,
L ≤T
P P.
Theorem 1.9. If P is NP-hard and P ∈ PO, then P = NP.
Exercise 1.2. Prove Theorem 1.9.
Theorem 1.10. Let P ∈ NPO. If Pdec is NP-hard, then P is NP-hard.
Proof. Since Pdec is NP-hard, L ≤P Pdec for all L ∈ NP. Since many-one
reducibility is a special case of Turing reducibility and ≤T
P is transitive, we
get L ≤TP P .
Some authors prefer to call an optimization problem NP-hard if Pdec
is NP-hard. Theorem 1.10 states that this definition is potentially more
restrictive than our definition.
8 1. Complexity of optimization problems

Corollary 1.11. If P 6= NP, then PO 6= NPO

Proof. There is a problem P in NPO such that Pdec is NP-hard, for in-
stance ∆-TSP. If P would belong to PO, then also Pdec ∈ P by Theorem 1.7,
a contradiction.
2 Approximation algorithms and
approximation classes

In the most general sense, an approximation algorithm is an algorithm that


given a valid instance x is able to compute some feasible solution.
Definition 2.1. A deterministic Turing machine A is an approximation
algorithm for an optimization problem P = (I, S, m, goal) if
1. the running time A is polynomial,

2. A(x) ∈ S(x) for all x ∈ I.


Of course, there are good and not so good approximation algorithms
and we develop a framework to measure the quality or approximation per-
formance of such an algorithm.
Definition 2.2. 1. Let P be an optimization problem, x ∈ I, and y ∈
S(x). The performance ratio of y with respect to x is defined as
 
m(x, y) OPT(x)
PR(x, y) = max , . 1
OPT(x) m(x, y)

2. Let α : N → Q. An approximation algorithm A is an α-approximation


algorithm, if for all x ∈ I,

PR(x, A(x)) ≤ α(|x|).

The definition of PR(x, y) basically means that in the case of a mini-


mization problem, we measure how many times the objective value of the
computed solution exceeds the objective value of an optimum solution. In
the case of a maximization problem, we do the same but we take the recip-
rocal. This may seem strange at a first glance but it has the advantage that
we can treat minimization and maximization problems in a uniform way.
(Be aware though that some authors use m(x, y)/ OPT(x) to measure the
approximation performance in case of maximization problems. But this is
merely a question of faith.)
Definition 2.3. 1. Let F be some set of functions N → Q. An optimiza-
tion problem P ∈ NPO is contained in the class F -APX if there is an
f ∈ F such that there exists an f -approximation algorithm for P .
1
Note that m only attains positive values. Thus, the quotient is always defined.

9
10 2. Approximation algorithms and approximation classes

2. APX := O(1)-APX.
(I hope that the elegant definition above clarifies why PR was defined
for maximization problems as it is.) There is a well-known 2-approximation
algorithm for ∆-TSP that is based on minimum spanning trees, thus

∆-TSP ∈ APX.

Even stronger is the concept of a polynomial time approximation scheme.


Definition 2.4. A deterministic Turing machine A is a polynomial time ap-
proximation scheme (PTAS) for an optimization problem P = (I, S, m, goal)
if on input hx, i for all small enough  > 0,
1. the running time of A is polynomial in the size of x (but not necessarily
in ), and
2. A(x, ) is a feasible solution for x with performance ratio 1 + .
We do not have to distinguish between minimization and maximization
problems. If a solution y has performance ratio 1 +  in the case of a
1
maximization problem, then we know that m(x, y) ≥ 1+ OPT(x). We have
1 
=1− ≥ 1 − ,
1+ 1+
which is exactly what we want.
Definition 2.5. PTAS is the class of all problems in NPO that have a PTAS.
We have
PO ⊆ PTAS ⊆ APX
If P 6= NP, then both inclusions are strict. Under this assumption, a problem
in APX \ PTAS is Maximum Satisfiability (see the next chapters for a proof),
a problem in PTAS \ PO is Knapsack (solve the next exercise for a proof).
Problem 2.6. Knapsack is the following problem:
Instances: rational numbers w1 , . . . , wn (weights), p1 , . . . , pn (profits),
and B (capacity bound) such X that wν ≤ B for all ν.
Solutions: I ⊆ {1, . . . , n} such that wi ≤ B
X i∈I
Measure: pi , the total profit of the items packed
i∈I
Goal: max
We may assume w.l.o.g. that all the pν are natural numbers. If this is not
the case, assume that pν = xν /yν with gcd(xν , yν ) = 1. Let Y = y1 · · · yn .
We now replace pν by pν · Y ∈ N. Any knapsack that maximizes the old
objective function also maximizes the new one. The size of the instance is
only polynomially larger. (Note that we encode all inputs in binary.)
2.1. Gap problems 11

Exercise 2.1. 1. Show that there is an algorithm for Knapsack with run-
ning time polynomial in n and P := max1≤ν≤n pν . (Compute by dy-
namic programming sets of indices I(i, p) such that

• ν ≤ i for all ν ∈ I(i, p),


• the sum of the pν with ν ∈ I(i, p) is exactly p, and
• the sum of all wν with ν ∈ I(i, p) is minimum among all such set
of indices.)

2. Show that we get a PTAS out of this pseudopolynomial algorithm as


follows:

• Let S = P/n and p̂ν = bpν /Sc for 1 ≤ ν ≤ n.


• Find an optimum solution for the instance w1 , . . . , wn , p̂1 , . . . , p̂n ,
and B.

Remark 2.7. The running of the PTAS constructed in the previous exercise
is also polynomial in 1 . This is called an fully polynomial time approxima-
tion scheme (FPTAS). The corresponding complexity class is denoted by
FPTAS.

Exercise 2.2. A super fully polynomial time approximation scheme is a


PTAS whose running time is polynomial in log 1 . Show that if Knapsack
has a super fully polynomial time approximation scheme, then P = NP.

2.1 Gap problems


A promise problem is a tuple of languages Q = (L, U ) with L ⊆ U . (Think
of U as the universe of admissible inputs.) A Turing machine decides a
promise problem, if for all x ∈ U , M (x) = 1 if x ∈ L and M (x) = 0 if
x ∈ U \ L. On inputs not in U , M may output whatever it wants. Since
we do not have to care about the behaviour of M on inputs not in U , we
can also think that we get an input with the additional promise that it is
in U . The elements in L are often called yes-instances, the elements in
U \ L are called no-instances, and the elements not in U are called don’t
care-instances. ”Ordinary” decision problems are a special case of promise
problems, we just set U = {0, 1}∗ .
Many-one-reductions can be extended to promise problems in a natural
way. Let Q = (L, U ) and Q0 = (L0 , U 0 ) be two promise problems. Q is
polynomial time many-one reducible to Q0 if there is a polynomial time
computable function f such that

x ∈ L =⇒ f (x) ∈ L0 and
0 0
x ∈ U \ L =⇒ f (x) ∈ U \ L
12 2. Approximation algorithms and approximation classes

That means that yes-instances are mapped to yes-instances and no-instances


are mapped to no-instances. A promise problem Q is C-hard for some class
C of decision or promise problems, if every problem in C is polynomial time
many-one reducible to Q.

Definition 2.8. Let P = (IP , SP , mP , goal) be an optimization problem and


a < b. gap(a, b)-P is the promise problem (L, U ) where

U = {x | OPT(x) ≤ a or OPT(x) ≥ b}

and (
{x | OPT(x) ≥ b} if goal = max
L=
{x | OPT(x) ≤ a} if goal = min

That is, we get an instance x and the promise that the objective value
is at most a or at least b and we shall decide which of these two options is
the case. There is a difference in the definition of L for maximization and
minimization problems because the yes-instances shall be the inputs with
solutions that have a ”good” objective value. We will also allow a and b two
be functions N → N that depend on |x|.

Theorem 2.9. If gap(a, b)-P is NP-hard for polynomial time computable


functions a and b with input given in unary and output given in binary,
then there is no α-approximation algorithm for P with α < b/a, unless
P = NP.

Proof. Suppose on the contrary that such an algorithm A exists. We


only show the case goalP = min, the other case is treated similarly. Since
gap(a, b)-P is NP-hard, there is a polynomial time many-one reduction f
from SAT to gap(a, b)-P . We design a polynomial time algorithm for SAT as
follows:
Input: formula φ in CNF

1. Compute x = f (φ) and y = A(x).

2. If mP (x, y) < b(|x|), then accept, else reject.

Let us see why this algorithm is correct. If φ ∈ SAT, then OPTP (x) ≤
a(x) and
mP (x, y) ≤ α(|x|) · OPTP (x) < b(|x|).
If φ ∈
/ SAT, then OPTP (x) ≥ b(|x|) and

mP (x, y) ≥ OPTP (x) ≥ b(|x|).

Thus the algorithm works correctly. It is obviously polynomial time. There-


fore, P = NP.
2.2. Approximation preserving reductions and hardness 13

In Exercise 1.1, we have seen that there is no polynomial p such that


TSP can be approximated within 2p(n) where n is the number of nodes of
the given graph. However, note that n is not the size of the instance but
1−
O(p(n)n2 ). Thus gap(n, 2n )-TSP is NP-hard
Since TSP ∈ NPO, we get the following result.

Theorem 2.10. If P 6= NP, then APX ( NPO.

We can always approximate TSP within 2O(|x|) where x is the given in-
stance, since with |x| symbols we can encode integers up to 2O(|x|) . Thus
TSP is contained in the class exp-APX, as defined below.

Definition 2.11. exp-APX = {2p | p is a polynomial }-APX.

Thus the theorem above can be strengthened to the following statement.

Theorem 2.12. If P 6= NP, then APX ( exp-APX.

Exercise 2.3. What is the difference between exp-APX and NPO?

2.2 Approximation preserving reductions and hardness


Let P and P 0 be two optimization problems. If P is reducible to P 0 (in
some sense to be defined), then we would like to turn approximate solution
of P 0 back into approximate solutions of P . That is, we do not only need a
function that maps instances of P to instances of P 0 , we also need to transfer
solutions of P 0 back to solutions of P like we did for #P functions. Many of
the reductions between NP-complete problems give also this second function
for free. But what they usually do not do is that they preserve approximation
factors.

Problem 2.13. Maximum Clique (Clique) is the following problem:


Instances: graph G = (V, E)
Solutions: all cliques of G, i.e., all C ⊆ V such that for all u, v ∈ C
with u 6= v, {u, v} ∈ E
Measure: #C, the size of the clique
Goal: max

Problem 2.14. Vertex Cover (VC) is the following problem:


Instances: graph G = (V, E)
Solutions: all subsets C of V such that for each {u, v} ∈ E, C ∩{u, v} =
6

Measure: #C
Goal: min
Exercise 2.4. There is an easy reduction Cliquedec ≤P VCdec that simply
maps G to its complement.
14 2. Approximation algorithms and approximation classes

1. How does one get a clique of G from a vertex cover of the complement?

2. Assume we have a vertex cover that is a 2-approximation. What ap-


proximation do we get for Clique from this?

Definition 2.15. Let P, P 0 ∈ NPO. P is reducible to P 0 by an approxima-


tion preserving reduction (short: P is AP-reducible to P 0 or even shorter,
P ≤AP P 0 ) if there are two functions f, g : {0, 1}∗ × Q+ → {0, 1}∗ and an
α ≥ 1 such that

1. for all x ∈ IP and β > 1, f (x, β) ∈ IP 0 ,

2. for all x ∈ IP and β > 1, if SP (x) 6= ∅ then SP 0 (f (x, β)) 6= ∅,

3. for all x ∈ IP , y ∈ SP 0 (f (x, β)), and β > 1,

g(x, y, β) ∈ SP (x),

4. f and g are deterministically polynomial time computable for fixed


β > 1,

5. for all x ∈ IP and all y ∈ SP 0 (f (x, β)), if y is a β-approximate solution


of f (x, β), then g(x, y, β) is an (1 + α(β − 1))-approximate solution of
x.

(f, g, α) is called an AP-reduction from P to P 0 .2

Lemma 2.16. If P ≤AP P 0 and P 0 ∈ APX, then P ∈ APX.

Proof. Let (f, g, α) be an AP-reduction from P to P 0 and let A0 be a β-


approximation algorithm for P 0 . Given x ∈ IP , A(x) := g(x, A0 (f (x, β)), β)
is a (1 + α(β − 1))-approximate solution for x. This follows directly from the
definition of AP-reduction. Furthermore, A is polynomial time computable.

Exercise 2.5. Let P ≤AP P 0 . Show that if P 0 ∈ PTAS, so is P .

The reduction in Exercise 2.4 is not an AP-reduction. This has a deeper


reason. While there is a 2-approximation algorithm for VC, Clique is much
harder to approximate. Håstad [Hås99] shows that any approximation algo-
rithm with performance ratio n1−0 for some 0 > 0 would imply ZPP = NP
(which is almost as unlikely as P = NP).
2
The functions f, g depend on the quality β of the solution y. I am only aware of one
example where this dependence seems to be necessary, so usually, f and g will not depend
on β.
2.2. Approximation preserving reductions and hardness 15

Problem 2.17. Maximum Independent Set (IS) is the following problem:


Instances: graph G = (V, E)
Solutions: independent sets of G, i.e., all S ⊆ V such that for all u, v ∈
S with u 6= v, {u, v} ∈
/E
Measure: #S
Goal: max

Exercise 2.6. Essentially the same idea as in Exercise 2.4 gives a reduction
from Clique to IS. Show that this is an AP-reduction.

Definition 2.18. Let C ⊆ NPO. A problem P is C-hard (under AP-


reductions) if for all P 0 ∈ C, P 0 ≤AP P . P is C-complete if it is in C
and C-hard.

Lemma 2.19. ≤AP is transitive.

Proof. Let P ≤AP P 0 and P 0 ≤AP P 00 . Let (f, g, α) and (f 0 , g 0 , α0 ) be the


corresponding reductions. Let γ = 1 + α0 (β − 1) We claim that (F, G, αα0 )
is an AP-reduction from P to P 00 where

F (x, β) = f 0 (f (x, γ), β),


G(x, y, β) = g(x, g 0 (f (x, γ), y, γ), β).

We verify that (F, G, αα0 ) is indeed an AP-reduction by checking the five


conditions in Definition 2.15:

1. Obvious.

2. Obvious, too.

3. Almost obvious, thus we give a proof. Let x ∈ IP and y ∈ SP 00 (F (x, β)).


We know that g 0 (f (x, γ), y, β) ∈ SP 0 (f (x, γ)), since (f 0 , g 0 , α0 ) is an
AP-reduction. But then also g(x, g 0 (f (x, γ), y, γ), β) ∈ SP (x), since
(f, g, α) is an AP-reduction.

4. Obvious.

5. Finally, if y is a β-approximation to f 0 (f (x, γ), β), then g 0 (f (x, γ), y, β)


is a (1+α0 (β−1))-approximation to f (x). But then g(x, g 0 (f (x, γ), y, β), γ)
is a (1 + αα0 (β − 1))-approximation to x, as

1 + α(1 + α0 (β − 1) − 1) = 1 + αα0 (β − 1).

Lemma 2.20. Let C ⊆ NPO. If P ≤AP P 0 and P is C-hard, then P 0 is also


C-hard.
16 2. Approximation algorithms and approximation classes

Proof. Let Q ∈ C be arbitrary. Since P is C-hard, Q ≤AP P . Since ≤AP


is transitive, Q ≤AP P 0 .
Thus once we have identified one APX-hard problem, we can prove the
APX-hardness using AP-reductions. A canonical candidate is of course the
following problem:
Problem 2.21 (Max-SAT). The Maximum Satisfiability problem (Max-SAT)
is defined as follows:
Instances: formulas in CNF
Solutions: Boolean assignments to the variables
Measure: the number of clauses satisfied
Goal: max
Proposition 2.22. Max-SAT is APX-hard.
The proof of this proposition above is very deep, we will spend the next
few weeks with it.
Exercise 2.7. Give a simple 2-approximation algorithm for SAT.

2.3 Further exercises


Here in an NPO-complete problem.
Problem 2.23. Maximum Weighted Satisfiability is the following prob-
lem:
Instances: Boolean formula φ with variables x1 , . . . , xn having nonneg-
ative weights w1 , . . . , wn
Solutions: BooleanP assignments α : {x1 , . . . , xn } → {0, 1} that satisfy φ
Measure: max{1, ni=0 wi α(xi )}
Goal: max
Exercise 2.8. 1. Show that every maximization problem in NPO is AP-
reducible to Maximum Weighted Satisfiability. (Hint: Construct an
NP-machine that guesses a solution y to input x and computes m(x, y).
Use a variant of the proof of the Cook-Karp-Levin Theorem to produce
an appropriate formula in CNF. Assign only nonzero weights to vari-
ables that contain the bits of m(x, y).)

2. Show that every minimization problem in NPO is AP-reducible to Min-


imum Weighted Satisfiability.

3. Show that Maximum Weighted Satisfiability is AP-reducible to Mini-


mum Weighted Satisfiability and vice versa.

4. Conclude that Maximum (Minimum) Weighted Satisfiability is NPO-


complete
2.3. Further exercises 17

The world of optimization classes

PO ⊆ PTAS ⊆ APX ⊆ exp-APX ⊆ NPO


All of these inclusion are strict, provided that P 6= NP. Under this
assumption, we have for instance

• Knapsack ∈ PTAS \ PO

• TSP ∈ exp-APX \ APX

• Weighted Satisfiability ∈ NPO \ exp-APX.

The goal of the next chapters is to prove that Max-SAT is in


APX \ PTAS provided that P 6= NP.
3 Probabilistically checkable proofs
and inapproximability

3.1 Probabilistically checkable proofs (PCPs)


3.1.1 Probabilistic verifiers
A polynomial time probabilistic verifier is a polynomial time probabilistic
Turing machine that has oracle access to a proof π ∈ {0, 1}∗ in the following
way: The proof π induces a function {0, 1}log(|π|) → {0, 1} by mapping
b ∈ {0, 1}log(|π|) to the bit of π that stands in the position that is encoded by
the binary representation b. By abuse of notation, we will call this function
again π. If the verifier queries a bit outside the range of π, then the answer
will be 0.
A verifier described above may query π several times and each query
may depend on previous queries. Such a behavior is called adaptive. We
need a more restricted kind of verifiers, called nonadaptive: A nonadaptive
verifier gets the proof π again as an oracle, but in a slightly different form:
The verifier can write down several positions of π at one time. If it enters
the query state, it gets the values of all the positions that it queries. But the
verifier may enter the query state only once, i.e., the verifier has to decide
in advance which bits it wants to query.
A nonadaptive probabilistic verifier is called (r(n), q(n))-restricted if it
uses r(n)-bits of randomness and queries q(n) bits of π for all n and all
inputs x of of length n.

Definition 3.1. Let r, q : N → N. A language L belongs to the class


PCP[r, q] if there exists a (r, q)-restricted nonadaptive polynomial time prob-
abilistic verifier such that the following holds:

1. For any x ∈ L, there is a proof π such that

Pr[V π (x, y) = 1] = 1.
y

2. For any x ∈
/ L and for all proofs π,

Pr[V π (x, y) = 0] ≥ 1/2.


y

The probabilities are taken over the the random strings y.

18
3.1. Probabilistically checkable proofs (PCPs) 19

In other words, if x is in L, then there is a proof π that convinces the


verifier regardless of the random string y. If x is not in L, then the verifier
will detect a “wrong” proof with probability at least 1/2, that is, for half of
the random strings.
Since the verifier is r(n)-restricted, there are only 2r(n) (relevant) random
strings. For any fixed random string, the verifiers queries at most q(n) bits
of the proof. Therefore, for an input x of length n, we only have to consider
proofs of length q(n)2r(n) , since the verifier cannot query more bits than
that.

3.1.2 A different characterization of NP


Once we have defined the PCP classes, the obvious question is: What is this
good for and how is it related to other classes? While complexity theorists
also like to answer the second part of the question without knowing an
answer to the first part, here the answer to the second part also gives the
answer to the first part.
Let R and Q denote sets of functions N → N. We generalize the notion
of PCP[r, q] in the obvious way:
[
PCP[R, Q] = PCP[r, q].
r∈R,q∈Q

The characterization of NP by polynomial time verifiers immediately yields


the following result.

Proposition 3.2. NP = PCP[0, poly(n)].

In the theorem above, we do not use the randomness at all. The next
result, the celebrated PCP theorem [?], shows that allowing a little bit of
randomness reduces the number of queries dramatically.

Theorem 3.3 (PCP Theorem). NP = PCP[O(log n), O(1)].

What does this mean? By allowing a little randomness—note that


O(log n) are barely sufficient to choose O(1) bits of the proof at random—
and a bounded probability of failure, we can check the proof π by just
reading a constant number of bits of π! This is really astonishing.

Exercise 3.1. Show that PCP[O(log n), O(1)] ⊆ NP. (Hint: How many
random strings are there?)

The other direction is way more complicated, we will spend the next few
lectures with its proof. We will not present the original proof by Arora et
al. [ALM+ 98] but a recent and—at least compared to the first one—elegant
proof by Irit Dinur [Din07].
20 3. PCP and inapproximability

3.2 PCPs and gap problems


The PCP theorem is usually used to prove hardness of approximation results.
Dinur’s proof goes the other way around, we show that the statement of the
PCP theorem is equivalent to the NP-hardness of some gap problem

Theorem 3.4. The following two statements are equivalent:

1. NP = PCP[O(log n), O(1)].

2. There is an  > 0 such that gap(1 − , 1)-Max-3-SAT is NP-hard.1

Proof. “=⇒”: Let L be any NP-complete language. By assumption,


there is an (r(n), q)-restricted nonadaptive polynomial time probabilistic
verifier V with r(n) = O(log n) and q = O(1). We can assume that V
always queries exactly q bits.
Let x be an input for L of length n. We will construct a formula in
3-CNF φ in polynomial time such that if x ∈ L, then φ is satisfiable and if
x∈ / L, then every assignment can satisfy at most a fraction of 1 −  of the
clauses for some fixed  > 0.
For each position i in the proof, there will be one Boolean variable vi . If
vi is set to 1, this will mean that the corresponding ith bit is 1; if it is set
to zero, then this bit is 0. Since we can restrict ourselves to proofs of length
≤ q · 2r(n) = poly(n), the number of these variables is polynomial.
For a random string y, let i(y, 1), . . . , i(y, q) denote the positions of the
bits that the verifier will query. (Note that the verifier is nonadaptive,
hence these position can only depend on y.) Let Ay be the set of all q-
tuples (b1 , . . . , bq ) ∈ {0, 1}q such that if the i(y, j)th bit of the proof is bj for
1 ≤ j ≤ q, then the verifier will reject (with random string y).
For each tuple (b1 , . . . , bq ) ∈ Ay , we construct a clause of q literals, that
is true iff the variables vi(y,1) , . . . , vi(y,q) do not take the value b1 , . . . , bq , i.e,
1−b1 1−b
vi(y,1) q
∨ · · · ∨ vi(y,q) . (Here, for a Boolean variable v, v 1 = v and v 0 = v̄.)
The formula φ has ≤ |Ay |2r(n) ≤ 2q+r(n) = poly(n) many clauses. These
clauses have length q. Like in the reduction of SAT to 3SAT, for each such
clause c, there are q − 2 clauses c1 , . . . , cq−2 of length three in the variables
of c and some additional variables such that any assignment that satisfies c
can be extended to an assignment that satisfies c1 , . . . , cq−2 and conversely,
the restriction of any assignment that satisfies c1 , . . . , cq−2 satisfies c, too.
This replacement can be computed in polynomial time.
The formula φ can be computed in polynomial time: We enumerate all
(polynomially many) random strings. For each such string y, we simulate
1
Instead of stating the absolute bounds (1 − )m and m, where m is the number of
clauses of the given instance, we just state the relative bounds 1 −  and 1. This is very
convenient here, since there is an easy upper bound of the objective value, namely m.
3.2. PCPs and gap problems 21

the verifier V to find out which bits he will query. Then we can give him all
the possible answers to the bits he queried to compute the sets Ay .
If x ∈ L, then there will be a proof π such that V π (x, y) = 1 for every
random string y. Therefore, if we set the variables of φ as given by this
proof π, then φ will be satisfied.
/ L, then for any proof π, there are at least 2r(n) /2 random strings
If x ∈
y for which V π (x, y) = 0. For each such y, one clause corresponding to a
tuple in Ay will not be satisfied. In other words, for any assignment, 2r(n) /2
clauses will not be satisfied. The total number of clauses is bounded by
(q − 2)2q+r(n) . The fraction of unsatisfied clauses therefore is
2r(n) /2
≥ ≥ 2−q−1 /(q − 2),
(q − 2)2q+r(n)
which is a constant.
“⇐=”: By Exercise 3.1, it suffices to show that NP ⊆ PCP[O(log n), O(1)].
Let L ∈ NP. By assumption, there is a polynomial time computable function
f such that
x ∈ L =⇒ f (x) is a satisfiable formula in 3-CNF,
x∈
/ L =⇒ f (x) is a formula in 3-CNF such that every assignment
satisfies at most (1 − ) of the clauses.
We construct a probabilistic verifier as follows:
Input: input x, proof π
1. Compute f (x).
2. Randomly select a clause c from f (x).
3. Interpret π as an assignment to f (x) and read the bits that belong to
the variables in c.
4. Accept if the selected clause c is satisfied. Reject otherwise.

Let m be the number of clauses of f (x). To select a clause at random,


the verifier reads log m random bits and interprets it as a number. If it
“selects” a nonexisting clause, then it will accept. So we can think of m
being a power of two at the expense of replacing  by /2.
Now assume x ∈ L. Then f (x) is satisfiable and therefore, there is a proof
that will make the verifier always accept, namely a satisfying assignment of
f (x). If x ∈
/ L, then no assignment will satisfy more than 1− of the clauses.
In particular, the probability that the verifier selects a clause that is satisfied
is at most 1 − . By repeating this process for a constant number of times,
we can bring the error probability down to 1/2.
Since f (x) is in 3-CNF, the verifier needs O(log m) = O(log |x|) random
bits, and it only queries O(1) bits of the proof.
22 3. PCP and inapproximability

Exercise 3.2. Let c be a clause of length q. Construct clauses c1 , . . . , cq−2 of


length three in the variables of c and some additional variables such that any
assignment that satisfies c can be extended to an assignment that satisfies
c1 , . . . , cq−2 and conversely, the restriction of any assignment that satisfies
c1 , . . . , cq−2 satisfies c, too.

Note that we get an explicit value for  in terms of q. Thus in order to


get good nonapproximability results from the PCP theorem, we want q to
be as small as possible.

3.3 Further exercises


Exercise 3.3. Show that PCP[O(log n), 2] = P.

It can be shown—tadah!—that three queries are enough to capture NP;


however, it is not possible to get error probability 1/2 and one-sided error,
see [GLST98] for further discussions.
A Max-3-SAT is APX-hard

In this chapter, we will strengthen the result of the previous one by showing
that Max-3-SAT is in fact APX-hard. We do this in several steps. First, we
show that any maximization problem in APX is AP-reducible to Max-3-SAT.
Second, we show that for every minimization problem P , there is a maxi-
mization problem P 0 such that P ≤AP P 0 . This will conclude the proof.
Our proof of the PCP-Theorem will also yield the following variant,
which we will use in the following.

Theorem A.1 (PCP-Theorem’). There are  > 0 and polynomial time


computable functions fPCP and gPCP such that for every formula ψ in 3-
CNF:

1. fPCP (ψ) is a formula in 3-CNF,

2. if ψ is satisfiable, so is fPCP (ψ),

3. if ψ is not satisfiable, then any assignment can satisfy at most a frac-


tion of 1 −  of the clauses in fPCP (ψ),

4. if a is an assignment for fPCP (ψ) that satisfies more than a fraction


of 1 −  of the clauses, then gPCP (ψ, a) is an assignment that satisfies
ψ.

Theorem A.2. Let P = (IP , SP , mP , max) be a maximization problem in


APX. Then P ≤AP Max-3-SAT.

Proof. Our goal is to construct an AP reduction (f, g, α) from P to


Max-3-SAT. Let fPCP and gPCP be the functions constructed in Theorem A.1
and let  be the corresponding constant. Let A be a b-approximation algo-
rithm for P . Let
1+
α = 2(b log b + b − 1) .

Our goal is to define the functions f and g given β. Let r = 1 + α(β − 1).
If r < b, then
r−1  r−1 
β= +1= · +1< +1 (A.1)
α 2(1 + ) b log b + b − 1 2k(1 + )

where k = dlogr be. The last inequality follows from


log b r log b b log b + b − 1
k≤ +1≤ +1≤ .
log r r−1 r−1

23
24 A. Max-3-SAT is APX-hard

Let µ(x) = mP (x, A(x)). Since A is a b-approximation algorithm, µ(x) ≤


OPTP (x) ≤ bµ(x).
The following Turing machine computes f :
Input: x ∈ {0, 1}∗ , β ∈ Q+

1. Construct formulas φx,i in 3-CNF that are true if OPTP (x) ≥ i.


(These formulas φx,i can be uniformly constructed in polynomial time,
cf. the proof of Cook’s theorem.)

2. Let ψx,κ = fPCP (φx,µ(x)rκ ), 1 ≤ κ ≤ k.


By padding with dummy clauses, we may assume that all the ψx,κ
have the same number of clauses c.

3. Return ψx = kκ=1 ψx,κ .


W

The function g is computed as follows:


Input: x ∈ {0, 1}∗ , assignment a with performance ratio β

1. If b ≤ 1 + α(β − 1), then return A(x).1

2. Else let κ0 be the largest κ such that gPCP (φx,µ(x)rκ ,a ) satisfies φx,µ(x)rκ .
(We restrict a to the variables of φx,µ(x)rκ .)

3. This satisfying assignment corresponds to a feasible solution y with


mP (x, y) ≥ µ(x)rκ0 .
Return y.

If b ≤ 1 + α(β − 1), then we return A(x). This is a b-approximation by


assumption. Since b ≤ 1 + α(β − 1), we are done.
Therefore, assume that b > 1 + α(β − 1). We have

β−1 β−1
OPTMax-3-SAT (ψx ) − mMax-3-SAT (ψx , a) ≤ OPTMax-3-SAT (ψx ) ≤ kc .
β β

Let βκ denote the performance ratio of a with respect to ψx,κ , i.e., we view
a as an assignment of ψx,κ . We have

OPTMax-3-SAT (ψx ) − mMax-3-SAT (ψx , a) ≥ OPTMax-3-SAT (ψx,κ ) − mMax-3-SAT (ψx,κ , a)


βκ − 1
= OPTMax-3-SAT (ψx,κ )
βκ
c βκ − 1
≥ · .
2 βκ
1
Here is the promised dependence on β.
25

The last inequality follows from the fact that any formula in CNF has an
assignment that satisfies at least half of the clauses. This yields

c βκ − 1 β−1
· ≤ kc
2 βκ β

and finally
1
βκ ≤ .
1 − 2k(β − 1)/β
Exploiting (A.1), we get, after some routine calculations,

βκ ≤ 1 + .

This means that a satisfies at least a fraction of 1/βκ ≥ 1 −  of the clauses


of ψx,κ . Then gPCP (a) satisfies φx,µ(x)rκ if and only if φx,µ(x)rκ is satisfiable.
This is equivalent to the fact that OPTP (x) ≥ µ(x)rκ . By the definition of
κ0 ,
µ(x)rκ0 +1 > OPTP (x) ≥ µ(x)rκ0 .

This means that mP (x, y) ≥ µ(x)rκ0 . But then y is an r-approximate solu-


tion. Then we are done, since r = 1 + α(β − 1) by definition.

Theorem A.3. For every minimization problem P ∈ APX, there is a max-


imization problem P 0 ∈ APX such that P ≤AP P 0 .

Proof. Let A be a b-approximation algorithm for P . Let µ(x) = mP (x, A(x))


for all x ∈ IP . Then µ(x) ≤ b OPTP (x). P 0 has the same instances and fea-
sible solutions as P . The objective function is however different:
(
(k + 1)µ(x) − k mP (x, y) if mP (x, y) ≤ µ(x)
mP 0 (x, y) =
µ(x) otherwise

where k = dbe. We have µ(x) ≤ OPTP 0 (x) ≤ (k + 1)µ(x). This means, that
A is a (k + 1)-approximation algorithm for P 0 . Hence, P 0 ∈ APX.
The AP reduction (f, g, α) from P 0 to P is defined as follows: f (x, β) = x
for all x ∈ IP . (Note that we do not need any dependence on β here). Next,
we set (
y if mP (x, y) ≤ µ(x)
g(x, y, β) =
A(x) otherwise

And finally, α = k + 1.
Let y be a β-approximate solution to x under mP 0 , that is, RP 0 (x, y) =
OPTP 0 (x)/ mP 0 (x, y) ≤ β. We have to show that RP (x, y) ≤ 1 + α(β − 1).
26 A. Max-3-SAT is APX-hard

We distinguish two cases: The first one is mP (x, y) ≤ µ(x). In this case,

(k + 1)µ(x) − mP 0 (x, y)
mP (x, y) =
k
(k + 1)µ(x) − OPTP 0 (x)/β

k
(k + 1)µ(x) − (1 − (β − 1)) OPTP 0 (x)

k
β−1
≤ OPTP (x) + OPTP 0 (x)
k
β−1
≤ OPTP (x) + (k + 1)µ(x)
k
≤ OPTP (x) + (β − 1)(k + 1)µ(x)/r
≤ (1 + α(β − 1)) OPTP (x).

This completes the first case.


For the second case, note that

mP (x, g(x, y)) = mP (x, A(y)) ≤ b OPTP (x) ≤ (1 + α(β − 1)) OPTP (x).

Thus, P ≤AP P 0 .
Now Theorems A.2 and A.3 imply the following result.

Theorem A.4. Max-3-SAT is APX-hard.

A.1 Further exercises


Exercise A.1. Show that Max-3-SAT ≤AP Clique. (In particular, Clique
does not have a PTAS, unless P = NP.)

The kth cartesian product of a graph G = (V, E) is a graph with nodes


V k and there is an edge between (u1 , . . . , uk ) and (v1 , . . . , vk ) if either ui = vi
or {ui , vi } ∈ E for all 1 ≤ i ≤ k.

Exercise A.2. 1. Prove that if G has a clique of size s, then Gk has a


clique of size sk .

2. Use this to show that if Clique ∈ APX, then Clique ∈ PTAS. Now
apply Exercise A.1.

Håstad [Hås99] shows that any approximation algorithm with perfor-


mance ratio n1−0 for some 0 > 0 would imply ZPP = NP On the other
hand, achieving a performance ratio of n is trivial.
4 The long code

The next chapters follow the ideas of Dinur quite closely. The Diplomarbeit
by Stefan Senitsch [Sen07] contains a polished proof with many additional
details which was very helpful in preparing the next chapters.
Let Bn denote the set of all Boolean functions {0, 1}n → {0, 1} and Bn−
the set of all functions {−1, 1}n → {−1, 1}. These are essentially the same
objects, x 7→ −2x + 1 (or x 7→ (−1)x ) is a bijection that maps 1 to −1 and
0 to 1. Note that we identify 1 (true) with −1 and 0 (false) with 1. For our
purposes, it is more convenient to work with Bn− and this interpretation of
true and false.

Definition 4.1. Let x ∈ {−1, 1}n . The long code of x is the function
LCx : Bn− → {−1, 1} given by LCx (f ) = f (x) for all f ∈ Bn− .

The long code was invented by Bellare, Goldreich, and Sudan [BGS98].
2n
By ordering the functions in Bn− , we can view LCx as a vector in {−1, 1}2
and we will tacitly switch between these two views.
2n
The relative distance between two elements in A, B ∈ {−1, 1}2 is

δ(A, B) = Pr [A(f ) 6= B(f )],



f ∈Bn

i.e., it is the probability that the vectors A and B differ at a random position.
2n
Furthermore, we define a scalar product on {−1, 1}2 by
n
X
hA, Bi = E [AB] = 2−2 A(f ) · B(f ).

f ∈Bn −
f ∈Bn

Note that hA, Ai = 1 for all A.


For a set S ⊆ {−1, 1}n , let χS : Bn− → {−1, 1} be defined by
Y
χS (f ) = f (x).
x∈S
n
Let Vn = {A : Bn− → R}. Vn is a vector space of dimension 22 .

Lemma 4.2. {χS | S ⊆ {−1, 1}n } is an orthonormal basis of Vn .

Proof. Let S, T ⊆ {−1, 1}n with S =6 T . First,


n
X
hχS , χS i = 2−2 χS (f )2 = 1.

f ∈Bn

27
28 4. The long code

Second,
n
X X Y Y X Y
hχS , χT i = 2−2 χS (f )χT (f ) = f (x)· f (x) = f (x).
− − x∈S x∈T − x∈S∆T
f ∈Bn f ∈Bn f ∈Bn

Choose an x ∈ S∆T . Note that such an x exists, since S 6= T . Consider the


mapping on Bn− that maps a function f to the function g with f (x) = −g(x)
and f (y) = g(y) for all y 6= x. This mapping is an involution (i.e., self
inverse) that does not have any fixed points. Such an involution separates
the functions in Bn− into two sets of the same size such that for all functions
f , the corresponding function g is in the other set.1 Hence
X Y X Y
f (x) = f (x) f (y) = 0 .
− x∈S∆T − y∈S∆T \{x}
f ∈Bn f ∈Bn

Thus, the χS form an orthonormal family. Since its size equals the
2n
dimension of {−1, 1}2 , it is spanning, too.
Once we have an orthonormal family, we can look at Fourier expansions.
The Fourier coefficients of a function A : Bn− → {−1, 1} are given by
n
X Y
ÂS = hA, χS i = 2−2 A(f ) f (x).
− x∈S
f ∈Bn

The Fourier expansion of A is


X X Y
A(f ) = ÂS χS (f ) = ÂS f (x).
S⊆{−1,1}n S⊆{−1,1}n x∈S

Furthermore, Parceval’s identity holds, that is,


X
Â2S = hA, Ai = 1.
S⊆{−1,1}n

n
In what follows, we will usually consider folded strings. A ∈ {−1, 1}2
is called folded over true if for all f , A(−f ) = −A(f ). Let ψ : {−1, 1}n →
{−1, 1}. A is called folded over ψ if for all f , A(f ) = A(f ∧ ψ). A is simply
called folded, if it is folded over true and over ψ. (This assumes that ψ is
clear from the context.) If a string is folded, then we only need to specify
it on a smaller set of positions Dψ defined as follows: Let D be a set of
functions that contains exactly one function of every pair f and −f and let
Dψ = {f ∈ D | f = f ∧ ψ}.
1
To achieve this, pick a function, put it into the one set and its image under the
involution into the other. This is possible, since the involution has no fixed points. Repeat
until all function are put into one of the two sets. I am sorry that you had to read this,
but I was puzzled.
4.1. Properties of folded strings 29

4.1 Properties of folded strings


Lemma 4.3. If A = LCa for some a ∈ {−1, 1}n and ψ(a) = −1 then A is
folded.

Proof. Let f : {−1, 1}n → {−1, 1}. We have

A(f ) = LCa (f ) = f (a) = (f ∧ ψ)(a) = LCa (f ∧ ψ) = A(f ∧ ψ)

and
A(−f ) = LCa (−f ) = −f (a) = − LCa (f ) = −A(f ).

Lemma 4.4. Let ψ ∈ Bn− , let A ∈ Vn be folded, and let S ⊆ {−1, 1}n . We
have:

1. Ef ∈Bn− [A(f )] = 0.

2. If |S| is even, then ÂS = 0.

3. If there is a y ∈ S with ψ(y) = 1, then ÂS = 0.

Proof. We start with 1: Let f ∈ Bn− . Since A is folded, A(f )+A(−f ) = 0.


From this, it follows easily that the expected value is 0.
Next comes 2: We have
Y Y
χS (−f ) = −f (x) = (−1)|S| f (x) = χS (f ).
x∈S x∈S

Let f ∈ Bn− . We have

A(f )χS (f ) + A(−f )χS (−f ) = (A(f ) + A(−f ))χS (f ) = 0.

Thus ÂS = E[AχS ] = 0.


Finally, we show 3: Let f ∈ Bn− and let g ∈ Bn− be the function that
differs only at y from f . We have
Y Y Y
χS (g) = g(x) = g(y) g(x) = −f (y) f (x) = −χS (f )
x∈S x∈S\{y} x∈S\{y}

Since A is folded and f ∧ ψ = g ∧ ψ, A(f ) = A(g). Thus A(f )χS (f ) +


A(g)χS (g) = 0.
5 Long code tests

We will use the long code to encode assignments of a formula ψ ∈ Bn− . We


will design a test T that gets a string A and test whether A is the long code
of a satisfying assignment of ψ. The test will only query three bits of A!
However, the long code is simply to long . . . but this will not matter in the
end.

5.1 First test

Input: folded string A : Bn− → {−1, 1}, ψ : {−1, 1}n → {−1, 1}.

1. Let τ = 1/100.

2. Choose f, g ∈ Bn− uniformly at random.

3. Define µ ∈ Bn− as follows: If f (x) = 1, then let µ(x) = −1. If f (x) =


−1, then let
(
1 with probability 1 − τ ,
µ(x) =
−1 with probability τ .

4. Let h = µ · g.

5. If A(f ) = A(g) = A(h) = 1, then reject. Else accept.

The following lemma show that if the test T accepts, then A is close to
the long code of a satisfying assignment of A or its negation.

Lemma 5.1. There exists a constant K ∗ such that the following holds:
If Pr[T rejects (A, ψ)] ≤  for small enough  > 0, then there is an a ∈
{−1, 1}n with ψ(a) = −1 such that either δ(A, LCa ) < K ∗  or δ(A, − LCa ) <
K ∗ .

Proof. T accepts iff not all of A(f ), A(g) and A(h) equal 1. This is
equivalent to 1 − 81 (1 + A(f ))(1 + A(g))(1 + A(h)) = 1 (and not 0; the

30
5.1. First test 31

lefthand side is {0, 1}-valued). Thus


 
1
Pr[T accepts (A, ψ)] = Pr 1 − (1 + A(f ))(1 + A(g))(1 + A(h)) = 1
8
 
1
= E 1 − (1 + A(f ))(1 + A(g))(1 + A(h))
8
7 1
= − (E[A(f )] + E[A(g)] + E[A(h)]
8 8
+ E[A(f )A(g)] + E[A(f )A(h)] + E[A(f )A(h)]
+ E[A(f )A(g)A(h)]) (5.1)

As A is folded, E[A(f )] = 0 by Lemma 4.4. Since f , g, and h (check the


latter!) are drawn uniformly at random,

E[A(f )] = E[A(g)] = E[A(h)] = 0.

The pairs (f, g) and (f, h) are independent ((g, h) is, however, not!), thus

E[A(f )A(g)] = E[A(f )] E[A(g)] = 0,

and in the same way E[A(f )A(h)] = 0.


Therefore, it remains to estimate E[A(g)A(h)] and E[A(f )A(g)A(h)] in
(5.1). We start with E[A(g)A(h)] and will use Fourier analysis:
 
X
E[A(g)A(h)] = E  ÂS χS (g)ÂT χT (h)
S,T ⊆{−1,1}n
X
= ÂS ÂT E[χS (g)χT (h)]. (5.2)
S,T

So we should analyze the terms E[χS (g)χT (h)]. We start with the case
S 6= T . Let z ∈ S \ T , the other case is symmetric. We have
 
Y Y
E[χS (g)χT (h)] = E  g(x) h(y)
x∈S y∈T
 
Y Y
= E g(z) g(x) h(y)
x∈S\{z} y∈T
 
Y Y
= E[g(z)] E  g(x) h(y)
x∈S\{z} y∈T

= 0,
32 5. Long code tests

since g(z) and the remaining product are independent and E[g(z)] = 0,
because g is random. If T = S, then
" #
Y
E[χS (g)χS (h)] = E g(x)h(x)
x∈S
" #
Y
=E g 2 (x)µ(x)
x∈S
" #
Y
=E µ(x)
x∈S
Y
= E[µ(x)]
x∈S
Y
= (Pr[µ(x) = 1] − Pr[µ(x) = −1])
| {z } | {z }
x∈S
= 12 (1−τ ) = 12 (1+τ )
Y
= (−τ )
x∈S

= (−τ )|S| , (5.3)


Above, we used g 2 (x) = 1 for all x and the independence of µ(x) and µ(y)
for x 6= y. If we plug everything into (5.2), we get
X
E[A(g)A(h)] = Â2S (−τ )|S| .
S⊆{−1,1}n

Because Â∅ = 0 by Lemma 4.4,


X
|E[A(g)A(h)]| ≤ τ Â2s = τ, (5.4)
S⊆{−1,1}n

where the last inequality follows from Parceval’s identity.


Next comes E[A(f )A(g)A(h)] =: W . Like before, it can be shown that
X
W = Â2S ÂR E[χS (µ)χR (f )],
R⊆S⊆{−1,1}n

see Exercise 5.1. Now,


 
Y Y
E[χS (µ)χR (f )] = E  f (x)µ(x) µ(y)
x∈R y∈S\R
Y
= (Pr[f (x)µ(x) = 1] − Pr[f (x)µ(x) = −1]) · (−τ )|S\R|
x∈R
Y 1 
1 1

= τ− + (1 − τ ) · (−τ )|S\R|
2 2 2
x∈R

= (τ − 1)|R| (−τ )|S\R| .


5.1. First test 33

= (−τ )|S\R| has already be shown, see (5.3). Thus,


Q
Note that y∈S\R µ(y)
X
W = Â2S Âr (τ − 1)|R| (−τ )|S\R|
R⊆S⊆{−1,1}n

and
X
|W | ≤ |Â2S ||ÂR |(1 − τ )|R| τ |S\R|
R⊆S⊆{−1,1}n
X X
≤ |Â2S | |ÂR |(1 − τ )|R| τ |S\R| . (5.5)
S R⊆S

By the Cauchy–Schwartz inequality,


X sX sX
|R| |S\R|
|ÂR |(1 − τ ) τ ≤ |ÂR |2 ((1 − τ )|R| τ |S\R| )2
R⊆S R⊆S R⊆S
v
u |S|  
uX |S|
≤t ((1 − τ )2 )i (τ 2 )|S|−i
i
i=0
q
= (τ 2 + (1 − τ )2 )|S|
≤ (1 − τ )|S|/2 .

The last inequality follows by the choice of τ . Thus


X
|W | ≤ |Â2S |(1 − τ )|S|/2
R⊆S⊆{−1,1}n
X X
≤ |Â2S |(1 − τ ) + |Â2S |(1 − τ )|S|/2 .
|S|=1 |S|≥3

since ÂT = 0 for even T . We get a better bound for the first sum by
analyzing (5.5), since |A2R | ≤ 1.
Set  = Pr[T rejects]. Then (5.1) yields −1 + τ + 8 ≥ W . For small
enough , this yields W < 0 and therefore we get

1 − τ − 8 ≤ |W | ≤ (1 − ρ)(1 − τ ) + ρ(1 − τ )3/2


2
P
where ρ = |S|≥3 |ÂS |. From this, we get

8
ρ≤ √ ≤ K.
(1 − τ )(1 − 1 − τ )
8√
with K = (1−τ )(1− 1−τ )
, which is constant.
Finally, we will apply Theorem 5.2. For small enough , 1 − Lρ will
be greater than 0. Since Â∅ = 0, the first case in the theorem cannot
34 5. Long code tests

happen. Hence |Â2{a} | ≥ 1 − Lρ for some a ∈ {−1, 1}n . Thus, either Â{a} ≥

1 − Lρ ≥ 1 − Lρ or −Â{a} ≥ 1 − Lρ. In the first case, we get

1 − Lρ ≤ Â{a} = hA, χ{a} i = hA, LCa i,

because χ{a} (f ) = f (a) = LCa (f ). Thus

δ(A, LCa ) ≤ L/2ρ ≤ KL/2 · .

In the second case, we get δ(A, − LCa ) ≤ KL/2 ·  in the same way. By
Lemma 4.4, ψ(a) = −1, since Â{a} 6= 0.

Here is the theorem that we used in the proof above. It is essentially


the only result that we will not prove. The theorem says that whenever a
function A only has small Fourier coefficients corresponding to sets |S| > 1,
then most of the mass is concentrated in one Fourier coefficient with |S| ≤ 1.

Theorem 5.2 (Friedgut, Kalai & Naor [FKN02]). There is aP


constant L > 0
such that for all ρ > 0 and A : Bn− → {−1, 1} with ρ ≥ |S|>1 |Â2S | the
following holds: Either |Â2∅ | ≥ 1−Lρ or |Â2a | ≥ 1−Lρ for some a ∈ {−1, 1}n .

Â2S ÂR E[χS (µ)χR (f )].


P
Exercise 5.1. Show that W = R⊆S⊆{−1,1}n

Theorem 5.3. The long code test T has the following properties:

1. If a ∈ {−1, 1}n with ψ(a) = −1, then T accepts LCa and ψ with
probability 1.

2. There is a constant c > 0 such that for all 0 < δ ≤ 1, if A is folded


and δ(A, LCa ) ≥ δ for all a ∈ {−1, 1}n with ψ(a) = −1, then T rejects
A and ψ with probability ≥ cδ.

Proof. We first prove 1: Let a ∈ {−1, 1}n with ψ(a) = −1 and let
A = LCa . By Lemma 4.3, A is folded. If A(f ) = f (a) = −1, then T accepts.
If A(f ) = f (a) = 1, then µ(a) = −1. Hence, A(h) = h(a) 6= g(a) = A(g).
Thus one of these two values equals −1 and T accepts, too.
Now we come to 2: Assume that the assertion does not hold. Then for
all c > 0, there is a δc such that δ(A, LCa ) ≥ δc for all a ∈ {−1, 1}n with
ψ(a) = −1 and T rejects with probability  < cδc .
We choose c < 1/K ∗ small enough and apply Lemma 5.1. There is an
a ∈ {−1, 1}n such that ψ(a) = −1 and δ(A, LCa ) < K ∗  ≤ cK ∗ δc < δc or
δ(A, − LCa ) < δc . The first possibility is ruled out by the assumption about
δc . Hence δ(A, − LCa ) < δc .
5.2. Second test 35

We have

Pr[T rejects (− LCa , ψ)] = Pr[LCa (f ) = LCa (g) = LCa (h) = −1]
= Pr[f (a) = g(a) = h(a) = −1]
= Pr[f (a) = −1] · Pr[g(a) = 1] · Pr[h(a) − 1|f (a) = g(a) = −1]
1
= Pr[µ(a) = 1|f (a) = −1]
4
1
= (1 − τ ).
4
This implies that

 = Pr[T rejects (− LCa , ψ)]


≥ 1 − Pr[T accepts (− LCa , ψ)]
≥ 1 − Pr[T accepts (− LCa , ψ) or
− LCa (f ) 6= A(f ) or − LCa (g) 6= A(g) or − LCa (h) 6= A(h)]
≥ 1 − Pr[T accepts (− LCa , ψ)] − 3δ(A, LCa )
1
≥ 1 − (1 − (1 − τ )) − 3K ∗ 
4
1
= (1 − τ ) − 3K ∗ .
4
For  small enough, the right hand side is about 1/4 and therefore greater
than , a contradiction.

5.2 Second test


Let ej : {−1, 1}n → {−1, 1} be the projection on the jth component. We
define a second test T 0 that bases on T :
Input: a ∈ {−1, 1}n , folded A : Bn− → {−1, 1}, ψ ∈ Bn−

1. With probability 1/2, run T on (A, ψ).

2. With probability 1/2, choose a j ∈ {1, . . . , n} and f ∈ Bn− at random.


Accept if aj = A(f ) · A(f · ej ). Else reject.

Theorem 5.4. 1. If ψ(a) = −1, then there is an A ∈ Bn− such that


0
Pr[T accepts (a, A, ψ)] = 1.

2. There is a c > 0 such that for all 0 < δ ≤ 1, the following holds: If
δ(a, a0 ) ≥ δ for all a0 with ψ(a0 ) = −1, then for all folded A ∈ Bn− ,
Pr[T 0 rejects (a, A, ψ)] ≥ c · δ.
36 5. Long code tests

Proof. We start with 1: Let A = LCa . Then A is folded. If T is executed,


then T 0 accepts by Lemma 5.1. Otherwise

A(f ) · A(f · ej ) = f (a) · (f · ej )(a) = f (a) · f (a) · ej (a) = aj .

Thus T 0 accepts with probability 1.


Next comes 2: Let δ(a, a0 ) ≥ δ for all a0 with ψ(a0 ) = −1 and let A be
folded.
First case: δ(A, LCa ) ≥ δ/4 for all a0 with ψ(a0 ) = −1. With probability
1/2, T is executed. T rejects with probability ≥ 4c δ.
Second case: There is an a0 ∈ {−1, 1}n with δ(A, LCa ) < δ/4 and ψ(a0 ) =
−1. If a0j 6= aj and A(f )A(f ej ) = a0j , then T will reject. Thus,

Pr[T 0 rejects] ≥ Pr[2. is executed] · Pr[2. rejects]


1
≥ Pr[a0j 6= aj ∧ A(f )A(f ej ) = a0j ]
2 j,f
1
≥ Pr[a0j 6= aj ] · Pr[A(f )A(f ej ) = a0j |a0j 6= aj ]
2 j j,f
1
= Pr[a0j 6= aj ] · Pr[A(f )A(f ej ) = a0j ]
2 j f
1 δ
≥ δ(1 − )
2 2
δ
≥ .
4
For the second-to-last inequality, we have to show that Prf [A(f )A(f ej ) =
a0j ] ≥ 1 − 2δ . A(f )A(f ej ) = aj is implied by A(f ) = f (a0 ) and A(f ej ) =
(f ej )(a0 ) (multiply the two equations). Therefore,

Pr[A(f )A(f ej ) = a0j ] ≥ Pr[A(f ) = f (a0 ) ∧ A(f ej ) = (f ej )(a0 )]


= 1 − Pr[A(f ) 6= f (a0 ) ∨ A(f ej ) 6= (f ej )(a0 )]
≥ 1 − Pr[A(f ) 6= f (a0 )] − Pr[A(f ej ) 6= (f ej )(a0 )]
= 1 − Pr[A(f ) 6= LCa0 (f )] − Pr[A(f ej ) 6= LCa0 (f ej )]
≥ 1 − δ/2.

The last inequality is true since we can bound both probabilities by δ/4.
The first one by assumption and also the second one, since f 7→ f ej is a
bijection of Bn− .
6 Assignment Tester

6.1 Constraint graph satisfiability


A constraint graph G over some alphabet Σ is a directed graph (V, E)
together with a mapping c : E → P(Σ). An assignment is a mapping
V → Σ. The assignment a satisfies the (constraint at) edge e = (u, v) if
(a(u), a(v)) ∈ c(e). The unsatisfiability value of a is the number of con-
straints not satisfied by a divided by the number of constraints (edges).
This value is denoted by UNSATa (G). The unsatisfiability value of G is
UNSAT(G) = mina UNSAT(G).
Problem 6.1. Maximum Constraint graph satisfiability Max-CGS is the fol-
lowing problem:
Instances: constraint graphs ((V, E), Σ, c)
Solutions: assignments a : V → Σ
Measure: (1 − UNSAT(G)) · |E|
Goal: max
Exercise 6.1. The following two statements are equivalent:
1. There is an  > 0 such that gap(1 − , 1)-Max-3-SAT is NP-hard.

2. There is an  > 0 such that gap(1 − , 1)-Max-CGS is NP-hard.


Thus, to prove the PCP-Theorem, we can show the NP-hardness of
gap(1 − , 1)-Max-CGS instead of gap(1 − , 1)-Max-3-SAT. The former one
has the advantage that it is easier to apply results from graph theory, in
particular expander graphs, which we will introduce in the next chapter.

6.2 Assignment testers


Definition 6.2. An assignment tester over Σ is a deterministic algorithm
that given a Boolean formula ψ in variables X = {x1 , . . . , xn } outputs a
constraint graph G = ((V, E), Σ, c) with X ⊆ V such that there is an  > 0
such that for all a : X → {−1, 1}1 :
1. If ψ(a) = −1, then there is a b : V \ X → Σ with UNSATa∪b (G) = 0.

2. If ψ(a) = 1, then for all b : V \ X → Σ, we have UNSATa∪b (G) ≥


δ(a, a0 ).
1
We identify two values of Σ with −1 and 1.

37
38 6. Assignment Tester

 is called the rejection probability.

Above, a ∪ b : V → Σ is the assignment that maps a v ∈ X to a(v) and


a v ∈ V \ X to b(v). Again, we do not care for running times, since we will
apply the assignment tester only to constant size instances. Given a Boolean
formula, an assignment tester constructs a graph such that every satisfying
assignment of the formula can be extended to a satisfying assignment of the
graph. Every non-satisfying assignment a, however, cannot be extended to
an assignment of the graph that fulfills a fraction > 1 − δ of the constraints,
where δ is the distance of a to any satisfying assignment.
Our construction takes the test T 0 and models its behaviour in a graph.
n
We set Σ = {−1, 1}3 and let Y be a set of 22 Boolean variables. These
variables correspond to the bits of the string A. The test T 0 makes some
random experiments. It first flips a coin and then, depending on its outcome,
either chooses f , g, and h or chooses j and f at random. For each of the
possible outcomes r, there will be one variable zr . Let Z be the set of all
these variables. We will interpret these three values as “guesses” of the bits
queried. The variables in X and Y will get Boolean values, that is, whenever
they get values from Σ that do not represent −1 or 1, then all constraints
containing them will not be satisfied.
If in the outcome r, T 0 queries A(f ), A(g), and A(h), then zr will be
connected to the three nodes in Y that correspond to these positions. (More
precisely, since we only consider folded strings A, we will consider only
positions in Dψ and might replace f , g, and h by the corresponding elements
of Dψ .) The constraints on these three edges are satisfied, if the three bits
at zr correspond to the bits of A(f ), A(g), and A(h) and T 0 would accept
when reading these three bits. If in the outcome r, T 0 queries A(f ), A(f ej ),
and aj , then zr will be connected to two nodes of Y and one of X. The rest
of the construction is essentially the same.

Theorem 6.3. The construction above is an assignment tester.

Proof. First we have to show that if a is a satisfying assignment of ψ,


then we can extend it to a satisfying assignment of G. To the variable of Y ,
we will assign the values according to LCa . By Theorem 5.4, T 0 will accept
with probability 1. This means that if we assign to Z values matching the
values of X ∪ Y , then all constraints will be satisfied.
If a does not satisfy ψ, then T 0 will reject every A with probability ≥ c·δ,
where δ(a, a0 ) ≥ δ for all satisfying assignments a. This means that for a
fraction of c · δ of the zr ’s at least one constraint is not satisfied. (Either we
choose the values consistently with the values of X and Y , then all three
constraints are not satisfied, or we try to cheat but then at least one is not
satisfied.) Thus UNSATa∪b (G) ≥ 3c · δ.
7 Expander graphs

Throughout this chapter, we are considering undirected multigraphs G =


(V, E) with self-loops. The degree d(v) of a node v is the number of edges
that v belongs to. This particularly means that a node with a self-loop and
no other edges has degree 1 (and not 2, which is a meaningful definition,
too). This definition of degree will be very convenient in the following. A
graph is called d-regular if d(v) = d for all v ∈ V .
It is a well-known fact that for graphs without self-loops, the sum of the
degrees of the nodes is twice the number of edges (proof by double-counting).
With self-loops, the following bounds hold.
P
Fact 7.1. 1. We have |E| ≤ v∈V d(v) ≤ 2|E|.
2. If G is d-regular, then |E| ≤ d|V | ≤ 2|E|.
A walk in a graph G = (V, E) is a sequence (v0 , e1 , v1 , e2 , . . . , e` , v` ) such
that eλ = {vλ−1 , vλ } for all 1 ≤ λ ≤ `. v0 is the start node, v` is the end
node of the walk. Its length is `. A walk can visit the same node or edge
several times, i.e., it is allowed that vi = vj or ei = ej for some i 6= j.
A graph is connected if for all pairs of nodes u and v, there is a walk
from u to v. The neighbourhood N (v) of v is the set of all nodes u such
that {v, u} ∈ E. In general, the t-neighbourhood is the set of all nodes u
such that there is a walk from v to u of length t.

7.1 Algebraic graph theory


The adjacency matrix of G is the |V | × |V |-matrix
A = (au,v )u,v∈V
where au,v is the number of edges between u and v. We will usually index the
rows and columns by the nodes itself and not by indices from {1, . . . , |V |}.
But we will assume that the nodes have some ordering, so that when we
need it, we can also index the rows by 1, . . . , |V |.
We will now apply tools from linear algebra to A in order to study
properties of G. This is called algebraic graph theory. The book by Biggs
[Big93] is an excellent introduction to this field. Everything you want to
know about expander graphs can be found in [HLW06].
Because G is undirected, A is symmetric. Therefore, A has n real eigen-
values λ1 ≥ λ2 ≥ · · · ≥ λn and there is a orthonormal basis consisting of
eigenvectors.

39
40 7. Expander graphs

Lemma 7.2. Let G be a d-regular graph with adjacency matrix A and eigen-
values λ1 ≥ λ2 ≥ · · · ≥ λn .

1. λ1 = d and 1n = (1, . . . , 1)T is a corresponding eigenvector.

2. G is connected if and only if λ2 < d.

Proof. We start with 1: Since G is d-regular,


X
d = d(v) = av,u for all v
u∈V

and
A · 1n = d · 1n .
Thus d is an eigenvalue and 1n is an associated eigenvector.
Let λ be any eigenvalue and b be an associated eigenvector. We can scale
b in such a way that the largest entry of b is 1. Let this entry by bv . Then
X X
λ = λ · bv = av,u bu ≤ av,u = d.
u∈V u∈V

Therefore, d is also the largest eigenvector.


Now comes 2. “=⇒”: Let b be an eigenvector associated with the eigen-
value d. As above, we scale b such that the largest entry is 1. Let bv be
this entry. We next show that for every node u ∈ N (v), bu = 1, too. Since
G is connected, b = 1n follows by induction. But this means that d has
multiplicity 1 and λ2 < d.
A · b = d · b implies
X X
d = dbv = av,u bu = av,u bu .
u∈V u∈N (v)
P
Since bu ≤ 1 for all u and since d = u∈N (v) av,u , this equation above can
only be fulfilled if bu = 1 for all u ∈ N (v).  
A1 0
“⇐=”: If the graph G is not connected, then A = . There-
0 A2
fore (1, . . . , 1, 0, . . . , 0) and (0, . . . , 0, 1, . . . , 1) (with the appropriate number
of 1’s and 0’s) are linearly independent eigenvectors associated with d.
Let k.k denote the Euclidean norm of R|V | , that is kbk =
pP
2
v∈V bv .

Definition 7.3. Let G be a graph with adjacency matrix A. Then

kAbk
λ(G) = max .
b⊥1n kbk

Theorem 7.4. Let G be a d-regular graph with adjacency matrix A and


eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λn .
7.2. Edge expansion 41

1. λ(G) = |λj | for some j.


kAbk
2. λ(G) = maxb⊥1n kbk is attained for any eigenvector b associated with
λj .
3. λ(G) = max{|λ2 |, |λn |}.
4. λ(G) ≤ d.
Proof. Let b be a vector for which the maximum is attained in the defi-
nition of λ(G). W.l.o.g. let kbk = 1. Let c1 , . . . , cn be an orthonormal basis
consisting of eigenvectors of A. W.l.o.g. let c1 = 1n . Since b is orthogonal
to 1n , we have
b = β2 c2 + · · · + βn cn ,
Since c1 , . . . , cn is a orthonormal family,
1 = kbk = b22 + · · · + b2n .
Let λj be the eigenvalue cj is associated with. We have
λ(G) = kAbk
= kβ2 Ac2 + . . . βn Acn k
= (β2 λ2 )2 + · · · + (βn λn )2 .
Since b is a vector for which the maximum is attained, βj can only be nonzero
for a λj whose absolute value is maximal among λ2 , . . . , λn .
It is an easy exercise to derive the statements 1–4 from this.

Exercise 7.1. Prove statements 1–4 of Theorem 7.4.


λ(G) is also called the second largest eigenvalue. (More correctly, it
should be called the second largest absolute value of the eigenvalues, but
this is even longer.)

7.2 Edge expansion


Definition 7.5. Let G be a d-regular graph. The edge expansion h(G) of
G is defined as
E(S, S̄)
h(G) = min .
S⊆V :|S|≤|V |/2 |S|
E(S, S̄) is the set of all edges with one endpoint in S and one endpoint in
S̄. G is called an h-expander if h(G) ≥ h.
Large edge expansion means that any set S has many neighbours that
are not in S. This will be a very useful property. Families of expanders can
be constructed in polynomial time, one construction is [RVW02]. We will
not prove it here.
42 7. Expander graphs

Theorem 7.6. There are constants d0 ∈ N and h0 > 0 and a deterministic


algorithm that given n constructs in time polynomial in n a d0 -regular graph
Gn with h(G) > h0 .
Large edge expansion means small second largest eigenvalue and vice
versa. We will need the following bound.
Theorem 7.7. Let G be a d-regular graph. If λ(G) < d1 then

h(G)2
λ(G) ≤ d − .
2d
To prove the theorem, it is sufficient to prove

h(G)2 ≤ 2d(d − λ) (7.1)


2
h(G) ≤ 2d(d + λ) (7.2)

by Theorem 7.4. The proofs of both inequalities are very similar, we will
only show the first one. Let

B = dI − A
B 0 = dI + A

where I is the n × n-identity matrix. Let f ∈ Rn . Later, we will derive f


from an eigenvector of A. In the following, a summation over “e = {u, v}”
is a sum over all edges in E with two end nodes and a summation over
“e = {v}” is a sum over all self-loops in E.
Lemma 7.8.
X
f T Bf = (fu − fv )2
e={u,v}
X
T 0
f Bf≥ (fu + fv )2
e={u,v}

Proof. We have
X
f T Bf = dfv2 − f T Af
v∈V
   
X X X X
= (fu2 + fv2 ) + fv2  −  2fu fv + fv2 
e={u,v} e={v} e={u,v} e={v}
X
2
= (fu − fv ) .
e={u,v}

1
We have to exclude bipartite graphs, which have λn = −d but can have edge expansion
> 0. Our prove will break down if λn = −d, because (d+λ) must not be zero when proving
the counter part of (7.3).
7.2. Edge expansion 43

The second inequality is proven in a similar manner.


To a given f , let X
F = |fu2 − fv2 |.
e={u,v}

Let β0 < β1 < · · · < βr be the different values that f attains. Let
Uj = {u ∈ V | fu ≥ βj },
Uj0 = {u ∈ V | fu ≤ βj }
be the set of all nodes whose value fu is at least or at most βj , respectively.
Lemma 7.9.
r
X
F = |E(Uj , Ūj )|(βj2 − βj−1
2
)
j=1
r−1
X
F = |E(Uj0 , Ūj0 )|(βj+1
2
− βj2 )
j=0

Proof. Let e = {u, v} ∈ E be an edge that is no self-loop. Assume that


fu = βi ≥ βj = fv . The contribution of e to F is βi2 − βj2 . On the other
hand, e crosses Uk and Ūk for j ≤ k ≤ i − 1. Thus the contribution of e to
right-hand side of the first equation in the statement of the lemma is
(βi2 − βi−1
2 2
) + (βi−1 2
− βi−2 2
+ · · · + (βj+1 − βj2 ) = βi2 − βj2 .
Thus both sides of the equation are equal.

Lemma 7.10. We have


√ p
F ≤ 2d f T Bf kf k .
If f (v) ≤ 0 for all v, then
√ p
F ≤ 2d f T B 0 f kf k .
Proof. We have
X
F = |fu2 − fv2 |
e={u,v}
X
= |fu − fv | · |fu + fv |
e={u,v}
s X s X
≤ (fu − fv )2 · (fu + fv )2
e={u,v} e={u,v}
p s X
= f T Bf · (fu + fv )2
e={u,v}
44 7. Expander graphs

by the Cauchy–Schwartz and Lemma 7.8 We can bound the second factor
by

s X s X
(fu + fv )2 ≤ 2 (fu2 + fv2 )
e={u,v} e={u,v}
s X
≤ 2d fv2
v∈V

≤ 2dkf k .

The second inequality is proven in a similar manner.

Lemma 7.11. Let fv ≥ 0 for all v ∈ V or fv ≤ 0 for all v ∈ V . If


| supp(f )| ≤ n/2, then F ≥ h(G)kf k.

Proof. We only show the statement for fv ≥ 0, the other case is com-
pletely similar. Since | supp(f )| ≤ n/2, we have β0 = 0 and |Uj | ≤ n/2 for
j > 0. We have |E(Uj , Ūj )| ≥ h(g)|Uj |. By Lemma 7.9,

r
X
F = |E(Uj , Ūj )|(βj2 − βj−1
2
)
j=1
r
X
≥ h(G) |Uj |(βj2 − βj−1
2
)
j=1
r−1
X
= h(G) βj2 (|Uj | − |Uj+1 |) +βr2 |Ur |
| {z }
j=1
=|{v|fv =βj }|
2
= h(G)kf k .

Finally, we will now prove (7.1) and (7.2). We only show (7.1), (7.2) is
proven in the same manner. Let λ < d be an eigenvector of A. d − λ is an
eigenvector of B = dI − A and every eigenvector of A associated with λ is
an eigenvector of B associated with d − λ. Let g be such an eigenvector. g
is orthogonal to 1n . We can assume that g has at most n/2 entries that are
≥ 0, otherwise we consider −g instead. We define
(
gv if gv ≥ 0
fv =
0 otherwise
7.2. Edge expansion 45

and W = supp(f ). By construction, |W | ≤ n/2. We have


X
(Bf )v = dfv − av,u fu
u∈V
X
= dgv − av,u gu
u∈W
X
≤ dgv − av,u gu
u∈V
= (d − λ)gv .

Since fv = 0 for v ∈
/ W , this implies
X
f T Bf = fv (Bf )v
v∈V
X
≤ (d − λ) fv gv
v∈V
X
≤ (d − λ) fv2
v∈V
= (d − λ)kf k2 . (7.3)

By Lemmas 7.10 and 7.11,


√ p
h(G)kf k2 ≤ 2d f T Bf kf k .

Squaring this and exploiting the inequality before, we get

h(G)2 kf k2 ≤ 2d · f T Bf · kf k2 ≤ 2d(d − λ)kf k4 .

Because g is orthogonal to 1n and nonzero, it has at least one entry > 0.


Therefore, kf k > 0 and we get

h(G)2 ≤ 2d(d − λ).

The second inequality is proven in the same manner.


8 Random walks on expanders

Consider the following method RW to generate a walk in a d-regular graph


G = (V, E).
Input: d-regular graph G, ` ∈ N
1. Randomly choose a vertex v0 .
2. For λ = 1, . . . , ` choose vλ ∈ N (vλ−1 ) uniformly at random.
3. Return (v0 , . . . , v` ).

Let W` be the set of all `-walks in G. We have |W` | = nd` , since a path
is uniquely specified by its start node (n choices) and a vector {1, . . . , d}`
which tells us which of the d edges of the current node is the next in the
given walk. For this, we number the d edges incident with a node arbitrarily
from 1 to d. It is now clear that method RW generates the walks according
to the uniform distribution on RW.
Lemma 8.1. Method RW returns each walk W ∈ W` with a probability of
1/(nd` ).
Instead of choosing a start node, we can choose a node in the middle.
This modified method RW0 also generates the uniform distribution on all
walks of length `.
Input: d-regular graph G, ` ∈ N, 0 ≤ j ≤ `
1. Randomly choose a vertex vj .
2. For λ = j − 1, . . . , 0 choose vλ ∈ N (vλ+1 ) uniformly at random.
3. For λ = j + 1, . . . , ` choose vλ ∈ N (vλ−1 ) uniformly at random.
4. Return (v0 , . . . , v` ).

In the following, let G be a d-regular graph with adjacency matrix A.


Let à = d1 · A be the normalized adjacency matrix. à is doubly stochastic,
i.e., all entries are nonnegative and all row sums and all column sums are 1.
λ is an eigenvalue of à iff d · λ is an eigenvalue of A.
Let x = (xv )v∈V be a probability distribution on the nodes v and consider
x as an element of Rn . If we now select a node v according to x, then select
an edge {v, u} (u = v is allowed) incident with v uniformly at random, and
then go to u, the probability distribution that we get is given by à · x.
Applying induction we get the following result.

46
47

Lemma 8.2. Let x be a probability distribution on V . If we run method


RW for ` steps and draw the start node according to x, then this induces a
probability distribution on V given by Ã` x.

Let F ⊆ E be a set of edges. We want to estimate the probability that


a random walk of length j that starts in an edge of F ends in an edge of F ,
too.
To this aim, we first calculate the probability xv that a random walk
that starts with an edge of F starts in v. The distribution x = (xv )v∈V is
generated by the following process: First choose an edge f ∈ F at random.
Then choose one of its nodes uniformly at random as the start node (and
the other node as the second node). Here it makes a difference whether f is
a self-loop or not. We have
 
1 1
xv = · |{e = {u, v} | e ∈ F, u 6= v}| + |{e = {v} | e ∈ F }|
|F | 2
d
≤ (8.1)
|F |

By symmetry, this is also the probability that v is the second node in a walk
that starts with an edge in F .
Second we estimate the probability yv that if we choose a random edge
incident to v, we have chosen an edge in F . This is simply

1
yv = · |{e = {u, v} | e ∈ F }|
d
2|F | 1
= · |{e = {u, v} | e ∈ F }|
d 2|F |
2|F |
≤ · xν . (8.2)
d
Now the probability distribution on the nodes after performing a walk
of length j that starts in F is given by Ãj−1 xv . (Note that xv is also the
probability that v is the second node in the walk.) The probability that the
(j + 1)th edge is in F is then given by
D E
y, Ãj−1 x (8.3)

where h., .i is the ordinary scalar product in Rn .


To estimate (8.3), we will exploit the Cauchy-Schwartz inequality. For
this, we need estimate kxk. x1 = n1 1n is an eigenvector of à associated with
1. Let x⊥ = x − x1 . x⊥ is orthogonal to x1 , because
X
h1n , x⊥ i = (xv − 1/n) = 1 − 1 = 0.
v∈V
48 8. Random walks on expanders

Ãk x⊥ is also orthogonal to x1 for every k, since


h1n , Ãk x⊥ i = h(Ãk )T 1n , x⊥ i = hÃk 1n , x⊥ i = h1n , x⊥ i = 0.
Let λ̃ = λ(G)/d. We have
kÃj−1 x⊥ k ≤ |λ̃|j−1 kx⊥ k
and
kx⊥ k2 = kx − x1 k2
= kxk2 − 2hx, x1 i + kx1 k2
2X 1
= kxk2 − xv +
n n
v∈V
2
< kxk .
By (8.1),
X d
kxk2 ≤ max xv · xv = max xv ≤ .
v∈V v∈V |F |
v∈V
Altogether, we have
2|F |
hy, Ãj−1 x⊥ i ≤ kyk · kÃj−1 x⊥ k ≤ kxk · |λ̃j−1 |kxk ≤ 2kλ̃kj−1 .
d
Finally, the probability that ej+1 ∈ F can be bounded by
hy, Ãj−1 xi = hy, Ãj−1 x1 i + hy, Ãj−1 x⊥ i
≤ hy, x1 i + 2|λ̃|j−1
1X
= yv + 2|λ̃|j−1
n
v∈V
2|F |
≤ + 2|λ̃|j−1
dn
 j−1 !
|F | λ
≤2 + .
|E| d

Lemma 8.3. Consider a random walk on a d-regular graph G = (V, E)


starting with an edge from a set F ⊆ E. Then the probability that the
(j + 1)th-edge of the walk is again in F is bounded by
 !
λ(G) j−1

|F |
2 + .
|E| d
d
If F does not contain any self-loops, then (8.1) can be bounded by 2|F |
and we can get rid of the 2 in the estimate. Then this bound says that even
after a logarithmic number of steps, the (j + 1)the edge is almost drawn at
random.
9 The final proof

Finally, we can start with the proof of the PCP theorem. We begin with
1
the observation that gap(1 − |E| , 1)-Max-CGS is NP-hard (over the alphabet
3
Σ = {0, 1} . Let G be a given constraint graph. We apply three procedures
to G:

G
↓ Preprocessing (G becomes an expander)
Gprep
↓ Amplification (UNSAT value gets larger, but also Σ)
Gamp
↓ Alphabet reduction (Σ = {0, 1}3 again)
Gred

If we do this O(log |E|) times, then we bring the (relative) size of the
1
gap from |E| to constant and we are done.

9.1 Preprocessing
Throughout this chapter, d0 and h0 will be “global” constants that come
out of the construction of a constant degree d0 expander Xn with constant
edge expansion h0 , see Theorem 7.6.

Lemma 9.1. Let G = ((V, E), Σ, c) be a constraint graph. There is a con-


stant γ1 > 0 such that we can construct in polynomial time a (d0 +1)-regular
graph G1 = ((V1 , E1 ), Σ : c1 ) with size(G1 ) = O(size(G)) and

γ1 · UNSAT(G) ≤ UNSAT(G1 ) ≤ UNSAT(G).

Proof. Let Xn be the expander from Theorem 7.6. G1 is constructed as


follows:

1. Replace each v ∈ V by a copy Yv of Xd(v) .

2. For each edge {u, v} ∈ E insert an edge from Yu to Yv . Do this in such


a way that every node of Yv is incident with exactly one such extra
edge. In this way, the resulting graph will be (d0 + 1)-regular.

3. Let Eint be the edges within the copies Yv and Eext be the edges
between two different copies. For all e ∈ Eint , c1 (e) is an equality

49
50 9. The final proof

constraint that is satisfied iff both nodes have the same value (“internal
constraints”). For all e ∈ Eext , c1 (e) is the same constraint as the
original edge has (“external constraint”).
P
We have |V1 | ≤ v∈V d(v) ≤ 2|E| and |E1 | ≤ |V1 |(d0 + 1) ≤ 2|E|(d0 + 1).
Thus size(G1 ) = O(size(G)).
Next, we show that UNSAT(G1 ) ≤ UNSAT(G). Chose an assignment
σ : V → Σ with UNSAT(G) = UNSATσ (G) (i.e., an optimal assignment).
We define σ1 : V1 → Σ by σ1 (u) = σ(v) iff u belongs to V (Yv ), the vertex
set of the copy Yv that replaces v. In this way, all internal constraints
are fulfilled by construction. Every external constraint is fulfilled iff it was
fulfilled under σ in G. Therefore,

UNSAT(G1 ) ≤ UNSATσ (G1 ) ≤ UNSAT(G),

where the second equation follows from the fact that |E| ≤ |E1 |.
The interesting case is γ · UNSAT(G) ≤ UNSAT(G1 ). Let σ1 : V1 → Σ
be an optimum assignment. We define σ : V → Σ by a majority vote:
σ(v) is the value a ∈ Σ that is the most common among all values σ1 (u)
with u ∈ V (Yv ). Ties are broken arbitrarily. Let F ⊆ E be the set of
all unsatisfied constraints under σ and F1 ⊆ E1 the set of all unsatisfied
constraints under σ1 . Let S = {u ∈ V (Yv ) | v ∈ V, σ1 (u) 6= σ(v)} and
S v = S ∩ V (Yv ), i.e., all the looser nodes that voted for a different value for
σ(v). Let α := |F |/|E| = UNSATσ (G). We have

α|E| = |F | ≤ |F1 | + |S|,

since, if a constraint in F is not satisfied, then either the corresponding


external constraint in |F1 | is not satisfied or one of the nodes is a looser
node.
Case 1: |F1 | ≥ α2 · |E|. We have

|F1 | α |E| α UNSAT(G)


UNSAT(G1 ) = ≥ · = ≥ .
|E1 | 2 |E1 | 4(d0 + 1) 4(d0 + 1)

Case 2: |F1 | < α2 |E|. In this case, we have

α
|E| + |S| > |F1 | + |S| ≥ α|E|.
2

Thus |S| ≥ α2 |E|. Consider some v ∈ V and let Sav = {u ∈ S v | σ1 (u) = a}.
We have S v = a6=σ(v) Sav . Because we took a majority vote, |Sav | ≤ 12 |V (Yv )|
S
for all a 6= σ1 (u). As Yv is an expander,

|E(Sav , S̄av )| ≥ h0 · |Sav |,


9.1. Preprocessing 51

where the complement is taken “locally”, i.e., S̄av = V (Yv ) \ Sav . Since we
have equality constraints on all internal edges, all edges in |E(Sav , S̄av )| are
not satisfied. Thus,
1X X
|F1 | ≥ |E(Sav , S̄av )|
2
v∈V a6=σ(v)
X1 X
≥ h0 · |Sav |
2
v∈V a6=σ(v)
1 X
≥ h0 |S v |
2
v∈V
1
= h0 |S|
2
α
> h0 |E|.
4
Thus
|F1 |
UNSAT(G1 ) =
|E1 |
αh0 |E|
> ·
4 |E1 |
αh0

8(d0 + 1)
h0
≥ UNSAT(G).
8(d0 + 1)

We set γ1 to be the minimum of the constants in the two cases.

Lemma 9.2. Let G be a d-regular constraint graph. We can construct in


polynomial time a constraint graph G2 such that

• G2 is (d + d0 + 1)-regular,

• every node of G2 has a self loop,


h20
• λ(G2 ) ≤ d + d0 + 1 − 2(d+d0 +1) ,

• size(G2 ) = O(size(G)),
d
• 2+2(d0 +1) · UNSAT(G) ≤ UNSAT(G2 ) ≤ UNSAT(G).

Proof. Assume that G has n nodes. We take the union of G and Xn


(both graphs have the same node set) and attach to each node a self-loop.
The edges from Xn and the self loops get trivial constraints that are always
fulfilled. G2 = ((V, E2 ), Σ, c2 ) is clearly d + d0 + 1-regular.
52 9. The final proof

We have h(G) ≥ h(Xn ) ≥ h0 . Since G is not bipartite,


h20
λ(G2 ) ≤ d + d0 + 1 − .
2(d + d0 + 1)
Finally,
d + 2(d0 + 1)
|E2 | = |E| + |E(Xn )| + n ≤ |E| + (d0 + 1)|V | ≤ |E|.
d
Thus the size increase is linear. Furthermore, the UNSAT value can at most
shrink by this factor.
By combining these two lemmas, we get the following result.
Theorem 9.3. There is are constants βprep > 0 and 0 < λ < δ such that for
all constraint graphs G, we can construct in polynomial time a constraint
graph Gprep over the same alphabet with
• Gprep is δ-regular,
• every node in Gprep has a self-loop,
• λ(Gprep ) ≤ λ,
• size(Gprep ) = O(size(G)),
• βprep · UNSAT(G) ≤ UNSAT(Gprep ) ≤ UNSAT(G).
d h20
We set δ = d + d0 + 1, βprep = γ · d+2(d0 +1) , and λ = δ − 2δ .

9.2 Gap amplification


Definition 9.4. Let G = ((V, E), Σ, c) be a d-regular constraint graph such
that every node has a self loop. Let t ∈ N be even. The t-fold amplification
t/2
product Gt = ((V, E t ), Σd , ct ) is defined as follows:
• For every walk W of length t from u to v, there is an edge {u, v} in
E t . If there are several walks between u and v, we introduce several
edges between u and v. But we disregard the directions of the walks,
that is, for every walk W and its reverse, we put only one edge into
Et.
t/2
• An assignment σ̂ maps every node to a vector from Σd . We index
the entries with walks of length t/2 starting in v. (There are exactly
dt/2 such walks. Let W be such a walk and let u be the other end node.
σ̂(v)W is called “the opinion of v about u with respect to w”. Since
there might be many walks from v to u, v can have many opinions
about u. We will usually assume that nodes are not “schizophrenic”,
i.e., that they always have the same opinion about u. In this case, we
will also write σ̂(v)u for the opinion of v about u.
9.2. Gap amplification 53

• It remains to define ct . Let e = {u, v} ∈ E t and σ̂ be an assignment.


Let Ge be the subgraph of G induced by Nt/2 (u) ∪ Nt/2 (v). ct (e) is
satisfied by σ̂ iff all opinions (of u and v) about every x ∈ Ge are
consistent and all constraints in Ge are satisfied. (Since G will be an
expander, if one constraint of G “appears” in many constraints of Gt .)

If t is a constant, Gt is polynomial time computable from G and we have


size(Gt ) = O(size(G)).

Theorem 9.5. Let λ < d be two constants, Σ an alphabet. There is a


constant βamp solely depending on λ, d, and |Σ| such that for all d-regular
constraint graphs G with self loops at every node and λ(G) ≤ λ:
√ 1
1. UNSAT(Gt ) ≥ βamp · t · min{UNSAT(G), 2t }

2. UNSAT(G) = 0 ⇒ UNSAT(Gt ) = 0.

Proof. We start with showing 2: Let σ be a satisfying assignment for


t/2
G. We define σ̂ : V → Σd by setting σ̂(v)W = σ(u) where W is a walk of
length t/2 starting in v and u is the other end node of W . By construction,
σ̂ fulfills all constraints of Ĝt .
For 1, let σ̂ be an optimum assignment for Gt . We can assume that
there are not any schizophrenic nodes v because otherwise all constraints
involving v are not satisfied and therefore, we cannot increase the UNSAT
value by changing the assignment to v.
We will define an assignment σ with
√ 1
UNSATσ̂ (Gt ) ≥ Ω( t) · min{UNSATσ (G), }.
2t
σ is again defined by a majority vote. σ(v) is the majority of all opinions of
the nodes u that are reachable by a walk of length t/2 from v. (These are
exactly the nodes that have an opinion about v.) If several paths go from v
to u, then each paths contributes one opinion.
We choose an F ⊆ E as large as possible such that all constraints in F
are not satisfied by σ and |F |/|E| ≤ 1/t. Then

1 |F | 1
min{UNSATσ (G), }≤ ≤ .
2t |E| t

Let Wt denote the set of all walks of length t.

Definition 9.6. W = (v0 , e1 , v1 , . . . , vt ) ∈ Wt is “ hit at j” if ej ∈ F and


the opinion of v0 about vj−1 and of vt about vj are equal to σ(vj−1 ) and
to σ(vj ), respectively. (In particular, both nodes have an opinion about the
corresponding node.)
54 9. The final proof

If an edge is hit, then it is not satisfied and it is not satisfied because it


is really not satisfied and not just√because σ̂ and σ√were inconsistent.
We set I = {j ∈ N | t/2 − t < j ≤ t/2 + t}, the set of “middle
indices”. For a walk W , we set
N (W ) = |{j ∈ I | W is hit at j}|.
Let eW be the edge in Gt corresponding to W If N (W ) > 0, then eW is not
satisfied by σ̂, since ej is not satisfied in G under σ and σ is consistent with
σ̂ on vj and vj−1 . In formulas,
Pr[N (W ) > 0] ≤ Pr [σ̂ does not satisfy ê]
ê∈E t
= UNSATσ̂ (Gt )
= UNSAT(Gt ).
√ |F |
We will show that Ω( t) |E| ≤ Pr[N (W ) > 0]. This will finish the proof.
Let (
1 if W is hit in j,
Nj (W ) =
0 otherwise.
P
Then j∈I Nj (W ) = N (W ). Lemma 9.7 below shows PrW ∈Wt [Nj (W ) =
|F |
1] = Ω( |E| ). With this, we can bound Pr[N (W ) > 0]. We use the following
conditional expectation inequality:
X Pr[Nj (W ) = 1)]
Pr[N (W ) > 0] ≥ .
E[N (W ) = 1|Nj (W ) = 1]
j∈I

By linearity of expectation,
X
E[N (W ) = 1|Nj (W ) = 1] = E[Nk (W ) = 1|Nj (W ) = 1].
k∈I

For every summand on the righthand side, we have


E[Nk (W ) = 1 | Nj (W ) = 1]
= Pr[a random walk of length |k − j + 1| ends in F | it started in F ]
 |k−j| !
|F | λ
≤2 +
|E| d
by Lemma 8.3. Hence,
 |k−j| !
X |F |λ
E[N (W ) = 1|Nj (W ) = 1] = 2 +
|E| d
k∈I
 
|F | 2
≤ 2 |I| · +
|E| 1 − λ/d
 
2 2
≤2 √ + .
t 1 − λ/d
9.2. Gap amplification 55

Thus
|F |
X Ω( |E| ) 
√ |F |

Pr[N (W ) > 0] ≥ 2 2 ≥Ω t·
√ + 1−λ/d
|E|
j∈I t

by the exercise below.

Exercise 9.1. Show that for every c > 0, there is a constant a such that
x
≥a·x
2/x + c

for all x ≥ 1.
|F |
Lemma 9.7. For all j ∈ I, PrW ∈Wt [Nj (W ) = 1] = Ω( |E| ).

Proof. Fix j ∈ I. We generate a walk W = (v0 , e1 , v1 , . . . , vt ) uniformly


at random by using the method RW’ with parameter j. Then, the edge ej
is chosen uniformly at random. Furthermore, v0 only depends on vj−1 and
vt only depends on vt . Therefore,

|F |
P rW ∈Wt [Nj = 1] = pq
|E|

where p = PrW ∈calWt [σ̂(v0 )vj−1 = σ(vj−1 )] and q = PrW ∈calWt [σ̂(vt )vj =
σ(vj ). We are done if we can show that p and q are constant. Since both
cases are symmetric, we will only present a proof for p.
Let Xj be the random variable generated by the following process. We
start in vj−1 and perform a random walk of length j − 1. Let u be the node
that we reach. We output the opinion σ(u)vj−1 of u about vj−1 . If u has no
opinion about vj−1 (this can happen, since j can be greater than t/2; but
this will not happen too often) then we output some dummy value not in
Σ. Obviously, p = Pr[Xj−1 = σ(vj−1 )].
If j = t/2, then we reach all the nodes that have an opinion about vj−1 .
We start with the case j = t/2. In this case, both nodes v0 and vt have
an opinion about vj−1 and vj . Since σ(vj−1 ) is chosen by a majority vote,
1
p ≥ |Σ| in this case.
We will now show that for all j ∈ I, the probability Pr[Xj−1 = σ(vj−1 )]
cannot differ by too much from this, in particular, it is Ω(1/|Σ|). The self
loops will play a crucial role here, since they ensure that a random paths
with ` edges visit not more than (1 − 1/d)` different nodes. We leave the
rest of this proof as an exercise.

Exercise 9.2. Show that Pr[Xj−1 = σ(vj−1 )] ≥ Ω(1/|Σ|) for j ∈ I.


56 9. The final proof

9.3 Alphabet reduction


In the last section, we increased the UNSAT value of the constraint graph
but also enlarged the alphabet. To apply the construction iteratively, we
need that in the end, the alphabet is again Σ = {0, 1}3 . This is achieved by
the procedure in this section.
For this, we need a little coding theory. An encoding of {0, 1}k is an
injective mapping E : {0, 1}k → {0, 1}` . Its image C is called a code, an
element of C a code word. The (relative) distance of a code is δ(C) =
min{δ(x, y) | x, y ∈ C, x 6= y}.
For our purposes, we need an encoding E : {0, 1}k → {0, 1}O(k) with
relative distance ≥ ρ for some constant ρ > 0. For a construction of such
a code, see e.g. [SS96]. If we have a relative distance of ρ, then this in
particular means that whenever we take a code word and change an arbitrary
fraction of less than ρ/2 of the bits, then we can recover the original code
word, since there is only one that has relative distance less than ρ/2.

Lemma 9.8. There is a constant βred such that for all constraint graphs
G = ((V, E), Σ̂, c) we can construct in polynomial time a constraint graph
Gred = ((V 0 , E 0 ), {0, 1}3 , c0 ) such that

1. size(Gred ) ≤ O(size(G)) where the constant only depends on |Σ̂|,

2. βred · UNSAT(G) ≤ UNSAT(Gred ) ≤ UNSAT(G).

Proof. Let k = |Σ̂|. We identify every element of Σ̂ with a string {0, 1}k
with k = log |Σ̂|. Then we map each string to {0, 1}` with ` = O(k) using
the code from the beginning of this section. We replace every node v ∈ V by
a sequence of nodes v1 , . . . , v` . With every edge e = (u, v) ∈ E, we identify
a function φe : {0, 1}` × {0, 1}` → {0, 1}. φe (x, y) is true iff x and y are
code words corresponding to values a, b ∈ Σ̂ such that (a, b) ∈ c(e). For
each such φe , we construct an assignment tester (see Theorem 6.3) Ge =
((Ve , Ee ), {0, 1}3 , ce ). The graph Gred is the union of all these assignment
testers. The ` nodes v1 , . . . , v` representing v, v ∈ V , are shared by all
assignment tester corresponding to an edge that contains v. The constraints
of each edge in Gpre are the constraints of the Ge . We can assume that each
Ge has the same number of edges, say, r. Thus Gpre has r|E| edges.
Each assignment tester is a constant size graph whose size only depends
on |Σ̂|. This immediately yields the upper bound on the size of Gpre .
For the second statement of theorem, consider an optimal assignment σ
of G. We construct an assignment σ 0 for G0 as follows: If σ satisfies the
constraint ce , then, by the properties of assignment testers, we can extend
σ in such a way that all constraints of Ge are satisfied. If σ does not satisfy
ce , then we extend σ in any way. In the worst case, no constraints of Ge are
satisfied. Thus for every constraint satisfied in G, at least r constraints are
9.4. Putting everything together 57

satisfied in Gpre . Thus


r · |E| · UNSAT(G)
UNSAT(Gpre ) ≤ = UNSATσ (G) = UNSAT(G).
|E 0 |
For the other inequality, let σ 0 be an optimum assignment for G0 . For
each set of ` nodes v1 , . . . , v` , representing v, v ∈ V , we interpret the as-
signment σ 0 to these nodes as a string x ∈ {0, 1}` . We set σ(v) to be the
element a ∈ Σ̂ whose encoding x̂ minimizes δ(x, x̂). We will now show that
for every constraint ce that is not satisfied by σ, a constant fraction of the
constraints of Ge is not satisfied. This will complete the proof. Let v and w
be the two nodes of e and x and y be the strings given by σ 0 and x̂ and ŷ
be the nearest code words. Since e is not satisfied, either x or y differs from
each satisfying assignment of φe in a least ρ/2 of the bits. If this were not
the case, then x and y would have been decoded to an assignment satisfying
e. Thus x and y in total differ from any satisfying assignment of φe in a
fraction of ρ/4 of the bits. But then, by the properties of an assignment
tester, also a fraction of  · ρ/4 of the constraints of Ge are not satisfied.

9.4 Putting everything together


If we put together the constructions of the three previous sections, we get
the following result.
Lemma 9.9. There are constants C > 0 and 1 > a > 0 such that for
every constraint graph G over the alphabet Σ = {0, 1}3 , we can construct a
constraint graph G0 over the same alphabet in polynomial time such that
1. size(G0 ) ≤ C · size(G),
2. If UNSAT(G) = 0, then UNSAT(G0 ) = 0,
3. UNSAT(G0 ) ≥ min{2 · UNSAT(G), a}.
Proof. We start with G, make it an expander, then amplify the gap (the
value t is yet to choose) and finally reduce the alphabet. It is clear that if
we choose t to be a constant, then the first two statements are fulfilled.
It remains to choose t in such a way that the third statement is fulfilled.
We have
√ 1
UNSAT(G0 ) ≥ βred · βamp · t · min{UNSAT(Gpre ), }
2t
√ 1
≥ βred · βamp · t · min{βpre · UNSAT(G), }
2t
 2
1
If we now set t = 4 βpre βamp βred , we get

UNSAT(G0 ) ≥ min{2 · UNSAT(G), a}


58 9. The final proof

2
βpre βamp 2
βred
with a = 4 .
With this lemma, the proof of the PCP theorem follows easily. We start
with the observation that the decision version of constraint graph satisfac-
tion is NP-complete, i.e., gap(1 − 1/|E|, 1)-Max-CGS is NP-hard. Let G be
an input graph. If we now apply the above lemma log |E| times, we get
an graph G0 that can be computed in time polynomial in size(G) with the
property that
1
UNSAT(G0 ) ≥ min{2log |E| · , a} = a
|E|

is constant. Thus we have a reduction from gap(1 − 1/|E|, 1)-Max-CGS to


gap(1 − a, 1)-Max-CGS. In particular, the latter problem is also NP-complete.
But this is equivalent to the statement of the PCP theorem.
10 Average-case complexity

Being intractable, e.g., NP-complete, does not completely reflect the dif-
ficulty of a problem. Approximability is one way of refining the notion of
intractability: We have seen some NP-hard optimization problems, for which
finding a close-to-optimal solution is easy, and others, for which finding even
a very weak approximation is as hard as solving an NP-complete problem.
Average-case complexity is another way of refining the intractability of
problems: Unless P = NP, no efficient algorithm exists to solve an NP-
complete problem efficiently on all instances. However, we may still hope
that we can solve the problem efficiently on most instances or on typical
instances, where “typical” here means something like “sampled from a dis-
tribution that reflects practical instances.”
Another motivation for studying average-case complexity is cryptogra-
phy: A cryptographic system is secure only if any efficient attempt to break
it succeeds only with a very small probability. Thus, it does not help if a
cryptographic system is hard to break only in the worst case. In fact, most
of cryptography assumes that NP problems exist that are intractable not
only in the worst but even in the average case, i.e., on random inputs.
Bogdanov and Trevisan give a well-written survey about average-case
complexity [BT06]. They also cover connections of average-case complexity
to areas like cryptography.

10.1 Probability distributions and distributional prob-


lems
What probability distribution should we take? What probability distribu-
tions should we allow? How should we model that we have inputs of various
sizes?
There are essentially two ways to deal with inputs of various sizes: First,
for each n ∈ N, we can have a distribution Dn , from which we draw the
instances of “size” n (e.g., n can be the length of the strings). Combining
D1 , D2 , D3 , . . ., we get an ensemble D = (Dn )n≥1 of probability distribu-
tions.
The second possibility is to have a single probability distribution D for
strings of all lengths. This is convenient in some applications, and it leads to
a simple notion of reducibility that preserves average-case tractability. But
it is difficult, e.g., to define circuit complexity for this case, and it is also
sometimes counterintuitive. For instance, since {0, 1}? is a countable infi-

59
60 10. Average-case complexity

nite set, there is no “uniform” distribution on {0, 1}? . Instead, the distribu-
tion that is commonly called the “uniform distribution” assigns probability
6 −2 −n
π2
n 2 to every string of length n ≥ 1 (or, to get rid of the factor 6/π 2 ,
1
we assign probability n(n+1) 2−n to strings of length n). We will use ensem-
bles of distributions as it will be more convenient. However, most results
hold independent of which possibility we choose. The uniform distribution
U = (Un )n∈N is then given by Un (x) = 2−n for x ∈ {0, 1}n .
In order to allow for different probability distributions, we do not fix
one, but we will consider distributional problems in the following.

Definition 10.1. A distributional decision problem is a pair Π = (L, D),


where L is a language and D = (Dn )n≥1 is an ensemble of probability dis-
tributions, where each Dn has finite support.

By supp(D) = {x | D(x) > 0}, we denote the support of a probability


distribution. You can think of supp(Dn ) ⊆ {0, 1}n , but this will not al-
ways be the case. However, we will have supp(Dn ) ⊆ {0, 1}≤p(n) for some
polynomial p for the distributions that we consider.
What we would now like to have is something like an “average-case NP-
hard” distributional problem Π = (L, D): If Π is average-case tractable, then
(L0 , D0 ) is average-case tractable for each L ∈ NP and each ensemble D0 .
However, as we will show, a statement like “every (L0 , D0 ) is average-case
tractable” is the same as “P = NP”. (The previous sentence is non-trivial:
For the same language L0 , we can use different algorithms for different dis-
tributions D0 .) Thus, the average-case analog of NP-completeness cannot
refer to arbitrary probability distributions. We will restrict ourselves to
two possible sets of distributions, namely polynomial time samplable and
polynomial time computable ensembles.

Definition 10.2. An ensemble D = (Dn ) is polynomial-time samplable


if there exists a randomized algorithm A that takes an input n ∈ N and
produces a string in A(n) ∈ {0, 1}? with the following properties:

• There exists a polynomial p such that A on input n is p(n) time


bounded, regardless of the random bits A reads.

• For every n ∈ N and every x ∈ {0, 1}? , we have Pr(A(n) = x) =


Dn (x).

PSamp denotes the set of all polynomial-time samplable ensembles.

Several variants of definitions of polynomial-time samplable distributions


exist. For instance, one can relax the strict bound on the running-time and
require only that A runs in expected polynomial time. Such finer distinctions
are important, e.g., in the study of zero-knowledge proofs, but we will not
elaborate on this.
10.2. Average polynomial time and heuristics 61

To define the second set of distributions, let ≤ denote the lexicographic


ordering between bit strings. (We have x ≤ y if |x| < |y| or |x| = |y| and x
appears before y in lexicographic order.) Then the cumulative probability of
x with respect to a probability distribution D is defined by
X
fD (x) = D(y) .
y≤x

Definition 10.3. An ensemble D = (Dn ) is polynomial-time computable if


there exists a deterministic algorithm A that, on input n ∈ N and x ∈ {0, 1}? ,
runs in time poly(n) and computes fDn (x).
PComp denotes the set of all polynomial-time computable ensembles.

If a distribution fDn (x) is computable in time poly(n), then also the


density function Dn (x) is computable in time poly(n). The converse does
not hold unless P = NP.

Exercise 10.1. Show that there exists an ensemble D = (Dn )n∈N such that
the density functions Dn are polynomial-time computable but D ∈ / PComp
unless P = NP. The latter means that if the functions fDn are computable
in polynomial time, then P = NP.

In the following, we will focus on ensembles from PComp. Many results


will nevertheless carry over to the wider class PSamp.

Exercise 10.2. Show that PComp ⊆ PSamp.

The converse is unlikely to be true.

Exercise 10.3. Show that PComp = PSamp if and only if P = P#P .


Hint: For ⇒, define an appropriate distribution on hϕ, ai (choose pairing
function and encoding carefully), where ϕ is a Boolean formula and a is an
assignment for ϕ. Given ϕ, the probability of hϕ, ai should essentially depend
on whether a satisfies ϕ. Show that the cumulative distribution function can
be used to compute the number of satisfying assignments.

One can argue that PSamp is the class of natural distributions: A distri-
bution is natural if we can efficiently sample from it. This, however, does not
mean that their density or distribution function is efficiently computable.

10.2 Average polynomial time and heuristics


When can we say that a problem can be solved efficiently on average? A
first attempt might be to consider the expected running-time with respect
to a given probability distribution. However, this definition, although very
natural, does not yield a robust class of average tractable problems.
62 10. Average-case complexity

Exercise 10.4. Show that the class


{(L, D) | L can be solved in expected polynomial time with respect to D}
is not invariant under changes of the machine model: Let A be an algorithm
with running-time t, and let B be a simulation of A that is, say, quadratically
slower. Give a function t such that A has expected polynomial running-time
but B has not.
Another way would be to consider the median of the running-time. At
least, this would be a definition that is robust against changes of the machine
model. However, you probability would not call an algorithm efficient if it
requires linear time on 70% of the instances and exponential time on the
rest. So what about requiring that the algorithm runs in polynomial time
on 99% of the instances? The threshold 99% would be arbitrary, there is no
reason why 99% is preferable to 98% or 99.9%. But with any such threshold,
there would still be a constant fraction of the inputs, for which the algorithm
performs poorly.
But what would be a natural way of defining “average tractability”?
Intuitively, one would probably say that an algorithm is efficient on average
if instances that require longer and longer running time show up with smaller
and smaller probability: An algorithm A has polynomial average running-
time if there exists a constant c > 0 such that the probability that A requires
more than time T is at most poly(n)/T c . In this way, we have a polynomial
trade-off between running-time and fraction of inputs.
Definition 10.4. An algorithm A with running-time tA has average poly-
nomial running-time with respect to the ensemble D if there exists an ε > 0
and a polynomial p such that, for all n and t,
p(n)
Pr (tA (x, n) ≥ t) ≤ .
x∼Dn tε
If an algorithm A has average polynomial running-time, then its median
running-time is polynomial, it runs in polynomial time on all but a 1/ poly(n)
fraction of inputs, and in time npolylog(n) on all but an n− polylog(n) fraction
of the inputs, and so on.
The whole field of average-case complexity theory was essentially founded
by Leonid Levin [Lev86]. (Leonid Levin is the guy who did not get a Tur-
ing award for inventing the Russian analog of NP completeness. His pa-
per [Lev86] is a good candidate for the shortest paper in theoretical com-
puter science that founded an area: The conference version of the paper is
one page long, the final journal paper has two pages.) Levin’s original defi-
nition is different, but turns out to be equivalent: An algorithm has average
polynomial running-time with respect to D if there exists an ε > 0 such that
E (tA (x, n)ε ) = O(n) .
x∼Dn
10.2. Average polynomial time and heuristics 63

Exercise 10.5. Prove that Definition 10.4 and Levin’s definition are indeed
equivalent.
Exercise 10.6. Actually, Levin’s original definition uses a single distribu-
tion D : {0, 1}+ → [0, 1]. Given an ensemble D = (Dn )n≥1 with supp(Dn ) ⊆
{0, 1}n , we obtain a single distribution D : {0, 1}+ by setting D(x) =
6
D (x).
π 2 |x|2 |x|
A function t : {0, 1}+ → N is called polynomial on D-average if there
exists constants k and c such that
X t(x)1/k
D(x) ≤ c .
|x|
x∈{0,1}+

Prove that t is polynomial on D-average if and only if Prx∼Dn t(x, n) ≥


 p(n)
t ≤ tε for all n, some polynomial p, and some ε > 0.
In addition to how to measure time, we have the choice to use determin-
istic or randomized algorithms or even non-uniform families of circuits.
“In practice”, we would probably not run an algorithm forever, but only
for a polynomial number of steps. We can model this by an algorithm with
worst-case polynomial running-time that “fails” on some inputs. This leads
to the following definition, where “failure” means that the algorithm says “I
don’t know.”
Definition 10.5. Let Π = (L, D) be a distributional problem with D = (Dn ).
An algorithm A is a (fully polynomial-time) errorless heuristic scheme for
Π if there is a polynomial p such that the following holds:
• For every n, every δ > 0, and every x ∈ supp(Dn ), A(x, n, δ) outputs
either L(x) or the failure symbol ⊥.

• For every n, every δ > 0, and every x ∈ supp(Dn ), A(x, n, δ) runs in


time p(n, 1/δ).

• For every n and every δ > 0, we have Prx∼Dn (A(x, n, δ) = ⊥) ≤ δ.


This, however, is yet another way of defining average-polynomial time.
Exercise 10.7. Show that a distributional problem Π admits an errorless
heuristic scheme if and only if it admits an algorithm whose running-time
is average-polynomial according to Definition 10.4.
Now that we have three equivalent definitions of “tractable on average”,
we will finally define a complexity class of distributional problems that are
tractable on average.
Definition 10.6. The class AvgP is the set of all distributional problems
that admit an errorless heuristic scheme.
64 10. Average-case complexity

Exercise 10.8. Let

3COL = {G | G is 3-colorable} ,

let Gn,1/2 be the uniform distribution on graphs with n vertices, and let
G1/2 = (Gn,1/2 )n∈N . Show that (3COL, G1/2 ) ∈ AvgP.
Note: If we include every edge of n2 with a probability of p, i.e., the
n
probability of getting G = (V, E) is p|E| (1 − p)( 2 )−|E| , then this probability
distribution is called Gn,p .
Instead of having δ as part of the input, we can also have a function
δ : N → (0, 1] of failure probabilities. This leads to the following definition.
Definition 10.7. Let Π = (L, D) be a distributional problem, and let δ :
N → (0, 1]. An algorithm A is an errorless heuristic algorithm for Π with
failure probability at most δ if the following properties are fulfilled:
• For every n ∈ N and every x ∈ supp(Dn ), A(x, n) outputs either L(x)
or ⊥.

• For every n ∈ N, we have Prx∼Dn (A(x, n) = ⊥) ≤ δ(n).


For a time bound t, we say that Π ∈ Avgδ DTime(t) if there exists an
errorless heuristic deterministic algorithm A that, for every n, runs on time
at most t(n) for Sall x ∈ supp(Dn ) and has failure probability at most δ.
Let Avgδ P = p:polynomial Avgδ DTime(p).
So far, all algorithms that we considered never produced wrong answers.
They might return “don’t know”, but if they give an answer, it is the correct
answer. Weakening the requirement yields the following definition.
Definition 10.8. An algorithm A is called a (fully polynomial-time) heuris-
tic scheme for Π = (L, D) if there exists a polynomial p such that the fol-
lowing holds:
1. For every n, every x ∈ supp(Dn ), and every δ > 0, A(x, n, δ) runs in
time p(n, 1/δ).

2. For every δ > 0, A(·, ·, δ) is a heuristic algorithm for Π with error


probability at most δ.
Let HeurP be the set of all distributional problems that admit a heuristic
scheme.
Definition 10.9. Let Π = (L, D) be a distributional problem, and let δ :
N → (0, 1]. An algorithm A is a heuristic algorithm with error probability
at most δ for Π if, for all n,

Pr A(x, n) 6= L(x) ≤ δ(n).
x∼Dn
10.3. A distribution for which worst case equals average case 65

For a time bound t and δ : N → (0, 1], we have Π ∈ Heurδ DTime(t) if


there exists a heuristic deterministic algorithm A such that, for every n and
every x ∈ supp(Dn ), A(x, n) runs in time t(n) with failure probability at
most δ(n). S
Let Heurδ P = p:polynomial Heurδ DTime(p).

Exercise 10.9. Prove the following: For every constant c, AvgP ⊆ Avgn−c P
and HeurP ⊆ Heurn−c P.

So far, we have defined classes of problems that are tractable (to various
degrees) on average. Now we define the average-case analog of NP, which is
called DistNP.

Definition 10.10. DistNP is the class of all distributional problems Π =


(L, D) with L ∈ NP and D ∈ PComp.

Similar to P versus NP, the central question in average-case complexity


is whether DistNP ⊆ AvgP. Note that AvgP = DistNP does not hold: First,
there is no restriction on the probability distributions for the problems in
AvgP. Second, AvgP might contain problems that are not even in NP.

10.3 A distribution for which worst case equals aver-


age case
There exists an ensemble for which worst case and average case are equiva-
lent. Thus, the study of average-case complexity with respect to all (instead
of only samplable or computable) distributions reduces to worst-case com-
plexity. To get meaningful results in average-case complexity, we thus have
to restrict ourselves to sufficiently simple sets of ensembles like PComp or
PSamp.

Theorem 10.11. There exists an ensemble K such that if L is a decidable


language and the distributional problem (L, K) is in Heuro(1) P, then L ∈ P.

For the proof of this theorem, we need Kolmogorov complexity (see also
the script of the lecture “Theoretical Computer Science”). To define Kol-
mogorov complexity, let us fix a universal Turing machine U . The Kol-
mogorov complexity of a string x ∈ {0, 1}? is the length of the shortest
string c such that U on input c = hg, yi outputs x. Here, g is a Gödel num-
ber of a Turing machine and y is an input for the Turing machine Mg . We
denote the length |c| by K(x).
The conditional Kolmogorov complexity K(x|z) is the length of the short-
est string c = hg, yi such that Mg on input y and z outputs x.

Example 10.12. We have K(0n ) = log n + O(1) and K(0n | bin(n)) = O(1)
and K(x) ≤ |x| + O(1).
66 10. Average-case complexity

By abusing notation, we write K(x|n) instead of K(x| bin(n)) for |x| = n.


This is also called the length-conditioned Kolmogorov complexity.
Proof of Theorem 10.11. The universal probability distribution K =
(Kn )n∈N assigns everyP string x −K(x|n)
of length nPa probability proportional to
2−K(x|n) : We have x∈{0,1} n 2 = x Kn (x) ≤ 1 since h·, ·i is a
prefix-free code (this follows from Kraft’s inequality). By scaling with an
appropriate constant ≥ 1, we make sure that the sum equals 1.
Since L ∈ Heuro(1) P, there exists a heuristic algorithm A for L that fails
with probability at most o(1). Consider a string x of length n such that
A(x) 6= L(x). Since the overall probability of such strings is at most o(1),
we have Kn (x) = o(1). Thus,

K(x|n) = − log Kn (x) + O(1) = ω(1).

This lower bound for the Kolmogorov complexity holds for all strings on
which A fails.
Now let x0 be the lexicographically first string of length n on which
A fails. Since L is decidable, x0 can be computed given n. This implies
K(x0 |n) = O(1). To conclude, we observe that for sufficiently large n, no
string exists on which A fails. Thus, A fails only on finitely many strings,
which proves L ∈ P.
Exercise 10.9 immediately yields the following result.

Corollary 10.13. If (L, K) ∈ HeurP, then L ∈ P.

Exercise 10.10. An interesting feature of the universal distribution is the


following: Let A be any algorithm with running-time t : {0, 1}? → N. Then
the expected running-time with respect to K is asymptotically equal to the
worst-case running-time:
 
E t(x) = Θ max t(x) ,
x∼Kn x∈{0,1}n

where the constant hidden by Θ depends only on the algorithm A.


Prove this!

Exercise 10.11. Let M = {1, . . . , m}. An encoding of M is an injective


mapping c : M → {0, 1, . . . , γ − 1}? . The code c is called prefix-free if there
are no i 6= j in M such that c(i) is a prefix of c(j).
Prove the following: Assume that we are given lengths `1 , . . . , `m ∈ N.
Then there exists a prefix-free code c for M with |c(i)| = `i if and only if
P m −`i ≤ 1. (This is called Kraft’s inequality.)
i=1 γ
11 Average-case completeness

The goal of this section is to prove that DistNP = (NP, PComp) contains a
complete problem. There are three issues that we have to deal with: First,
we need an appropriate notion of reduction. Second, we have to take care
of the different probability distributions, which can differ vastly. Third, and
this is the easiest task, we need an appropriate problem with an appropriate
probability distribution.

11.1 Reductions
For reductions between distributional problems, the usual many-one reduc-
tions do not suffice: A feature that a suitable reduction should have is that
if (A, D) reduces to (A0 , D0 ) and (A0 , D0 ) ∈ AvgP, then (A, D) ∈ AvgP. Let
us consider the following example: We use the identity mapping as reduc-
tion between (A, D) and (A, D0 ). But D assigns high probability to the hard
instances, whereas D0 assigns small probability to hard instances. Then
(A, D0 ) is tractable on average, but (A, D) is not.
What we learn is that a meaningful notion of reduction must take into
account the probability distributions.

Definition 11.1. Let Π = (L, D) and Π0 = (L0 , D0 ) be distributional prob-


lems. Then Π reduces to Π0 , denoted by Π ≤AvgP Π0 , if there is a function
f that, for every n and every x in the support of Dn , can be computed in
time polynomial in n such that

1. (correctness) x ∈ L if and only if f (x, n) ∈ L0 and

2. (domination) there exist polynomials p and m such that, for every n


0
and every y in the support of Dm(n) ,
X
0
Dn (x) ≤ p(n) · Dm(n) (y) .
x:f (x,n)=y

The first condition is usual for many-one reductions. The second condi-
tion forbids the scenario sketched above: Drawing strings according to Dn
and then using f (·, n) yields a probability distribution of instances for L0 .
(Of course, we just get binary strings again. But we view them as instances
for L0 .) Then the second property makes sure that no string y is generated
0
with much larger a probability than if y had been drawn according to Dm(n) .

67
68 11. Average-case completeness

Lemma 11.2. Let C ∈ {AvgP, HeurP}. If Π ≤AvgP Π0 and Π0 ∈ C, then


Π ∈ C.
Proof. We only consider the case C = AvgP. The other case is similar.
Suppose Π0 ∈ AvgP, and let A0 be a fully polynomial-time errorless heuristic
scheme for Π0 . Let f be a reduction from Π to Π0 , and let p and m be the
polynomials of Definition 11.1.
We claim that A(x, n, δ) = A0 f (x, n), m(n), δ/p(n) is a fully poly-


nomial-time errorless heuristic scheme for Π. To prove this, let Bn = {y ∈


0
supp(Dm(n) ) | A0 (y, m(n), δ/p(n)) = ⊥} be the set of strings on which A0
fails.
Since A0 is a fully polynomial-time errorless heuristic scheme, we have
0
Dm(n) (Bn ) ≤ δ/p(n). With this, we get

Pr A(x, n, δ) = ⊥ = Pr A0 f (x, n), m(n), δ/p(n) = ⊥


  
x∼Dn x∼Dn
X
= Dn (x)
x:f (x,n)∈Bn
X
0
≤ p(n)Dm(n) (y)
y∈Bn
0
= p(n) · Dm(n) (Bn ) ≤ δ .

The inequality holds because the reduction must fulfill the domination con-
dition. Thus, Π ∈ AvgP.
It is also not hard to show that ≤AvgP is transitive.
Lemma 11.3. Let Π = (L, D), Π0 = (L0 , D0 ), and Π00 = (L00 , D00 ) be distri-
butional problems with Π ≤AvgP Π0 and Π0 ≤ Π00 . Then Π ≤AvgP Π00 .
Proof. Let f be a reduction from Π to Π0 , and let g be a reduction from Π0
to Π00 . Let p and m the polynomials of Definition 11.1 for f , and let p0 and m0
be the polynomials for g. Obviously, h given by h(x, n) = g(f (x, n), m(n))
is polynomial-time computable and a many-one reduction from L to L00 .
To prove that it fulfills domination remains to be done. Therefore, Let
q(n) = p(n) · p0 (m(n)), and let `(n) = m0 (m(n)). The functions q and `
are obviously polynomials. Now let n be arbitrary, and let z ∈ supp(D`(n) 00 ).

Then
X X X
Dn (x) = Dn (x)
x:h(x,n)=z y:g(y,m(n))=z x:f (x,n)=y
X
0
≤ p(n)Dm(n) (y)
y:g(y,m(n))=z

≤ p(n)p0 (m(n))Dm
00 00
0 (m(n)) (z) = q(n)D`(n) (z) .
11.2. Bounded halting 69

11.2 Bounded halting


In this section, we present a problem that is complete for DistNP. It is the
bounded halting problem, which is the generic NP-complete problem:

BH = hg, x, 1t i | Mg is a non-deterministic Turing machine




that accepts x in at most t steps .

We will show that (BH, U BH ) is DistNP-complete, where U BH = (UnBH )n∈N


is some kind of uniform distribution on the inputs for BH.
The main challenge in the proof that (BH, U BH ) is DistNP-complete is
that problems Π = (L, D) from DistNP have very different distributions.
Thus, just using the many-one reduction from L to BH is unlikely to work.
The key idea is to find an injective mapping C with the property that
C(x) is almost uniformly distributed if x is distributed according to D. The
following lemma makes this more precise and proves that such a function
exists.

Lemma 11.4. Let D = (Dn )n∈N ∈ PComp be an ensemble. Then there


exists an algorithm C with the following properties:

1. C(x, n) runs in time polynomial in n for all x ∈ supp(Dn ),

2. for every n and x, x0 ∈ supp(Dn ), C(x, n) = C(x0 , n) implies x = x0


(so C is somewhat “injective”), and

3. |C(x, n)| ≤ 1 + min |x|, log |Dn1(x)| .




Proof. Consider any x ∈ supp(Dn ). If Dn (x) ≤ 2−|x| , then let C(x, n) =


0x. If Dn (x) > 2−|x| , then let y be the string that precedes x in lexicographic
order, and let p = fDn (y). Then we set C(x, n) = 1z, where z is the longest
common prefix of the binary representation of p and fDn (x) = p + Dn (x).
Since D ∈ PComp, the string z can be computed in polynomial time. Thus,
C can be computed in polynomial time. (This also shows that |C(x, n)| is
bounded by a polynomial in |x|.)
It remains to prove that C is injective and fulfills the length condition.
C is injective because no three strings can have the same longest common
prefix: If z is the longest common prefix of x1 and x2 and z is also a prefix
of x3 , then also either z0 or z1 is a prefix of x3 and it is also a prefix of
either x1 and x2 .
Finally, we observe that either C(x, n) = 0x or C(x, n) = 1z for the
z described above. In the former case, we have |C(x, n)| ≤ 1 + |x| and
Dn (x) ≤ 2−|x| . Thus, also 1 + log(1/Dn (x)) ≥ |x| + 1 = |C(x, n)|. In the
latter case, we have Dn (x) ≤ 2−|z| . Thus, |C(x, n)| = 1 + |z| ≤ log(1/Dn (x))
and, since Dn (x) ≥ 2−|x| , |C(x, n)| ≤ 1 + |x|.
70 11. Average-case completeness

Now let us focus on U BH . The instances of BH are triples hg, x, 1t i of


length 2 log |g| + 2 log |x| + 2 log t + |x| + |g| + t + Θ(1). Note that this
representation is prefix-free. We draw such instances of length at most N
as follows: We flip random bits b1 , b2 , . . . until either i = N or b1 . . . bi has
the form hg, xi. In the former case, we output b1 . . . bN , in the latter we
output hg, x, 1N −i i. We denote this distribution by UN BH . The probability of
t
an instance hg, x, 1 i under this distribution is
BH
hg, x, 1t i = 2−(2 log |g|+2 log |x|+|g|+|x|+Θ(1) ,

UN

where t = N − |hg, xi|.


With this preparation, we can prove the main theorem of this section.
The key idea is to use C to “compress” the inputs: While Dn can be any
distribution, the images C(x, n) with x drawn according to Dn are, more or
less, uniformly distributed in the sense of U BH : If x is likely, then C(x, n) is
short. If x is unlikely, then C(x, n) is long. Thus, if we draw random bits
until we have seen an image y = C(x, n), then the probability of seeing y is
roughly Dn (x).
Theorem 11.5. (BH, U BH ) is DistNP with respect to ≤AvgP .
Proof. Let Π = (L, D) ∈ DistNP be arbitrary, i.e., L ∈ NP and D ∈
PComp. Let M be a nondeterministic Turing machine that accepts an input
string y if and only if there exists a string x ∈ L with C(x, n) = y. Since C
is polynomial-time computable and L ∈ NP, we can assume that M obeys
a polynomial time bound q. Let g be the Gödel number of M .
Let us describe the reduction from (L, D) to (BH, U BH ): On input x and
parameter n, the reduction outputs an instance hg, C(x, n), 1t(x) i of length
N = N (n). We choose N to be a sufficiently large polynomial to make sure
that t(x) ≥ q(n).
Obviously, we have x ∈ L if and only if hg, C(x, n), 1t(x) i ∈ BH. To verify
the domination condition, we exploit that C is injective. Thus, it suffices
to check that, for every n and every x ∈ supp(Dn ), we have Dn (x) ≤
poly(n) · UNBH (hg, x, 1t(x) i).

Let ` = |g| be the length of the encoding of M , which is fixed. Then


BH
hg, C(x, n), 1q(n) i = 2−(2 log `+2 log |C(x,n)|+`+|C(x,n)|+Θ(1)) .

UN

Now log |C(x, n)| ≤ log(m(n)) + 1 and |C(x, n)| ≤ log Dn1(x) + 1 yields


1
BH
hg, C(x, n), 1q(n) i ≥ 2| −(2 {z
log `+`)

UN }· (m(n) + 1)2
· Dn (x) · Ω(1) ,
=Θ(1)

which proves that domination is fulfilled.


Let us make a final remark on the function C. This function C is some-
times called a compression function for D. It plays a crucial role in the
11.3. Heuristic algorithms vs. heuristic schemes 71

completeness proof, as it makes the reductions, which have to meet the


domination requirement, possible in the first place. Why is C called com-
pression function? Assume that we are given samples drawn according to D.
If we compress them using C, we have a compression with close to optimal
compression rate.

11.3 Heuristic algorithms vs. heuristic schemes


In the previous chapter, we distinguished between algorithms and schemes:
An algorithm has a fixed failure probability (fixed means a fixed function,
not a fixed constant), whereas a scheme works for all failure probabilities δ,
but the running-time depends on δ.
By Exercise 10.9, if a problem Π admits a heuristic scheme, then it
admits heuristic algorithms with error probabilities n−c for every constant c.
The containment in the other direction does not hold. For instance,
Avg1/n P contains undecidable problems, whereas AvgP does not.
Exercise 11.1. 1. Show that there exists an undecidable problem L with
(L, U) ∈ Avg1/n P.
2. Show that AvgP does not contain undecidable problems (L, U).
But if we restrict ourselves to problems in DistNP, the other containment
can be proved: DistNP as a whole admits heuristic schemes if and only if it
admits heuristic algorithms.
Theorem 11.6. Let c > 0 be arbitrary. If (BH, U BH ) ∈ Avgn−c P, then
DistNP ⊆ AvgP. The same holds for Heurn−c P and HeurP.
Proof. For simplicity, we will only consider AvgP and c = 1. By the
completeness of (BH, U BH ), it suffices to show (BH, U BH ) ∈ AvgP.
Let A be an errorless heuristic algorithm for (BH, U BH ) with error prob-
ability 1/n. We will use A to construct an errorless heuristic scheme A0 .
The idea is to use padding to map short instances of BH to longer instances.
Then we exploit that the error probability of A decreases with growing input
length.
Let N be the length of an instance I = hg, x, 1t i of BH. Then we set
 
A0 (I, N, δ) = A hg, x, 1t+1/δ i, N + 1δ .

(A0 immediately rejects inputs that are not syntactically correct.) Note that
BH
(I) = U BH 1 hg, x, 1t+1/δ i

UN
N+ δ

BH . On inputs from U BH
by the definition of UN N +1/δ , algorithm A outputs ⊥
with a probability of at most 1/(N + 1/δ) < δ. Thus, A0 outputs ⊥ on at
most a δ fraction of the instances obtained from UN BH .
72 11. Average-case completeness

11.4 More DistNP-complete problems


Here, we list some more DistNP-complete problems without proving that
they are. For the proofs as well as some other problems, we refer to Wang’s
survey of DistNP-complete problems [Wan97].
There are not too many DistNP-complete distributional problems (L, D),
where both L and D are natural. The main issue is that we lack a pow-
erful tool to prove the a problem is hard-on-average like the PCP theorem
for (in)approximability. So, in some sense, average-case complexity is in a
similar state as the complexity of optimization was before the PCP theorem.

Tiling
A tile is a square with a symbol on each of its four sides. Tiles must not be
rotated or turned over. If we have a set T of tiles, we assume that we have
an infinite number of tiles of any kind in T . A tiling of an n × n square is
an arrangement of n2 tiles that cover the square such that the symbols of
the adjacent sides of the tiles agree. The size of a tile is the length of the
binary representation of its four symbols.

Instance: A finite set T of tiles, an integer n > 0, a sequence s1 , . . . , sk ∈ T


of tiles that match each other (the right side of si matches the left side
of si+1 ) The size of the instance is n plus the sizes of the tiles in T
plus the sizes of s1 , . . . , sk .

Question: Can s1 , . . . , sk be extended to a tiling of the n × n square using


only tiles from the set T ?

Distribution: Given n, select T using your favorite probability distribution


(this really does not matter much; for the reduction, T just represents
the Turing machine deciding a language, and this Turing machine has
constant size for any fixed language). Select k uniformly at random
from {1, . . . , n}. Finally, select s1 uniformly from T and select si+1
randomly from T such that it matches si .

Levin’s original DistNP-complete problem was a variant of tiling, where


the corners instead of the sides of the tiles had to match.

Post correspondence
The Post correspondence problem is one of the better known undecidable
problems. In a restricted variant, it becomes NP-complete. Together with
an appropriate probability distribution, it becomes DistNP-complete.

Instance: A positive integer n, and a list hx1 , y1 i,P


. . . , hxm , ym i of pairs of
strings. The length N of the instance is n + m i=1 |xi | + |yi |.
11.4. More DistNP-complete problems 73

Question: Is there a sequence i1 , . . . , in ∈ {1, . . . , m} of indices such that


xi1 xi2 . . . xin = yi1 yi2 . . . yin ?

Distribution: Draw m according to Pr(m = µ) = Θ(1/µ2 ). Then draw


x1 , . . . , xm and y1 , . . . , ym according to the uniform distribution on
{0, 1}+ defined in Section 10.1.

Arbitrary NP-complete problems


If some problem (L, D) with L ∈ NP and D ∈ PComp is hard-on-average,
then every NP-complete language A is hard-on-average with respect to some
samplable ensemble E. The ensemble E, however, might look a bit unnatural.
In particular, for every NP-complete language A, there exists an ensemble
E ∈ PSamp such that (A, E) is DistNP-hard.
12 Average case versus worst case

In this section, we will show some connections between average-case and


worst-case complexity. We will first provide a condition under which a distri-
butional problem is not DistNP-complete unless EXP = NEXP. Second, we
will show that DistNP is not contained in AvgP unless E = NE. (Recall that
E = DTime(2O(n) ) and NE = NTime(2O(n) ), whereas EXP = DTime(2poly(n) )
and NEXP = DTime(2poly(n) ).)

12.1 Flatness and DistNP-complete problems


So under which conditions is a distributional problem DistNP-complete?
Gurevich gave a partial answer: Π = (L, DistNP) cannot be DistNP-complete
if D assigns only very little weight to all strings in supp(Dn ). The intuition
is the following: Assume that we have a distributional problem Ψ = (A, E)
that reduces to Π. Assume further that E assigns high weight to few strings.
Then, in order to satisfy the domination requirement, also D must assign
somewhat high weight to some strings. The following definition makes the
notion of “assigns very little weight to all strings” precise.

Definition 12.1. An ensemble D = (Dn )n∈N is flat if there exists an ε > 0


ε
such that, for all n and x, Dn (x) ≤ 2−n .

Exercise 12.1. Show that G1/2 (introduced in Exercise 10.8) is flat.

Exercise 12.2. Show that U BH (see Section 11.2) is not flat.

Theorem 12.2 (Gurevich [Gur91]). If there is a DistNP-complete problem


(L, D) with a flat ensemble D, then NEXP = EXP.

Proof overview: Assume that Π = (L, D) is DistNP-complete, D is flat,


and there exists a reduction from Ψ = (A, E) to Π, where E = (En )n∈N is
non-flat. Let f be a reduction from Ψ to Π. In order to maintain domination,
f must map strings x ∈ supp(En ) to very short strings f (x). Short strings,
however, mean (relatively) short running-time.

Proof. Assume that there exists a DistNP-complete distributional prob-


lem Π = (L, D), where D is a flat ensemble. Obviously, L ∈ EXP. Now let
A ∈ NEXP be arbitrary. Let p be a polynomial such that A ∈ NTime(2p(n) ).
p(n)
For a string x ∈ {0, 1}? with |x| = n, let x0 = x012 −n−1 . Let A0 = {x0 |

74
12.1. Flatness and DistNP-complete problems 75

x ∈ A}. Since A ∈ NEXP, the language A0 is in NP. Let E = (En )n∈N be


the following ensemble:
(
2−|x| if z = x0 for some string x and
E2p(n) (z) =
0 otherwise.

Since E is computable, we have (A0 , E) ∈ DistNP. Thus, there exists a


reduction f from (A0 , E) to (L, D). Let us make a few observations.
• Given x of length n, f (x0 , 2p(n) ) can be computed in time 2q(n) for
some polynomial q.

• The function x 7→ f (x0 , 2p(n) ) is a many-one reduction from A to L,


i.e., x ∈ A if and only if f (x0 , 2p(n) ) ∈ L.

• There exist polynomials m and r such that


X
E2p(n) (z) ≤ r(2p(n) ) · Dm(2p(n) ) (f (x0 , 2p(n) ))
z:f (x0 ,2p(n) )=f (z,2p(n) )

for all n and x. (This might look confusing at first glance since the
strings x0 , z and f (x0 ) are exponentially long, but it is just the domi-
nation condition.)
Now we have E2p(n) (x0 ) = 2−n . Thus, domination implies that

2−n
Dm(2p(n) ) (f (x0 , 2p(n) )) ≥ = 2−s(n) (12.1)
r(2p(n) )
for some polynomial s. Since D is flat, there exists an ε > 0 such that
p(n) ))ε
Dm(2p(n) ) (f (x0 , 2p(n) )) ≤ 2−(m(2 (12.2)

From the two bounds (12.1) and (12.2) on Dm(2p(n) ) , we get

s(n) ≥ m(2p(n) )ε . (12.3)

Now we are almost done. Since the images f (x0 , 2p(n) ) are in supp(Dm(2p(n) ) ),
they are polynomially bounded in m(2p(n) ). By (12.3), this quantity is
bounded from above by a polynomial s(n). Hence, A ∈ EXP: (1) x ∈ A if
and only if f (x0 , 2p(n) ) ∈ L. (2) We can compute y = f (x0 , 2p(n) ) in time
p(n)
2q(n) . (3) We can decide whether y ∈ L in time 2p(|y|) ≤ 2p(m(2 )) ≤
2p(s(n)) = 2poly(n) , where the second inequality holds because of (12.3).
Since the uniform distribution U = (Un )n∈N is flat, we immediate get
the following result as a special case.
Corollary 12.3. There is no L ∈ NP such that (L, U) is DistNP-complete
unless NEXP = EXP.
76 12. Average case versus worst case

12.2 Collapse in exponential time


Our second result concerning connections between average-case and worst-
case complexity shows that it is unlikely that DistNP is a subset of AvgP. If
this is the case, then nondeterministic exponential time collapses to deter-
ministic exponential time.
To show this, we need the following two lemmas.
Lemma 12.4. E 6= NE if and only of there exists a unary language L ∈
NP \ P.
Proof. “=⇒”: Assume that NE 6= E, and let L0 ∈ NE \ E. Let L =
{1cod(x) | x ∈ L0 } be a unary language. Let us first show that L ∈ NP: Given
y = 1cod(x) , x can be computed in polynomial time. We have |x| = O(log |y|).
Since L0 ∈ NE, there exists a 2O(|y|) time bounded nondeterministic Turing
machine that decides L0 . On input x, this machine needs time 2O(log n) =
nO(1) .
Now let us prove that L ∈ / P. Assume to the contrary that L ∈ P.
Then, given any string x, we can compute y = 1cod(x) in time 2O(n) . By
definition, y ∈ L if and only if x ∈ L0 . Since L ∈ P, we can decide in time
|y|O(1) = 2O(|x|) if y ∈ L. This would imply L0 ∈ E – a contradiction.
“⇐=”: Assume that there exists a unary language L ⊆ {1}? in NP \ P.
Consider the language L0 = {bin(n) | 1n ∈ L}. We will show that L0 ∈ NE\E.
On input y, we can compute an x = 1n with bin(n) = y in time 2O(|y|) .
Then we can use the nondeterministic polynomial-time Turing machine that
witnesses L ∈ NP to decide x ∈ L in time nO(1) = 2O(|y|) . Thus, L0 ∈ NE.
Lastly, we have to show that L0 ∈ / E. Assume to the contrary that
0
L ∈ E. Then there is a deterministic 2O(n) time bounded Turing machine
that decides L0 . Now, on input x = 1n , we can compute y = bin(n) in
polynomial time. We have |y| = O(log n). Thus, y ∈ L0 can be decided in
time 2O(log n) = nO(1) . Since y ∈ L0 if and only if x ∈ L, this would imply
L ∈ P – again a contradiction.

Lemma 12.5. Let Q = (Qn )n∈N be given by Qn (1n ) = 1. Then, for every
unary language L ⊆ {1}? , we have L ∈ P if and only if (L, Q) ∈ AvgP.
Proof. Clearly, if L ∈ P, then (L, Q) ∈ AvgP. To see that the converse
also holds, consider any algorithm A that witnesses (L, Q) ∈ AvgP. Let t
be the running-time of A. Then we have, for some ε > 0, Ex∼Qn (tε (x, n)) =
O(n). Since supp(Qn ) = {1n }, this is equivalent to t(1n , n) = O(n1/ε ).
Thus, A runs in worst-case polynomial time.
With these two lemmas, the main theorem of this section can be proved
easily.
Theorem 12.6 (Ben-David, Chor, Goldreich, Luby [BDCGL92]). If E 6=
NE, then DistNP 6⊆ AvgP.
12.2. Collapse in exponential time 77

Proof. Let Q be the ensemble of Lemma 12.5. Obviously, D ∈ PComp.


Thus, L ∈ NP if and only if (L, D) ∈ DistNP.
We have E 6= NE if and only if there exists a unary language L ∈ NP \ P
by Lemma 12.4. This in turn holds if and only of (L, Q) ∈ DistNP \ AvgP
by Lemma 12.5.
13 Decision versus search

A search algorithm for an NP relation V is an algorithm that, on input


x, computes a witness w of length poly(|x|) such that hx, wi ∈ V . Recall
that the corresponding language L ∈ NP is L = {x | ∃P w : hx, wi ∈ V }.
By abusing notation, we will call A also a search algorithm for a language
L ∈ NP. This is ambiguous since L does not uniquely define a corresponding
NP relation.
Obviously, if we have an efficient search algorithm for a language L ∈ NP,
then we can use it to get an efficient decision algorithm for L. What about
the opposite? If L is NP-complete, then we can use an efficient decision
algorithm for L to efficiently compute witnesses (see the script of the lecture
“Computational Complexity Theory”). Thus, if P = NP, then every problem
in NP admits efficient search algorithms. So for NP as a whole, decision and
search are equally hard. Nevertheless, it is believed that in general, efficient
decision does not imply efficient search. For instance, one-way permutations,
if they exist, yield problems for which decision is easy but search is hard.

Exercise 13.1. A one-way permutation is a bijective function f : {0, 1}? →


{0, 1}? such that

• |f (x)| = |x| for every x ∈ {0, 1}? ,

• given x, f (x) can be computed in polynomial time, and

• the problem of finding an x with f (x) = y for a given y cannot be


solved in polynomial time. (Since f is bijective, we can equivalently
say that f −1 cannot be computed in polynomial time.)

Show that the existence of one-way permutations implies that there are prob-
lems for which search is harder than decision.

In this section, we consider the question of decision versus search in the


average-case setting: Assume that all DistNP problems admit efficient-on-
average decision algorithms, i.e., DistNP ⊆ AvgP. Do then all problems in
DistNP also have efficient-on-average search algorithms? We will give a par-
tial answer to this question: If all languages in NP with the uniform distri-
bution admit efficient-on-average randomized algorithms, then all languages
in NP with the uniform distribution admit efficient-on-average randomized
search algorithms.
We have not yet defined what an efficient-on-average randomized algo-
rithm is. (Here, the instances are drawn at random and, in addition, also the

78
13.1. Randomized decision algorithms 79

algorithm itself is allowed to use randomness to solve the instance.) Further-


more, we also do not know yet what an efficient-on-average (randomized)
search algorithm is. We will define all this in the next section and postpone
the main theorem of this section to Section 13.3.

13.1 Randomized decision algorithms


We first generalize AvgP, Avgδ P and so on to randomized algorithms.
Definition 13.1. Let Π = (L, D) be a distributional problem. An algo-
rithm A is a randomized errorless heuristic scheme for Π if A runs in time
polynomial in n and 1/δ for every δ > 0 and x ∈ supp(Dn ) and
 1
Pr A(x, n, δ) ∈
/ {L(x), ⊥} ≤ (13.1)
A 4
(the probability is taken over A’s coin tosses) and
 
 1
Pr Pr A(x, n, δ) = ⊥ ≥ ≤δ (13.2)
x∼Dn A 4
(the inner probability is again over A’s coin tosses, the outer probability over
the random instances).
AvgBPP is the class of all distributional problems that admit a random-
ized errorless heuristic scheme.
We stress that “errorless” refers to the random input, not to the internal
coin tosses of the algorithm.
Definition 13.1 probably need some explanation. Fix some input x ∈
supp(Dn ), consider running A(x, n) k times for some large k. If significantly
more than k/4 of these runs return ⊥, then we can interpret this as A
not knowing the answer for x. This follows from the second condition of
Definition 13.1. On the other hand, if ⊥ is returned fewer than k/4 times,
then the first condition guarantees that we will see the right answer at least
k/2 times (with high probability due to Chernoff’s bound). The choice of the
constant 1/4 in Definition 13.1 is arbitrary: Any constant strictly smaller
than 1/3 serves well.

Excursus: Chernoff bounds


Chernoff bounds are frequently used to bound large deviations from the expected
value of random variables that are sums of independent indicator√variables. The
rough statement is: If we toss n unbiased coins, we see n/2 ± O( n) heads with
high probability.
More precisely: Let X1 , . . . , Xn be independent randomPn variables that assume
only
Pn values in {0, 1}. Let Pr(X i = 1) = pi , let X = i=1 Xi , and let E(X) =
p
i=1 i = µ. Then
−2a2
 

Pr X > E(X) + a < exp
n
80 13. Decision versus search

for all a > 0. By symmetry, we have the same bound for Pr(X < E(X) − a).
There are many variants of Chernoff bounds. Sometimes, they lead to slightly
different bounds. For most applications, however, it does not matter which version
we use.

Exercise 13.2. Let A be a randomized errorless heuristic scheme. Let A0 be


the an algorithm that executes A k = k(n) times on inputs from supp(Dn )
and outputs the majority vote. Prove that

Pr0 A0 (x, n, δ) ∈
/ {L(x), ⊥} ≤ 2−Ω(k(n))

A

and  
Pr0 A0 (x, n, δ) = ⊥ ≥ 2−Ω(k(n))

Pr ≤δ.
x∼Dn A

As in Definition 10.2, it is also possible to define randomized errorless


heuristic algorithms or randomized heuristics that are allowed to make er-
rors, but we will not do so here. We can also replace the constant 1/4
in (13.1) by 0. Then we obtain zero-error randomized errorless heuristic
schemes.
Now what is the difference between errorless and zero-error? Note that
we have two types of “errors” or “failure”: We can be unlucky to get a hard
instance, and the algorithm, since randomized, may fail. Errorless means
that there is no instance on which the algorithm A errs. It is just allowed
to produce ⊥. However, if A is randomized it may still have bad luck with
its coin tosses, which may cause it to output a wrong answer. If this is not
the case, then A is called zero-error.
Exercise 13.3. We can also define a non-uniform variant of AvgP: A dis-
tributional problem Π = (L, D) is in AvgP/poly if there exists an algorithm
A and an advice function a : N×(0, 1] → {0, 1}? with |a(n, δ)| ≤ poly(n, 1/δ)
such that the following holds:
1. For every n, every δ > 0, and every x ∈ supp(Dn ), A(x, n, δ, a(n, δ))
outputs either L(x) or the failure symbol ⊥.

2. For every n, every δ > 0, and every x ∈ supp(Dn ), A(x, n, δ, a(n, δ))
runs in time p(n, 1/δ).

3. For every n and every δ > 0, we have Prx∼Dn (A(x, n, δ, a(n, δ)) =
⊥) ≤ δ.
(One might prefer to define AvgP/poly in terms of circuits rather than Tur-
ing machines that take advice. But we want to have one circuit for each n
and δ, and supp(Dn ) can contain strings of different lengths. This technical
problem can be solved, but why bother?)
Prove that AvgBPP ⊆ AvgP/poly.
13.2. Search algorithms 81

13.2 Search algorithms


Now we turn to the definition of search algorithms. In order to avoid confu-
sion, we will first define deterministic search algorithms that are efficient on
average, although we will never use them. After that, we allow our search
algorithms to use randomness.

Definition 13.2. Let Π = (L, D) be a distributional problem with L ∈ NP.


An algorithm A is a deterministic errorless search scheme for Π if there is
a polynomial p such that the following holds:

1. For every n, δ > 0, and every x ∈ supp(Dn ), A runs in time at most


p(n, 1/δ).

2. For every n, δ > 0, and every x ∈ L ∩ supp(Dn ), A(x, n, δ) outputs a


witness for x ∈ L or ⊥.

3. For every n and δ > 0, we have Prx∼Dn (A(x, n, δ) = ⊥) ≤ δ.

For x ∈ / L, the algorithm A(x, n, δ) can output anything. The above


definition is not completely precise since the witness language is not unique.
However, this really makes no difference hear.
In the next definition, we allow our search algorithm to use randomness.

Definition 13.3. Let Π = (L, D) be a distributional problem with L ∈ NP.


An algorithm A is a randomized errorless search scheme for Π if there is a
polynomial p such that the following is true:

1. For every n and δ > 0, A runs in time p(n, 1/δ) and outputs either a
string w or ⊥.

2. For every n, δ > 0, and x ∈ L ∩ supp(Dn ),


 1
Pr A(x, n, δ) outputs a witness for x or A(x, n, δ) = ⊥ > .
A 2

3. For every n and δ > 0,


 
 1
Pr Pr A(x, n, δ) = ⊥ > ≤δ.
x∼Dn A 4

What does this definition mean? For any x ∈ L, A may output a non-
witness w. According to item (2), this happens with bounded probability.
This is an internal failure of the algorithm A and not due to x being a hard
instance. Item (3) bounds the probability that A outputs ⊥. Intuitively, A
outputs ⊥ not because of internal failure, but because x is a hard instance.
82 13. Decision versus search

This is called an external failure. However, there is at most a δ fraction


of strings x (measured with respect to Dn ) on which A outputs ⊥ with
significant probability.
The constants 1/2 and 1/4 in the definition are to some extent arbitrary.
We can replace them by any constants c and c0 with 1 > c > c0 > 0.
Furthermore, these two failure probabilities can be decreased to 2−Ω(k) by
executing the algorithm k times: If we ever get a witness, we output this
witness. If we see ⊥ more than c0 k times, then we output ⊥. Otherwise, we
output an arbitrary string.
Definition 13.3 allows the algorithm A to output anything on input x ∈/L
(but even then ⊥ only with bounded probability). Thus, a randomized
errorless search scheme can be used as a randomized decision algorithm: If
we get a witness, then we know that x ∈ L. If we get neither a witness nor
⊥, then we claim that x ∈ / L. If the answer is ⊥, then we do not know. By
amplifying probabilities, we can make sure that the probability of outputting
x∈ / L though there exists a witness for x is small.

13.3 Search-to-decision reduction


Recall that U = (Un )n∈N is what we call the uniform distribution on {0, 1}? .
Namely, Un (x) = 2−n for |x| = n and Un (x) = 0 otherwise.
In the following, we will reduce search to decision in the average-case
setting. Let us first see why the usual approach from worst-case complexity
does not work. Let L = {x | ∃P w : hx, wi ∈ V } ∈ NP with being V the
corresponding witness language. Then, given x and y, deciding if there exists
a witness w that is lexicographically smaller than y is an NP language as
well. Let 
W = hx, yi | ∃w : w ≤ y ∧ hx, wi ∈ V ,
where w ≤ y means “lexicographically smaller”. Assuming that decision is
easy, namely P = NP, we can use binary search to find a witness w with
hx, wi ∈ L.
What about the average-case? Let wx be the lexicographically smallest
witness for x. Suppose our efficient-on-average algorithm for W works well
on all instances hx, yi except for those y that are close to wx . Then our
algorithm is able to find the most significant bits of wx , but it fails to find a
few least significant bits of wx . Since most strings are not close to wx , our
algorithm can still be efficient on average.
Our goal in the remainder of this section is still to prove that search-to-
decision reduction is possible in the average-case setting, despite what we
sketched above.
To do this, let us first consider the scenario where every x ∈ L has a
unique witness wx . Then we can ask NP questions like “is the ith bit of
the witness for x a 1?” Let p be the (polynomial) length of witnesses. By
13.3. Search-to-decision reduction 83

querying the above for i ∈ {1, 2, . . . , p(|x|)}, we can find the witness. In the
following, let |x| = n and p = p(n).
Of course, we cannot assume in general that witnesses are unique. But
we know tools to make witnesses unique (recall the Valiant–Vazirani theorem
from the lecture “Computational Complexity Theory” [VV86]). We use a
family H of pairwise independent hash functions h : {0, 1}p → {0, 1}p : H
consists of all functions x 7→ Ax + b, where A ∈ {0, 1}p×p and b ∈ {0, 1}p .
By restricting h ∈ H to the first j bits, we obtain h|j . Also {h|j | h ∈ H} is
a family of pairwise independent hash functions.
Now we consider the language

W 0 = hx, h, i, ji | ∃w : hx, wi ∈ V ∧ wi = 1 ∧ h|j (w) = 0j .




We build the quadruple hx, h, i, ji such that, for |x| = n, it always has length
q = q(n) for some polynomial q. Furthermore, we make sure that x, h, i
and j are independent. This means that we can equivalently draw x{0, 1}n ,
h ∈ H as well as i, j ∈ {1, . . . , p} uniformly and independently at random
and pair them to hx, h, i, ji. In this way, we get the same distribution. This
can be done since the lengths of x and h is fixed once n is known. (We can,
for instance, assume that p be a power of 2. Then we can write i and j as
binary string of length log p, possibly with leading 0s.)
It can be shown (see again the script of the lecture “Computational
Complexity Theory”) that if j is the logarithm of the number of witnesses
for x, then, with at least constant, positive probability (taken over the choice
of h ∈ H), there is a unique witness w for x that also satisfies h|j (w) = 0.
Now we proceed as follows:

1. Draw h ∈ H uniformly at random.

2. If, for some j ∈ {1, . . . , p}, the sequence of answers to the queries
hx, h, 1, ji, . . . , hx, h, p, ji yields a witness w for x ∈ L, then we output
this witness. (Note that hx, wi ∈ V can be checked in polynomial
time.)

3. If an answer to some hx, h, i, ji is ⊥, then we also output ⊥. Otherwise,


we output an arbitrary string.

We call this algorithm B. Apart from technical details, which we will


prove below, this proves the following theorem. The essential subtlety is
that h is part of the (random) input for W 0 , where h is part of the internal
randomness of B for L. This means that h appears in the outer probabil-
ity in (13.2) of Definition 13.1 and in the inner probability of item (3) of
Definition 13.3.

Theorem 13.4 (Ben-David et al. [BDCGL92]). If (NP, U) ⊆ AvgBPP, then


every problem in (NP, U) has an errorless randomized search algorithm.
84 13. Decision versus search

Proof. We have already done the lion’s share of the work. It remains to
estimate the failure probabilities. Let L ∈ NP be arbitrary such that wit-
nesses for L have length p for some polynomial p, and let A be a randomized
errorless heuristic scheme for (W 0 , U) with

W 0 = hx, h, i, ji | ∃w : hx, wi ∈ V ∧ wi = 1 ∧ h|j (w) = 0j .




Let x ∈ L ∩ supp(Un ) = L ∩ {0, 1}n , and let δ > 0 be arbitrary. Our


search algorithm B proceeds as described above. We call A with a failure
probability of α to achieve a failure probability of δ for B. Furthermore, we
amplify item (2) of Definition 13.1 to

/ {W 0 (y), ⊥} ≤ β

Pr A(y, q, α) ∈ (13.3)
A

for every y = hx, h, i, ji and item (3) to


 

Pr Pr A(y, q, α) = ⊥ ≥ γ ≤ α .
y=hx,h,i,ji∼Uq A

We will specify α, β, and γ later on.


The algorithm described above, which we will call B, runs obviously in
polynomial time. The failure probabilities remain to be analyzed: We have
to find constants c0 and c with 0 < c0 < c < 1 such that

Pr B(x, n, δ) yields a witness or ⊥ ≥ c (13.4)
B

for each x ∈ L ∩ {0, 1}n and


 
Pr B(x, n, δ) = ⊥ > c0 ≤ δ

Pr (13.5)
x∼Un B

for every n and δ > 0.


To show (13.4), consider any x ∈ L. The probability that we draw a
hash function h with the property that there exists a j such that x possesses
a unique witness w with h|j (w) = 0 is at least 1/8 (script of the lecture
“Computational Complexity Theory”, Lemma 17.3). We call such an h good
for x. Fix an arbitrary good h. If B draws this h, then a sufficient condition
for B output a witness or ⊥ is that A never outputs a wrong answer. The
probability that A outputs a wrong answer (i.e., neither correct nor ⊥) is at
most p2 β by a union bound over all i and j. Thus, the probability that

1. B samples an h that is good for x and

2. A never gives a wrong answer


13.3. Search-to-decision reduction 85

is at least c = 18 · (1 − p2 β). We choose β = 5p12 , which yields c = 1/10.


Before specifying the parameters α and γ, let us also analyze (13.5). To
do this, let
   
Z = x ∈ {0, 1}n | Pr ∃i, j : Pr A(hx, h, i, ji, q, α) = ⊥ ≥ γ ≥ φ

h A

be the set of bad strings. Let us analyze the probability Prx∼Un (x ∈ Z) that
a random x is bad. We have
 

Pr ∃i, j : Pr A(hx, h, i, ji, q, α) = ⊥ ≥ γ ≥ φ · Pr (x ∈ Z) .
x,h A x∼Un

Thus,
 
 φ · Prx∼Un (x ∈ Z)
Pr Pr A(y, q, α) = ⊥ ≥ γ ≥ .
y=hx,h,i,ji A p2
| {z }
≤α by (13.3)

αp2
From this, we learn Prx∼Un (x ∈ Z) ≤ φ . We want

Pr (x ∈ Z) ≤ δ , (13.6)
x∼Un

αp2
thus we put the constraint φ ≤ δ. For x ∈
/ Z, we have

Pr B(x, n, δ) = ⊥ ≤ φ + (1 − φ)p2 γ .

B

1
Now we choose φ = 1/40 and γ = 40p2
. This also specifies α to α = δφ/p2 ,
αp2
which satisfies our constraint φ ≤ δ This specification of φ and γ yields

 1
Pr B(x, n, δ) = ⊥ ≤ . (13.7)
B 20
/ Z. We set c0 = 1/20 < c. Then item (3) of Definition 13.3 follows
for x ∈
from (13.6) and (13.7).
14 Hardness amplification

Assume that we have a function f such that f is hard-on-average in a weak


sense. This means that every algorithm has a non-negligible chance of mak-
ing a mistake when evaluating f on a random input. But there still might be
algorithms that get a huge portion (for instance, a 1 − 1/ poly(n) fraction)
of the inputs right. The goal of hardness amplification is the following: If
there is such a function f , then we can get a related problem g from f such
that g is hard-on-average in the strongest possible sense: No algorithm can
do significantly better than simply tossing a fair coin. To put it the other
way round: If, for some class of functions, we can compute every function
in this class with a non-trivial error probability (i.e., significantly less than
1/2). Then we can amplify this to get the error probability very small (i.e.,
1 − 1/ poly(n)). (Note that this does not work by simple Chernoff bounds:
it is the hardness of the instance that causes the algorithm to fail, not bad
luck with its random bits. In fact, our algorithms in this section will always
be deterministic.)
Yao’s XOR lemma is a powerful tool for hardness amplification. The idea
is is simple: If f is slightly hard on average, then g, given by g(x1 , . . . , xk ) =
f (x1 )⊕. . .⊕f (xk ), is very hard on random x1 , . . . , xk . The intuitive reason is
as follows: Although the probability that a specific x is hard, the probability
that at least one of x1 , . . . , xk is hard is much higher. However, intuition
says that we need all f (x1 ), . . . , f (xk ) to compute g(x1 , . . . , xk ) correctly.
Exercise 14.1. Let X1 , . . . , Xn ∈ {0, 1}n be independent random variables
with Pr(Xi = 1) = p. Prove that
n
!
X 1 + (1 − 2p)n
Pr Xi is even = .
2
i=1

For simplicity, we restrict ourselves to (non-uniform) circuits in the sec-


tion. In the next section, we will state hardness amplification results for NP
languages.
Definition 14.1. We say that a Boolean function f : {0, 1}n → {0, 1} is
(s, δ)-hard with respect to a distribution D if, for every circuit C of size at
most s, we have 
Pr f (x) 6= C(x) > δ .
x∼D
What does this mean? For every circuit C of size at most s, there exists
a set H of size 2δ2n such that using C to compute f (x) for x ∈ H is about
as good as tossing a fair coin.

86
14.1. Impagliazzo’s hard-core set lemma 87

We also need the advantage of a circuit has in computing a certain


function.

Definition 14.2. Let f be a function, C be a circuit, and D be a distribution


on inputs for f and C. If
 1
P rx∼D f (x) = C(x) = (1 + ε) ,
2
then we say that C has an advantage of ε with respect to D. (By definition,
the advantage ε is a number in the interval [0, 1].)

We will prove Yao’s XOR lemma in Section 14.2. There are several
different proofs of this lemma. An elegant (and quite intuitive) proof is
via Impagliazzo’s hard-core set lemma (Section 14.1). This lemma is also
interesting in its own right and a somewhat surprising result. There are at
least two different proofs of this lemma. An elegant (and quite intuitive)
proof is via von Neumann’s min-max theorem. Impagliazzo attributes this
proof to Nisan.

14.1 Impagliazzo’s hard-core set lemma


Note the quantifiers in Definition 14.1: For all circuits C, a hard set H
exists. Impagliazzo’s hard-core set lemma states that we can switch the
quantifiers: There exists a set H such that for all C computing f on H is
hard. We will prove the hard-core set lemma in two steps: First, we show
that there is a probability distribution over {0, 1}n such that f is hard with
respect to this probability distribution. Second, we show how to get a set
from this distribution.

Lemma 14.3. Let f : {0, 1}n → {0, 1} be an (s, δ)-hard function with
respect to the uniform distribution on {0, 1}n , and let ε > 0. Then there is
a probability distribution D on {0, 1}n with the following properties:
−ε2
, 21 − ε -hard with respect to D.

1. f is s · 8·log(εδ)

2. D(x) ≤ 1
δ · 2−n for all x ∈ {0, 1}n .

Proof overview: We have to switch quantifiers: We have “for all circuits,


there exists a hard set”, and we want “there exists a hard set such that
for all circuits”. We model this by a zero-sum game (see excursus below):
One player’s strategies are circuits C, the other player’s strategies are sets
H. The amount that the first player (who plays C) gets from the second
player (who plays H) is proportional to the number of inputs of H that C
gets right. Then we can use von Neumann’s min-max theorem to switch
quantifiers.
88 14. Hardness amplification

Proof. Consider the following two-player game: Player D picks a set T of


−ε2
δ2n strings from {0, 1}n . Player C picks a circuit of size s0 = s · 8·log(εδ) . The
payoff for C is Prx∼{0,1} (f (x) = C(x)). This is a zero-sum game. and we
can apply the min-max theorem: Either the player D has a mixed strategy
so that there is no mixed strategy for player C for which C achieves a payoff
of at least 12 + ε. Or there is a mixed strategy for player C with which C
gets a payoff of at least 12 + ε for any (mixed) strategy of player D.
Consider the first case. This means the following: There exists a dis-
tribution D0 on sets of size δ2n such that every circuit C of size at most s,
which corresponds to the pure strategies of player C, achieves only
 1
E Pr f (x) = C(x) ≤ +ε.
T ∼D0 x∼T 2

This is the same as the probability that f (x) = C(x) if x is drawn according
to the following distribution D: First, draw T ∼ D0 . Second, draw x ∈ T
uniformly at random. This probability distribution D is as stated in the
lemma: For each x0 ∈ {0, 1}n , we have Prx∼D (x = x0 ) = PrT ∼D0 (x0 ∈
T ) · 1δ 2−n ≤ 1δ · 2−n . Thus, f is (s, 12 − ε)-hard with respect to D. The lemma
follows for this case since s0 ≤ s.
Now consider the second case. There exists a probability distribution C
on circuits of size s0 such that, for every subset T ⊆ {0, 1}n of cardinality
δ2n , we have
1
Pr (C(x) = f (x)) ≥ + ε ,
x∼T,C∼C 2
which corresponds to an average advantage of 2ε.
Let U be the set of inputs x for which the distribution C on circuits
achieves an advantage of at most ε in computing f . (“Advantage” is gener-
alized to distributions over circuits in the obvious way.)

Claim 14.4. |U | ≤ δ(1 − ε)2n .

Proof of Claim 14.4. Assume to the contrary that |U | > δ(1 − ε)2n . If
|U | ≥ δ2n , then U would give rise to a strategy of player D to keep the
payoff to at most 21 (1 + ε), which contradicts the assumption.
Otherwise, consider any set T 0 ⊇ U of cardinality δ2n for which C
achieves the smallest advantage. Since |U | > δ(1 − ε)2n , we have |T 0 \ U | <
εδ2n . Then the advantage of C on T 0 would be smaller than
1 1
· |T 0 ∩ U | · ε + |T 0 \ U | < n · (εδ2n + εδ2n ) = 2ε .

δ2n δ2
This contradicts the average advantage of at least 2ε on this set T 0 (which
was the assumption of the second case).
14.1. Impagliazzo’s hard-core set lemma 89

Now we construct a circuit C̃ of size s that gets f (x) right for more than
a 1 − δ fraction of the inputs, which contradicts the assumption that f is
(s, δ)-hard. The idea is as follows: On U , we might have only little chance
to get the right answer. Thus, we ignore inputs from U . For inputs from
{0, 1}n \ U , however, we have a non-trivial chance of computing the right
answer if we sample circuits according to the distribution C, which is an
optimal strategy for player C. Then we amplify probabilities by sampling
more than one circuit and taking the majority outcome.
More precisely, let us draw t independent random circuits according
to the distribution C. Our new circuit C̃ outputs the majority output
of these t circuits. Fix any x ∈ / U . We can bound the probability that
C̃ gets f (x) wrong using Chernoff bounds: C̃ errs only if at most t/2 of
its subcircuits give a wrong output. The expected number of correct out-
puts is at least 12 (1 + ε). Thus, Chernoff bounds yields an upper bound of
exp(−2(tε/2)2 /t) = exp(−tε2 /2). We set t = −4·log(εδ) ε2
≥ −2·log(εδ/2)
ε2
. This
gives a probabilistic construction of C̃ that errs on only a εδ/2 fraction of
the inputs not from U . (Note that we do not need to construct C̃ explicitly.
Its existence suffices.) Since |U | ≤ δ(1 − ε)2n , C̃ errs with a probability of
at most δ(1 − ε) + εδ/2 ≤ δ for random x ∈ {0, 1}n .
Since C̃ consists of t circuits of size s0 , its size is at most 2ts0 = s. This
contradicts the assumption that f is (s, δ)-hard.

Excursus: Min-max theorem


A zero-sum game is a game between two players such that the loss of one player
is the gain of the other. A zero-sum game can be modeled by a matrix A =
(ai,j )1≤i≤m,1≤j≤n ∈ Rm×n . The game consists of one player, called the maximizer,
choosing an i ∈ {1, . . . , m} and the other player, called minimizer, choosing a
j ∈ {1, . . . , n}. Then the minimizer has to pay ai,j to the maximizer. (If ai,j < 0,
then the maximizer has to pay −ai,j to the minimizer.) The set {1, . . . , m} is the set
of pure strategies of the maximizer. The set {1, . . . , n} is the set of pure strategies
of the minimizer.
The order in which the players choose matters, as can be seen easily from the
simple game with m = n = 2 and ai,j = (−1)i+j .
However, if we allow the players to use randomized strategies (so-called mixed
strategies), then the order of play does not matter. This is what the min-max
theorem by von Neumann [vN28] says.
More precisely: A mixed strategy is a probability distribution over Pn the pure
strategies of a player. In our case, it is simply a vector p ∈ [0, 1]n with j=1 pj = 1
Pm
for the minimizer and a vector q ∈ [0, 1]m with i=1 qi = 1 for the maximizer. The
outcome of the game is then q T Ap. Then the min-max theorem says

min n
max q T Ap = max min q T Ap .
Ppn∈ [0, 1] q ∈ [0, 1]m q ∈ [0, 1]m Pp ∈ [0, 1]n
P m P m n
j=1 pj = 1 i=1 qi = 1 i=1 qi = 1 j=1 pj = 1

The number minp maxq q T Ap = maxq minp q T Ap is called the value of the game.
90 14. Hardness amplification

The goal of the next lemma is to get a hard-core set from the hard-core
distribution just constructed.
Lemma 14.5. Let D : {0, 1}n → [0, 1] be a probability distribution such that
D(x) ≤ 1δ 2−n for all x ∈ {0, 1}n . Let f : {0, 1}n → {0, 1} be a function such
that f is (s, 12 − 2ε ) with respect to D for 2n < s < 16n
1 n
2 (εδ)2
Then there exists a set H ⊆ {0, 1}n such that f is (s, 21 − ε)-hard with
respect to the uniform distribution on H.

Proof overview: We use the probabilistic method: We draw a set according


to the hard-core distribution. Then we take a union bound over all possible
circuits to bound the probability that there exists a circuit that achieves
a significant advantage on this (random) set. Since this probability will
be bounded away from 1, there exists a set such that no circuit achieves a
significant advantage on this set. This will be our hard-core set.

Proof. The construction of our set H will again be probabilistic. Let


H be the random set obtained by placing x into H with a probability of
δ2n D(x). The expected number of elements in H is δ2n . With non-zero
probability, this set H will have the desired property.
The number of circuits of size s is upper-bounded by
 n 2 2
2s 2ns 1 2 ε δ
2(2n + s) ≤2 < exp .
4 8
Let C be any circuit of size s. Let

AC (H) = {x ∈ H | f (x) = C(x)}

be the number of strings x that C gets right. We have


 
1 ε
· δ2n

E AC (H) ≤ +
H 2 2

by the assumption that Prx∼D (C(x) = f (x)) ≤ 12 + 2ε for every C.


The random variable AC (H) consists of 2n independent indicator random
variables, one for each string x. This brings Chernoff bounds into play:
     
1 3ε n
 εδ n
Pr AC (H) ≥ + · δ2 ≤ Pr AC (H) ≥ E AC (H) + 2
H 2 4 H 4
εδ n 2
 !
2 2
< exp − 4 n
2
 2 2 n
ε δ 2
= exp − .
8
14.1. Impagliazzo’s hard-core set lemma 91

Furthermore, also by Chernoff bounds, H is unlikely to be small:


 2 2 n
  ε  ε δ 2
Pr |H| < δ2n · 1 − < exp − .
H 4 8
Now we take a union bound over all circuits of size s: The probability that,
for a random set H, there exists a circuit C with AC (H) ≥ ( 21 + 3ε n
4 ) · δ2 or
ε
|H| < δ2n · (1 − 4 ) is bounded by
   
1 3ε n n
 ε
Pr ∃C : AC (H) ≥ + · δ2 ∨ |H| < δ2 · 1 −
H 2 4 4
 2 2 n  2 2 n
1 ε δ 2 ε δ 2 1
≤ · exp · 2 · exp − = .
4 8 8 2
We can conclude that a set H with the desired properties exists with the
following properties:

• |H| ≥ δ2n · (1 − 4ε ).

• There is no circuit C of size s that gets more than ( 21 + 3ε n


4 ) · δ2 strings
from x right.

If |H| = δ2n , then we are done. If |H| < δ2n , we add δ2n − |H| arbitrary
elements to H and call the new set again H. No circuit gets more than
( 21 + ε) · δ2n strings of H right. If |H| > δ2n , then we remove |H| − δ2n
elements from H and again call the new set H. Since we only remove
elements, no circuits gets more strings of the new set right that it got for
the old set. Thus, this set also meets our requirements.
From the two lemmas above, the main result of this section follows easily.

Theorem 14.6 (Impagliazzo [Imp95]). Let f : {0, 1}n → {0, 1} be a func-


tion that is (s, δ)-hard with respect to the uniform distribution. Then, for
every ε > 0, there exists a set H ⊆ {0, 1}n (called hard-core set for f ) of
−ε2
cardinality at least δ2n such that f is s · 64 log(εδ) , 12 − ε -hard with respect


to the uniform distribution over H.

Proof. By Lemma 14.3, there exists a distribution D such that f is


−ε2 1 ε

s · 64·log(εδ) , 2 − 2 -hard with respect to D and D’s density is bounded by
1 −n
δ2 . (Note that we use ε/2 instead of ε, which yields the worse constant.)
Now Lemma 14.5 shows n
−ε2 1
 the existence of a set H of cardinality δ2 such that
f is s · 64·log(εδ) , 2 − ε -hard with respect to the uniform distribution on H.

Exercise 14.2. It is, in fact, possible to show an even stronger statement:


Assume that f : {0, 1}n → {0, 1} is (s, δ)-hard, and let η > 0 be an arbitrary
constant. Then there exists a set H ⊆ {0, 1}n of cardinality at least (2 −
92 14. Hardness amplification

η)δ2n such that f is (s · poly(ε, δ, η), 12 − ε)-hard with respect to the uniform
distribution.
Prove this!
Hint: Modify Lemma 14.3, then the rest follows.
This is close to optimal: Assume that there is a circuit C that errs only
with a probability of δ, and consider any set H of cardinality significantly
larger than 2δ2n . Then the probability that C errs on a random input is
significantly smaller than 1/2.

14.2 Yao’s XOR lemma


The XOR lemma is attributed to Yao [Yao82]. The version that we present
here is due to Impagliazzo [Imp95].
Theorem 14.7 (Yao’s XOR lemma). Let f : {0, 1}n → {0, 1} be (s, δ)-hard
with respect to the uniform distribution. Let k ≥ 1, and let g : {0, 1}kn →
{0, 1} be given by
g(x1 , . . . , xk ) = f (x1 ) ⊕ f (x2 ) ⊕ . . . ⊕ f (xk ) .
2
Then, for every ε > 0, the function g is s · 100−ε 1 k

log(εδ) , 2 − ε − (1 − δ) -hard
with respect to the uniform distribution.

Proof overview: Let H be a hard-core set of f as in Theorem 14.6. The


probability that one specific xi is in H is δ. We ignore ε for the moment. If
xi ∈ H, then computing g(x1 , . . . , xk ) correctly stay about the same if we
replace f (xi ) by a random bit b and compute
f (x1 ) ⊕ . . . ⊕ f (xi−1 ) ⊕ b ⊕ f (xi+1 ) ⊕ . . . ⊕ f (xk ) .
A random bit xor-ed with something is still a random bit. Thus, we get the
right answer in this case only with probability 1/2.
Our only hope is that none of x1 , . . . , xk is in H. This happens with a
probability of (1−δ)k . Thus, we compute g correctly only with a probability
of 21 + (1 − δ)k .

The problem with this proof idea is that a circuit for computing g does
not necessarily proceeds by first computing f (x1 ), . . . , f (xk ). It is allowed
to do anything. Nevertheless this idea can be turned into a proof of the
XOR lemma.
Proof. Let H be a hard-core set for f of cardinality at least δ2n as in
Theorem 14.6. Assume to the contrary that there exists a circuit C of size
2
s0 = s · 100−ε
log(εδ) such that

 1
Pr C(x1 , . . . , xk ) = g(x1 , . . . , xk ) > + (1 − δ)k + ε . (14.1)
x1 ,...,xk 2
14.2. Yao’s XOR lemma 93

Let D be the uniform distribution over (x1 , . . . , xk ) with xi ∈ {0, 1}n and
conditioned on at least one xi being in the hard-core set H. This yields

Pr C(x1 , . . . , xk ) = g(x1 , . . . , xk ) (14.2)
(x1 ,...,xk )∼D
 
≥ Pr C(x1 , . . . , xk ) = g(x1 , . . . , xk ) − Pr ∃i : xi ∈ H
x1 ,...,xk x1 ,...,xk
1
> +ε.
2
Let us take a different view on the distribution D: First, we pick a non-
empty set T ⊆ {1, . . . , k} with an appropriate distribution. Then, we choose
xi ∈ H for i ∈ T uniformly at random and xi ∈ {0, 1}n \ H for i ∈ / T . Let
the latter distribution be DT . Thus, we can rewrite (14.2) as
 1
E Pr C(x1 , . . . , xk ) = g(x1 , . . . , xk ) > + ε .
T (x1 ,...,xk )∼DT 2
Fix a set T that maximizes the inner probability. Without loss of generality,
we assume that 1 ∈ T . Then we can further rewrite the probability as
 1
E Pr C(x1 , . . . , xk ) = g(x1 , . . . , xk ) > + ε ,
x2 ,...,xk x1 ∼H 2
where, by abusing notation, x1 ∼ H means that x1 is drawn uniformly
at random from H. Now let aj for j > 1 be the assignment for xj that
maximizes the above probability. This yields
 1
Pr C(x1 , a2 , . . . , ak ) ⊕ f (a2 ) ⊕ . . . ⊕ f (ak ) = f (x1 ) > + ε ,
x1 ∼H 2
where we have rearranged terms to isolate f (x1 ). To get a circuit for f
from C, we replace x2 , . . . , xk by the constants a2 , . . . , ak . Then we observe
that f (a2 ) ⊕ . . . ⊕ f (ak ) is a constant. Thus, we possibly have to negate the
output of C. This increases the size by at most 1. Thus, we have a circuit
−ε2
C 0 of size s0 + 1 ≤ s · 64 log(εδ) for f that has success probability greater than
1
2 + ε on the hard-core set H for f . This would contradict Theorem 14.6.
Using Exercise 14.2, we can even improve the hardness almost to 21 − ε −
(1 − 2δ)k .
Analogously to Exercise 13.3, we can generalize Heurδ P to non-uniform
circuits of polynomial size in a straight-forward way. In this way, we obtain
Heurδ P/poly.
Corollary 14.8. Let C be a class of Boolean functions with the follow-
Lk property: If f = (fn )n∈N ∈ C, then also g ∈ C with g(x1 , . . . , xk ) =
ing
i=1 f (xi ), where k can be a function of |xi |.
Suppose there exists a family of functions f = (fn )n∈N ∈ C such that
f ∈
/ Heur1/p(n) P/poly. Then, for every constant c > 0, there exists a family
g of functions such that g ∈ C and g ∈ / Heur 1 −n−c P/poly.
2
94 14. Hardness amplification

Exercise 14.3. Prove Corollary 14.8.

We can phrase the XOR lemma and Corollary 14.8 also the other way
round: Assume that a class C of functions is closed under ⊗, and suppose
that that every function in C can be computed with a success probability of
at least 21 + ε for some not too small ε > 0. (Say, ε = 1/ poly(n).) Then we
can reduce the error probability to 1/ poly(n). (If this were not the case,
then we would be able to amplify the hardness of 1/ poly(n) to 21 − ε – a
contradiction.) So if we are able to compute a function with a non-trivial
advantage, then we can bring the advantage close to 1. This is closely related
to boosting, which is a concept in computational learning theory. Klivans
and Servedio [KS03] explain the connections between boosting and hard-core
sets.
15 Amplification within NP

Our goal in this section is to show a statement of the form “if there is a
language in NP that is mildly hard on average, then there is a language in
NP that is very hard on average.” Unfortunately, the XOR lemma does not
yield such a result: If L ∈ NP, then it is unclear if computing L(x) ⊕ L(y) is
also in NP. For instance, if L is co-NP-complete, then L(x) ⊕ L(y) can only
be computed in NP if NP = co-NP.
We circumvent this problem by replacing parity by a monotone function
g : {0, 1}k → {0, 1}. If L ∈ NP, then computing g(L(x1 ), . . . , L(xk )) from
x1 , . . . , xk is also in NP.

Exercise 15.1. Prove the above statement. More precisely, prove the fol-
lowing stronger statement:
Assume that NP 6= co-NP. Prove that the following two statements are
equivalent for any function g : {0, 1}k → {0, 1}:

1. For all L ∈ NP, also {(x1 , . . . , xk ) | g(L(x1 ), . . . , L(xk )) = 1} ∈ NP.

2. g is monotonically increasing. This means that for any y, z ∈ {0, 1}


with y ≤ z (component-wise), we have g(y) ≤ g(z).

15.1 The main idea


For the results of this chapter, which mainly due to O’Donnell [O’D04], we
need some preparation. Let f : {0, 1}n → {0, 1}, and let g : {0, 1}k →
{0, 1}. Then g ⊗ f : ({0, 1}n )k → {0, 1} denotes the function given by
(g⊗f )(x1 , . . . , xk ) = g(f (x1 ), . . . , f (xk )). Our goal is to analyze the hardness
of g ⊗ f in terms of properties of g and the hardness of f . The property
of g that we need is the bias of g or, more precisely, the expected bias of g
subject to a random restriction.

Definition 15.1. The bias of a Boolean function h is



bias(h) = max Pr(h(x) = 0), Pr(h(x) = 1) ∈ [1/2, 1] .
x x

The function h is called balanced if bias(h) = 1/2.

In fact, not the bias of g is the parameter that plays a role, but the
expected bias of g with respect to a random restriction.

95
96 15. Amplification within NP

Definition 15.2. A restriction ρ of a function h : {0, 1}m → {0, 1} is a


mapping ρ : {1, 2, . . . , m} → {0, 1, ?}. Then hρ denotes the subfunction of h
obtained by substituting each coordinate i with ρ(i) ∈ {0, 1} by ρ(i).
For a δ ∈ [0, 1], we denote be Pδm the probability space over all restric-
tions, where a restriction ρ is drawn according to the following rules:
• ρ(1), . . . , ρ(m) are independent.

• Pr(ρ(i) = ?) = δ.
1−δ
• Pr(ρ(i) = 0) = Pr(ρ(i) = 1) = 2 .

If ρ ∼ Pδm , then ρ is called a random restriction with parameter δ.


The expected bias of h at δ is

EBiasδ (h) = E m bias(hρ ) .
ρ∼Pδ

Exercise 15.2. Let paritym be the parity function of m bits. Compute


EBiasδ (paritym ).
Give estimates for EBiasδ (andm ), where andm is the AND of m bits.
The main result of this chapter is the following theorem, which is a
generalized version of the XOR lemma. It states the hardness of g ⊗ f
in terms of the hardness of f and the expected bias of g. The technical
restriction is that we require that the function f be balanced.
Theorem 15.3. Let f : {0, 1}n → {0, 1} be an (s, δ)-hard balanced function,
and let g : {0, 1}k → {0, 1} be arbitrary. Then, for every η > 0, the function
ε2
g ⊗ f is (s0 , 1 − EBias(2−η)δ (g) − ε)-hard, where s0 = Ω s · log(1/δ)k

.
We will not give a full proof of the result. Rather, we will give an
intuition why it should be true. In the next sections, we will discuss which
functions g are suitable to amplify hardness.

Proof overview: Suppose that x1 , . . . , xk are drawn at random. Our


task is to compute (g ⊗ f )(x1 , . . . , xk ), where f is both balanced and δ-
hard. We model the hardness of f by computing g(y1 , . . . , yk ) with imperfect
information about y1 , . . . , yk . This means that yi = f (xi ), but the hardness
of f obscures the true values and we see only corrupted values z1 , . . . , zk .
Since f is balanced, Pr(yi = 1) = Pr(yi = 0) = 1/2, i.e., the values y1 , . . . , yk
are drawn uniformly and independently at random. Since f is δ-hard, we
have Pr(zi = yi ) ≤ 1 − δ.
We abstract away f and just use the δ-hardness of f . We model this by
setting drawing zi according to Pr(zi = yi ) = 1 − δ and Pr(zi 6= yi ) = δ.
Now we might take simply output g(z1 , . . . , zk ). Then we would compute
the correct value g(y1 , . . . , yk ) with a probability of NStabδ (g). (The noise
stability, denoted by NStab, is defined below.) More sophisticated, we might
15.2. Noise stability and expected bias 97

compute g(z 0 ) for different z 0 close to z = (z1 , . . . , zk ) and output a maximum


likelihood answer.
However, in the true setting involving f , we might not only have zi = yi ,
but we may also know that zi is correct. Taking this to its extreme, we get
the following scenario: With a probability of 1 − 2δ, we have zi = yi , and
we know for sure that zi = yi . With a probability of 2δ, zi is a random bit,
and we know that zi is a corrupted bit. (But, of course, we do not know if
zi = yi or zi 6= yi . Question: Why 2δ, where f is only δ-hard?)
So what can we do now? We take the values zi for which we are certain
that zi = yi for granted, and we replace the corrupted values by ?. In
this way, we obtain a restriction ρ. Then we compute Pra (gρ (a) = 0) and
Pra (gρ (a) = 1). If the first is larger, then we output 0, otherwise, we output
1. The error probability is thus 1 − bias(gρ ).
To compute the overall probability that this strategy succeeds, we must
take into account that ρ is in fact a random restriction, drawn from Pk2δ .
Thus, the probability of outputting the correct answer is nothing else but
EBias(g). The other way round, this looks as if g ⊗ f be EBias2δ (g)-hard.

The idea sketched above can be turned into a proof. It proceeds as


follows: First, one can show that f does not only possess a hard-core set,
but a balanced hard-core set H. This is not surprising: If a hard-core set
H is not balanced, then either always outputting 1 or always outputting 0
gives a non-trivial advantage. Then we transfer the idea using arguments
similar to those of the proof of Theorem 14.7.

Exercise 15.3. Use Theorem 15.3 and Exercise 15.2 to derive a weaker
form of the XOR lemma (Theorem 14.7), which holds only for balanced
functions.

15.2 Noise stability and expected bias


The expected bias is closely related to another measure for Boolean func-
tions, called noise stability. Lemma 15.5 states this connection precisely.
We use noise stability since it is sometimes easier to compute, although the
expected bias is the “right” parameter for Theorem 15.3.

Definition 15.4. The noise stability of a Boolean function h : {0, 1}m →


{0, 1} is defined as

NStabδ (h) = Pr f (x) = f (y) ,
x∼{0,1}m ,y∼Nδ (x)

where x is drawn uniformly at random and y is obtained from x by flipping


each bit of x independently with a probability of δ.
98 15. Amplification within NP

The quantity

NSensδ (h) = 1 − NStabδ (h) = Pr f (x) 6= f (y)
x∼{0,1}m ,y∼N δ (x)

is called the noise sensitivity of h.


Exercise 15.4. Compute NStabδ (paritym ) and NStabδ (andm ).
Depending on the context, either noise stability or noise sensitivity will
be more convenient to use or to analyze.
In the following, let x? = 2x − 1 ∈ [0, 1] for any quantity x ∈ [1/2, 1].
Lemma 15.5. For any Boolean function h : {0, 1}m → {0, 1}, we have
p
NStab?δ (h) ≤ EBias?2δ (h) ≤ NStab?δ (h) .
Proof. We exploit the following fact.
Exercise 15.5. Prove the following: For any Boolean function h and in-
dependently and uniformly drawn x and y, we have h(x) = h(y) with a
probability of 21 + 12 bias(h)2 . In other words,
1 1 2
+ bias(h)? = NStab1/2 (h) .
2 2
We take a different view on NStabδ (h): First, we draw ρ ∈ P2δ m . We set

xi = yi = ρi if ρi 6= ?. For ρi = ?, we draw xi , yi ∈ {0, 1} uniformly and


independently at random. Then x is drawn uniformly at random and yi
differs from xi with a probability of δ. Furthermore, x and y are identically
distributed. And given ρ, they are drawn independently. Let x0 and y 0 be
the vectors obtained by removing all positions i with ρi 6= ?.
Together with Exercise 15.5, we get
0 0

NStabδ (h) = E m Pr hρ (x ) = hρ (y )
ρ∼P2δ x0 ,y 0
 
1 1
= Em + bias(hρ )2 .
ρ∼P2δ 2 2
Thus,
NStab?δ (h) = E m bias? (hρ )2 .

ρ∼P2δ

by linearity of expectation.
Since bias? (hρ ) ∈ [0, 1], we have bias? (hρ )2 ≤ bias? (hρ ), which yields the
first inequality. The second inequality follows from Jensen’s inequality since
squaring is a convex function:
p r
NStab?δ (h) = E m (bias? (hρ )2 )
ρ∼P2δ
q 
bias? (hρ )2 E m bias? (hρ )

≥ Em =
ρ∼P2δ ρ∼P2δ

= EBias?2δ (h) .
15.3. Recursive majority-of-three and tribes 99

The following lemma will be very useful to compute the noise stability
and noise sensitivity of the functions g that we use to amplify hardness.
Lemma 15.6. Let h : {0, 1}m → {0, 1} be a balanced Boolean function, and
let g : {0, 1}k → {0, 1} be an arbitrary Boolean function. Then

NSensδ (g ⊗ h) = NSensNSensδ (h) (g) .

Proof. Let us take a closer look at

NSensδ (g ⊗ h)
 
= Pr g h(x1 ), . . . , h(xk ) 6= g h(y1 ), . . . , h(yk ) .
{0, 1}m
x1 , . . . , xk ∈
y1 ∼ Nδ (x1 ), . . . ,
yk ∼ Nδ (xk )

Let zi = h(xi ) and zi0 = h(yi ). Then Pr(zi = 1) = Pr(zi = 0) = 1/2 since h
is balanced. Furthermore, the probability that zi0 6= zi is just NSensδ (h) =
1 − NStabδ (h). Thus,

g(z) 6= g(z 0 )

NSensδ (g ⊗ h) = Pr
z ∈ {0, 1}k
z 0 ∼ NNSensδ (h)

= NSensNSensδ (h) (g)

as claimed.

15.3 Recursive majority-of-three and tribes


The function g should be nearly balanced subject to a random restriction in
order to keep EBiasδ (g) close to 1/2.
Two functions that turn out to be very useful: The first one is the
“recursive majority of 3” function, which we will define recursively: Let
k
Mk : {0, 1}3 → {0, 1}. Them M1 (x, y, z) = 1 if and only if at least two of
x, y, and z are set to 1. For k ≥ 1, we have Mk+1 = M1 ⊗ Mk . This means
that

Mk+1 (x1 , . . . , x3k , y1 , . . . , y3k , y1 , . . . , y3k )



= M1 Mk (x1 , . . . , x3k ), Mk (y1 , . . . , y3k ), Mk (y1 , . . . , y3k ) .

Lemma 15.7. For ` ≥ log1.1 (1/δ), we have NStab?δ (M` ) ≤ δ −1.1 · (3` )−0.15 .
Exercise 15.6. Prove Lemma 15.7. You can also show a slightly weaker
variant: Prove that there exist constants a > 1, b ≥ 1, c > 0 such that for
` ≥ loga (1/δ), we have NStabδ (M` )? ≤ δ −b (3` )−c .
Hint: Calculate NSensδ (M1 ) explicitly. Then use Lemma 15.6. Make a
case distinction whether NSensδ (Mk ) is large or small for k ≤ `.
100 15. Amplification within NP

Majority-of-three is helpful to amplify a (1/ poly(n))-hard language to


become somewhat hard, namely to ( 12 − n−α )-hard for some small constant
α > 0.
To amplify further to ( 21 −n−1/2+η )-hardness, we use our second function,
which is called “tribes”. Tribes does particularly well if the function whose
hardness we want to amplify is already somewhat hard.
Let w ∈ N, and let n = n(w) ∈ N be the smallest multiple of w such that
(1 − 2−w )n/w ≤ 1/2. Then the tribes function Tn of n variables is defined as

n/w−1 w
_ ^
Tn (x1 , . . . , xn ) = xiw+j .
i=0 j=1

If we write w as a function of n, we get w = log n − log ln n + o(1).


To estimate the expected bias of tribes is technically more challenging
than it was for majority-of-three, and we will omit a proof here.

Lemma 15.8. For every constant η > 0, there is a constant r > 0 such that
EBias1−r (Tn ) ≤ 21 + n−1/2+η .

Exercise 15.7. Let f : {0, 1}n → {0, 1} be a Boolean function. The influ-
ence of the j-th variable is defined as

f (x) 6= f (x(j) ) ,

Ij = Pr
x∼{0,1}n

where x(j) is obtainedPfrom x by flipping the j-th entry of x. The total


n
influence of f is I = j=1 Ij .
Compute (or give as good as possible estimates for) I(andm ), I(paritym ),
I(majoritym ), I(Mk ), I(Tk ). (majoritym (x1 , . . . , xm ) is 1 if and only if at
least m/2 of the xi are 1.)

Exercise 15.8. Show that

NSensδ (f ) ≤ δ · I(f ) .

15.4 Hardness within NP


Recursive majority-of-three turns out to be very useful to amplify δ-hardness
for relatively small values of δ. More precisely, if f is 1/ poly(n)-hard, then
recursive majority-of-three we can amplify its hardness to 12 − n−α for some
small constant α. Then tribes comes into play, which can, if δ is not too
small, bring the hardness to 12 − n−1/2+ε for arbitrarily small ε > 0.
Using Lemma 15.7 and Theorem 15.3, we can show that majority-of-
three can amplify hardness close to 1/2.
15.4. Hardness within NP 101

Lemma 15.9. If there is a family of functions in NP which is infinitely often


balanced and (poly(n), 1/ poly(n))-hard. (This means that this function is
1/ poly(n)-hard for circuits of polynomial size.) Then there is a family of
functions (hm ) in NP that is infinitely often balanced and (1/2+m−0.07 )-hard
for circuits of polynomial size.
Proof. Suppose a family f = (fn ) of functions is infinitely often balanced
and 1/nc -hard for polynomial-size circuits. Choose k = nC for some suffi-
ciently large constant C, and set ` = log3 (k) = C log3 n for some sufficiently
large constant C.
Let hm = M` ⊗ fn . The function hm has input length m = nk = nC+1 .
The family h = (hm )m∈N is in NP since f is in NP and M` is monotone and
in P. Moreover, hm is balanced whenever fn is balanced.
We have to show that hm is (1/2 − m−0.07 )-hard whenever fn is hard
and balanced. We apply Theorem 15.3 with η = 1, ε = n−C , and δ = n−c .
Lemmas 15.5 (for converting noise stability to expected bias) and 15.7
(for the noise stability of M` ) yield

1 1 δ −0.55 −0.075`
 
EBiasδ (M` ) ≤ + 3 .
2 2 2
This assumes that ` is sufficiently large, which can be ensured by choosing
C large.
Now we observe that
 C −0.075
1 δ −0.55 −0.075`
 
c 0.55 n
3 ≤ (2n ) + n−C ≤ n−0.074C
2 2 3

for large enough (but still constant) C. Finally, n−0.074C ≤ m−0.07 for large
enough C since m = nC+1 .
Using the tribes function, we can further improve the hardness.
Theorem 15.10. Suppose that there is a family f = (fn )n∈N of functions in
NP which is infinitely often balanced and (poly(n), 1/ poly(n))-hard. (This
means that fn is 1/ poly(n)-hard for circuits of polynomial size.) Then there
is a family of functions in NP which is infinitely often (poly(n), 21 −n−1/2+ε )-
hard for any ε > 0.
Exercise 15.9. Prove Theorem 15.10 using the tools Lemma 15.8, Lemma 15.9,
and Theorem 15.3.
At the expense of a small loss in the final hardness, we get even get rid
of the requirement that the initial function is balanced.
Theorem 15.11. Suppose that there is a family of functions in NP which
is infinitely often (poly(n), 1/ poly(n))-hard. Then there is a family of func-
tions in NP which is infinitely often (poly(n), 21 − n−1/3+ε )-hard.
102 15. Amplification within NP

We can rephrase Theorem 15.11 in “boosting form”, which sounds more


positive.

Theorem 15.11’. Suppose that (L, U) ∈ Heur 1 −n−0.33 P/poly for all L ∈
2
NP. Then (L, U) ∈ Heur1/p P/poly for every polynomial p and every language
L ∈ NP.

Exercise 15.10. If we allow arbitrary Boolean functions (which are not


required to be from, say, NP or PSPACE), we can even find a function that
is exponentially close to 1/2-hard for circuits of exponential size.
Prove the following: There exists a universal constant γ ≥ 1/8 such that,
for all sufficiently large n ∈ N, there exists a function h : {0, 1}n → {0, 1}
which is (2γn , 12 − 2−γn )-hard.
This is almost as hard as possible: No function is harder than 1/2-hard
even for very small circuits. And just by hard-wiring the correct function
value for one input and outputting either 0 or 1 on all other inputs (depend-
ing on whether more function values are 0 or 1), we can bring the hardness
down to 21 − 2−n .
16 RL and undirected connectivity

The problem

CONN = {(G, s, t) | G is a directed graph with a path from s to t}

is NL-complete. What about its undirected counter part

UCONN = {(G, s, t) | G is a undirected graph with a path from s to t}?

It is of course in NL, but the NL-hardness proof for CONN does not work for
undirected G, since the configuration graph of a nondeterministic Turing
machine is a directed graph.
In this chapter, we will show that UCONN can be decided in randomized
logarithmic space. We define RSpace(s(n)) to be the class of all languages
that can be decided by an s(n)-space bounded probabilistic Turing machine
with one-sided error. The Turing machine has a separate input tape and the
space used on the random tape is not counted. In the same way, we define
BPSpace(s(n)), the only difference is that we allow two-sided errors.

Definition 16.1. 1. RL = RSpace(O(log n)).

2. BPL = BPSpace(O(log n)).

Both RL and BPL allow probability amplification. Obviously

RL ⊆ NL.

For randomized computations with small space, it is important that the


randomness is created “on the fly”, that is, that the random tape is oneway.
For instance, one can show that BP- L = BPP (the BP-operator applied to
the class L) but BPL is not likely to be BPP.

Theorem 16.2. UCONN ∈ RL

The algorithm showing that UCONN ∈ RL is very simple. We perform a


random walk starting in s. If we reach the node t, we accept. If we do not
reach t after a polynomial number of steps, we reject.
Input: undirected graph G = (V, E), nodes s, t ∈ V .

1. Let v := s.

2. For i := 1 to poly(n) do

103
104 16. RL and undirected connectivity

(a) Replace v by a random neighbour of v.


(b) If v = t, then accept.
3. Reject.

The algorithm is obviously logarithmic space bounded, since we only


have to store one node and a counter that counts to a polynomially large
value. It is clear that if there is no path between s and t, then the algorithm
is always right. The hard part is to show that if there is a path, then it is
also right with constant probability. Along the proof, we will also give an
explicit bound for the poly(n) term.
Let G = (V, E) be a d-regular graph and A be its adjacency matrix.
Recall that à = d1 · A is the normalized adjacency matrix. It is a doubly
stochastic matrix. If p is a probability distribution on V , then Ãt p is the
probability distribution that we get when drawing a starting vertex accord-
ing to p and then performing a random walk of length t. As a first step, we
will show that Ãt p converges to the uniform distribution 1̃ on V . We will
need the following relation between 1-norm and 2-norm:

kxk2 ≤ kxk1 ≤ n · kxk2 for all x ∈ Rn .
Whenever we write just kxk, we mean the 2-norm in the following.
Lemma 16.3. Let G = (V, E) be a d-regular connected graph with adjacency
matrix A and let p be a probability distribution on V . Then
λ(G) t
 
t
kà p − 1̃k2 ≤ for all t ∈ N.
d
Proof. Let λ = λ(G)/d. By the definition of λ, we have kÃxk ≤ λkxk if
x⊥1̃. If x⊥1̃, then Ax⊥1̃, since x belongs to the direct sum of the eigenspaces
of the eigenvalues γ 6= 1. Therefore, by induction we get:
1. kÃt xk ≤ λt kxk if x⊥1̃ and
2. kÃt 1̃k = 1.

Pn decompose p = α1̃ + q with q⊥1̃. q⊥1̃ means that hq, 1̃i = 0, that is,
We
i=1 qi = 0. This means that α = 1, since p is a probability distribution.
Thus
Ãt p = Ãt (1̃ + q) = 1 + Ãt q.
We have kpk2 = k1̃k2 + kqk2 , since 1̃ and q are orthogonal. Therefore,
kqk ≤ kpk. Since p is a probability distribution, kpk ≤ kpk1 ≤ 1. Hence,
kÃt p − 1̃k = kÃt qk ≤ λt kqk ≤ λt .

Next we will show that many graphs are “slight” expanders, that is,
λ(G)/d is bounded away from 1 by 1/ poly(n).
105

Lemma 16.4. Let G be a connected d-regular graph with self-loops at each


node. Then
λ(G) 1
≤1− .
d 8dn3
Proof. Let x⊥1̃ with kxk = 1 and let y = Ãx. We have

1 − kyk2 = kxk2 − kyk2


= kxk2 − 2kyk2 + kyk2
= kxk2 − 2hÃx, yi + kyk2
Xn X n n X
X n n X
X n
2
= Ãi,j xj − 2 Ãi,j xj yi + Ãi,j yj2
i=1 j=1 i=1 j=1 i=1 j=1
Xn X n
= Ãi,j (xj − vi )2 .
i=1 j=1

We now claim that there are indices i and j such that

1
Ãi,j (xj − yi )2 ≥ .
4dn3
Since the sum above only contains nonnegative terms, this will also be a
lower bound for 1P − kyk2 . We sort the nodes (indices) such that x1 ≥ x2 ≥
· · · ≥ xn . Since ni=1 xi = 0, we have x1 ≥ 0 ≥ xn . Because kxk2 = 1,
√ √
x1 ≥ 1/ n or xn ≤ −1/ n. Thus

1
x1 − xn ≥ √ .
n
1
Thus there is an i0 such that xi0 − xi0 +1 ≥ n1.5 . Set U = {1, . . . , i0 } and
Ū = {i0 + 1, . . . , n}. Since G is connected, there is and edge {j, i} with
j ∈ U and i ∈ Ū . Then

|xj − yi | ≥ |xj − xi | −|xi − yi |.


| {z }
≥xi0 −xi0 +1 ≥1/n1.5

If |xi − yi | ≤ 2n11.5 , then |xj − yi | ≥ 2n11.5 and Ãi,j (xj − yi )2 ≥ 4dn


1
3 , because
1
Ãi,j ≥ 1/d since there is an edge {j, i}. If |xi − yi | ≥ 2n1.5 , then Ãi,i (xi −
1
yi )2 ≥ 4dn 3 , because Ãi,i ≥ 1/d since the graph has all self loops.

Thus
1
kyk2 ≤ 1 −
4dn3
and
1
kyk ≤ 1 − .
8dn3
106 16. RL and undirected connectivity

Since this holds for all y = Ãx with kxk = 1 and x⊥1̃, this is also an upper
bound for λ(G)/d.
Assume G = (V, E) is a connected d-regular graph such that every node
has a self loop. Let p be any probability distribution on V . By Lemmas 16.3
and 16.4,  t
1 3 1
t
kà p − 1̃k ≤ 1 − 3
≤ e−t/(8dn ) ≤ 1.5
8dn 2n
for t ≥ 12dn3 ln n + 8dn3 ln 2. Thus
1
kÃt p − 1̃k1 ≤
2n

and (Ãt p)i ≥ 1/n − 1/(2n) = 1/(2n). Thus the probability that we hit
any particular node i is at least 1/(2n). If we repeat this for 2n times, the
probability that we hit i is at least 1 − 1/e ≥ 1/2.
This proves the correctness of the algorithm in the beginning of our
chapter. The input graph G need not to be regular or have self loops at
every node. But we can make it regular with self loops by attaching an
appropriate number of self loops to each node. The degree of the resulting
graph is at most n. This does not change the connectivity properties of
the graph. (In fact, we even do not have to do the preprocessing, since
if a node is hit in the new graph, it is only hit earlier in the old graph.)
Then we apply the analysis above to the connected component that contains
s. If t is in this component, too, then we hit it with probability at least
1/2. Note that instead of restarting the random walk, we can perform one
longer random walk, since the analysis does not make any assumption on
the starting probability except that the mass should be in the component
of s.
17 Explicit constructions of expanders

We call a family of (multi)graphs (Gn )n∈N a family of d-regular λ-expanders


if
1. Gn has n nodes
2. Gn is d-regular
3. λ(Gn ) ≤ λ
for all n. Here, d and λ are constants.
The family is called explicit if the function
1 n → Gn
is polynomial time computable. It is called strongly explicit if
(n, v, i) 7→ the ith neighbour of v in Gn
is polynomial time computable. Here the input and output size is only
O(log n), so the algorithm runs in time only poly(log n). In our case, it is
also possible to return the whole neighbourhood, since d is constant.
Let G be a d-regular graph with adjacency matrix In this chapter, it will
we very convenient to work with the normalized adjacency matrices à = d1 A.
These matrices are also called random walk matrices, since they describe the
transition probabilities of one step of a random walk. λ̃(G) is the second
largest (absolute value of an) eigenvalue of Ã. Obviously, λ̃(G) = λ(G)/d.
We will now describe three graph transformations. One of them increases
the number of nodes. This will be used to construct larger expanders from
smaller ones. The second one will reduce the degree. This is used to keep
the degree of our family constant. An the last one reduces the second largest
eigenvalue. This is needed to keep λ(G) below λ.

17.1 Matrix products


Let G be a d-regular graph with normalized adjacency matrix Ã. The k-
fold matrix product Gk of G is the graph given by the normalized adjacency
matrix Ãk . This transformation is also called path product, since there is
an edge between u and v in Gk if there is path of length k in G between u
and v.
It is obvious that the number of nodes stays the same and the degree
becomes dk .

107
108 17. Explicit constructions of expanders

Lemma 17.1. λ̃(Gk ) = λ̃(G)k for all k ≥ 1.

Proof. Let x be an eigenvector of à associated with the eigenvalue λ


such that λ = λ̃(G). Then Ãk x = λk x (induction in k). Thus λ̃(Gk ) ≥ λk .
It cannot be larger, since otherwise λ̃(G) > λ.

Matrix product
nodes degree λ̃(G)
G n d λ
Gk n dk λk

Given oracle access to the neighbourhoods of G, that is, we may ask


queries “Give me a list of all neighbours of v!”, we can compute the neigh-
bourhood of a node v in Gk in time O(dk log n) by doing a breadth first
search starting in v. From v, we can reach at most dk vertices and the
description size of a node is O(log n).

17.2 Tensor products


Let G be a d-regular graph with n nodes and normalized adjacency matrix
à and let G0 be a d0 -regular graph with n0 nodes and normalized adjacency
matrix Ã0 . The tensor product G ⊗ G0 is the graph given by the normalized
adjacency matrix à ⊗ Ã0 . Here à ⊗ Ã0 denotes the Kronecker product of the
two matrices, which is given by

a1,1 Ã0 . . . a1,n Ã0


 

à ⊗ Ã0 =  .. .. ..
,
 
. . .
0
an,1 Ã . . . an,n Ã0

where A = (ai,j ).
The new graph has nn0 nodes and its degree is dd0 .

Lemma 17.2. Let A be a m × m-matrix and B be a n × n-matrix with


eigenvalues λ1 , . . . , λm and µ1 , . . . , µn . The eigenvalues of A ⊗ B are λi µj ,
1 ≤ i ≤ m, 1 ≤ j ≤ n.

Proof. Let x be an eigenvector of A associated with the eigenvalue λ.


and y be an eigenvector of B associated with the eigenvalue µ. Let z := x⊗y
be the vector  
x1 y
 .. 
 . .
xn y
17.3. Replacement product 109

where x = (xi ). z is an eigenvector of A ⊗ B associated with λµ:


 
a1,1 x1 By + · · · + a1,m xm By
A⊗B·z =
 .. 
. 
am,1 x1 By + · · · + am,m xm By
 
(a1,1 x1 + · · · + a1,m xm )y
=µ·
 .. 
. 
(am,1 x1 + · · · + am,m xm )y
 
x1 y
= λµ ·  ... 
 

xm y
= λµz.

These are all eigenvalues, since one can show that if x1 , . . . , xm and y1 , . . . , yn
are bases, then xi ⊗ yj , 1 ≤ i ≤ m, 1 ≤ j ≤ n, form a basis, too.
From the lemma, it follows that λ̃(G ⊗ G0 ) = max{λ̃(G), λ̃(G0 )}, since
1 · λ̃(G0 ) and λ̃(G) · 1 are eigenvalues of à ⊗ Ã0 , but the eigenvalue 1 · 1 is
excluded in the definition of λ̃(G ⊗ G0 ).

Tensor product
nodes degree λ̃(G)
G n d λ
G0 n0 d0 λ0
G ⊗ G0 nn0 dd0 max{λ, λ0 }

Given oracle access to the neighbourhoods of G and G0 , we can compute


the neighbourhood of a node v in G⊗G0 in time O(d2 log max{n, n0 }). (This
assume that from the names of the nodes v in G and v 0 in G0 we can compute
in linear time a name of the node that corresponds to v ⊗ v 0 .)

17.3 Replacement product


Let G be a D-regular graph with n nodes and adjacency matrix A and H be
a d-regular graph with D nodes and adjacency matrix B. The replacement
product G r H is defined as follows:

• For every node v of G, we have one copy Hv of H.

• For every edge {u, v} of G, there are d parallel edges between node i
in Hu and node j in Hv where v is the ith neighbour of u and u is the
jth neighbour of v.
110 17. Explicit constructions of expanders

We assume that the nodes of H are the number from 1 to D and that the
neighbours of each node of G are ordered. Such an ordering can for instance
be induced by an ordering of the nodes of G.
We can think of G r H of having an inner and an outer structure. The
inner structures are the copies of H and the outer structure is given by G.
For every edge of G, we put d parallel edges into G r H. This ensures that
when we choose a random neighbour of some node v, the probability that we
stay in Hv is the same as the probability that we go to another Hu . In other
words, with probability 1/2, we perform an inner step and with probability
1/2, we perform an outer step. The normalized adjacency matrix of G r H
is given by
1 1
à + I ⊗ B,
2 2
where I is the n × n-identity matrix. The nD × nD-matrix  is defined as
follows: Think of the rows and columns labeled with pairs (v, j), v is a node
of G and j is a node of H. Then there is a 1 in the position ((u, i), (v, j)) if v
is the ith neighbour of u and u is the jth neighbour of v. Â is a permutation
matrix.
Obviously, G r H has nD nodes and it is 2d-regular.

Excursus: Induced matrix norms


For a norm k.k on Rn , the induced matrix norm on Rn×n is defined by

kAxk
kAk = sup = max kAxk.
x6=0 kxk kxk=1

It is a norm that is subadditive and submultiplicative. By definition, it is compatible


with the vector norm, that is,

kAxk ≤ kAk · kxk.

It is the “smallest” norm that is compatible with the given vector norm.
For the Euclidian norm k.k2 on Rn , then induced norm is the so-called spectral
norm, the square root of the largest of the absolute values of the eigenvalues of
AH A. If A is symmetric, then this is just the largest of the absolute values of the
eigenvalues of A. In particular,

λ(G) ≤ kAk2 .

If A is symmetric and doubly stochastic, then kAk2 ≤ 1.

Lemma 17.3. If λ̃(G) ≤ 1 −  and λ̃(H) ≤ 1 − δ, then λ̃(G r H) ≤ 1 −


δ 2 /24.
17.3. Replacement product 111

Proof. By Bernoulli’s inequality, it is sufficient to show that λ̄(G r H)3 ≤


1 − δ 2 /8. Since λ̄(G r H)3 = λ̄((G r H)3 ), we analyze the threefold matrix
power of G r H. Its normalized adjacency matrix is given by
 3
1 1
 + I ⊗ B̃ . (17.1)
2 2

 and I ⊗ B̃ are doubly stochastic, so their spectral norm is bounded by 1.


Since the spectral norm is submultiplicative, we can expand (17.1) into
1 
= sum of seven matrices of spectral norm ≤ 1 + (I ⊗ B̃)Â(I ⊗ B̃)
8
7 1
= M + (I ⊗ B̃)Â(I ⊗ B̃)
8 8| {z }
=:(∗)

with kM k ≤ 1. By Exercise 17.1, we can write B̃ = (1 − δ)C + δJ with


kCk ≤ 1. Thus
(∗) = (I ⊗ (1 − δ)C + I ⊗ δJ)Â(I ⊗ (1 − δ)C + I ⊗ δJ)
= (1 − δ 2 )M 0 + δ 2 (I ⊗ J)Â(I ⊗ J)
with kM 0 k ≤ 1. A direct calculation shows that
(I ⊗ J)Â(I ⊗ J) = A ⊗ J 0
where the entries of J 0 are all equal to 1/D2 . Thus, the second largest
eigenvalue of
λ((I ⊗ J)Â(I ⊗ J)) = λ(A ⊗ J 0 )
≤ λ(Ã).
Hence,
3
δ2 δ2

1 1
 + I ⊗ B̃ = (1 − )M 00 + (A ⊗ J 0 )
2 2 8 8
with kM 00 k ≤ 1 and
3
δ2 δ2

1 1
λ Â + I ⊗ B̃ ≤ 1 − + (1 − )
2 2 8 8
2
δ 
=1− ,
8
because λ(M 00 ) ≤ kM 00 k.
The only term in the analysis that we used was the (I ⊗ B̃)Â(I ⊗ B̃)
term. This corresponds to doing an “inner” step in H, then an “outer step”
in G and again an “inner” step in H. The so-called zig-zag product is a
product similar to the replacement product that only allows such steps.
112 17. Explicit constructions of expanders

Exercise 17.1. Let A be the normalized adjacency matrix of a d-regular


λ-expander. Let  1 1

n ... n
 .. . . ..  .
J = . . . 
1 1
n ... n
Then
A = (1 − λ)J + λC
for some matrix C with kCk ≤ 1.

Replacement product
nodes degree λ̃(G)
G n D 1−
H D d 1−δ
GrH nD 2d 1 − δ 2 /24

Given oracle access to the neighbourhoods of D and H, we can compute


the neighbourhood of a node v in G ⊗ G0 in time O((D + d) log n). (This
assume that the oracle gives us the neighbourhoods in the same order than
the one used when building the replacement product.)

17.4 Explicit construction


We first construct a family of expanders (Gm ) such that Gm has cm nodes.
In a second step (Exercise!), we will show that we can get expanders from
Gm of all sizes between cm−1 + 1 and cm . The constants occurring in the
proof are fairly arbitrary, they are just chosen in such a way that the proof
works. We have taken them from the book by Arora and Barak.
For the start, we need the following constant size expanders. Since they
have constant size, we do not need a constructive proof, since we can simply
enumerate all graphs of the particular size and check whether they have the
mentioned properties.

Exercise 17.2. For large enough d, there are

1. a d-regular 0.01-expander with (2d)100 nodes.


1
2. a 2d-regular (1 − 50 )-expander with (2d)200 nodes

We now construct the graphs Gk inductively:

1. Let H be a d-regular 0.01-expander with (2d)100 nodes.


1
2. Let G1 be a 2d-regular (1 − 50 )-expander with (2d)100 nodes and G2
1
be a 2d-regular (1 − 50 )-expander with (2d)200 nodes.
17.4. Explicit construction 113

3. For k ≥ 3, let
Gk := ((Gb k−1 c ⊗ Gd k−1 e )50 ) r H
2 2

1
Theorem 17.4. Every Gk is a 2d-regular (1 − 50 )-expander with (2d)100k
nodes. Furthermore, the mapping

(bin k, bin i, bin j) 7→ jth neighbour of node i in Gk

is computable in time polynomial in k. (Note that k is logarithmic in the


size of Gk !)

Proof. The proof of the first part is by induction in k. Let nk denote


the number of nodes of Gk .
Induction base: Clear from construction.
Induction step: The number of nodes of Gk is

nb k−1 c · nd k−1 e · (2d)100 = (2d)100(k−1) · (2d)100 · (2d)100k .


2 2

The degree of Gb k−1 c and Gd k−1 e is 2d by the induction hypothesis.


2 2
The degree of their tensor product is (2d)2 and of the 50th matrix power is
(2d)100 . Then we take the replacement product with H and get the graph
Gk of degree 2d.
1
Finally, the second largest eigenvalue of Gb k−1 c ⊗ Gd k−1 e is ≤ 1 − 50 .
2 2
Thus,
1 1 1
λ̃((Gb k−1 c ⊗ Gd k−1 e )50 ) ≤ (1 − )50 ≤ ≤
2 2 50 e 2
Thus λ̃(Gk ) ≤ 1 − 21 · 0.992 /24 ≤ 1 − 50
1
.
For the second part note that the definition of Gk gives a recursive
scheme to compute the neighbourhood of a node. The recursion depth is
log k. We have shown how to compute the neighbourhoods of G50 , G ⊗ G0 ,
and G r H from the neighbourhoods of the given graphs. The total size of
the neighbourhood of a node in Gk is Dlog k = poly(k) for some constant D.
18 UCONN ∈ L

We modify the transition relation of k-tape nondeterministic Turing ma-


chines as follows: A transition is a tuple (p, p0 , t1 , . . . , tk ) where p and p0 are
states and tκ are triples of the form (αβ, d, α0 β 0 ). The interpretation is the
following: if d = 1, the head of M stands on α, and β is the symbol to the
right of the head, then M may go to the right and replace the two symbols
by α0 and β 0 . If d = −1, then the head has to be on β and M goes to the
left. In both cases, the machine changes it state from p to p0 . An “ordinary”
Turing machine can simulate such a Turing machine by always first looking
at the symbols to the left and right of the current head position and storing
them in its finite control.
By defining a transition like above, every transition T has a reverse
transition T −1 that undoes what T did. M is now called symmetric if for
every T in the transition relation ∆, T −1 ∈ ∆.

Definition 18.1.

SL = {L | there is a logarithmic space bounded symmetric


Turing machine M such that L = L(M ) }

L is a subset of SL. We simply make the transition relation of a deter-


ministic Turing machine M symmetric by adding T −1 to it for every T in it.
Note that the weakly connected components of the configuration graph of
M are directed trees that converge into a unique accepting or rejecting con-
figuration. We cannot reach any other accepting or rejecting configuration
by making edges in the configuration graph bidirectional, so the accepted
language is the same.
In the same way, we can see that UCONN ∈ SL: Just always guess a neigh-
bour of the current node until we reach the target t. The guessing step can
be made reversible and the deterministic steps between the guessing steps
can be made reversible, too. UCONN is also hard for SL under determinis-
tic logarithmic space reductions. The NL-hardness proof CONN works, we
use the fact that the configuration graph of a symmetric Turing machine is
undirected. Finally, if A ∈ SL and B ≤log A, then B ∈ SL.
Less obvious are the facts that

• planarity testing is in SL,

• bipartiteness testing is in SL,

114
18.1. Connectivity in expanders 115

• a lot of other interesting problems are contained in SL, see the com-
pendium by [AG00].

• SL is closed under complementation [NTS95].

In this chapter, we will show that UCONN ∈ L. This immediately also


yields space efficient algorithms for planarity or bipartiteness testing.

18.1 Connectivity in expanders


Lemma 18.2. Let c < 1 and d ∈ N. The following promise problem can be
decided by a logarithmic space bounded deterministic Turing machine:

Input: a d-regular graph, such that every connected component is a


λ-expander with λ/d ≤ c, nodes s and t.
Output: accept if there is a path between s and t, otherwise reject.

Proof. The Turing machine enumerates all paths of length O(log n)


starting in s. If it sees the node t, it accepts; after enumerating all the
paths without seeing t, it rejects.
Since G has constant degree, we can enumerate all paths in space O(log n).
Every path is described by a sequence {1, . . . , d}O(log n) . Such a sequence
δ0 , δ1 , . . . is interpreted as “Take the δ0 th neighbour of s, then the δ1 th
neighbour of this node, . . . ”.
If the machine accepts, then there certainly is a path between s and t.
For the other direction note that, by Lemma 16.3, a random walk on G that
starts in s converges to the uniform distribution on the connected compo-
nent containing s. After O(log n) steps, every node in the same connected
component of s has a positive probability of being reached. In particular
there is some path of length O(log n) to it.

18.2 Converting graphs into expanders


Lemma 18.3. There is a logarithmic space computable transformation that
transforms any graph G = (V, E) into a cubic regular graph G0 = (V 0 , E 0 )
such that V ⊆ V 0 and for any pair of nodes s, t ∈ V , there is a path between
s and t in G iff there is one in G0 .

Proof. If a node v in G

1. has degree d > 3, then we replace v by a cycle of length d and connect


every node of the cycle to one of the neighbours of v.

2. has degree d ≤ 3, then we add 3 − d self loops.


116 18. UCONN ∈ L

For every node v with degree > 3, we identify one of the new nodes of the
cycle with v. Let the resulting graph be G0 . By construction, G0 is cubic
and if there is a path between s and t in G then there is one between in G0
and vice versa.
With a little care, the construction can be done in logarithmic space.
(Recall that the Turing machine has a separate output tape that is write-
only and oneway, so once it decided to output an edge this decision is not
reversible.) We process each node in the order given by the representation
of G. For each node v, we count the number m of its neighbours. If m ≤ 3,
then we just copy the edges containing v to the output tape and output the
additional self loops. If m > 3, then we output the edges {(v, i), (v, i + 1)},
1 ≤ i < m and {(v, m), (v, 1)}. Then we go through all neighbours of v. If
u is the ith neighbour of v, then we determine which neighbour v of u is,
say the jth, and output the edge {(v, i), (u, j)}. (We only need to do this if
v is processed before u because otherwise, the edge is already output.)
Let d be large enough such that there is a d/2-regular 0.01-expander H
with d50 nodes. (Again, the constants are chosen in such a way that the
proof works; they are fairly arbitrary and we have taken them from the
book by Arora and Barak.) We can make our cubic graph G d50 -regular by
adding d50 − 3 self loops per node. Recursively define

G0 := G
Gk := (Gk−1 r H)50 .

Lemma 18.4. For all k ≥ 1,

1. Gk has d50k · n nodes,

2. Gk is d50 -regular,
1 k
3. λ̃(Gk ) ≤ 1 − k , where k = min{ 20 , 8d1.5
50 n3 }.

Proof. The proof is by induction in k. Let nk be the number of nodes of


Gk .
Induction base: G0 has n nodes and degree d50 . By Lemma 16.4, λ̃(G0 ) ≤
1 − 8d501 n3 ≤ 1 − 0 .
Induction step: The replacement product Gk r H has nk · d50 = nk+1 nodes.
Its degree is d. Gk+1 has the same number of nodes and the degree becomes
d50 . We have
k k
λ̃(Gk r H) ≤ 1 − · 0.992 ≤ 1 −
24 25
and
 k 50
λ̃(Gk ) ≤ 1 − ≤ e−2k ≤ 1 − 2k + 22k = 1 − 2k (1 − k ).
25
18.2. Converting graphs into expanders 117

1 1 1.5k 1
If k = 20 , then λ̃(Gk ) ≤ 1 − 20 . If k = 8d50 n3
< 20 , then

λ̃(Gk ) ≤ 1 − 1.5k = 1 − k+1 .

If we set k = O(log n), then Gk is a constant degree expander with


19
λ̄(Gk ) ≤ 20 . For such graphs, connectivity can be decided in deterministic
logarithmic space by Lemma 18.2. So we could first make our input graph
cubic, then compute Gk for k = O(log n) and finally use the connectivity
algorithm for expander graphs. Since L is closed under logarithmic space
computable reductions, this would show UCONN ∈ SL.
But there one problem: To compute Gk , we cannot compute G0 , then
G1 , then G2 , and so on, since L is only closed under application of a constant
number of many-one-reductions. Thus we have to compute Gk from G0 in
one step.
Lemma 18.5. The mapping G0 → Gk with k = O(log n) is deterministic
logarithmic space computable.
Proof. Assume that G0 has nodes {1, . . . , n}. Then the nodes of Gk are
from {1, . . . , n} × {1, . . . , d50 }k . The description length of a node of Gk is
log n + 50 log d · k = O(log n). We will identify {1, . . . , d50 } with {1, . . . , d}50 ,
since an edge in Gk corresponds to a path of length 50 in Gk−1 r H.
Now given a node v = (i, δ1 , . . . , δk ) of Gk and j ∈ {1, . . . , d50 }, we want
to compute the jth neighbour of v in Gk . We interpret j as a sequence
(j1 , . . . , j50 ) ∈ {1, . . . , d}50 .
Input: node v = (i, δ1 , . . . , δk ) of Gk , index j = (j1 , . . . , j50 )
Output: the jth neighbour of v in Gk
1. For h = 1, . . . , 50 compute the jh neighbour of the current node in
Gk−1 r H.

So it remains to compute the neighbours in Gk−1 r H.


Input: node v = (i, δ1 , . . . , δk ) of Gk , index j
Output: the jth neighbour of v in Gk−1 r H
1. If j ≤ d/2, then return (i, δ1 , . . . , δk−1 , δ 0 ) where δ 0 is the j neighbour
of δk in H. Since H is constant, this can be hard-wired.
(We perform an internal step inside a copy of H.)
2. Otherwise, recursively compute the δk th neighbour of (i, δ1 , . . . , δk−1 )
in Gk−1 .
(We perform an external step between two copies of H.)

Note that we can view (v, δ1 , . . . , δk ) as a stack and all the recursive calls
operate on the same step. Thus we only have to store one node at a time.
118 18. UCONN ∈ L

Theorem 18.6 (Reingold [Rei08]). UCONN ∈ SL.

Corollary 18.7. L = SL.


19 Extractors

Extractors are a useful tool for randomness efficient error probability am-
plification. To define extractors, we first have to be able to measure the
closeness of probability distributions.
Definition 19.1. Let X and Y two random variables with range S. The
statistical difference of X and Y is Diff(X, Y ) = maxT ⊆S | Pr[X ∈ T ] −
Pr[Y ∈ T ]|. X and Y are called -close if Diff(X, Y ) ≤ .
In the same way, we can define the statistical difference of two probability
distributions.
We can think of T as a statistical test which tries to distinguish the
distributions of X and Y . The L1 -distance of X and Y is defined as
X
|X − Y |1 = | Pr[X = s] − Pr[Y = s]|
s∈S

L1 -distance and statistical difference are related as stated below.


Exercise 19.1. Prove the following: Two random variables X and Y are
-close if and only if |X − Y |1 ≤ 2.
Statistical closeness is preserved under application of functions.
Exercise 19.2. Prove the following statements:
1. Let X and Y be random variables with range S that are -close. Let f
be a function with domain S. Then f (X) and f (Y ) are -close.
2. If Z is a random variable independent of X and Y , then the random
variables (X, Z) and (Y, Z) are -close.
A classical measure for the amount of randomness
P contained in a random
source X is the Shannon entropy H(X) = − s∈S Pr[X = s] log Pr[X = s].
This is however not a suitable measure in our context. Consider for instance
the following source: With probability 0.99 it returns the all-zero string.
With probability 0.01 is returns a string in {0, 1}N chosen uniformly at
random. The Shannon entropy of this source is ≥ 0.01N which is quite
large, in particular unbounded. If we want to use this source for simulating
randomized algorithms, we will take one sample from this source. But with
probability 0.99, we see a string that contains no randomness at all which
is not very useful for derandomization. The Shannon entropy measures
“randomness on the average” and particularly does not talk about variance.
It is useful when one draws many samples from a source. For our purposes,
the following definition is more useful.

119
120 19. Extractors

Definition 19.2. Let X be a random variable with range S.


1. The min-entropy of X is mins∈S − log Pr[X = s].
2. If X has min-entropy at least k, then X will be called a k-source. If in
addition its range is contained in {0, 1}N , then X is an (N, k)-source.
Note that the min-entropy of the source above is only log 1/0.99 which
is constant. In some sense, the min-entropy measures “randomness in the
worst-case”.
Definition 19.3. Let Ud be the uniform distribution on {0, 1}d . A function
Ext : {0, 1}N × {0, 1}d → {0, 1}m is called a (k, )-extractor if for any
(N, k)-source X, Ext(X, Ud ) is -close to uniform.
Above, we call a source -close to uniform, if it and Um are -close.
Our aim is to construct extractors with small d and large m. An extractor
extracts the randomness of the weak source in the sense that given a sample
of the weak random source and a short truly random string, it produces a
string that is nearly uniformly distributed.
Sometimes it is convenient to view an extractor Ext as a bipartite multi-
graph. The nodes are {0, 1}N on the one and {0, 1}m on the other side. Each
node v ∈ {0, 1}N has degree 2d . It is incident with the edges (v, Ext(v, i))
for all i ∈ {0, 1}d .
A family of extractors Extm : {0, 1}N (m) × {0, 1}d(m) → {0, 1}m is
called explicit, if the mapping (m, v, e) → Extm (v, e) is computable in time
poly(N (m), d(m), m). (Usually, N ≥ m for an extractor. Therefore, we
parameterize the family by the size of the image.)

19.1 Extractors from expanders


Lemma 19.4. Let  > 0. Let k(n) ≤ n for all n. There is an explicit family
of (k, )-extractors Extn : {0, 1}n × {0, 1}t → {0, 1}n with t = O(n − k −
log 1/).
Proof. Let X be an (n, k)-source, and let v be a sample drawn from
X. Let G = (V, E) be a d-regular 21 -expander with 2n nodes. (We do not
construct this graph, since it is too large. We just perform a random walk
on it. This is possible, since strongly explicit expanders exist.) Let z be a
truly random string of length
n k 1 1
t = log d · ( − + log + 2) = O(n − k + log ).
2 2  
n k
We interpret z as a random walk in G of length ` = 2 − 2 + log 1 + 1 and
set

Ext(v, z) = label of the node reached from v by a walk as given by z


19.2. Randomness efficient probability amplification 121

Let p be the probability distribution on V induced by X and A be the


adjacency matrix of G. Let p = 1̃ + p0 with 1̃⊥p0 . We have

kÃ` p − 1k ≤ kÃ` (p − 1̃)k ≤ kÃ` kkp − 1k ≤ 2−` kp − 1̃k.

Since X is an (n, k)-source, we have Pr[X = s] ≤ 2−k for every s in the


range of X. Thus kpk2 ≤ 2−k . Therefore,

kp − 1̃k ≤ kpk + k1̃k ≤ 2−k/2 + 2−n/2 ≤ 2−k/2+1

and
kÃ` − 1̃k ≤ 2−n/2+k/2−log 1/−2 · 2−k/2+1 ≤  · 2−n/2−1 .
Finally,

Diff(Ã` p, Un ) = 2kÃ` p − 1̃k1 ≤ 2kÃ` p − 1̃k2 · 2n/2 ≤ .

The extractor constructed above is only efficient if the k is large, at least


(1 − )n. For small k, better constructions are known.

19.2 Randomness efficient probability amplification


Lemma 19.5. If there is an explicit family of (k(r), 1/8)-extractors Extr :
{0, 1}N (r) × {0, 1}d(r) → {0, 1}r , then for any BPP-Turing machine M that
runs in time t, uses r random bits, and has error probability 1/3, there is a
BPP-machine M 0 with L(M ) = L(M 0 ) that runs in time poly(N (r), 2d(r) , t),
uses N (r) random bits, and has error probability bounded by 2k(r)−N (r) .

Proof. M 0 uses its N (r) random bits and interprets it as a string x ∈


{0, 1}N (r) . Let yi = Ext(x, i) for all i ∈ {0, 1}d(r) . M 0 now simulates 2d(r)
runs of M , each one with a different string yi as random string. M 0 accepts
if the majority of these runs lead to an accepting configuration and rejects
otherwise.
The bound on the running time is clear from the construction. We
have to estimate the error probability. Assume that a given input u is in
L(M ), i.e., M accepts u with probability at least 2/3. The case u ∈ / L(m)
is symmetric. To show the bound on the error probability, it is sufficient
to show that less than 2k(r) of the random strings x lead to a rejecting
configuration. Suppose on the contrary that this is not the case. Let S be
the set of all such x. Then the uniform distribution X on S has min-entropy
at least k(r). Thus Ext(X, Ud(r) ) is 1/6-close to uniform. Let T ⊆ {0, 1}r be
the statistical test that consists of all random strings that make M accept.
The probability that a string drawn uniformly at random from {0, 1}r is
in T is at least 2/3. By definition, the probability that the yi are in T is
≥ 2/3 − 1/8 > 1/2.
122 19. Extractors

This is a contradiction, since for each choice of x that makes M 0 reject,


more than half of the string Ext(x, i) lead to a rejecting configuration, i.e.,
are not in T .
If we take the extractor from the previous section, we have N (r) = r. To
achieve d(r) = O(log n) (and get polynomial running time), we have to set
k(r) = r−log r. To get a k(r) source, we can use k(r) random bits and fill the
remaining log r bits with zeroes. The error probability is 2r−log r−r = 1/r.
So we get a polynomial error reduction with less random bits! (Note that
one can always save log r random bits by trying all possibilities for them
and then making a majority vote. But it is not clear that this reduces the
error probability, since the trials are not independent.)
Extractors can also be used to run PTMs with a weak random source
instead of a prefect random string. The proof of the following lemma is
similar to the proof of the previous one and is left as an exercise.

Lemma 19.6. If there is an explicit family of (k(r), 1/6)-extractors Extr :


{0, 1}N (r) × {0, 1}d(r) → {0, 1}r then for any BPP-machine M that runs in
time t, uses r random bits, and has error probability 1/3, there is a Turing
machine M 0 with L(M ) = L(M 0 ) that runs in time poly(N (r), 2d(r) , t), uses
one sample of an (N (r), k(r)+`(r))-source, and has error probability bounded
by 2−`(r) .

Exercise 19.3. Prove Lemma 19.6


20 Circuits and first-order logic

One can (quite easily) find AC0 circuits for addition. Multiplication seems a
little harder, but there are constant depth circuits with unbounded fan-in for
multiplication, if we use not only and, or, and not gates, but also threshold
gates. But for a long time, it was not even known how to divide even in
logspace, let alone with constant-depth circuits of polynomial size. This
changed The goal of this and the following section is to develop threshold
circuits for division.
It will sometimes be more convenient to do this in the framework of
logic. Thus, we will show the equivalence of constant-depth circuits to first-
order logic in this section, which has been proved by Barrington and Im-
merman [BI90]. In the next section, we will then show that division can be
performed by constant-depth circuits.
This chapter is far from being a complete introduction to complexity
theory in terms of logic, which is called descriptive complexity. We will only
cover first-order logic with some extensions. For a more detailed introduc-
tion, we refer to Immerman [Imm99].

20.1 First-order logic


First-order logic is logic, where the quantifiers range only over elements of
the domain and not (as in second-order logic) over sets of elements.
Since we want to express properties of strings over {0, 1}, we introduce
a unary predicate X. For an input string x = x0 . . . xn−1 , we have X(i) =
1 if and only if xi = 1. (We will use 1 and true as well as 0 and false
interchangeably.) We will have constants 0, 1, and |x| = n and binary
predicates = and ≤ on numbers {0, . . . , n}. Finally, we include the binary
predicate BIT, where BIT(i, m) = 1 if and only if the ith bit of the binary
expansion of m is 1. (For instance, BIT(0, m) = 1 if and only if m is odd.
The role of BIT will soon become clear.)
Our first-order language is the set of formulas that we can build using
0, n, ≤, =, BIT, X() as well as ∧, ∨, ¬ and variables x, y, z, . . . and the
quantifiers ∀ and ∃. The quantifier always range over {0, 1, . . . , n − 1}, i.e.,
∃x means ∃x ∈ {0, 1, . . . , n − 1}. To make the notation less cumbersome,
we add syntactic sugar like →, where a → b means ¬a ∨ b. Furthermore,
we abbreviate (a ∧ b) ∨ (¬a ∧ ¬b) by a = b, knowing well that this might
cause confusion with the “official” binary predicate =. The exclusive-or is
denoted by ⊕. We will also use the binary predicate < on numbers. Not

123
124 20. Circuits and first-order logic

surprisingly, a < b if and only of a ≤ b and ¬(a = b). Analogously, we will


use 6=, >, and ≥. To increase confusion, we identify 0 and false as well as 1
and true, we will thus treat true and false also as numbers.
A sentence is a closed formula of the first-order language that contains
no free variables. (A variable x is free if there is no quantifier ∀x or ∃x to the
left of it that binds this variable.) Sentences express properties of binary
strings: A string x specifies X and n = |x|. Then the sentence is either
true or false. In the former case, x has the property (or is in the language
specified by the sentence). In the latter, not.
We denote by FO the set of all languages that can be expressed by first-
order sentences. In particular, the predicates ≤ and BIT are allowed for FO.
If we want to make clear that only built-in predicates P1 , . . . , Pc are allowed,
we call the corresponding class FO[P1 , . . . , Pc ]. Thus, FO = FO[≤, BIT].
Furthermore, FO[], which forbids ≤ and BIT, is a strict subclass of FO. In
FO[], only =, 0, 1, and n are available.

Example 20.1. The regular language L((00 + 11)? ) can be expressed by

∀i : BIT(0, i) → ∃j : succ(i, j) ∧ X(i) = X(j) .

We have to specify succ(i, j), which should be 1 if and only if i + 1 = j:

succ(i, j) ≡ ∀k : (k < j) → (k ≤ i) .

The class FO suffices to perform (or describe) addition. We assume that


three inputs x, y, and z (each an n digit number) are each given as separate
unary predicates X, Y , and Z, respectively.
We have x + y = z if and only if X(0) ⊕ Y (0) = Z(0) and X(i) ⊕ Y (i) ⊕
C(i) = Z(i) for i ∈ {1, 2, 3, . . . , n − 1} and C(n) = 0. Here, C(i) denotes
the ith carry bit. We can express this as

X(0) ⊕ Y (0) = Z(0)

∧ ∀i : i = 0 ∨ X(i) ⊕ Y (i) ⊕ C(i) = Z(i)
∧ ¬C(n) .

This leaves us with two problem of how to compute C? As a first attempt,


one might try
 
C(i) ≡ X(i − 1) ∧ Y (i − 1) ∨ C(i − 1) ∧ (X(i − 1) ∨ Y (i − 1)) .

(We can compute i − 1 using succ.) But this does not work: C is not
a predicate that we are allowed to use. Instead, it is a placeholder for
something, and we are only allowed to replace it by a first-order formula.
Thus, we replace C(i) by the first-order sentence

∃j < i : X(j) ∧ Y (j) ∧ ∀k ∈ {j, . . . , i − 1} : X(k) ∨ Y (k) .


20.1. First-order logic 125

We call something like C, which looks like a predicate, will be used like a
predicate, but is no predicate, a pseudo predicate in the following.
In the same way, we can add numbers from {0, 1, . . . , n − 1} using BIT.
We call the corresponding ternary predicate +, written in the usual way
x + y = z.
Just to get a better feeling, assume that our input consists of three parts,
representing n/3-bit numbers a, b, and c. We want to test if a + b = c. First,
we compute m = n/3 and m0 = 2n/3 = 2m, which can be done by

∃m∃m0 : m + m = m0 ∧ m + m0 = n .

Then we add three pseudo predicates A, B, C as follows:

A(i) ≡ (i < m) ∧ X(i) ,


B(i) ≡ ∃j : i + m = j ∧ (i < m) ∧ X(j) , and
C(i) ≡ ∃j : i + m0 = j ∧ (i < m) ∧ X(j) .

Now we can use addition using the pseudo predicates.

Exercise 20.1. Give a first-order sentence for the language COPY = {ww |
w ∈ {0, 1}? }.

Exercise 20.2. Give a first-order sentence for the following variant of par-
ity: n o
Lblog nc
x ∈ {0, 1}? | x = x1 . . . xn ∧ i=1 2 xi = 1 .

(We will soon prove that FO equals AC0 (with appropriate uniformity). We
already know that parity is not in AC0 (not even in non-uniform AC0 ). In-
ε
deed, every constant depth circuit for parity has to be of size 2n for some
constant ε > 0. Why is this not a contradiction?)
Do the proof without the following Lemma 20.2.

Using BIT, we can also perform multiplication of numbers between 0 and


n − 1. To do this, we need the following result, which is called the bitsum
lemma.

Lemma 20.2. The binary predicate BSUM, which is defined by BSUM(x, y) =


1 if and only if y is equal to the number of 1s in the binary representation
of x, is in FO.

Proof overview: The idea is to keep a running-sum s1 , s2 , s3 , . . . of the


first, second, third, . . . log log n bits of x. Then we only have to compare
whether si equals si−1 plus the ith block of log log n bits of x. This reduces
the problem of counting 1s in blocks of log log n bits. We apply the same
idea again.
126 20. Circuits and first-order logic

Now we encode an array containing the prefix sums within each block
into a single variable. Furthermore, we have a variable that represents an
array containing s1 , s2 , . . .

Proof. It is fairly easy to express a predicate Pow2 that some number m


is a power of two:

Pow2(m) ≡ ∃i : BIT(i, m) ∧ ∀j : j 6= i → ¬ BIT(j, m) .

The number x consists of at most dlog2 ne bits. Let L be the smallest power
of two larger than dlog2 ne. The number L can be expressed as follows:

∃L : Pow2(L) ∧ BIT(L − 1, n) = 1 .

(Strictly speaking, we have to translate something like BIT(L − 1, n) into


∃q : q + 1 = L ∧ BIT(q, n). But will not do so for the sake of readability.) In
the following, we assume that a variable can hold up to L bits. This is not
precisely true, but we can use a fixed number c of variables to store c · log n
bits. Addressing them is not too complicated since c is a constant and log n
can be computed.
Given any power of two A = 2a , we can multiply with and divide by A.
We express x = Ay as

∀i : BIT(i, y) = BIT(i + A, x) ∧ (i < a → BIT(i, x) = 0 .

We have to add less than L bits. This is a number with at most dlog2 Le
bits. Let L0 be the smallest power of two larger than dlog2 Le. The number
L0 can be expressed in the same way as L.
The idea is to keep a running-sum: Using one existentially quantified
variable s (which represents L bits), we can guess (roughly) L/L0 numbers
s1 , s2 , . . . , sL/L0 such that si = si−1 + ti , where ti is the number of 1s of
x(i−1)L0 +1 , . . . , xiL0 . Furthermore, s1 = t1 . Given i, the bits of si are the
bits s(i−1)L0 , . . . , siL0 −1 . We can address them since L0 is a power of two.
Thus, for instance, we can add them or compare them. We assume for the
moment that the numbers ti are given. Then we can express this as

∃s : s1 = t1 ∧ ∀i : (i > 1) → si = ti + si−1 .

Thus, we can express BSUM:

BSUM(x, y) = ∃s : s1 = t1 ∧ ∀i : (i > 1) → si = ti + si−1 ∧ sL0 = y .

The t1 , t2 , . . . remain to be computed. We can compute ti by a running-sum,


this time over single bits. The numbers of roughly at most log log log n < L0
bits, and there are only L0 partial sums. We assume that L0 · L0 < L, which
is true for sufficiently large n. Then all partial sums fit into a single variable.
20.2. First-order logic with majority 127

We call the kth partial sums tkj . We know how to address since L0 is a power
of two:

∃tj : BIT(0, t1j ) = BIT(jL0 , x) ∧ ∀k ∈ {1, . . . , L − 1} BIT(i, t1j ) = 0∧


∀i∀k : tkj + BIT(jL0 + k, x) = tk+1
j .

(We have used + with a Boolean value BIT(), but it should be clear how to
interpret this.)
Noting that we can deal with not-large-enough values of n by hard-wiring
the results completes the proof.

Lemma 20.3. The ternary multiplication predicate ×, which is true if and


only if x · y = z, is first-order definable.

Proof. Multiplication is equivalent to adding log n numbers a1 , . . . , alog n


of O(log n) bits, where ai = 2i xyi .
Let L = Θ(log n) and L0 = Θ(log log n) be as in the proof of Lemma 20.2.
We add the numbers a1 , a2 , . . . as follows: First, we split ai = bi +ci such that
the binary representations of bi and ci consist each of L0 bits of ai separated
by L0 bits of 0s: bi contains the first, third, fifth, . . . block of L0 bits of ai , ci
the second, fourth, sixth, . . . block. From now on, we treat b1 , b2 , b3 , . . . and
c1 , c2 , . . . separately. By symmetry, we only have to describe how to add the
bi s. If we have the two sums of the bi s and ci ’s, then we can add them sums
using a final +.
We consider the bi s as written below each other. Then we count the
number of 1s in each column using BSUM. Due to the structure of the
bi s, no carry is propagated more than L0 bits. This reduces the problem to
adding log n numbers (L0 for each of the log n/L0 blocks), each of length L0
(because of possible carries). But if fact, we have only L/L0 blocks of L0
numbers of L0 bits to add. We can guess all sums in a single variable. This
is then the sum of the bi s. Then we can verify that we guessed correctly
using BSUM and running-sums, as we did for Lemma 20.2.

Remark 20.4. Instead of BIT and ≤, we can directly use + and ×. This
is equivalent: We can implement BIT using + and ×, and we have already
seen that BIT suffices to implement + and ×. Thus, FO = FO[≤, BIT] =
FO[+, ×].

Exercise 20.3. Show that FO[≤] ⊆ FO[+].

20.2 First-order logic with majority


We can extend first-order logic in different ways. First, we can add new
predicates and constants, which would allow us to specify properties of more
128 20. Circuits and first-order logic

complicated structures. For instances, to specify properties of graphs, it is


more convenient to specify a graph G = (V, E) on n vertices by a binary
relation E with E(i, j) = 1 if and only if {i, j} ∈ E.
Exercise 20.4. Show that using a binary input predicate E instead of the
unary predicate X does not increase the expressive power of first-order logic.
Second, and more importantly for our purposes, we can introduce new
quantifiers. We will make heavy use of the threshold quantifier M. It has
the following interpretation: Mx : P (x) is true if and only if P (x) = 1 for
more than half of the possible x, i.e., for at least d(n+1)/2e of the n possible
values for x.
We denote by FOM the set of all languages that can be expressed by
first-order sentences with ≤ and BIT as well as M.
Another quantifier, which is only of temporary use for us, is H: Hx : P (x)
is true if and only of P (x) is true for bn/2c of the n possible values of x.
“Hx : P (x)” can be expressed by saying “P (x) is not true for the majority
of x, but it becomes true if we add one more element x0 for which P (x0 ) is
true”:

Hx : P (x)
≡ ∃x0 : Mx : P (x) ∨ x = x0 ∧ ¬ Mx : P (x) .
 

The quantifier H is useful to express the following predicates:


1. F (x, y)P (x): “There are exactly y values of x with x ≤ bn/2c and
P (x).”

2. S(x, y)P (x): “There are exactly y values of x with x > bn/2c and
P (x).”

3. y = #x : P (x): “There are exactly y values of x for which P (x).”


We only show how to express the first expression:

F (x, y)P (x)


 
≡ Hx : x ≤ bn/2c ∧ P (x) ∨ bn/2c < x ≤ n − y .

The second expression follows by symmetry. The third is addition of vari-


ables, which we have seen in Section 20.1.
Using #, we can add not only two numbers, but a sequence of num-
bers: Let ITADD(X1 , . . . , Xn , Y ) be the predicate that is true if the sum
of the numbers x1 , . . . , xn , represented by the unary predicates X1 , . . . , Xn ,
is equal to y, represented by Y . (ITADD stands for iterated addition. For
convenience, we assume that we have a binary predicate X(i, j) = Xi (j).)
As we have seen, multiplication reduces to adding sequences of numbers.
Let MULT(X, Y, Z) be true if and only if x · y = z.
20.3. Uniformity 129

Lemma 20.5. ITADD, and hence also MULT, can be expressed in FOM.

Proof. The lemma can be proved in a similar way as Lemma 20.3. We


already have # because we are allowed to use M, which is in this setting the
equivalent of BSUM.

Exercise 20.5. Fill in the details of the proof of Lemma 20.5.

For ∀ and ∃, quantification over pairs of variables can be replaced by


two quantifiers. For instance, ∃xy is equivalent to ∃x∃y. For M, it is not
immediately clear how to get rid of quantifiers over two or more variables.
However, we can do so by using BIT.

Lemma 20.6. Mxy can be expressed using FOM.

Exercise 20.6. Prove Lemma 20.6.


Hint: First express the predicate hu, vi = #hx, yi : P (x, y) with the
meaning “there are exactly n(u − 1) + v pairs x, y for which P (x, y) is true.

Exercise 20.7. Show that PARITY ∈ FOM.

In the next chapter, we will show that division is in FOM.

20.3 Uniformity
The circuit complexity classes NCi and ACi all come in different flavors
corresponding to different conditions on uniformity.
Recall: A family C = (Cn )n∈N of circuits is called polynomial-time uni-
form, if the mapping 1n 7→ Cn (where we identify Cn and an appropriate
encoding of Cn ) can be computed in polynomial time. C is log-space uniform
if the mapping can be computed in logarithmic space.
From the two uniformity conditions, we get ptime-u ACi and ptime-u NCi
as well as logspace-u ACi and logspace-u NCi . However, both uniformity
conditions have drawbacks if we want to analyze subclasses of L or NC1 .
The reason is, for instance, for ptime-uAC0 , constructing the circuit can be
much harder than evaluating it. Thus, in order to study subclasses of NC1 ,
we need a more restrictive variant of uniformity.
It turns out that a good choice is DLOGTIME uniformity:

1. It is restricted in the sense that constructing the circuit is very easy.


In particular, it allows us to distinguish subclasses of NC1 .

2. It yields circuit complexity classes equal to FO and FOM. So it is


robust.

3. Many ptime-u or logspace-u circuits are actually DLOGTIME uniform.


130 20. Circuits and first-order logic

Of course, in logarithmic time, we cannot construct a circuit of poly-


nomial size. But we are able to answer questions about specific gates. To
make this more precise, we need the following definition.
Definition 20.7. The connection language of a family C = (Cn )n∈N of
circuits is the set of all tuples z = ht, a, b, yi, such that a and b are numbers
(in binary) of gates of Cn such that b is an input for a and gate a has type
t. The string y is arbitrary such that the whole string z has length n.
In time logarithmic in n (which is by construction also the input length),
we are able to read the relevant parts of the instance ht, a, b, ni. (We will
see that the necessary steps can be performed in logarithmic time below.)
Let us make more precise what we mean by deterministic logarithmic
time: A log-time Turing machine has a read-only input tape, a constant
number of work-tapes, and a read-write address tape. The address tape is
used to select bits from the input. On a given time step, the Turing machine
has access to the input bit specified by the contents of its address tape. If
the number on the address tape is too large, the Turing machine will get
the information that this is the case.
Deterministic logarithmic time Turing machines look quite limited at
first glance, but they can perform some non-trivial basic tasks:
1. They can determine the length of their input.
2. They can add and subtract numbers of O(log n) bits.
3. They can compute the logarithm of numbers of O(log n) bits.
4. They can decode simple pairing functions.
This suffices to recognize the relevant parts of connection languages.
Exercise 20.8. Prove that deterministic log-time Turing machines can in-
deed do what we claimed above.
We denote by DLOGTIME the set of languages that can be decided in
deterministic logarithmic time.
Definition 20.8. A family C = (Cn )n∈N of circuits is DLOGTIME uni-
form (DLOGTIME-u), if the connection language of C can be decided in
DLOGTIME.
In the following, we are mainly concerned with AC0 and TC0 . If nothing
else is said, we assume DLOGTIME uniformity. AC0 contains all languages
that can be decided by DLOGTIME-u families of circuits with unbounded
fan-in, constant depth, and polynomial size. TC0 is defined similarly. The
exception is that we also have threshold gates of unbounded fanin. A thresh-
old gate of m inputs outputs a 1 if and only if at least d(m + 1)/2e of its
inputs are 1. (This means, more than half of its inputs must be 1.)
20.4. Circuits vs. logic 131

20.4 Circuits vs. logic


The main result of this section is that AC0 = FO and TC0 = FOM. To
show this, we first prove that our uniformity condition is restricted enough
to allow for “construction in FO”.

Lemma 20.9. DLOGTIME ⊆ FO.

Proof. Let M be a DLOGTIME Turing machine with k work tapes.


We will have to write down a first-order sentence ϕ such that for all input
strings x, M accepts x if and only ϕ is satisfied by X.
The main idea is simple: Since the machine runs in logarithmic time,
we can encode its behavior into a constant number of variables. (Recall
that a variable can hold values between 0 and n − 1, thus log n bits. Using
BIT, we can specify individual bits.) Each step t is described by a constant
number of bits: M ’s state qt , the k symbols w1 , . . . , wk that M writes on its
work tapes, the k directions d1 , . . . , dk to which M moves its head on tape
1, . . . , k, respectively, as well as the position It of the input head, which is
controlled by the address tape.
The sentence ϕ begins with existential quantifiers over the variables
z1 , . . . , zc (c is a suitable constant) that describe the behavior of M . This
means that ϕ = ∃z1 ∃z2 . . . ∃zk ψ(z1 , . . . , zk ) for some first-order sentence ψ.
The sentence ψ must assert that z = (z1 , . . . , zk ) forms a valid accepting
computation of M . To do this, we define two first-order formulas: C(p, t, a)
is true if and only if the contents of cell p at time t is a. P (p, t) is true if
and only if the appropriate head is at position p at time t. (The position p
also contains the information on which tape the position is.) Given P and
T , we can write ψ as follows. Let us fix a time t.

1. We have to assert that It is correct for all t. To do this, we have a


variable y with an existential quantifier, and we condition y to be equal
to the contents of the address tape at time t, which can be verified
using C. Then we set It = C(y).

2. The step from time t to time t + 1 should be according to M ’s finite


control. The step depends on It , the current state qt , k work tape
symbols (a current tape symbol is an a such that there exists a p with
C(p, t, a) ∧ P (p, t)).

Using P , we can write C: The contents of cell p of tape i at time t is


wi,t0 , where t0 is the most recent visit of head i to position p. If M has not
yet visited p, then wi,t is the blank symbol.
Finally, to get P , we have to sum up O(log n) values of dt0 for t0 < t.
This can be done as we have already seen.

Theorem 20.10. AC0 = FO and TC0 = FOM.


132 20. Circuits and first-order logic

Proof. We only prove the first statement. The second follows with
an almost identical proof, where we have to replace the quantifier M by
threshold gates and vice versa. (Since threshold gates can take more than
n inputs, we need Lemma 20.6 for TC0 = FOM.)
Let us first prove FO ⊆ AC0 . Let ϕ be any first-order formula of quantifier
depth d. Without loss of generality, we can assume that ϕ is in prenex
normal form. This means that ϕ = Q1 y1 Q2 y2 . . . Qd yd : ψ(y1 , . . . , yd ), where
ψ is quantifier-free and Q1 , . . . , Qd are any quantifiers.
For such a ϕ, there is a canonical constant-depth circuit Cn for every n.
A tree of fan-out n and depth d corresponds to the quantifiers. At each leaf,
there is a constant-size circuit corresponding to ψ(y1 , . . . , yd ). This circuit
consists of Boolean operators, input nodes, and constants corresponding to
the value of atomic formulas (=, ≤, and BIT), where the constants depend
on y1 , . . . , yd .
What remains to be done is to show that this canonical circuit family is
indeed DLOGTIME uniform. The address of a node will consist of log n bits
for each quantifier (this is needed to specify the respective value of yi ) as well
as a constant number of bits specifying which node of the respective copy
of the constant-size circuit we are considering. In order to answer queries
for the connection language, our DLOGTIME machine has to be able to (i)
compare O(log n) bit numbers and do arithmetic with them (like dividing
them into their parts for the several quantifiers) and (ii) compute from the
numbers y1 , . . . , yd to which input nodes the respective constant-size circuit
has to be connected. The latter is possible because the DLOGTIME machine
can perform BIT and all the other operations on O(log n) bit numbers that
a first-order formula can do. It is not difficult but tedious to work out the
details.
To prove AC0 ⊆ FO, we first observe that the connection language is in
FO by Lemma 20.9. Let C = (Cn )n∈N be a DLOGTIME uniform family
of constant-depth, polynomial-size circuits. Since Cn is of polynomial size,
we can refer to its nodes by tuples of (a constant number of) variables. In
order to devise a first-order formula for Cn , we will express the predicate
AccGate(a), which is true if and only if gate a accepts, i.e., outputs 1. If we
have AccGate, then we just have to evaluate it for the output gate.
To get AccGate, we define inductively predicates AccGated (a) with the
meaning “gate a on level d outputs 1”. For level 0, AccGate0 (a) is true if
and only if (i) a is a number of a gate on level 0 and (ii) a is connected to
some input xi = 1.
To express AccGated (a), we have to evaluate if a is a gate on level d
in the first place. Since d is constant, this can be expressed. If gate a is
indeed on level d, then we proceed as follows. If a is a NOT gate, then
AccGated (a) = ¬ AccGated−1 (b) for some gate b on level d − 1. (We can
easily find out which gate b we need using the connection language, which is
in FO by assumption.) If a is an AND gate, we use a ∀b : ξ(b) to range over
20.4. Circuits vs. logic 133

all other gates b. The expression ξ(b) is true if and only if (i) b is not a gate
on level d − 1, or (ii) b is not a predecessor of a, or (iii) b is a predecessor of
a and AccGated−1 (b) = 1.

Exercise 20.9. Let LH be the class of languages that can be decided by


an alternating log-time Turing machine. (Such machines work similar to
deterministic log-time Turing machines, except that they are alternating.)
Show that FO = LH = AC0 . (This requires only little extra work, given
that we know FO = AC0 and DLOGTIME ⊆ FO.)
Thus, DLOGTIME-u AC0 is indeed a very robust class: We can define
it in terms of circuits, logic, or using Turing machines.

Exercise 20.10. Construct family of circuits of polynomial size and depth


O(log n/ log log n) for parity.
Note: This is asymptotically optimal (see Corollary 12.7 of the lecture
notes of “Computational Complexity Theory”).
21 Threshold circuits for division

For addition and multiplication, and also for subtraction, it is not too hard
to come up with AC0 circuits. But division seems to be much harder. It
is fairly easy to see that division can be done in polynomial time. But for
a long time, it was unknown if division is possible in logarithmic space. In
the 1980s, it has been shown that there are polynomial-time uniform TC0
circuits. (We will define below, what TC0 means.) However, even this does
not prove that division can be done in logarithmic space. This was shown
by Chiu et al. [CDL01], who proved that division lies in log-space uniform
TC0 (which is a subclass of L). Finally, it was shown that division is in
DLOGTIME uniform TC0 , which is optimal: the problem is complete for
DLOGTIME uniform TC0 . (We will also define below what DLOGTIME
uniform means. Let us just remark that it is even weaker than log-space
uniform.)
The goal of this chapter is to prove that division is in DLOGTIME
uniform TC0 = FOM.
We will do so in three steps: First, we reduce division to iterated multi-
plication. Second, we will introduce a new predicate POW (see below). Let
us write FOMP = FOM[BIT, <, POW] and FOP = FO[BIT, <, POW] for
short. We will show that division can be described in FOMP. This places
division in L since FOM = TC0 ⊆ L and POW can easily be seen to be in L.
Third, we will show that in fact POW can also be expressed in FOM, which
places division in FOM = TC0 .
In the remainder of this chapter, variables in capital letters, such as X
and Y , denote numbers of polynomial lengths. We call them also long num-
bers. Small letters represent short numbers, which are of length O(log n).
If there are numbers of length poly(log n), we will mention their lengths
explicitly.

21.1 Division, iterated multiplication, and powering


Division is closely related to two other problems: iterated multiplication and
powering:

Division: The predicate Division(X, Y, i) is 1 if and only if the ith bit of


bX/Y c is 1.

Powering: Powering(X, k, i) is 1 if and only if the ith bit of X k is 1. (Note


that X has n bits and k has length O(log n).

134
21.2. Division in FOM + POW 135

Iterated multiplication: ItMult(X1 , X2 , . . . , Xn , i) = 1 if and only if the


ith bit of nj=1 Xj is a 1.
Q

If we want to compute X/Y and have 1/Y with sufficient precision, then
division reduces to multiplication. And we already know how to multiply in
TC0 . Now observe that

X 1
(1 − α)i =
α
i=0

for α ∈ (0, 1). If we assume further that α ∈ [1/2, 1), then we have
n
1 X
= (1 − α)i + O(2−n ) .
α
i=0

Now let j = dlog2 Y e be roughly the number of bits of Y . Then use 2−j Y ∈
[ 12 , 1) as α in the preceding equation. This yields
n
X X
2nj · =X· (1 − 2−j Y )i + O(X · 2nj−n )
Y
i=0
n
X
=X· (2j − Y )i · (2j )n−i + O(X · 2nj−n ) . (21.1)
i=0

This is equivalent to
n
X X
=X· (2j − Y )i · 2−ij + O(X · 2−n ) .
Y
i=0

Thus, X/Y is approximated within an additive error of O(X2−n ). If we


can evaluate the sum in (21.1), then we can proceed as follows: We calcu-
late X/Y with a precision of O(1), and then we compute the exact value
of bX/Y c by hand. (There is only a constant number of candidates and
multiplication can be done in FOM.)
So far, we have reduced division to computing an iterated sum of powers.
Of course, powering reduces to iterated multiplication. Thus, we mainly
focus on iterated multiplication in the following.

21.2 Division in FOM + POW


The central tool for iterated multiplication, and thus also for division, is
the Chinese remainder representation (CRR): An n-bit number is uniquely
determined by its residues modulo polynomially prime numbers, each of
length O(log n). (There are enough such primes.)
Assume
Qk that we are given primes m1 , . . . , mk , each a short number, and
let M = i=1 mi be their product. Any number X ∈ {0, 1, . . . , M − 1} can
136 21. Threshold circuits for division

be represented uniquely as (x1 , . . . , xk ) with X ≡ xi (mod mi ) for each i.


Let Ci = M/mi , and let hi be the inverse of Ci modulo mi , i.e., Ci hi ≡ 1
(mod mi ). Then, for any i, we have
k
X
X≡ xi hi Ci (mod M ) .
i=1

Even more,
k
X
X= xi hi Ci − rM
i=1

for some number r = rankM (X), called the rank of X with respect to M .
Note that r is a short number, equal to the sum of the integer parts of
xi hi Ci /M = xi hi /mi , which is in {0, 1, . . . , mi − 1}.
What does CRR help? We have reduced iterated multiplication to iter-
ated multiplication of short numbers, which is considerably easier.
The algorithm for iterated multiplication is now easy to describe:

1. Convert the input from binary to CRR.

2. Compute the iterated product in CRR.

3. Convert the answer from CRR back to binary.

As a tool, we assume that the following predicate is given:

POW(a, i, b, p) ≡ ai ≡ b (mod p) .

(All four numbers here are short numbers.)

21.2.1 The second step


If p is a prime, then the multiplicative group Z?p is cyclic and of order p − 1.
This allows us to take discrete logarithms: First, we find g, the smallest
generator of Z?p : g is the smallest number with g i 6≡ 1 (mod p) for 0 < i <
p − 1. This yields a FOP formula GEN(g, p) that yields true if and only if g
is the smallest generator of Z?p . If g is a generator, then g i ≡ a (mod p) has
a unique solution for every a. Using POW and GEN, we can take discrete
logarithms:
GEN(g, p) ∧ POW(g, i, a, p)
is a FOP predicate that states that i is the discrete logarithm of a.
Now, if the input is in CRR, then iterated multiplication simply reduces
to iterated addition: We just have to add the discrete logarithms. Since
iterated addition is in FOM, this would put iterated multiplication in FOMP.
This gives us the second step of our algorithm. However, we still have to be
able to perform the first and third step of the algorithm.
21.2. Division in FOM + POW 137

21.2.2 The first step


The first step of our algorithm is easy to accomplish in FOMP, as we see
from the following lemma.

Lemma 21.1. If X, m1 , . . . , mk are given in binary and X < M = ki=1 mi ,


Q
then we can compute CRR(X) = (x1 , . . . , xk ) in FOMP.

Proof. For each mi and each j < n, we can compute 2j mod mi using
POW. In this way, we obtain values yi,j ∈ {0, 1, . . . , mi − 1} for 1 ≤ i ≤ k
and 0 ≤ j < n. Then we add yi,1 +. . .+yi,n−1 using iterated addition (which
is in FOM) and take the sum modulo mi to obtain xi .
In the lemma above, the prime numbers m1 , . . . , mk are given. You
might wonder how we actually get them since, of course, they are not part
of the input. This is not very difficult, but we will nevertheless deal with it
later on.

21.2.3 The third step


We will prove that we can perform the third step of our algorithm in FOMP
by a series of lemmas. First, we observe that we, in fact, already know how
to divide, albeit only by short primes.

Lemma 21.2. Let p be a short prime. Then the binary representation of


1/p can be computed to nO(1) bits of accuracy in FOP.

Proof. We can assume that p is odd. Let s ∈ N be arbitrary. We write


2s = ap + b with b = 2s mod p. Then the sth bit of the binary expansion
of 1/p is equal to the low-order bit of a: We are interested in the low-order
s
bit of b2s /pc. Now we have 2p = a + pb . Since b ∈ {0, 1, . . . , p − 1}, we have
bb/pc = 0. Thus,
 s  
2 b
= a+ =a.
p p
We observe that ap + b = 2s is even. Thus, because p is odd, b mod 2 =
a mod 2. Therefore, the low-order bit of b is the sth bit of 1/p.
In binary representation, it is very easy to test if a number is smaller
than another. In Chinese remainder representation, this is more difficult,
although it can be done.

Lemma 21.3. Let X, Y ∈ {0, 1, . . . , M − 1} given in CRRM form. Testing


whether X < Y is in FOMP.

Proof. Of course, X < Y if and only if X/M < Y /M . Thus, it suffices


to show that we can compute X/M to polynomially many bits of accuracy.
138 21. Threshold circuits for division

Pk
Recall that X = i=1 xi hi Ci − rankM (X)M and that Ci = M/mi .
Thus,
k
X X xi hi
= − rankM (x) .
M mi
i=1

The numbers x1 , . . . , xk are given to us since CRRM (X) is part of the in-
put. The numbers C1 , . . . , Ck can be computed in FOMP: For Ci , we add
the discrete logarithms of mj for j 6= i. And h1 , . . . , hk are the inverse of
C1 , . . . , Ck , respectively, which can also easily be computed in FOMP.
By Lemma 21.2, we can compute each summand can be computed to
polynomially many bits of accuracy. We know that iterated addition is in
FOM, thus we can compute polynomially many bits of the binary represen-
tation of
k
X xi hi X
= + rankM (X) .
mi M
i=1

Since rankM (X) is an integer, X/M is just the fractional part of this sum,
of which we have sufficiently many bits.
A useful consequence of us being able to compare numbers in CRR is that
it allows
Q us to change the CRR basis: If we have primes primes p1 , . . . , p`
with `i=1 pi = P , then we can get CRRP (X) from CRRM (X). The crucial
ingredient for this is that, given CRRM (X) we can compute X mod p for a
short prime p.

Lemma 21.4. Given CRRM (X) and a short prime p,, we can compute
X mod p in FOMP.

Proof. If p = mi for some i, then we know the answer from the input.
Thus, we can assume that p does not divide M . Let P = M p. If we can
compute CRRP (X), then this gives us X mod p.
We turn to brute-force: We try all p ≤ poly(n) possible values q for
X mod p. This gives us the CRRP of numbers X0 , . . . , Xp−1 . One of these
numbers is X. Moreover, X is the only number among X0 , . . . , Xp−1 that
is smaller than M . (This follows that numbers smaller than M differ in
CRRM and all X0 , . . . , Xp−1 have a unique representation in CRRP .)
We can compute CRRP (M ) by adding the discrete logarithms of the
primes m1 , . . . , mk modulo p. We carry out comparisons with M in CRRP .
Thus, we can compute X mod p by finding the unique Xi that is smaller
than M . All of this can be done in FOMP.
The last lemma towards implementing the third step of our division
algorithm is dividing by products of short primes.

Lemma 21.5. Let b1 , . . . , b` be distinct short primes, let B = `i=1 bi , and


Q
let CRRM (X) be given. Then we can compute CRRM (bX/Bc) in FOMP.
21.2. Division in FOM + POW 139

Proof. We can assume that B divides M . Otherwise we apply Lemma 21.4


and extend our CRR basis. Let M = BP . By dropping the primes from P
from our basis, we can compute CRRB (X mod B) in FOMP. From this, we
can compute CRRM (X mod B) by extending the basis again according to
Lemma 21.4. Finally, we compute X − (X mod B) = B · bX/Bc in CRRM .
By assumption, B and P are relatively prime. Thus, there exists a
B −1 with BB −1 ≡ 1 (mod P ). We can find CRRP (B −1 ) in FOMP: This
is finding the inverse of each component of the CRRP representation with
respect to each component of P using discrete logarithms. Now we have
   
−1 X X
B B ≡ (mod P ) (21.2)
B B

in CRRP representation. The final step is to observe that X < M implies


bX/Bc < P . Thus, we can extend the basis to get the CRRM representation
of bX/Bc.
Finally, we are able to prove that also the third step of our algorithm,
converting CRR numbers into binary representation, can be expressed in
FOMP.

Theorem 21.6. Given CRRM (X), with 0 ≤ X < M , we can compute the
binary representation of X in FOMP.

Proof. It suffices to compute bX/2s c for any s. Then the sth bit of X
is given as bX/2s c − 2 · bX/2s+1 c. We get this number in CRRM , but it is
easy to distinguish 0 from 1, even in CRR.
First, we create numbers A1 , . . . , As . Each Aj is the product of polyno-
mially many short distinct primes that do not divide M , and want Aj > M .
Recall that M = ki=1 mi for short primes. Let m1 < m2 < . . . < mk
Q
be the first k odd primes. (We can assume this without loss of generality.)
Then we set Aj = ki=1 pjk+1 , where p` is the `th smallest prime number.
Q
The prime number theorem guarantees that there are enough short primes
for our purposes, and these Aj fulfill our conditions. Furthermore, a list of
all (short) primes smaller than poly(n) can easily be computed by a TC0
circuit, this also in FOM. Thus, we know how to get these primes.
1+A
Assume that the Aj are very large. Then 2Ajj ≈ 12 . Thus, X/2s ≈
1+A
X sj=1 2Ajj . It might look as if we are complicating the problem, but it
Q

turns out that, on the one hand, this quantity involving the Aj s is easier to
s
compute and, on the Qs other hand, it is precise enough to give us bX/2 c.
Let P = M · j=1 Aj . We extend the basis to get CRRP (X). Since
M < Aj for all j, we have
s
1 s
 
Y 1 + Aj
< 1+ .
Aj M
j=1
140 21. Threshold circuits for division

Furthermore, for every K ≥ 1,


s
1 s 1 M ·(K+1)
   s  
1+ < exp < 1+
M M K
M
Setting K = s+1 and exploiting that s  M , this yields
s
Y 1 + Aj s+1
<1+ . (21.3)
Aj M
j=1

Using Lemma 21.5, we can compute the CRRP of


$ Qs 1+Aj %  
j=1 2 X
Q = X · Qs ≥ s .
j=1 Aj 2

By (21.3), we have
Qs 1+Aj
2s
   
j=1 2 X s+1 X X
X · Qs < s · 1+ < s · 1+ < s +1.
j=1 Aj 2 M 2 X 2

Thus, Q ∈ {bX/2s c, bX/2s c + 1}. We determine which one of Q, Q − 1 is


correct by checking whether Q2s > X (using CRRP ).

Exercise 21.1. Using Lemma 21.1 and Theorem 21.6, one can convert in
FOMP numbers from any base to any other base.
Prove this!
Corollary 21.7. Division, iterated multiplication, and powering can be ex-
pressed in FOMP.

21.3 POW is in FO
21.3.1 Two special cases in FO
The first step towards proving POW ∈ FO will be to show POW as well as
division and iterated multiplication of very short numbers can be performed
in FO. We start by showing that this is true for POW.
Lemma 21.8. POW(a, r, b, p), where a, r, b, and p have O(log log n) bits
each, is in FO.
Proof. Let us assume that a, b, p, and r have k log log n bits each. We
can compute ar mod b in FO by using repeated squaring. To do this, we
consider the sequence r0 , r1 , . . . , rk log log n of exponents with ri = br/2i c.
Thus, r0 = r and rk log log n = 0. Moreover, ri = 2ri+1 or ri = 2ri+1 + 1
depending on the corresponding bit of r.
21.3. POW is in FO 141

Now we compute all values ai = ari mod p. We have to check that


ak log log n = 1 and ai = a2i+1 mod p or ai = a2i+1 mod p, depending on the
corresponding bit of r. Each check can be performed easily in FO. Since
each ai needs at most k log log n bits and there are k log log n such numbers,
all ai easily fit into a single variable for sufficiently large n. Thus, we can
perform all checks in parallel, which completes the proof of the lemma.
Now we use that POW for very short numbers can be expressed in FO
to show that division and iteration multiplication of short numbers can be
done in FO as well.

Theorem 21.9. ItMult and Division, where the inputs have (log n)O(1) bits,
are in FO.

Proof. We know from Corollary 21.7 that division and iterated mul-
tiplication with inputs of length r can be done in FOMP over the universe
0, . . . , r −1. We set r = (log n)k . Then Division and ItMult can be expressed
in FOMP over the universe 0, . . . , (log n)k −1. We will show that such FOMP
formulas can be expressed in FO over the universe 0, 1, . . . , n − 1.
Note that all uses of POW in these formulas are called with inputs
of O(log(log n)k ) = O(log log n) bits. Thus, we can replace POW by FO
formulas according to Lemma 21.8. In the same way, the threshold quantifier
can be replaced by a FO formula since the range of the quantified variables
is 0, . . . , (log n)k − 1. This is because BSUM can be expressed in FO as
long as there are at most (log n)O(1) ones to count, which is the result of
Exercise 21.2, which completes the proof.

Exercise 21.2. Prove the following: In FO over the universe 0, 1, . . . , n − 1,


we can count the number of 1s in binary strings of length (log n)O(1) . Even
more, we can even count the number of 1s in a binary string of length n if
this number is at most (log n)O(1) .

Remark 21.10. Beyond being a tool for showing that division is in FOM,
this theorem is also interesting in its own right: It gives tight bounds for
the size of the numbers for which Division and ItMult are in FO: One the
one hand, the theorem shows that this is the case for numbers consisting of
(log n)O(1) bits. On the other hand, we have FO = AC0 . And any circuit √
2d
of constant depth d that computes parity of m √bits must be of size 2Ω( m) .
2d
For parity of m bits being in FO, we need 2Ω( m) ≤ poly(n), which implies
m ≤ poly(log n).

21.3.2 POW is in FO
What remains to be done is to show that POW is in FO. In order to prove
this, we first show something slightly more general: powering in groups of
142 21. Threshold circuits for division

order n is FO Turing reducible to finding the product of log n elements of


this group.
This needs some clarification: First, FO Turing reducible essentially
means that we are allowed to use a predicate for the product of log n el-
ements of this group. Second, we restrict ourselves to groups that can be
represented in FO. This means that group elements can be labeled by num-
bers 0, . . . , n − 1 such that the product operation is FO definable.

Exercise 21.3. Show that for any group that can be represented in FO, the
inverse and the neutral element can be defined in FO.

Lemma 21.11. Finding small powers in any group of order n, i.e., comput-
ing ar for a group element a and a small number r, is FO Turing reducible
to finding products of log n elements.

Proof. Our goal is to compute ar . The way how we do this is to compute


group elements a1 , . .Q . , ak as well as numbers u, u1 , . . . , uk for k = o(log n)
such that a = a · ki=1 aui i . In addition, we want ui < 2 log n and u <
r u

2(log n)2 .
Given these elements and numbers, we can easily compute ar : aui i for
1 ≤ i ≤ k as well as au amounts to computing products of small numbers of
group elements: For aui i , this follows from ui < 2 log n. And for au , we use
two rounds of multiplying at most 2 log n elements. The result ar is then
also just a product of k + 1 numbers.
We will choose the group elements ai to be d-th roots of unity for a small
prime d. The numbers ui can then be computed using Chinese remaindering.
Our first step consists of finding a CRR basis D consisting of primes, each
of which is at most O(log n). More precisely, we choose a set k = o(log n)
primes d1 , . . . , dk such that di < 2 log n for each
Qk i and each di is relatively
prime to n. Furthermore, we want n < D = i=1 di < n . We can compute2

these di s by a FO formula that finds the first D > n that is square-free,


relatively prime to n such that all its prime factors are smaller than 2 log n.
To compute the number k and the relation between the di and i, we count,
for each prime p0 , the numbers of primes dividing D that are smaller than
p0 . We can do this using BSUM.
The second step consists of computing ai = abn/di c . We do this as
follows: First, we compute a−1 (see Exercise 21.1). Second, we compute
ni = n mod di . Third, we compute a−ni by multiplying ni copies of a. We
can do this by the assumption of this lemma because ni < di < 2 log n. Now
we come to computing abn/di c . To do this, we observe that
 d i
abn/di c = adi bn/di c = an−ni = a−ni .

The last equality holds because an = 1 in any group of order n.


21.3. POW is in FO 143

Now let d−1


i be the multiplicative inverse of di , i.e., there exists a number
m with di d−1
i = mn + 1. There exists exactly one group element x with
d −n
x = a , and this group element x is the one we are looking for: We have
i i

 d−1
i d−1
x = xmn+1 = xdi = a−ni i .

Thus, we can express ai as

∃ai xdi = a−ni .

Note that we can compute xdi using multiplication of O(log n) numbers, but
we cannot compute abn/di c directly since it might happen that d−1
i is not
O(log n).
Our third step consists of finding the exponents u, u1 , . . . , uk . By the
choice of the ai in the second step, we have

Pk
au1 1 · . . . · auk k = a i=1 ui bn/di c
.

Thus, we have to choose u1 , . . . , uk such that

k  
X n
u≡r− ui · (mod n) . (21.4)
di
i=1

In order to get a small value for u, we have to choose ki=1 ui · b dni c close
P
to r. To achieve this, we approximate r as a linear combination of bn/di c:
Compute f = brD/nc. (We can compute this since r has only O(log n) bits
by Theorem 21.9.) Let Di = D/di . Then we compute ui = f Di−1 mod di .
This gives us
k
X
ui Di ≡ f (mod D) .
i=1

Let m be a number that satisfies ki=1 ui Di = f +mD. Now we can calculate


P
u from u1 , . . . , uk according to (21.4). (This is a sum of k short numbers,
which can be computed in FO since k = o(log n).)
What remains to be done is to show that u < 2(log n)2 . To show this,
144 21. Threshold circuits for division

we calculate the difference between r and the sum of the ui bn/di c:


k   k k   
X n X ui n X ui n n
ui · = − − ui ·
di di di di
i=1 i=1 i=1
k k   
n X X n n
= · ui Di − ui · −
D di di
i=1 i=1
k   
n X n n
= · (f + mD) − ui · −
D di di
i=1
  k   
n rD X n n
= · + nm − ui · −
D n di di
i=1
   k   
n rD rD X n n
=r− − + nm − ui · − .
D n n di di
i=1

This yields
   Xk   
n rD rD n n
u= − + ui · − .
D n n di di
i=1

For any number x, we have x − bxc ∈ [0, 1). Furthermore, n/D < 1 by
our choice of D, ui < 2 log n for each i, and k = o(log n). Thus, we have
u < 2(log n)2 , which finishes the proof.
Now we note that, first, FO is closed under polynomial changes of the
input size and, second, the product of log(nk ) = k log n groups elements is
FO Turing reducible to the product of log n groups elements. This yields
that finding powers in any group of order nk is FO Turing reducible to finding
the product of log n elements.
We now apply the above result that powering reduces to iterated mul-
tiplication to the groups of integers modulo p for a prime p = O(nk ). The
multiplicative group Z?p contains the integers 1, . . . , p − 1. Multiplication in
Z?p is FO-definable since multiplication is FO-definable.
For evaluating POW(a, r, b, p), we proceed now as follows: If a = 0, then
we just have to check whether also b = 0. Otherwise, we can find ar in Z?p ,
provided that the product of log n group elements can be computed with
inputs of size log2 n. However, this can be done according to Theorem 21.9.
This immediately yields the main results of this section and of this chapter.

Theorem 21.12. POW is in FO.

Theorem 21.13. Division is in FO.


Bibliography

[ACG+ 99] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-


Spaccamela, and M. Protasi. Complexity and Approximation.
Springer, 1999.

[AG00] Carme Alvarez and Raymond Greenlaw. A compendium of


problems complete for symmetric logarithmic space. Comput.
Complexity, 9:73–95, 2000.

[ALM+ 98] Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan,
and Mario Szegedy. Proof verification and hardness of approx-
imation problems. J. ACM, 45(3):501–555, 1998.

[BDCGL92] Shai Ben-David, Benny Chor, Oded Goldreich, and Michael


Luby. On the theory of average case complexity. J. Comput.
Syst. Sci, 44(2):193–219, 1992.

[BGS98] Mihir Bellare, Oded Goldreich, and Madhu Sudan. Free bits,
PCPs, and nonapproximability—towards tight results. SIAM
J. Comput, 27(3):804–915, 1998.

[BI90] David A. Mix Barrington and Neil Immerman. On uniformity


within NC1 . J. Comput. Syst. Sci, 41:274–306, 1990.

[Big93] Norman Biggs. Algebraic graph theory. Cambridge University


Press, second edition, 1993.

[BT06] Andrej Bogdanov and Luca Trevisan. Average-case complex-


ity. Foundations and Trends in Theoretical Computer Science,
2(1):1–106, 2006.

[CDL01] Andrew Chiu, George I. Davida, and Bruce E. Litow. Division


in logspace-uniform NC1 . RAIRO Theoretical Informatics and
Applications, 35(3):259–275, 2001.

[Din07] Irit Dinur. The PCP theorem by gap amplification. J. ACM,


54(3), 2007.

[FKN02] E. Friedgut, G. Kalai, and A. Naor. Boolean functions whose


Fourier transform is concentrated on the first two levels. Adv.
in Applied Math., 29:427–437, 2002.

145
146 BIBLIOGRAPHY

[GLST98] Venkatesan Guruswami, Daniel Lewin, Madhu Sudan, and


Luca Trevisan. A tight characterization of NP with 3-query
PCPs. In Proc. 39th Ann. IEEE Symp. on Foundations of
Comput. Sci. (FOCS), pages 8–17, 1998.

[Gur91] Yuri Gurevich. Average case completeness. J. Comput. Syst.


Sci, 42(3):346–398, 1991.

[Hås99] Johan Håstad. Clique is hard to approximate within n1− . Acta


Mathematica, 182:105–142, 1999.

[HLW06] S. Hoory, N. Linial, and A. Wigderson. Expander graphs and


their applications. Bull. Amer. Math. Soc., pages 439–561,
2006.

[Imm99] Neil Immerman. Descriptive Complexity. Springer, 1999.

[Imp95] Russell Impagliazzo. Hard-core distributions for somewhat


hard functions. In Proc. 36th Ann. IEEE Symp. on Founda-
tions of Comput. Sci. (FOCS), pages 538–545, 1995.

[KS03] Adam R. Klivans and Rocco A. Servedio. Boosting and hard-


core set construction. Machine Learning, 51(3):217–238, 2003.

[Lev86] Leonid A. Levin. Average case complete problems. SIAM J.


Comput, 15(1):285–286, 1986.

[NTS95] Noam Nisan and Amnon Ta-Shma. Symmetric logspace is


closed under complement. Chicago Journal of Theoretical Com-
puter Science, 1995.

[O’D04] Ryan O’Donnell. Hardness amplification within NP. J. Com-


put. Syst. Sci, 69(1):68–94, 2004.

[Rei08] Omer Reingold. Undirected connectivity is in log-space. J.


ACM, 55(4), 2008.

[RVW02] Omer Reingold, Salil Vadhan, and Avi Wigderson. Entropy


waves, the zig-zag graph product and new constant degree ex-
panders and extractors. Annals of Mathematics, 155(1):157–
187, 2002.

[Sen07] Stefan Senitsch. Ein kombinatorischer Beweis für das PCP-


Theorem. Diplomarbeit, TU Ilmenau, 2007.

[SS96] Michael Sipser and Daniel Spielman. Expander codes. IEEE


Trans. Inform. Theory, 42:1710–1722, 1996.

[Vaz01] Vijay V. Vazirani. Approximation Algorithms. Springer, 2001.


BIBLIOGRAPHY 147

[vN28] John von Neumann. Zur Theorie der Gesellschaftsspiele. Math-


ematische Annalen, 100:295–320, 1928.

[VV86] Leslie G. Valiant and Vijay V. Vazirani. NP is as easy as


detecting unique solutions. Theoret. Comput. Sci., 47(1):85–
93, 1986.

[Wan97] Jie Wang. Average-case intractable NP problems. In Ding-Zhu


Du and Ker-I Ko, editors, Advances in Languages, Algorithms,
and Complexity, pages 313–378. Kluwer, 1997.

[Yao82] A. C. Yao. Theory and applications of trapdoor functions. In


Proc. 23rd Ann. IEEE Symp. on Foundations of Comput. Sci.
(FOCS), pages 80–91, 1982.

Вам также может понравиться