Вы находитесь на странице: 1из 4

Advanced Algorithms Course.

Lecture Notes. Part 11

Chernoff Bounds
This is a very useful general tool to bound the probabilities that certain
random variables deviate much from their expected values. Here we will
derive one version of this bound and then apply it to a simple load balancing
problem. (You do not need the proof when you apply the bound, but why
not see it once? It is pretty nice and elegant.)
Let X be sum of n independent 0-1 valued random variables Xi taking
value 1 with probability pi . Clearly E[X] = i pi . For any E[X] and
P

> 0 we ask how likely it is that X > (1 + ), in other words, that X


exceeds the expected value by more than 100 percent.
Since function exp is monotone, X > (1 + ) is equivalent to exp(tX) >
exp(t(1 + )) for any t > 0. Exponentiation and this free extra parameter t
seem to make things more complicated, but we will see very soon why they
are useful.
For any random variable Y and any number > 0 we have E[Y ]
P r[Y > ]. This is known as Markovs inequality and follows directly from
the definition of E[Y ]. For Y := exp(tX) and = exp(t(1 + )) this yields
P r[X > (1 + )] exp(t(1 + ))E[exp(tX)].
Due to independence of the terms Xi we have
P Q Q
E[exp(tX)] = E[exp( i tXi )] = E[ i exp(tXi )] = i E[exp(tXi )]
= i (pi et + 1 pi ) = i (1 + pi (et 1)) i exp(pi (et 1))
Q Q Q

= exp((et 1) i pi ) exp((et 1)).


P

This gives us the bound exp(t(1+)) exp((et 1)). Wecan arbitrarily



choose t. With t := ln(1 + ) our bound reads as (1+)e (1+ ) .
Using e 1+ one can see that the base is smaller than 1. For any fixed
deviation the base is constant, and the bound decreases exponentially in
. The more independent summands Xi we have in X, the smaller is the

1
probability of large deviations. A direct application of the simple Markov
inequality would be much weaker (therefore the detour via the exponential
function).
In order to show at least one application, consider the following simple
load balancing problem: m jobs shall be assigned to n processors, in such a
way that no processor gets a high load. In contrast to the Load Balancing
problem we studied earlier, no central authority assigns jobs to processors,
but every job chooses a processor by itself. We want to install a simple
rule yet obtain a well balanced allocation. (An application is distributed
processing of independent tasks in networks.) To make the rule as light-
weight as possible, let us choose for every job a processor randomly and
independently. The jobs need not even talk to each other and negotiate
places. How good is this policy?
We analyze only the case m = n. What would you guess: How man
jobs end up on the same processor? To achieve clarity, consider the random
variable Xi defined as the number of jobs assigned to processor i. Clearly
E[Xi ] = 1. The quantity we are interested in is P r[Xi > c], for a given
bound c. Since Xi is a sum of independent 0-1 valued random variables
(every job chooses processor i or not), we can apply the Chernoff bound.
With = c 1 and = 1 we get immediately the bound ec1 /cc < (e/c)c .
But this is only the probability bound for one processor. To bound the
probability that Xi > c holds for some of the n processors, we can apply the
union bound and multiply the above probability with n. Now we ask: For
which c will n(e/c)c be small?
At least, we must choose c large enough to make cc > n. As an auxiliary
calculation consider the equation xx = n. For such x we can say (1) x log x =
log n and (2) log x + log log x = log log n, we have just taken the logarithm
twice. Equation (2) easily implies log x < log log n < 2 log x. Division by
(1) yields 1/x < log log n/ log n < 2/x. In other words, xx = n holds for
some x = (log n/ log log n).
Thus, if we choose c := ex, our Chernoff bound for every single processor
simplifies to 1/xex < 1/(xx )2 = 1/n2 . This finally shows: With probability
1 1/n, each processor gets O(log n/ log log n) jobs. This answers our ques-
tion: Under random assignments, the maximum load can be logarithmic,
but it is unlikely to be worse.
For m = (n log n) or more jobs, the random load balancing becomes
really good. Then the load is larger than twice the expected value (log n)
only with probability below 1/n2 . Calculations are similar.

2
Helpful Input Structure
Small Vertex Covers
(The presentation of this topic differs somewhat from the book.)
The Vertex Cover problem in graphs is NP-complete, but if the graph
is already known (or expected) to have some vertex cover with a small
number k of nodes, we can still solve it exactly and efficiently in practice.
(Some motivations of this case are mentioned in class.)
Let n always denote the number of nodes in the given graph. A naive
way to find a small vertex cover is to test all subsets of k nodes exhaustively.
Elementary combinatorics tells us that this costs O(knk+1 /k!) time: Note
that O(kn) time is sufficient to test whether a given set of k nodes is a vertex
cover, and the other factor comes from nk . This time bound is feasible only


for very small k. The bad thing is that k appears in the exponent of n. It
would be much better to have a time bound of the form O(bk p(n)), where
b is a constant base, and p some fixed polynomial. (To get a feeling of the
tremendous difference you may try some concrete figures and compare the
naive time bound for Vertex Cover with the bounds we will obtain below.)
A problem with input length n and another input parameter k is said
to be in the complexity class XP if it can be solved in O(nf (k) ) time, where
f is any computable function. A problem with input length n and another
input parameter k is called fixed-parameter tractable (FPT) if it can be
solved in O(f (k) p(n)) time, where f is any computable function (usually
exponential) and p some polynomial. We may write O (f (k)) instead of
O(f (k) p(n)) if we want to suppress the polynomial factor and stress the
more important parameterized part of the complexity.
In the following we show that Vertex Cover is not only an XP problem
but an FPT problem. The basic algorithm is: Take an uncovered edge (i, j)
and put node i or node j in the solution. Repeat this step recursively in
both branches, until k nodes are chosen or all edges are covered.
Upon every decision (i or j) we create new branches, hence the whole
process has the form of a recursion tree that we call a bounded search tree.
Since at most k nodes of the graph are allowed in a solution, the tree has
depth at most k, thus at most 2k leaves and O(2k ) nodes. If some leaf
represents a vertex cover, we have found a solution, otherwise we know
that there is no solution. To bound the time complexity, it remains to
check how much time we need to process any node of the search tree: In a

3
simple implementation we may copy the whole graph, delete in one copy all
edges incident to i, and delete in one copy all edges incident to j (because
these edges are covered). The main work is copying. Here we observe that
the whole graph can have at most kn edges, otherwise no vertex cover of
size k can exist. Hence copying costs O(kn) time, and the overall time is
O(2k kn) = O (2k ).
Although this is already much better than naive exhaustive search, fur-
ther improvements would still be desirable. Here, the more important part
is the exponential factor 2k . Can we improve the base 2 and thus make the
algorithm practical for somewhat larger k?
The weakness of the search tree algorithm above is that it considers
single edges and selects only one vertex at a time. If we could select more
vertices, we could generate our solutions faster. Now observe: For any node
i, we have to take i or all its neighbors, in order to cover all edges incident to
i. It might be good to apply this branching rule on nodes i of high degree.
But what if the graph has no high-degree nodes?
If all degrees are at most 2, the graph consists of simple paths and cycles,
and the problem is trivial. Thus we can assume (worst case!) that there is
always a node of degree 3 or larger. In a branching step we take either 1
node or 3 nodes (or more). How large is our search tree?
This can be analyzed by recurrence equations, similar to the analysis
of divide-and-conquer algorithms. Let T (k) be the number of leaves of a
search tree for vertex covers of size k. Due to our branching rule we have
T (k) = T (k 1)+T (k 3). To figure out what function T is, we assume that
it has the form T (k) = xk with an unknown constant base x. Our recurrence
becomes xk = xk1 + xk3 , which simplifies to x3 = x2 + 1. This equation
is called the characteristic equation of the recurrence. Numerical evaluation
shows x 1.47, which is much better than 2. Researchers have invented
more tricky branching rules for Vertex Cover and further accelerated the
branching process. Meanwhile the best known base is below 1.3.

Вам также может понравиться