S. E. Payne - Applied Combinatorics (Student Version) - 2003 - 216p PDF

Applied Combinatorics – Math 6409
S. E. Payne
Student Version - Fall 2003

2
Contents
0.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1 Basic Counting Techniques 11

1.1 Sets and Functions: The Twelvefold Way . . . . . . . . . . . . 11
1.2 Composition of Positive Integers . . . . . . . . . . . . . . . . . 17
1.3 Multisets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Multinomial Coefficients . . . . . . . . . . . . . . . . . . . . . 20
1.5 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.6 Partitions of Integers . . . . . . . . . . . . . . . . . . . . . . . 24
1.7 Set Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.8 Table Entries in the Twelvefold Way . . . . . . . . . . . . . . 27
1.9 Recapitulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.10 Cayley’s Theorem: The Number of Labeled Trees . . . . . . . 30
1.11 The Matrix-Tree Theorem . . . . . . . . . . . . . . . . . . . . 35
1.12 Number Theoretic Functions . . . . . . . . . . . . . . . . . . . 38
1.13 Inclusion – Exclusion . . . . . . . . . . . . . . . . . . . . . . . 42
1.14 Rook Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.15 Permutations With forbidden Positions . . . . . . . . . . . . . 50
1.16 Recurrence Relations: Mènage Numbers Again . . . . . . . . . 55
1.17 Solutions and/or Hints to Selected Exercises . . . . . . . . . . 58
2 Systems of Representatives and Matroids 67

2.1 The Theorem of Philip Hall . . . . . . . . . . . . . . . . . . . 67
2.2 An Algorithm for SDR’s . . . . . . . . . . . . . . . . . . . . . 72
2.3 Theorems of König and G. Birkkhoff . . . . . . . . . . . . . . 73
2.4 The Theorem of Marshall Hall, Jr. . . . . . . . . . . . . . . . 76
2.5 Matroids and the Greedy Algorithm . . . . . . . . . . . . . . . 79
3
4 CONTENTS
3 Polya Theory 89
3.1 Group Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3 The Cycle Index: Polya’s Theorem . . . . . . . . . . . . . . . 96
3.4 Sylow Theory Via Group Actions . . . . . . . . . . . . . . . . 98
3.5 Patterns and Weights . . . . . . . . . . . . . . . . . . . . . . . 100
3.6 The Symmetric Group . . . . . . . . . . . . . . . . . . . . . . 106
3.7 Counting Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 110
4 Formal Power Series as Generating Functions 113

4.1 Using Power Series to Count Objects . . . . . . . . . . . . . . 113
4.2 A famous example: Stirling numbers of the 2nd kind . . . . . 116
4.3 Ordinary Generating Functions . . . . . . . . . . . . . . . . . 118
4.4 Formal Power Series . . . . . . . . . . . . . . . . . . . . . . . 121
4.5 Composition of Power Series . . . . . . . . . . . . . . . . . . . 125
4.6 The Formal Derivative and Integral . . . . . . . . . . . . . . . 127
4.7 Log, Exp and Binomial Power Series . . . . . . . . . . . . . . 129
4.8 Exponential Generating Functions . . . . . . . . . . . . . . . . 132
4.9 Famous Example: Bernoulli Numbers . . . . . . . . . . . . . . 135
4.10 Famous Example: Fibonacci Numbers . . . . . . . . . . . . . 137
4.11 Roots of a Power Series . . . . . . . . . . . . . . . . . . . . . . 138
4.12 Laurent Series and Lagrange Inversion . . . . . . . . . . . . . 139
4.13 EGF: A Second Look . . . . . . . . . . . . . . . . . . . . . . . 149
4.14 Dirichlet Series - The Formal Theory . . . . . . . . . . . . . . 155
4.15 Rational Generating Functions . . . . . . . . . . . . . . . . . . 159
4.16 More Practice with Generating Functions . . . . . . . . . . . . 164
4.17 The Transfer Matrix Method . . . . . . . . . . . . . . . . . . . 167
4.18 A Famous NONLINEAR Recurrence . . . . . . . . . . . . . . 176
4.19 MacMahon’s Master Theorem . . . . . . . . . . . . . . . . . . 177
4.19.1 Preliminary Results on Determinants . . . . . . . . . . 177
4.19.3 Permutation Digraphs . . . . . . . . . . . . . . . . . . 178
4.19.4 A Class of General Digraphs . . . . . . . . . . . . . . . 179
4.19.5 MacMahon’s Master Theorem for Permutations . . . . 181
4.19.8 Dixon’s Identity as an Application of the Master The-
orem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
4.21 Addendum on Exercise 4.19.9 . . . . . . . . . . . . . . . . . . 194
CONTENTS 5
4.21.1 Symmetric Polynomials . . . . . . . . . . . . . . . . . . 194

4.21.7 A Special Determinant . . . . . . . . . . . . . . . . . . 197
4.21.9 Application of the Master Theorem to the Matrix B . . 199
4.21.10 Sums of Cubes of Binomial Coefficients . . . . . . . . . 201
5 Möbius Inversion on Posets 203

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.2 POSETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5.3 Vector Spaces and Algebras . . . . . . . . . . . . . . . . . . . 208
5.4 The Incidence Algebra I(P, K) . . . . . . . . . . . . . . . . . 210
5.5 Optional Section on ζ . . . . . . . . . . . . . . . . . . . . . . . 215
5.6 The Action of I(P, K) and Möbius Inversion . . . . . . . . . . 216
5.7 Evaluating µ: the Product Theorem . . . . . . . . . . . . . . . 218
5.8 More Applications of Möbius Inversion . . . . . . . . . . . . . 226
5.9 Lattices and Gaussian Coefficients . . . . . . . . . . . . . . . . 233
5.10 Posets with Finite Order Ideals . . . . . . . . . . . . . . . . . 243
6 CONTENTS
0.1 Notation
Throughout these notes the following notation will be used.
C = The set of complex numbers
N = The set of nonnegative integers
P = The set of positive integers
Q = The set of rational numbers
R = The set of real numbers
Z = The set of integers
N = {a1 , . . . , an } = typical set with n elements
[n] = {1, 2, . . . , n}; [0] = ∅
[i, j] = [i, i + 1, . . . , j], if i ≤ j
bxc = The floor of x (i.e., the largest integer not larger than x)
dxe = The ceiling of x (i.e., smallest integer not smaller than x)
P([n]) = {A : A ⊆ [n]}
P(S) = {A : A ⊆ S} (for any set S)
|A| = The number of elements of A (also denoted #A)

!
N
= {A : A ⊆ N and |A| = k} = set of k − subsets of N
k
! !
n N
= # = number of k − subsets of N (0 ≤ k ≤ n)
k k
0.1. NOTATION 7
!!
S
= set of all k − multisets on S
k
!!
n
= number of k − multisets of an n − set
k
!
n
= number of ways of putting each element of an
a1 , . . . , a m
n − set into one of m categories C1 , . . . , Cm ,
P
with ai objects in Ci , ai = n.
(j)q = 1 + q + q 2 + · · · + q j−1
(n)!q = (1)q (2)q · · · (n)q n-qtorial

" #
n (n)!q
= (k)!q (n−k)!q
Gaussian q-binomial coefficient
k q
Sn = The symmetric group on [n]

" #
n
= #{π ∈ Sn : π has k cycles}
k
= c(n, k) = signless Stirling number of first kind
" #
n−k n
s(n, k) = (−1) (Stirling number of first kind)
k
( )
n
= S(n, k) = Number of partitions of an n-set into
k
k nonempty subsets (blocks)
= Stirling number of second kind
B(n) = Total number of partitions of an n-set

= Bell Number
nk = (n)k = n(n − 1) · · · (n − k + 1) (n to the k falling)
nk̄ = (n)k = n(n + 1) · · · (n + k − 1) (n to the k rising)

8 CONTENTS
0.2 Introduction
The course at CU-Denver for which these notes were assembled, Math 6409
(Applied Combinatorics), deals more or less entirely with enumerative com-
binatorics. Other courses deal with combinatorial structures such as Latin
squares, designs of many types, finite geometries, etc. This course is a
one semester course, but as it has been taught different ways in different
semesters, the notes have grown to contain more than we are now able to
cover in one semester. On the other hand, these notes contain considerably
less material than the standard textbooks listed below. It is always difficult
to decide what to leave out, and the choices clearly are a reflection of the
likes and dislikes of the author. We have tried to include some truly tradi-
tional material and some truly nontrivial material, albeit with a treatment
that makes it accessible to the student.
Since the greater part of this course is, ultimately, devoted to developing
ever more sophisticated methods of counting, we begin with a brief discussion
of what it means to count something. As a first example, for n ∈ N , put
f (n) = |P([n])|. Then no one will argue that the formula f (n) = 2n is any-
thing but nice. As a second example, let d(n) be the number of derangements
i
of (1, . . . , n). Then (as we show at least twice later on) d(n) = n! ni=0 (−1)
P
i!
.
This is not so nice an answer as the first one, but there are very clear proofs.
Also, d(n) is the nearest integer to n!e . This is a convenient answer, but it
lacks combinatorial significance. Finally, let f (n) be the number of n × n
matrices of 0’s and 1’s such that each row and column has 3 1’s. It has been
shown that
X (−1)β (n!)2 (β + 3γ)!2α 3β
f (n) = 6−n ,
α!β!(γ!)2 6γ
where the sum is over all α, β, γ ∈ N for which α + β + γ = n. As far as we
know, this formula is not good for much of anything, but it is a very specific
answer that can be evaluated by computer for relatively small n.
As a different kind of example, suppose we want the Fibonacci numbers
F0 , F1 , F2 , . . . , and what we know about them is that they satisfy the recur-
rence relation
Fn+1 = Fn + Fn−1 (n ≥ 1; F0 = F1 = 1).
The sequence begins with 1, 1, 2, 3, 5, 8, 21, 34, 55, 89, . . .. There are ex-
act, not very complicated formulas for Fn , as we shall see later. But just to
0.2. INTRODUCTION 9
introduce the idea of a generating function, here is how a “generatingfunc-

tionologist” might answer the question: The nth Fibonacci number Fn is the
coefficient of xn in the expansion of the function 1/(1 − x − x2 ) as a power
series about the origin. (See the book generatingfunctionology by H. S. Wilf.)
Later we shall investigate this problem a great deal more.
We shall derive a variety of techniques for counting, some purely combina-
torial, some involving algebra in a moderately sophisticated way. But many
of the famous problems of combinatorial theory were “solved” before the
sophisticated theory was developed. And often the simplest ways to count
some type of object offer the greatest insight into the solution. So before we
develop the elaborate structure theory that mechanizes some of the counting
problems, we give some specific examples that involve only rather elemen-
tary ideas. The first several sections are short, with attention to specific
problems rather than to erecting large theories. Then later we develop more
elaborate theories that begin to show some of the sophistication of modern
combinatorics.
Many common topics from combinatorics have received very little treat-
ment in these notes. Other topics have received no mention at all! We
propose that the reader consult at least the following well known text books
for additional material.
P. J. Cameron, Combinatorics, Cambridge University Press, 1994.
I. P. Goulden and D. M. Jackson, Combinatorial Enumeration, Wiley-
Interscience, 1983.
R. Graham, D. Knuth and O. Patashnik, Concrete Mathematics,
Addison-wesley Pub. Co., 1991.
R. P. Stanley, Enumerative Combinatorics, Vol. I, Wadsworth and
Brooks/Cole, 1986.
J. H. van Lint and R. M. Wilson, A Course In Combinatorics, Cambridge
University Press, 1992.
H. S. Wilf, generatingfunctionology, Academic Press, 1990.
In addition there is the monumental Handbook of Combinatorics
edited by R.L. Graham, M. Grötschel and L. Lovász, published in 1995 by
the MIT Press in the USA and by North-Holland outside the USA. This is a
two volume set with over 2000 pages of articles contributed by many experts.
This is a very sophisticated compendium of combinatorial mathematics that
10 CONTENTS
is difficult reading but gives many wonderful insights into the more advanced
aspects of the subject.
Chapter 1
Basic Counting Techniques
1.1 Sets and Functions: The Twelvefold Way
Let N , X be finite sets with #N = n, #X = x. Put X N = {f : N → X}.

We want to compute #(X N ) subject to three types of restrictions on f and
four types of restrictions on when two functions are considered the same.
(i) f is arbitrary
Restrictions on f : (ii) f is injective
(iii) f is surjective
Consider N to be a set of balls, X to be a set of boxes and f : N → X

a way to put balls into boxes. The balls and the boxes may be labeled or
unlabeled. We illustrate the various possibilities with the following examples.
N = {1, 2, 3}, X = {a, b, c, d}.

! ! !
1 2 3 1 2 3 1 2 3
f= ; g= ; h= ;
a a b a b a b b d
!
1 2 3
i= .
c b b
Case 1. Both the balls and the boxes are labeled (or distinguishable).
11
12 CHAPTER 1. BASIC COUNTING TECHNIQUES
f: 1 2 3
a b c d
g: 1 3 2
a b c d
h: 1 2 3
a b c d
i: 2 3 1
a b c d
Case 2. Balls unlabeled; boxes labeled.
f ∼ g:
a b c d
h:
a b c d
i:
a b c d
Case 3. Balls labeled; boxes unlabeled.
f ∼ h: 1 2 3
g: 1 3 2
i: 2 3 1
Case 4. Both balls and boxes unlabeled.

f ∼ g ∼ h ∼ i:
For the four different possibilities arising according as N and X are each
labeled or unlabeled there are different definitions describing when two func-
tions from N to X are equivalent.
Definition: Two functions f , g : N → X are equivalent
1. with N unlabeled provided there is a bijection π : N → N such that
f (π(a)) = g(a) for all a ∈ N . (In words: provided some relabeling of the
elements of N turns f into g.)
2. with X unlabeled provided there is a bijection σ : X → X such that
σ(f (a)) = g(a) for all a ∈ N . (In words: provided some relabeling of the
elements of X turns f into g.)
1.1. SETS AND FUNCTIONS: THE TWELVEFOLD WAY 13
3. with both N and X unlabeled provided there are bijections π :

N → N and σ : X → X with σ(f (π(a))) = g(a), i.e., the following diagram
commutes.
f
N - X
π σ
?
g
N - X
Obs. 1. These three notions of equivalence determine equivalence rela-

tions on X N . So the number of “different” functions with respect to one of
these equivalences is the number of different equivalence classes.
Obs. 2. If f and g are equivalent in any of the above ways, then f is

injective (resp., surjective) iff g is injective (resp., surjective). So we say
the notions of injectivity and surjectivity are compatible with the equivalence
relation. By the “number of inequivalent injective functions f : N → X” we
mean the number of equivalence classes all of whose elements are injective.
Similarly for surjectivity.
As we develop notations, methods and results, the reader should fill in the
blanks in the following table with the number of functions of the indicated
type. Of course, sometimes the formula will be elegant and simple. Other
times it may be rather implicit, e.g., the coefficient of some term of a given
power series.
The 12–Fold Way
Value of |{f : N → X}| if |N | = n and |X| = x

N X f unrestricted f injective f surjective
Labeled Labeled 1 2 3
Unlabeled Labeled 4 5 6
Labeled Unlabeled 7 8 9
Unlabeled Unlabeled 10 11 12
Let N = {a1 , . . . , an }, X = {0, 1}. Let P(N ) be the set of all subsets of
N . For A ⊆ N , define fA : N → X by
(
1, ai ∈ A
fA (ai ) =
0, ai ∈
6 A.
Then F : P(N ) → X N : A 7→ fA is a bijection, so
|P(N )| = #(X N ) = 2n . (1.1)
Exercise: 1.1.1 Generalize the result of Eq. 1.1 to provide an answer for
the first blank in the twelvefold way.
! !
N n
We define to be the set of all k-subsets of N , and put =
k k
!
N
# . Let N (n, k) be the number of ways to choose a k-subset T of N
k
!
n
and then linearly order the elements of T . Clearly N (n, k) = k!. On
k
the other hand, we could choose any element of N to be the first element
1.1. SETS AND FUNCTIONS: THE TWELVEFOLD WAY 15
of T in n ways, then choose the second element in n − 1 ways, . . .!, and

n
finally choose the kth element in n − k + 1 ways. So N (n, k) = k! =
k
n(n − 1) · · · (n − k + 1) := nk , where this last expression is read as “n to the
k falling.” (Note: Similarly, nk̄ = n(n + 1) · · · (n + k − 1) is read as “n to the
k rising.”) This proves the following.
nk
!
n n(n − 1) · · · (n − k + 1)
= = (1.2)
k k! k!
!
(n)k
= according to some authors .
k!

n n n+1
Exercise: 1.1.2 Prove: k−1
+ k
= k
.
Exercise: 1.1.3 Binomial Expansion If xy = yx and n ∈ N , show that

n
!
n n
xi y n−i .
X
(x + y) = (1.3)
i=0
i
!
n k
Note: := nk! makes sense for k ∈ N and n ∈ C. What if n ∈ Z,
k
k ∈ Z and k < 0 or k > n? The best thing to do is to define the value of
the binomial coefficient to be zero in these cases. Then we may write the
following
n
! !
n n k n
xn−k ,
X X
(1 + x) = x = (1.4)
k=0
k k
k
where in the second summation the index may be allowed to run over all
integers since the coefficient on xn−k is nonzero for at most a finite number
of values of k.
Put x = −1 in Eq. 1.4 to obtain
!
X
k n
(−1) = 0. (1.5)
k
k
Put x = 1 in Eq. 1.4 to obtain

!
n
= 2n .
X
(1.6)
k
k
!
n
Differentiate Eq. 1.4 with respect to x (n(1 + x)n−1 = xk−1 )
P
k k
k
and put x = 1 to obtain
!
n
= n · 2n−1 .
X
k (1.7)
k
k

Pr j r+1
Exercise: 1.1.4 Prove that j=m m
= m+1
.
(Hint: A straightforward induction argument works easily. For a more amus-

(1+y)n −1

Pn−1 i Pn n j−1
ing approach try the following: i=0 (1+y) = (1+y)−1 = j=1 j y . Now
compute the coefficient of y m .)
Exercise: 1.1.5 Prove that for all a, b, n ∈ N the following holds:

! ! !
X a b a+b
= .
i
i n−i n
Exercise: 1.1.6 Let r, s, k be any nonnegative integers. Then the following

identity holds:
k
! ! !
X r+j s+k−j r+s+k+1
= .
j=0 r s r+s+1
Exercise: 1.1.7
Evaluate the following two sums:
a. ni=1 ni i3i
P

Pn n
b. i=2 i
i(i − 1)mi
1.2. COMPOSITION OF POSITIVE INTEGERS 17
Exercise: 1.1.8 Show that if 0 ≤ m < n, then

n
! !
k n k−1
= (−1)m+1 .
X
(−1)
k=m+1 k m
(Hint: Fix m and induct on n.)
Exercise: 1.1.9 Show that if 0 ≤ m < n, then

m
! !
k n n−1
= (−1)m
X
(−1) .
k=0 k m
(Hint: Fix n > 0 and use “finite” induction on m.)
Exercise: 1.1.10 Show that:

P∞ k+n 1
(a) k=0 k 2k = 2n+1 .

Pn k+n 1
(b) k=0 k 2k
= 2n+1 .
Exercise: 1.1.11 If n is a positive integer, show that

n n
(−1)k+1 n
!
X X 1
= .
k=1 k k k=1 k
1.2 Composition of Positive Integers
A composition of n ∈ P is an ordered set σ = (a1 , . . . , ak ) of positive

integers for which n = a1 + · · · ak . In this case σ has k parts, i.e., σ is
a k-composition of n. Given a k-composition σ = (a1 , . . . , ak ) , define a
(k − 1)-subset θ(σ) of [n − 1] by
θ(σ) = {a1 , a1 + a2 , . . . , a1 + a2 + · · · + ak−1 }.

θ is a bijection between the set of k-compositions of n and (k − 1)-subsets
of [n − 1]. This proves the following:
!
n−1
There are exactly k − compositions of n. (1.8)
k−1
Moreover, the total number of compositions of n is

n n−1
! !
n−1 n−1
= 2n−1 .
X X
= (1.9)
k=1
k−1 k=0
k
The bijection θ is often represented schematically by drawing n dots in a

row and drawing k − 1 vertical bars between the n − 1 spaces separating the
dots - at most one bar to a space. For example,
· | · · | · · · | · | · · ↔ 1 + 2 + 3 + 1 + 2 = 9.
There is a closely related problem. Let N (n, k) be the number of solutions

(also called weak compositions) (x1 , . . . , xk ) in nonnegative integers such
that x1 + x2 + · · · + xk = n. Put yi = xi + 1 to see that N (n, k) is the number
of solutions in positive integers y1 , . . . , yk to y1 + y2 + · · · + yk = n + k, i.e.,
the number of k-compositions of n + k. Hence
! !
n+k−1 n+k−1
N (n, k) = = . (1.10)
k−1 n
Exercise: 1.2.1 Find the number of solutions (x1 , . . . , xk ) in nonnegative

integers to xi ≤ n. Here k canvary over the the range 0 ≤
P
numbers
in
Pn n+k Pr j r+1
k ≤ r. (Hint: j=0 N (j, k) = k , and j=m m = m+1 .)
Exercise: 1.2.2 Find the number of solutions (x1 , . . . , xk ) in integers to

xi = n with xi ≥ ai for preassigned integers ai , 1 ≤ i ≤ k. (Hint: Try
P
yi = xi + 1 − ai .)(Ans: If k = 4 and m = a1 + a2 + a3 + a4 , the answer is

(n+3−m)(n+2−m)(n+1−m)
6
.)

Exercise: 1.2.3 a) Show that nk=1 m+k−1 = m n+k−1
P P
k k=1 k
. b) Derive a
closed form formula for the number of weak compositions of n into at most
m parts.
1.3. MULTISETS 19
Exercise: 1.2.4 Let S be a set of n elements. Count the ordered pairs (A, B)
of subsets of S such that ∅ ⊆ A ⊆ B ⊆ S. Let c(j, k) denote the number of
such ordered pairs for which |A| = j and |B| = k. Show that:
(1 + y + xy)n = c(j, k)xj y k .

X
0≤j≤k≤n
What does this give if x = y = 1?
Exercise: 1.2.5 Show that
xk
!
x
fk (x) = =
k k!
is a polynomial in x with rational coefficients (not all of which are integers)
and such that for each integer m (positive, negative or zero) fk (m) is also an
integer.
1.3 Multisets
A finite multiset M on a set S is a function ν : S → N such that x∈S ν(x) <
P
∞. If x∈S ν(x) = k, M is called a k-multiset. Sometimes we write k = #M .

P
If S =!! {x1 , . . . , xn } and ν(xi ) = ai , write M = {xa11 , . . . , xann }. Then let

S
denote the set of all k-multisets on S and put
k
!! !!
n S
=# .
k k
If M 0 = ν 0 : S → N is a second multiset on S, we say M 0 is a submultiset
of M provided ν 0 (x) ≤ ν(x) for all x ∈ S.
Q
Note: The number
!! of submultisets of M is x∈S (ν(x) + 1). And each
S
element of corresponds to a weak n-composition of k : a1 + a2 +
k
· · · + an = k. This proves the following:
!! ! !
n n+k−1 n+k−1
= = . (1.11)
k n−1 k
In the present context we give a glimpse of the generating functions that

will be studied in greater detail later.
ν(xi )
(1 + x1 + x21 + · · ·)(1 + x2 + x22 + · · ·) · · · (1 + xn + x2n + · · ·) =
X Y
( xi ).
ν:S→N xi ∈S
Put all xi ’s equal to x:
!!
2 n
P
ν(xi ) #M n
xk .
X X X
(1 + x + x + · · ·) = x = x =
ν:S→N M on S k≥0
k
Hence we have proved that

!!
n
(1 − x)−n = xk .
X
(1.12)
k≥0
k
! !!
−n kn
From this it follows that (−1) = , a fact which is also
k k
easy to check directly. Replacing n with n + 1 give the following version
which is worth memorizing:
!
−(n+1)
X n+k k
(1 − x) = x . (1.13)
k≥0 n

−n n+k−1
Exercise: 1.3.1 k
= (−1)k k
.
1.4 Multinomial Coefficients

!
n
Let be the number of ways of putting each element of an n-
a1 , . . . , a m
set into one of m labeled categories C1 , . . . , Cm , so that Ci gets ai elements.
This is also the number of ways of distributing n labeled balls into m labeled
boxes so that box Bi gets ai balls.
Consider n linearly ordered blanks 1 , 2 , . . . , n to be assigned one of
m letters B1 , . . . , Bm so that Bi is used ai times. It is easy to see that the
number of “words” of length n from an alphabet B1 , . . . , Bm with m letters
1.5. PERMUTATIONS 21
!
n
where the ith letter Bi is used ai times is . Using this fact it
a1 , . . . , a m
is easy to see that
!
n
=
a1 , . . . , a m
! ! ! !
n n − a1 n − a1 − a2 n − a1 − a2 − · · · − an−1
= ···
a1 a2 a3 am
n!
= .
a1 !a2 ! · · · am !
!
n
Theorem 1.4.1 The coefficient of xa11 xa22 · · · xamm in (x1 +· · ·+xm )n is .
a1 , . . . , a m
The multinomial coefficient is defined to be zero whenever it is not the

case that a1 , . . . , am are nonnegative integers whose sum is n.
Exercise: 1.4.2 Prove that

!
n
=
a1 , . . . , a m
! ! !
n−1 n−1 n−1
+ + ··· + .
a1 − 1, a2 , . . . , am a1 , a2 − 1, . . . , am a1 , a2 , . . . , am − 1
1.5 Permutations
There are several ways to approach the study of permutations. One of the
most basic is as a group of bijections. Let A be any nonempty set (finite or
infinite). Put S(A) = {π : A → A : π is a bijection}.
Notation: If π : a1 7→ a2 we write π(a1 ) = a2 (unless noted otherwise). If
π, σ ∈ S(A), define the composition π ◦ σ by (π ◦ σ)(a) = π(σ(a)).
Theorem 1.5.1 (S(A), ◦) is a group.

For simplicity in notation we take A = [n] = {1, . . . , n}, for n ∈ P and

we write Sn = S([n]). One way to ! represent π ∈ Sn is as a two-rowed
1 2 ... n
array π = . From this representation it is easy to
π(1) π(2) . . . π(n)
write π either as a linearly ordered sequence π = π(1), π(2), . . . , π(n) or as a
product of disjoint cycles:
π = (π(1)π 2 (1) · · · π i (1))(π(j)π 2 (j) · · ·) · · · (· · ·).
!
1 2 3 4 5 6 7 8 9
Example: π = . Then π = 261983754 as
2 6 1 9 8 3 7 5 4
a linearly ordered sequence. π = (1263)(49)(58)(7) as a product of disjoint
cycles. Recall that disjoint cycles commute, and (135) = (351) = (513) 6=
(153), etc.
We now introduce the so-called standard representation of a permutation
π ∈ Sn . Write π as a product of disjoint cycles in such a way that
(a) Each cycle is written with it largest element first, and
(b) Cycles are ordered (left to right) in increasing order of largest ele-
ments.
!
1 2 3 4 5 6 7
Example: π = = (14)(2)(375)(6) =
4 2 7 1 3 6 5
= (2)(41)(6)(753), where the last expression is the standard representation
of π. Given a permutation π, π̂ is the word (or permutation written as a
linearly ordered sequence) obtained by writing π in standard form and erasing
parentheses. So for the example above, π̂ = 2416753. We can recover π from
π̂ by inserting a left parenthesis preceding each left-to-right maximum, i.e.,
before each ai such that ai > aj for every j < i in π̂ = a1 a2 · · · an . Then put
right parentheses where they have to be. It follows that π 7→ π̂ is a bijection
from Sn to itself.
Theorem 1.5.2 Sn →S ˆ n is a bijection. And π has k cycles if and only if π̂

has k left-to-right maxima.
" #
n
Define to be the number of permutations in Sn that have exactly k
k
cycles. (Many authors use c(n, k) to denote this number.) So as a corollary
of Theorem 1.5.2 we have
1.5. PERMUTATIONS 23
Corollary 1.5.3" The# number of permutations in Sn with exactly k left-to-

n
right maxima is .
k
" #
n
Put s(n, k) := (−1)n−k c(n, k) = (−1)n−k . Then s(n, k) is called the
k
" #
n
Stirling number of the first kind and is the signless Stirling number
k
of the first kind.
" # " # " # " #
n n−1 n−1 0
Lemma 1.5.4 = (n−1) + , n, k > 0. And =
k k k−1 0
" #
n
1, but otherwise if n ≤ 0 or k ≤ 0 put = 0.
k
Proof: Let π ∈ Sn−1 be written as a product of k disjoint cycles. We

can insert the symbol n after any of the numbers 1, . . . , n − 1 (in its cycle).
This can be done in n − 1 ways, yielding the disjoint cycle decomposition of
a permutation π 0 ∈ Sn with k cycles for which n appears" in a#cycle of length
n−1
greater than or equal to 2. So there are (n − 1) permutations
k
π 0 ∈ Sn with k cycles for which π 0 (n) 6= n. On the other hand, we can choose
a permutation π ∈ Sn−1 with k − 1 cycles and extend it to a permutation
π 0 ∈ Sn with k cycles satisfying π 0 (n) = n. This gives each π 0 ∈ Sn exactly
once, proving the desired result.
" #
Pn n
Theorem 1.5.5 For n ∈ N , k=0 xk = xn̄ = (x)n .
k
Proof: Put Fn (x) := xn̄ = x(x + 1) · · · (x + n − 1) = nk=0 b(n, k)xk .

P
If n = 0, Fn (x) is a “void product”, which by convention is 1. So we put

b(0, 0) = 1, and b(n, k) = 0 if n < 0 or k < 0. Then Fn (x) = (x+n−1)Fn−1 (x)
implies that
= x n−1
Pn k k Pn−1 k
k=0 b(n − 1, k)x + (n − 1) k=0 b(n − 1, k)x
P
k=0 b(n, k)x
Pn Pn−1
= k=1 b(n − 1, k − 1)xk + (n − 1) k=0 b(n − 1, k)xk
Pn
= k=0 [b(n − 1, k − 1) + (n − 1)b(n − 1, k)]xk .
This implies that b(n, k) = (n − 1)b(n − 1, k) + b(n − 1, k − 1). "Hence

#
n
the b(n, k) satisfy the same recurrence and initial conditions as the ,
k
" #
n
implying that they are the same, viz., b(n, k) = .
k
" #
Pn n
Corollary 1.5.6 xn = k=0 (−1)
n−k
· xk .
k
Proof: Put x = −y in Theorem 1.5.5 and simplify. (Use xn = (−1)n (−x)n̄ .)
Cycle Type: If π ∈ Sn , then ci = ci (π) is the number of cycles of length

i in π, 1 ≤ i ≤ n. Note: n = ni=1 ici . Then π has type (c1 , . . . , cn ) and the
P
total number of cycles of π is c(π) = n1 ci (π).

P
Theorem 1.5.7 The number of π ∈ Sn with type (c1 , . . . , cn ) is
n!
.
1c1 c 1 !2c2 c cn
2 ! · · · n cn !
Proof: Let π = a1 · · · an be any word, i.e., permutation in Sn . Suppose

(c1 , . . . , cn ) is an admissible cycle type, i.e., ci ≥ 0 for all i and n = i ici .
P
Insert parentheses in π so that the first c1 cycles have length 1, the next
c2 cycles have length 2, . . . , etc. This defines a map Φ : S([n]) → Sc ([n]),
where Sc ([n]) = {σ ∈ S([n]) : σ has type (c1 , . . . , cn )}. Clearly Φ is onto
Sc ([n]). We claim that if σ ∈ Sc ([n]), then the number of π mapped to σ
is 1c1 c1 !2c2 c2 ! · · · ncn cn !. This follows because in writing σ as a product of
disjoint cycles, we can order the cycles of length i (among themselves) in ci !
ways, and then choose the first elements of all these cycles in ici ways. These
choices for different i are all independent. So Φ : S([n]) → Sc ([n]) is a many
to one map onto Sc ([n]) mapping the same number of π to each σ. Since
#S([n]) = n!, we obtain the desired result.
1.6 Partitions of Integers

A partition of n ∈ N is a sequence λ = (λ1 , . . . , λk ) ∈ N k such that
1.7. SET PARTITIONS 25
P
a) λi = n, and
b) λ1 ≥ · · · ≥ λk ≥ 0.
Two partitions of n are identical if they differ only in the number of
terminal 0’s. For example, (3, 3, 2, 1) ≡ (3, 3, 2, 1, 0, 0). The nonzero λi are
the parts of the partition λ. If λ = (λ1 , . . . , λk ) with λ1 ≥ · · · ≥ λk > 0,
we say that λ has k parts. If λ has αi parts equal to i, we may write
λ =< 1α1 , 2α2 , . . . > where terms with αi = 0 may be omitted, and the
superscript αi = 1 may be omitted.
Notation: “λ ` n” means λ is a partition of n. As an example we have
(4, 4, 2, 2, 2, 1) = < 11 , 23 , 30 , 42 > = < 1, 23 , 42 > ` 15
Put p(n) equal to the total number of partitions of n, and pk (n) equal to
the number of partitions of n with k parts.
Convention: p(0) = p0 (0) = 1.
pn (n) = 1.
pn−1 (n) = 1 if n > 1.
p1 (n) = 1 for n ≥ 1.
p2 (n) = bn/2c.
Exercise: 1.6.1 pk (n) = pk−1 (n − 1) + pk (n − k).

Pk
Exercise: 1.6.2 Show pk (n) = s=1 ps (n − k).
A great deal of time and effort has been spent studying the partitions of
n and much is known about them. However, most of the results concerning
the numbers pk (n) have been obtained via the use of generating functions.
Hence after we have studied formal power series it would be reasonable to
return to the topic of partitions. Unfortunately we probably will not have
time to do this, so this topic would be a great one for a term project.
1.7 Set Partitions

A partition of a finite set N is a collection π = {B1 , . . . , Bk } of subsets of N
such that:
(a) Bi 6= ∅ for all i;
(b) Bi ∩ Bj = ∅ if i 6= j;
(c) B1 ∪ B2 ∪ · · · ∪ Bk = N.
We
( call) Bi a block of π and say that π has k = |π| = #π blocks. Put
n
= S(n, k) = the number of partitions of an n-set into k-blocks.
k
S(n, k) is called a Stirling number of the second kind.
We immediately have the following list of Stirling numbers.
( )
0
= 1;
(
0 )
n
= 0 if k > n ≥ 1;
(
k )
n
= 0 if n > 0;
(
0 )
n
= 1;
(
1 )
n
= 2n−1 − 1;
(
2 )
n
= 1;
(
n ) !
n n
= .
n−1 2
( ) ( ) ( )
n n−1 n−1
Theorem 1.7.1 =k + .
k k k−1
Proof: To obtain a partition of [n] into k blocks, we can either
( (i) partition
) [n − 1] into k blocks and place n into any of these blocks in
n−1
k ways, or
k
( (ii) put) n into a block by itself and partition [n − 1] into k − 1 blocks in
n−1
ways.
k−1
Bell Number Let B(n) be the total number of partitions of an n-
set. Hence
1.8. TABLE ENTRIES IN THE TWELVEFOLD WAY 27
n n
( ) ( )
X n X n
B(n) = = ,
k=1
k k=0
k
for all n ≥ 1.
Theorem 1.7.2 ( )
n n
xk ,
X
x = n ∈ N.
k
k
Proof: Check n = 0 and n = 1. Then note that x · xk = xk+1 + kxk ,

because xk+1 = xk (x(− k) = x)· xk − k · xk . Now let our induction hypothesis
n−1
be that xn−1 = k xk for some n ≥ 2. Then xn = x · xn−1 =
P
k
( ) ( ) ( )
n−1 k n−1 k+1 n−1
kxk =
P P P
x k x = k x + k
k k k
( ) ( )
n−1 k n−1
kxk
P P
k x + k
k−1 k
( ) ( )! ( )
n−1 n−1 n
xk = k xk .
P P
= k k +
k k−1 k
( )
n
Corollary 1.7.3 xn = n−k
xk̄ .
P
k (−1)
k
1.8 Table Entries in the Twelvefold Way

The reader should supply whatever is still needed for a complete proof for
each of the following.
Entry #1. #(X N ) = xn .
Entry #2. #{f ∈ X N : f is one-to-one} =(xn . )
n
Entry #3. #{f ∈ X N : f is onto} = x! , as it is the number of
x
ways of partitioning the balls (say [n]) into x blocks and then linearly ordering
the blocks. This uniquely determines an f of the type being counted.
Entry #4. A function from unlabeled N to labeled X is a placing of
unlabeled balls in labeled boxes: the only important thing is how many balls
are there to be in!!

each box. Each choice
! corresponds to an n-multiset of an
x n+x−1
x-set, i.e., = = n+x−1
x−1
.
n n
Entry #5. Here N is unlabeled and X is labeled and f is one-to-one.
Each function corresponds to putting 0 or 1 ball in each box so that n balls
!
x
are used, so the desired number of functions is the binomial coefficient .
n
Entry #6. Here N is unlabeled, X is labeled, and f is onto. So each f
corresponds to an n-multiset on X with
!!each box chosen
! at least once.
! The
x n−1 n−1
number of such functions is = = .
n−x x−1 n−x
Entry #7. With N labeled and X unlabeled, a function f : N → X is
determined by the sets f −1 (b) : b ∈ X. Hence f corresponds to a (
partition
)
Px n
of N into at most x parts. The number of such partitions is k=1 =
k
Px
k=1 S(n, k).
Entry #8. Here N is labeled, X is unlabeled and f is one-to-one. Such

an f amounts to putting just one of the n balls into each of n unlabeled
boxes. This is possible iff n ≤ x, and in that case there is just 1 way.
Entry #9. With N labeled, X unlabeled, and f onto, such an f corre-
sponds
( )to a partition of N into x parts. Hence the number of such functions
n
is .
x
Entry #10. With N unlabeled, X unlabeled and f arbitrary, f is de-
termined by the number of elements in each block of the ker(f ), i.e., f is
essentially just a partitiion of [n] with at most x parts. Hence the number of
such f is p1 (n) + · · · + px (n).
Entry #11. Here N and X are both unlabeled, and f is one-to-one. So
n unlabeled balls distributed into x unlabeled boxes is possible in just one
way if n ≤ x and not at all otherwise.
Entry #12. Here both N and X are unlabeled and f is onto. Clearly f
corresponds to a partition of n into x parts, so there are px (n) such functions.
Several of the entries of the Twelvefold Way are quite satisfactory, but
others need considerable further development before they are really useful.
1.9. RECAPITULATION 29
1.9 Recapitulation
We have already
! established
! the following.
!
n n−1 n−1
1. = + “n choose k 00
k k k−1
" # " # " #
n n−1 n−1
2. = (n − 1) + “n cycle k 00
k k k−1
( ) ( ) ( )
n n−1 n−1
3. =k + “n subset k 00
k k k−1
" #
n̄ n k
xn = (−1)n (−x)n̄
P
4. x = k x
k
" #
n n−k n k
xn̄ = (−1)n (−x)n
P
5. x = k (−1) x
k
( )
n n
(−1)n−k xk̄
P
6. x = k
k
( )
n n
xk
P
7. x = k
k
It appears that 4. and 6. (resp., 5. and 7.) are some kind of inverses
of each other. Later we shall make this a little more formal as we study the
incidence algebra of a finite POSET.
Also in this section we want to recap certain results on composition of
integers, etc.
!
r r!
1. P (r; r1 , r2 , . . . , rn ) = =
r1 , r2 , . . . , rn r1 !r2 ! · · · rn !
= the number of ways to split up r people into n labeled committees with
ri people in committee Ci
= the number of words of length r with ri letters of type i, 1 ≤ i ≤ n,
where r1 + r2 + · · · + rn = r
= the coefficient on xr11 · xr22 · · · xrnn in (x1 + x2 + · · · + xn )r , where r1 +

r2 + · · · + rn = r
= the number of ways of putting r distinct balls into n labeled boxes with
ri balls in the ith box.
!
n n!
2. C(n, r) = =
r r!(n − r)!
= the number of ways of selecting a committee of size r from a set of n
people
= the coefficient of xr y n−r in (x + y)n .
!
r+n−1 (r + n − 1)!
3. C(r + n − 1, r) = =
r r!(n − 1)!
= the number of ways of ordering r hot dogs of n different types (selec-
tion with repetition)
= the number of ways of putting r identical balls into n labeled boxes
(distribution of identical objects)
= the number of solutions (x1 , . . . , xn ) of ordered n-tuples of nonnegative
integers such that x1 + x2 + · · · + xn = r.
4. Suppose that a1 , . . . , an are given integers, not necessarily nonnega-
tive. Then
n
!
r− ai + n − 1
P
X
= |{(y1 , . . . , yn ) : yi = r and yi ≥ ai for 1 ≤ i ≤ n}|.
r − ai
P
i=0
For a proof, just put xi = yi − ai and note that yi ≥ ai iff xi ≥ 0.

P
NOTE: If ai > r, then the number of solutions is 0.
1.10 Cayley’s Theorem: The Number of La-

beled Trees
One of the most famous results in combinatorics is Cayley’s Theorem that
says that the number of labeled trees on n vertices is nn−2 . We give several
1.10. CAYLEY’S THEOREM: THE NUMBER OF LABELED TREES 31
proofs that illustrate different types of combinatorial arguments. But first

we recall the basic facts about trees.
Let G be a finite graph G = (V, E) on n = |V | vertices and having b = |E|
edges. Then G is called a tree provided G is connected and has no cycles. It
follows that for any two vertices x, y ∈ V , there is a unique path in G from x
to y. Moreover, if x and y are two vertices at maximum distance in G, then
x and y each have degree 1. Hence any tree with at least two vertices has at
least two vertices of degree 1. Such a vertex will be called a hanging vertex.
An easy induction argument shows that if G is a tree on n vertices, then it
has n − 1 edges. As a kind of converse, if G is an acyclic graph on n vertices
with n − 1 edges, it must be a tree. Clearly we need only verify that G is
connected. Each connected component of G is a tree by definition. But if G
has k connected components T1 , . . . , Tk , where Ti has ni vertices and ni − 1
edges, and where n1 + · · · + nk = n, then G has ki=1 ni − 1 = n − k edges.
P
So n − k = n − 1 implies G is connected. Similarly, if G is connected with

b = n − 1, then G has a spanning tree T with n − 1 edges. Hence G = T
must be acyclic and hence a tree. It follows that if G is a graph on n vertices
and b edges, then G is a tree if and only if at least two (and hence all three)
of the following hold:
(a) G is connected;
(b) G is acyclic;
(c) b = n − 1.
A labeled tree on [n] is just a spanning tree of the complete graph Kn on
[n]. Hence we may state Cayley’s theorem as follows.
Theorem 1.10.1 The number of spanning trees of Kn is nn−2 .
Proof #1. The first proof is due to H. Prüfer (1918). It uses an algorithm
that uniquely characterizes the tree.
Let T be a tree with V = [n], so the vertex set already has a natural
order. Let T1 := T . For i = 1, 2, . . . , n − 2, let bi denote the vertex of degree
1 with the smallest label in Ti , and let ai be the vertex adjacent to bi , and
let Ti+1 be the tree obtained by deleting the vertex bi and the edge {ai , bi }
from Ti . The “code” assigned to the tree T is [a1 , a2 , . . . , an−2 ].
z6
A
A
A
4 x 5 x Ax7 z8
@ BB
@ B

x
@ B

@X
XXX
2 XX B
X XBz

x10
H H
3 x 1 HH
H
Hz9
HH
A Tree on 10 Points
As an example, consider the tree T = T1 on 10 points in the figure. The

vertex of degree 1 with smallest index is 3. It is joined to vertex 2. We define
a1 = 2, b1 = 3, then delete vertex 3 and edge {3, 2}, to obtain a tree T2 with
one edge and one vertex less. This procedure is repeated eight times yielding
the sequences
[a1 , a2 , . . . , a8 ] = [2, 2, 1, 1, 7, 1, 10, 10],
[b1 , b2 , . . . , b8 ] = [3, 4, 2, 5, 6, 7, 1, 8]
and terminating with the edge {9, 10}.
The code for the tree is the sequence: [a1 , a2 , . . . , a8 ] =
[2, 2, 1, 1, 7, 1, 10, 10].
To reverse the procedure, start with any code [a1 , a2 , . . . , a8 ] =
[2, 2, 1, 1, 7, 1, 10, 10]. Write an−1 := n. For i = 1, 2, . . . , n − 1, let bi be the
vertex with smallest index which is not in
{ai , ai+1 , . . . , an−1 } ∪ {b1 , b2 , . . . , bi−1 }.
Then {{ai , bi } : i = 1, . . . , n − 1} will be the edge set of a spanning tree.
Exercise: 1.10.2 With the sequence bi defined from the code as indicated in
the proof above, show that {{bi , ai } : i = 1, . . . , n − 1} will be the edge set of
a tree on [n]. Fill in the details of why the mapping associating a code to a
tree, and the mapping associating a tree to a code, are inverses.
Proof #2. This proof starts by showing that the number N (d1 , . . . , dn ) of
labeled trees on vertices {v1 , . . . , vn } in which vi has degree di + 1, 1 ≤ i ≤ n,
1.10. CAYLEY’S THEOREM: THE NUMBER OF LABELED TREES 33

is the multinomial coefficient d1n−2 ,...,dn
. As an inductive hypothesis we assume
that this result holds for trees with fewer than n vertices and leave to the
reader the task of checking that the result holds for n = 3. Since the degree
of each vertex is at least 1, we know that the d’s are all nonnegative integers.
The sum of the degrees of the vertices counts the n−1 edges twice, so we have
2(n − 1) = ni=1 (di + 1) = ( di ) + n ⇒ di = n − 2. Hence at least d1n−2
P P P
,...,dn
is in proper form. We also know that any tree has at least two vertices with
degree 1. We need to show that if (d1 , . . . , dn ) is a sequence of nonnegative
P
integers
with
di = n−2 then (d1 +1, . . . , dn +1) really is the degree sequence
n−2
of d1 ,...,dn labeled trees. Clearly if di = n − 2 then at least two of the di ’s
P
equal zero. The following argument would work with any particular dj = 0,
but for notational ease we suppose that dn = 0. If there is a labeled tree with
degree sequence (d1 + 1, . . . , 1), then the vertex vn is adjacent to a unique
vertex vj with degree at least 2. So the tree obtained by removing vn and
the edge {vj , vn } has degree sequence (d1 + 1, . . . , dj , . . . , dn−1 + 1). It follows
that N (d1 , . . . , dn−1 , 0) = N (d1 − 1, d2 , . . . , dn−1 ) + N (d1 , d2 − 1, . . . , dn−1 ) +
· · · + N (d1 , d2 , . . . , dn−1 − 1). By the induction hypothesis this is the sum of
the multinomial coefficients
! !
n−3 n−3
+ + ···+
d1 − 1, d2 , . . . , dn−1 d1 , d2 − 1, . . . , dn−1
! !
n−3 n−2
+ = .
d1 , d2 , . . . , dn−1 − 1 d1 , d2 , . . . , dn−1 , 0
Cayley’s Theorem now follows. For the number T (n) of labeled trees
on n vertices is the sum of all the terms N (d1, . . . , dn ) with di ≥ 0 and
Pn n−2
i=1 di = n − 2, which is the sum of all terms d1 ,d2 ,...,dn with di ≥ 0 and
Pn n−2
i=1 di = n − 2. Now in the multinomial expansion of (a1 + a2 + · · · + an )
n−2
set a1 = · · · = an = 1 to obtain the desired result T (n) = (1+1+. . .+1) =
n−2
n .
Proof #3. This proof establishes a bijection between the set of labeled
trees on n vertices and the set of mappings from the set {2, 3, . . . , n−1} to the
set [n] = {1, 2, . . . , n}. Clearly the number of such mappings is nn−2 . Suppose
f is such a mapping. Construct a functional digraph D on the vertices 1
through n by defining (i, f (i)), i = 2, . . . , n − 1 to be the arcs. Clearly 1 and
n have zero outdegrees in D, but each of them could have positive indegree.
In either case, the (weakly) connected component containing 1 (respectively,
n) may be viewed as an “in-tree” rooted at 1 (respectively, n). Any other

component consists of an oriented circuit, to each point of which an in-tree is
attached with that point as root. Some of these in-trees may consist only of
the root. The ith oriented circuit has smallest element ri , and the circuits are
to be ordered among theselves so that r1 < r2 < . . . < rk . In the ith circuit,
let li be the vertex to which the arc from ri points, i.e., f (ri ) = li . Suppose
there are k oriented circuits. We may now construct a tree T from D by
deleting the arcs (ri , l − i) (to create a forest of trees) and then adjoining the
arcs (1, l1 ), (r1 , l2 ), . . . , (rk−1 , lk ), (rk , n).
For the reverse process, suppose the labeled tree T is given. Put r0 := 1,
and define ri to be the smallest vertex on the (unique) path from ri−1 to n.
Now delete the edges {ri−1 , li }, i = 1, . . . , k − 1, and {rk , n}, to create
k + 2 components. View the vertex 1 as the root of a directed in-tree.
Similarly, view each vertex along the path from li to ri as the root of an
in-tree. Now adjoin the directed arcs (ri , li ). We may now view this directed
tree as the functional digraph of a unique function from {2, 3, . . . , n − 1} to
[n]. Moreover, it should be clear that this correspondence between functions
from {2, 3, . . . , n − 1} to [n] and labeled trees on [n] is a bijection.
Proof #4. In this proof, due to Joyal (1981), we describe a many-to-one
function F from the set of nn functions from [n] to [n] to the set of labeled
trees on [n] such that the preimage of each labeled tree contains n2 functions.
First, recall that a permutation π of the elements of a set S may be
viewed simultaneously as a linear arrangement of of the elements of S and
as a product of disjoint oriented cycles of the objects of S. In the present
context we want S to be a set of disjoint rooted trees on [n] that use precisely
all the elements of [n]. But there are many such sets S, and the general result
we need is that the number of linear arrangements of disjoint rooted trees
on [n] that use precisely all the elements of [n] is the same as the number of
collections of disjoint oriented cycles of disjoint rooted trees on [n] that use
precisely all the elements of [n].
To each function f we may associate its functional digraph which has an
arc from i to f (i) for each i in [n]. Every (weakly) connected component
of a functional digraph (i.e., connected component of the underlying undi-
rected graph) can be represented by an oriented cycle of rooted trees, so
that the cycles corresponding to different components are disjoint and all the
components use the elements in [n], each exactly once. Clearly there are nn
functions from [n] to [n], each corresponding uniquely to a functional digraph
which is represented by a collection of disjoint oriented cycles of rooted trees
1.11. THE MATRIX-TREE THEOREM 35
on [n] that together use each element in [n] exactly once. Each collection
of disjoint oriented cycles of rooted trees (using each element of [n] exactly
once) corresponds uniquely to a linear arrangement of a collection of rooted
trees (using each element of [n] exactly once). Hence nn is the number of
linear arrangements of rooted trees on [n] (by which we always mean that
the rooted trees in a given linear arrangement use each element of [n] exactly
once).
We claim now that nn = n2 tn , where tn is the number of (labeled) trees
on [n].
It is clear that n2 tn is the number of triples (x, y, T ), where x, y ∈ [n] and
T is a tree on [n]. Given such a triple, we obtain a linear arrangement of
rooted trees by removing all arcs on the unique path from x to y and taking
the nodes on this path to be the roots of the trees that remain, and ordering
these trees by the order of their roots in the original path from x to y. In
this way each labeled tree corresponds to n2 linear arrangements of rooted
trees on [n].
1.11 The Matrix-Tree Theorem

The “matrix-tree” theorem expresses the number of spanning trees in a graph
as the determinant of an appropriate matrix, from which we obtain one more
proof of Cayley’s theorem counting labeled trees. The main ingredient in
the proof is the following theorem known as the Cauchy-Binet Theorem. It
is more commonly stated and applied with the diagonal matrix 4 below
taken to be the identity matrix. However, the generality given here actually
simplifies the proof.
Theorem 1.11.1 Let A and B be, respectively, r × m and m × r matrices,

with r ≤ m. Let 4 be the m × m diagonal matrix with entry ei in the (i, i)-
position. For an r-subset S of [m], let AS and B S denote, respectively, the
r × r submatrices of A and B consisting of the columns of A, or the rows of
B, indexed by the elements of S. Then
det(AS )det(B S )
X Y
det(A 4 B) = ei ,
S i∈S
where the sum is over all r-subsets S of [m].

Proof: We prove the theorem assuming that e1 , . . . , em are independent

(commuting) indeterminates over F . Of course it will then hold for all values
of e1 , . . . , em in F .
Recall that if C = (cij ) is any r × r matrix over F , then
X
det(C) = sgn(σ)c1σ(1) c2σ(2) · · · crσ(r) .
σ∈Sr
Given that A = (aij ) and B = (bij ), the (i,j)-entry of A4B is m

P
k=1 aik ek bkj ,
and this is a linear form in the indeterminates e1 , . . . , em . Hence det(A4B) is
a homogeneous polynomial of degree r in e1 , . . . , em . Suppose that det(A4B)
has a monomial et11 et22 . . . where the number of indeterminates ei that have
ti > 0 is less than r. Substitute 0 for the indeterminates ei that do not
appear in et11 et22 . . ., i.e., that have ti = 0. This will not affect the monomial
et11 et22 . . . or its coefficient in det(A 4 B). But after this substitution 4 has
rank less than r, so A 4 B has rank less than r, implying that det(A 4 B)
must be the zero polynomial. Hence we see that the coefficient of a monomial
in the polynomial det(A 4 B) is zero unless that monomial is the product
Q
of r distinct indeterminates ei , i.e., unless it is of the form i∈S ei for some
r-subset S of [m].
The coefficient of a monomial i∈S ei in det(A 4 B) is found by setting
Q
ei = 1 for i ∈ S, and ei = 0 for i 6∈ S. When this substitution is made in

4, A 4 B evaluates to AS B S . So the coefficient of i∈S ei in det(A 4 B) is
Q
det(AS )det(B S ).
Exercise: 1.11.2 Let M be an n × n matrix all of whose linesums are zero.

Then one of the eigenvalues of M is λ1 = 0. Let λ2 , . . . , λn be the other
eigenvalues of M . Show that all principal n − 1 by n − 1 submatrices have
the same determinant and that this value is n1 λ2 λ3 · · · λn .
An incidence matrix N of a directed graph H is a matrix whose rows

are indexed by the vertices V of H, whose columns are indexed by the edges
E of H, and whose entries are defined by:

 0
 if x is not incident with e, or e is a loop,
N (x, e) = 1 if x is the head of e,
−1


if x is the tail of e.
Lemma 1.11.3 If H has k components, then rank(N ) = |V | − k.

1.11. THE MATRIX-TREE THEOREM 37
Proof: N has v = |V | rows. The rank of N is v − n, where n is the

dimension of the left null space of N , i.e., the dimension of the space of row
vectors g for which gN = 0. But if e is any edge, directed from x to y, then
gN = 0 if and only if g(x) − g(y) = 0. Hence gN = 0 iff g is constant on
each component of H, which says that n is the number k of components of
H.
Lemma 1.11.4 Let A be a square matrix that has at most two nonzero en-
tries in each column, at most one 1 in each column, at most one -1 in each
column, and whose entries are all either 0, 1 or -1. Then det(A) is 0, 1 or
-1.
Proof: This follows by induction on the number of rows. If every column

has both a 1 and a -1, then the sum of all the rows is zero, so the matrix is
singular and det(A) = 0. Otherwise, expand the determinant by a column
with one nonzero entry, to find that it is equal to ±1 times the determinant
of a smaller matrix with the same property.
Corollary 1.11.5 Every square submatrix of an incidence matrix of a di-

rected graph has determinant 0 or ±1. (Such a matrix is called totally
unimodular.)
Theorem 1.11.6 (The Matrix-Tree Theorem) The number of spanning trees

in a connected graph G on n vertices and without loops is the determinant of
any n − 1 by n − 1 principal submatrix of the matrix D − A, where A is the
adjacency matrix of G and D is the diagonal matrix whose diagonal contains
the degrees of the corresponding vertices of G.
Proof: First let H be a connected digraph with n vertices and with inci-
dence matrix N . H must have at least n − 1 edges, because it is connected
and must have a spanning tree, so we may let S be a set of n−1 edges. Using
the notation of the Cauchy-Binet Theorem, consider the n by n−1 submatrix
NS of N whose columns are indexed by elements of S. By Lemma 1.11.3, NS
has rank n−1 iff the spanning subgraph of H with S as edge set is connected,
i.e., iff S is the edge set of a tree in H. Let N 0 be obtained by dropping any
single row of the incidence matrix N . Since the sum of all rows of N (or of
NS ) is zero, the rank of NS0 is the same as the rank of NS . Hence we have
the following:
(
±1 if S is the edge set of a spanning tree in H,
det(NS0 ) = (1.14)
0 otherwise.
Now let G be a connected loopless graph on n vertices. Let H be any

digraph obtained by orienting G, and let N be an incidence matrix of H.
T
Then we claim N N = D − A. For,
T P
(N N )xy = ( e∈E(G) N (x, e)N (y, e)
deg(x) if x = y,
=
−t if x and y are joined by t edges in G.
An n − 1 by n − 1 principal submatrix of D − A is of the form N 0 N 0T
where N 0 is obtained from N by dropping any one row. By Cauchy-Binet,
det(N 0 N 0T ) = det(NS0 )det(NS0T ) = (det(NS0 ))2 ,

X X
S S
where the sum is over all n − 1 subsets S of the edge set. By Eq. 1.14 this is
the number of spanning trees of G.
Exercise: 1.11.7 (Cayley’s Theorem Again) In the Matrix-Tree Theorem,

take G to be the complete graph Kn . Here the matrix D − A is nI − J, where
I is the identity matrix of order n, and J is the n by n matrix of all 1’s.
Now calculate the determinant of any n − 1 by n − 1 principal submatrix of
this matrix to obtain another proof that Kn has nn−2 spanning trees.
Exercise: 1.11.8 In the statement of the Matirx-Tree Theorem it is not nec-

essary to use principal subdeterminants. If the n − 1 × n − 1 submatrix M
is obtained by deleting the ith row and jth column from D − A, then the
number of spanning trees is (−1)i+j det(M ). This follows from the more gen-
eral lemma: If A is an n − 1 × n matrix whose row sums are all equal to
0 and if Aj is obtained by deleting the jth column of A, 1 ≤ j ≤ n, then
det(Aj ) = −det(Aj+1 ).
1.12 Number Theoretic Functions

An arithmetic function (sometimes called a number theoretic function) is a
function whose domain is the set P of positive integers and whose range is a
subset of the complex numbers C. Hence C P is just the set of all arithmetic
1.12. NUMBER THEORETIC FUNCTIONS 39
functions. If f is an arithmetic function not the zero function, f is said to

be multiplicative provided f (mn) = f (m)f (n) whenever (m, n) = 1, and to
be totally multiplicative provided f (mn) = f (m)f (n) for all m, n ∈ P. The
following examples will be of special interest to us here.
Example 1.12.1 I(1) = 1 and I(n) = 0 if n > 1.
Example 1.12.2 U (n) = 1 for all n ∈ P.
Example 1.12.3 E(n) = n for all n ∈ P.
Example 1.12.4 The omega function: ω(n) is the number of distinct primes
dividing n.
Example 1.12.5 The mu function: µ(n) = (−1)ω(n) , if n is square-free, and

µ(n) = 0 otherwise.
Example 1.12.6 Euler’s phi-function: φ(n) is the number of integers k, 1 ≤

k ≤ n, with (k, n) = 1.
The following additional examples often arise in practice.
Example 1.12.7 The Omega function: Ω(n) is the number of primes divid-
ing n counting multiplicity. So ω(n) = Ω(n) iff n is square-free.
Example 1.12.8 The tau function: τ (n) is the number of positive divisors
of n.
Example 1.12.9 The sigma function: σ(n) is the sum of the positive divi-
sors of n.
Example 1.12.10 A generalization of the sigma function: σk (n) is the sum

of the kth powers of the positive divisors of n.
Dirichlet (convolution) Product of Arithmetic Functions.

Def. If f and g are arithmetic functions, define the Dirichlet product
f ∗ g by:
X X
(f ∗ g)(n) = f (d)g(n/d) = f (d1 )g(d2 ).
d|n d1 d2 =n
Obs. 1.12.11 f ∗ g = g ∗ f .
Obs. 1.12.12 If f, g, h are arithmetic functions, (f ∗ g) ∗ h = f ∗ (g ∗ h), and
[(f ∗ g) ∗ h)](n) = d1 d2 d3 =n f (d1 )g(d2 )h(d3 ).
P
Obs. 1.12.13 I ∗ f = f ∗ I = f for all f . And I is the unique multiplicative

identity.
Obs. 1.12.14 An arithmetic function f has a (necessarily unique) multi-
plicative inverse f −1 iff f (1) 6= 0.
Proof: If f ∗ f −1 = I, then f (1)f −1 (1) = (f ∗ f −1 )(1) = I(1) = 1, so
f (1) 6= 0. Conversely, if f (1) 6= 0, then f −1 (1) = (f (1))−1 . Use induction
on n. For n > 1, if f −1 (1), f −1 (2), . . . , f −1 (n − 1) are known, f −1 (n) may be
obtained from 0 = I(n) = (f ∗ f −1 )(n) = d|n f (d)f −1 (n/d) for n > 1.
P
The following theorem has essentially been proved.

Theorem 1.12.15 The set of all arithmetic functions f with f (1) 6= 0 forms
a group under Dirichlet multiplication.
Theorem 1.12.16 The set of all multiplicative functions is a subgroup.
Proof: f (1) 6= 0 6= g(1) implies (f ∗ g)(1) 6= 0. Associativity holds by
Obs. 1.12.12. The identity I is clearly multiplicative. So suppose f, g are
multiplicative. Let (m, n) = 1. Then

(f ∗ g)(mn) = d|mn f (d)g mn
P
d
= d1 |m d2 |n f (d1 d2 )g dmn
P P
P P 1 d2
= d1 |m d2 |n f (d1 )f (d2 )g(m/d1 )g(n/d2 )
= d1 |m f (d1 )g(m/d1 ) · d2 |n f (d2 )g(n/d2 )
P P
= (f ∗ g)(m)(f ∗ g)(n).
Finally, we need to show that if f is multiplicative, in which case f −1
exists, then also f −1 is multiplicative. Define g as follows. Put g(1) = 1,
and for every prime p and every j > 0 put g(pj ) = f −1 (pj ). Then extend
g multiplicatively for all n ∈ P. Since f and g are both multiplicative, so
is f ∗ g. Then for any prime power pk , (f ∗ g)(pk ) = d1 d2 =pk f (d1 )g(d2 ) =
P
−1
(d2 ) = (f ∗ f −1 )(pk ) = I(pk ). So f ∗ g and I coincide on
P
d1 d2 =pk f (d1 )f
prime powers and are multiplicative. Hence f ∗ g = I, implying g = f −1 , i.e.,
f −1 is multiplicative.
Clearly µ is multiplicative, and d|n µ(d) = 1 if n = 1. For n = pe ,
P
Pe j
j=0 µ(p ) = 1 + (−1) + 0 · · · + 0 = 0. Hence µ ∗ U = I and we
P
d|n µ(d) =
have proved the following:
1.12. NUMBER THEORETIC FUNCTIONS 41
Obs. 1.12.17 µ−1 = U ; U −1 = µ.
Theorem 1.12.18 Möbius Inversion: F = U ∗ f iff f = µ ∗ F .
This follows from µ−1 = U and associativity. In its more usual form it
appears as:
X X
F (n) = f (d) ∀n ∈ P iff f (n) = µ(d)F (n/d) ∀n ∈ P.
d|n d|n
NOTE: Here we sometimes say F is the sum function of f . When F

and f are related this way it is interesting to note that F is multiplicative if
and only if f is multiplicative. For if f is multiplicative, then F = U ∗ f is
multiplicative. Conversely, if F = U ∗ f is multiplicative, then µ ∗ F = f is
also multiplicative.
1. τ = U ∗ U is multiplicative, and τ (n) =

Q
Exercise: 1.12.19 pα ||n (α +
1).
2. φ = µ ∗ E is multiplicative. (First show E = φ ∗ U .)
pα+1 −1

3. σ = U ∗ E is multiplicative and σ(n) =
Q
pα ||n p−1
.
4. φ ∗ τ = σ.
5. σ ∗ φ = E ∗ E.
6. E −1 (n) = nµ(n).
Sometimes it is useful to have even more structure on C P . For f, g ∈ C P ,

define the sum of f and g as follows:
(f + g)(n) = f (n) + g(n)
Then a large part of the following theorem has already been proved and
the remainder is left as an exercise.
Theorem 1.12.20 With the above definitions of addition and convolution

product, (C P , +, ∗) is a commutative ring with unity I, and f ∈ C P is a unit
iff f (1) 6= 0.
Exercise: 1.12.21 For g ∈ C P , define ĝ ∈ C P by ĝ(n) = ng(n). Show that

g 7→ ĝ is a ring automorphism. In particular, gˆ−1 = (ĝ)−1 .
Exercise: 1.12.22 In how many ways can a “necklace” with n beads be

formed out of beads labeled L, R, 1, 2, . . . , m so there is at least one L,
and the L’s and R’s alternate (so that the number of L’s is the same as the
number of R’s)?
1.13 Inclusion – Exclusion

Let E be a set of N objects. Let a1 , a2 , . . . , am be a set of m properties
that these objects may or may not have. In general these properties are not
mutually exclusive. Let Ai be the set of obejcts in E that have property ai .
In fact, it could even happen that Ai and Aj are the same set even when i and
j are different. Let N (ai ) be the number of objects that have the property
ai . Let N (a0i ) be the number of objects that do not have property ai . Then
N (ai a0j ) denotes the number of objects that have property ai but do not have
property aj . It is easy to see how to generalize this notation and to establish
identities such as the following:
N = N (ai ) + N (a0i ); N = N (ai aj ) + N (ai a0j ) + N (a0i aj ) + N (a0i a0j ). (1.15)
We now introduce some additional notation.
s0 = N ;
s1 = N (a1 ) + N (a2 ) + · · · + N (am ) =

P
i N (ai );
s2 = N (a1 a2 ) + N (a1 a3 ) + · · · + N (am−1 am ) =

P
i6=j N (ai aj );
s3 = N (a1 a2 a3 ) + · · · + N (am−2 am−1 am ) =

P
1≤i<j<k≤m N (ai aj ak );
..
.
sm = N (a1 a2 · · · am ).
Also,
1.13. INCLUSION – EXCLUSION 43
e0 = N (a01 a02 · · · a0m );
e1 = N (a1 a02 a03 · · · a0m ) + N (a01 a2 a03 · · · a0m ) + · · · + N (a01 a02 · · · a0m−1 am );
..
.
em = N (a1 a2 · · · am ).
In other words, ei is the number of objects that have exactly i properties.
Theorem 1.13.1 For 0 ≤ r ≤ m, we have

m−r
!
X
j r+j
er = (−1) sr+j .
j=0 j
Proof: Clearly, if an object has fewer than r of the properties, then it

contributes zero to both sides of the equation. Suppose an object has exactly
r of the properties. Then it contributes exactly 1 to the left hand side. On
the right hand side it contributes 1 to the term with j = 0 and it contributes
0 to all the other terms. Suppose an object has exactly r + k properties with
0 ≤ k ≤ m − r, so it contributes exactly 1 to the the left hand side. On
r+k
the right hand side it contributes exactly r+j to sr+j . So the total count it
contributes to the right hand side is
j=m−r ! !
r+j r+k
(−1)j .
X
j=0 j r+j
Notice that
! !
r+j r+k (r + j)! (r + k)!
=
j r+j r!j! (r + j)!(k − j)!
! !
(r + k)! k! r+k k
= = .
r!k! j!(k − j)! r j
Thus the total count on the right hand side is
! 
r + k j=k
!
X k
(−1)j  = 0.
r j=0 j
This concludes the proof.

Put S(x) = i=m i Pi=m
ei xi .
P
i=0 si x and E(x) = i=0
Using Theorem 1.13.1 we see that
 
m m m−r
!
r+j
er xr = (−1)j sr+j  xr
X X X
E(x) = 
r=0 r=0 j=0 j
m k
! ! !
j r+j r k
(−1)k−r xr
X X X
= (−1) x sr+j = sk
0≤r≤m
j k=0 r=0 k − r
0≤j≤m−r
m
sk (x − 1)k = S(x − 1).
X
=
k=0
This proves the following:
Theorem 1.13.2 E(x) = S(x − 1).
Of course, it now follows that E(x + 1) = S(x), from which we easily

deduce the following results:
m
!
X k
sj = ek . (1.16)
k=j j
m
(−1)i si .
X
E(0) = S(−1) = (1.17)
i=0
 
r
1 1
s0 + (−2)j sj  .
X X
[E(1) + E(−1)] = e2i = (1.18)
2 i 2 j=0
 
r
1 1
e2i+1 = s0 − (−2)j sj  .
X X
[E(1) − E(−1)] = (1.19)
2 i 2 j=0
Equation 1.17 is the traditional inclusion-exclusion principle. Equation 1.18

gives the number of objects having an even number of properties, and Equa-
tion 1.19 gives the number of objects having an odd number of properties.
We can also easily find a formula for the number of objects having at least t
of the properties.
1.13. INCLUSION – EXCLUSION 45

Pk t+k t+k−1
Exercise: 1.13.3 For k ≥ 0, t ≥ 0, show that j=0 j
= (−1)k k
.
Theorem 1.13.4 The number of objects having at least t of the m properties

is given by
m−t
!
j t+j −1
X X
er = (−1) st+j .
r≥t j=0 t−1
Proof: The proof of this result amounts to collecting terms appropri-

ately in the
left
hand sum and observing that the coefficient
on st+j is
Pj i t+j j t+j−1
i=0 (−1) i
, which by Ex. 1.13.3 is equal to (−1) t−1 . In detail,
and using Theorem 1.13.1,
m m m−r
!
X X X
j r+j
er = (−1) sr+j .
r=t r=t j=0 j
Here t and m are fixed with 0 ≤ t ≤ m, and r and j are dummy variables
with t ≤ r ≤ m and 0 ≤ j ≤ m − r. We introduce new dummy variables k
and i by the invertible substitution r = t + i, j = k − i. The constraints on
k and i are 0 ≤ i ≤ k ≤ m − t. So continuing, we have
m m−t k
! " !#
k−i t+k t+k
(−1)k−i
X X X X
er = (−1) st+k = st+k
r=t i,k:0≤i≤k≤m−t k−i k=0 i=0 k−i
 !
m−t k m−t
!
t+k  k t+k−1
(−1)j
X X X
=  st+k = (−1) st+k .
k=0 j=0 j k=0 k
If we replace each property ai with its negation a0i , and use the notation
sˆi , Ŝ(x), eˆi , etc., for the corresponding analogues of si , ei , etc., we can see
how to write Ŝ(x) in terms of the si .
Theorem 1.13.5
m
! !
k m−k
sk x r .
X X
Ŝ(x) = (−1)
r=0 k m−r

1
Proof: It is clear that êi = em−i , i.e., Ê(x) is the reverse xm E x
of
E(x). Recall that S(x) = E(x + 1). Then
1 −x

m
Ŝ(x) = Ê(x + 1) = (x + 1) E = (x + 1m E 1 +
x+1 s+1
m
−x

= (x + 1)m S (−x)k (x + 1)m−k sk .
X
=
x+1 k=0
r
The coefficient of x in this expression is
( )
n o
r k m−k
[xr−k ] (−1)k (x + 1)m−k sk
X X
[x ] (−x) (x + 1) sk =
k k
!
X
k m−k
= (−1) sk ,
k r−k
from which the theorem follows.
Application to derangements: Let Dn be the number of permutations
σ = b1 · · · bn of 1, 2, . . . , n for which bi 6= i for all i, i.e., Dn is the number
of derangements of n things. Here the N objects are the n! permutations.
The property ai is defined by: The permutation σ has property ! ai provided
n
bi = i. Then N (ai1 · · · air ) = (n − r)!, and sj = (n − j)! = n!
j!
. So
j
p
e0 = np=0 (−1)p n! = n! np=0 (−1)
P P
p! p!
.
Problèm des Rencontres: Let Dn,r be the number of permutations
σ = b1 · · · bn with exactly r fixed
! elements, i.e., bj = j for r values of j.
n
Choose r fixed symbols in ways, and multiply by the number Dn−r of
r
derangements on the remaining n − r elements.
h i
n!
Dn,r = r!(n−r)!
(n − r)!( 0!1 − 1
1!
+ 1
2!
+ ··· ± 1
(n−r)!
) . Or,
!
Pn−r p r+p n! n! Pn−r p1
Dn,r = p=0 (−1) (r+p)!
= r! p=0 (−1) p! .
r
Application to Euler’s Phi function: Let n = pa1 · · · par be the prime
power factorization of the positive integer n. Apply the Inclusion–Exclusion
Principle with E = [n] = {1, . . . , n}, and let ai be the property (of a positive
integer) that it is divisible by pi , 1 ≤ i ≤ r. This yields
r r
!
X n n X Y 1
φ(n) = n − + − ··· = n 1− .
i=1 pi 1≤i<j≤r pi pj i=1 pi
1.14. ROOK POLYNOMIALS 47
1.14 Rook Polynomials

Let C be an n × m matrix, 1 ≤ n ≤ m, each of whose entries is a 0 or a 1.
A line of C is a row or column of C. An independent k-set of C is a set of k
1’s of C with no two on the same line. Given the matrix C, for 0 ≤ r ≤ n we
let rk (C) = rk be the number of independent k-sets of C. A diagonal of C is
a set of n entries of C with no two in the same line, i.e., a set of n entries of
C with one in each row and no two in the same column. Let E be the set of
diagonals of C and let N = |E| = m(m − 1) · · · (m − n + 1) be the number
of diagonals of C. Let aj be the property (that a diagonal may or may not
have) that the entry in row j of the diagonal is a 1.
If we select an independent j-set of C (in rj ways) and then in the re-
maining n − j rows select n − j entries in (m − j)(m − j − 1) · · · (m − n + 1)
ways, we see that using the notation of the inclusion-exclusion principle we
have
(m − j)!
sj = rj · (m − j)(m − j − 1) · · · (m − n + 1) = · rj . (1.20)
(m − n)!
The number of diagonals of C with exactly j 1’s is clearly the same as

the number of diagonals of C with exactly n − j 0’s. If J is the n × m (0, 1)-
matrix each entry of which is a 1s, then Ĉ = J − C is the complement of C.
If E(x) = m i
P
i=0 ei x where ei is the number of diagonals of C having exactly
i 1’s of C, then the reverse polynomial is given by
m m
n1 X
i
X
Ê(x) = x E( ) = êi x = en−i ,
x i=0 i=0
where êi is the number of diagonals of C with exactly i 0’s of C. Clearly

êi = en−i .
X (m − k)!
r̂k xk = ŝk xk = Ŝ(x)
X
k (m − n)! k
1 −x

= Ê(x + 1) = (x + 1)n E = (x + 1)n E 1 +
x+1 x+1
(x + 1)n X
i
−x

= ri (m − i)!E
(m − n)! i x+1
(m − i)!
(−x)i (x + 1)n−i .
X
= ri ·
i (m − n)!
k
The coefficient of x in the first term of this sequence of equal expressionsis
(m−k)!
clearly (m−n)! r̂k . The coefficient of xk in the last term is
( )
(m − i)!
[xk ] (−x)i (x + 1)n−i
X
ri
i (m − n)!
( )
k−i (m − i)!
(−1)i (x + 1)n−i
X
= [x ] ri
i (m − n)!
!
X (m − i)! n − i
i
= (−1) .
i (m − n)! k − i
Hence we have established the following theorem.
Theorem 1.14.1
!
X (m − i)! n − i
i
r̂k = (−1) ri .
i (m − k)! n − k
There is a special case that is often used.
Corollary 1.14.2 Suppose m = n. Then
(−1)i (n − i)! · ri .
X
r̂n =
i
For a given n × m (0,1)-matrix C we continue to let rk denote the number

of independent k-sets of C, and let
n
rk x k
X
RC (x) = R(x) =
k=0
be the ordinary generating function of the sequence (r0 , r1 , r2 , . . .). Then

R(x) is the rook polynomial of the given matrix.
If C is the direct sum of two (0, 1)-matrices C1 and C2 , i.e., no line of C
contains 1’s in both C1 and C2 , it is easy to see that the independent sets of
C1 are completely independent of the independent sets of C2 . It follows that
rk (C) = kj=0 rj (C1 )rk−j (C2 ), and hence
P
1.14. ROOK POLYNOMIALS 49
RC (x) = RC1 (x)RC2 (x) (1.21)

It is also easy to see that if some one line of C contains all the 1’s of C,
then RC (x) = 1 + ax, where a is the number of 1’s of C.
Suppose that in a given matrix C an entry 1ij (in row i and column j) is
selected and marked as a special entry. Let C 0 denote the matrix obtained by
deleting row i and column j of the matrix C, and let C 00 denote the matrix
obtained by replacing the entry 1ij of C with a 0ij . Then the independent
k-sets of C are naturally divided into two classes: those that have a 1 in row
i and column j and those that do not. The number of independent k-sets of
the first type is rk−1 (C 0 ) and the number of the second type is rk (C 00 ). Hence
we have the relation
rk (C) = rk−1 (C 0 ) + rk (C 00 ).
It is now easy to see that we have
RC (x) = xRC 0 (x) + RC 00 (x). (1.22)

Equation 1.22 is called the expansion formula. The rook polynomial of a
(0, 1)-matrix of arbitrary size and shape may be found by repeated appli-
cations of the expansion formula. To facilitate giving an example of this,
let
 
1 1 0 0
 0 1 1 0 
 
0 0 1 10
denote the rook polynomial of the displayed matrix and the 10 indicates the
entry about which the expansion formula is about to be applied. Then by
the expansion formula we have (since we write the matrix to mean its rook
polynomial)
 
! 1 1 0 0
1 1 0
RC (x) = x +  0 10 1 0 
 
0 1 10
0 0 1 0
  
" !# ! 1 1 0 0
1 1 0 1 0 0
=x x 1 1 + + x

+  0 0 1 0 
 
0 10 0 0 1 0
0 0 1 0
" !#
2 1 1 0
= x (1 + 2s) + x x(1 0) + + x(1 + x)2 + (1 + 2x)2
0 0 0
= x2 + 2x3 + x2 (1 + x) + x(1 + 2x) + x + 2x2 + x3 + 1 + 4x + rx2

= 4x3 + 10x2 + 6x + 1.
Exercise: Compute the rook polynomials of the following matrices:

!
1 1
a. .
1 0
 
0 1 0
b.  1 1 1 

.
1 1 0
(Answer: 1 + 6x + 7x2 + x3 .)
1 1 1 1 1
 
 1 0 1 1 1 
c.  .

0 0 1 1 0

 
0 0 1 1 1
1.15 Permutations With forbidden Positions

Consider the distribution of four distinct objects, labeled a, b, c, and d, into
four distinct positions, labeled 1, 2, 3, and 4, with no two objects occupying
the same position. A distribution can be represented in the form of a ma-
trix as illustrated below, where the rows correspond to the objects and the
columns correspond to the positions. A 1 in a cell indicates that the object
in the row containing the cell occupies the position in the column containing
the cell. Thus, the distribution shown in the figure is as follows: a is placed
in the second position, b is placed in the fourth position, c is placed in the
first position, and d is placed in the third position.
a 0 1 0 0
 
b  0 0 0 1 
.
 
c 1 0 0 0

 
d 0 0 1 0
1.15. PERMUTATIONS WITH FORBIDDEN POSITIONS 51
Since an object cannot be placed in more than one position and a position
cannot hold more than one object, in the matrix representation of an accept-
able distribution there will never be more than one 1 in a row or column.
Hence an acceptable distribution is equivalent to an independent 4-set of 1’s.
We can extend this notion to the case where there are forbidden positions
for each of the objects. For example, for the derangement of four objects,
the forbidden positions are
just those along the main diagonal. Also, it is
easy to see that rk (x) = k4 , so that RI (x) = (1 + x)4 . Hence the problem
of enumerating the number of derangements of four objects is equivalent to
the problem of finding the value of r̂4 for the complementary matrix
0 1 1 1
 
 1 0 1 1 
.
 
1 1 0 1

 
1 1 1 0

P4 i (4−i)! 4−i P4 i
By Theorem 1.14.1, r̂4 = i=0 (−1) (4−4)! 4−4
ri = i=0 (−1) (4 − i)! ×

4 P4 i1
i
= i=0 (−1) i! . Of course, this agrees with the usual formula.
Nontaking Rooks A chess piece called a rook can capture any oppo-
nent’s piece in the same row or column of the given rook (provided there are
no intervening pieces). Instead of using a normal 8 × 8 chessboard, suppose
we “play chess” on the “board” consisting solely of those positions of an n×m
(0,1)-matrix where the 1’s appear. Counting the number of ways to place k
mutually nontaking rooks on this board of entries equal to 1 is equivalent to
our earlier problem of counting the number of independent k-sets of 1’s in
the matrix. Consider the example represented by the following matrix:
 
1 1 0 0 1
1 1 1 0 1
 
 
 
B=
 1 0 1 1 0 


 1 1 1 1 1 

1 1 1 1 0
Thus rk (B) counts the number of ways k nontaking rooks can be placed
in those entries of B equal to 1. The 5 × 5 matrix B could be consideredto
have arisen from a job assignment problem. The rows correspond to workers,
the columns to jobs, and the (i, j) entry is a 1 provided worker i is suitable
for job j. We wish to determine the number of ways in which each worker can
be assigned to one job, no more than one worker per job, so that a worker
only gets a job to which he or she is suited. It is easy to see that this is
equivalent to the problem of computing r5 (B). Since there are several more
1’s than 0’s, it might be easier to deal with the complementary matrix
 
0 0 1 1 0
0 0 0 1 0
 
 
B0 = 
 
 0 1 0 0 1 


 0 0 0 0 0 

0 0 0 0 1
Easy Exercise: Show that if the matrix C 0 is obtained from matrix C
by deleting rows or columns with no entries equal to 1, then rk (C 0 ) = rk (C).
Let B 00 be the matrix obtained by deleting column 1 and row 4 from B 0 .
So
0 1 1 0
 
 0 0 1 0 
B 00 =  
1 0 0 1
 
 
0 0 0 1
By Equation 1.21, we see that rB 00 (x) = rC1 (x) × rC2 (x),where
!
1 1
C1 = C2 = .
0 1
We easily compute rC1 (x) = 1 + 3x + x2 , so
rB 0 (x) = rB 00 (x) = (1 + 3x + x2 )2 = 1 + 6x + 11x2 + 6x3 + x4 .
Now using (the dual of) Theorem 1.14.1 we find

!
X (5 − i)! 5 − i
i
r5 (B) = (−1) r̂i
i (5 − 5)! 5 − 5
(−1)i (5 − i)!r̂i = 5! × 1 − 4! × 6 + 3! × 11 − 2! × 6 + 1! × 1 − 0! × 0 = 31.
X
=
i
The next example is a 5×7 matrix B that arises from a problem of storing
computer programs. The (i, j) position is a 1 provided storage location j has
1.15. PERMUTATIONS WITH FORBIDDEN POSITIONS 53
sufficient storage capacity for program i. We wish to assign each program to

a storage location with sufficient storage capacity, at most one program per
location. The number of ways this can be done is again given by r5 (B).
 
1 0 0 0 1 0 0
1 0 1 0 1 0 0
 
 
 
B=
 1 1 1 0 1 0 1 


 1 1 1 0 1 0 0 

1 1 1 1 1 1 1
Exercise Compute r5 (B).
Problème des Ménages A ménage is a permutation of 1, 2, . . . , n in which
i does not appear in position i or i + 1 (mod n). Let Pn be the permutation
matrix with a 1 in position (i, i + 1) (mod n), 1 ≤ i ≤ n. So Pn represents
the cycle (1, 2, 3 . . . , n). Then let Mn = In + Pn , and let Mn (x) be the rook
polynomial of Mn . If Jn is the n × n matrix of 1’s, then en (Jn − In − Pn ) is
the number Un of ménages.
Let Mn∗ be obtained from Mn by changing the 1 in position (n, 1) to a 0,
and let Mn0 be obtained from Mn by changing both 1’s of column 1 to 0’s
(i.e., the 1’s in positions (1,1) and (n, 1) become 0). It should be clear after
a little thought (using the expansion formula and the fact that a matrix and
its transpose have the same rook polynomial), that
Mn (x) = Mn∗ (x) + xMn−1

∗
(x).
Since Mn0 has only zeros in its first column, Mn0 (x) = M̄n0 (x), where
 
1
 1 1
 

1 1
 
M̄n0 = 
 
 
 ··· 

1 1 
 

1
with 1’s in positions (1, 1), (2, 1), (2, 2), (3, 2), (3, 3), . . . , (n − 1, n − 2), (n, n −
1). Here M̄n0 is n × (n − 1). Now select the 1 in position (n, n − 1) to use the
expansion theorem for rook polynomials. So deleting the row and column
0
containing 1(n,n−1) , we get M̄n−1 . Also, replacing 1(n,n−1) with a zero and
∗
then removing the bottom row of 0’s, we get (Mn−1 )T . Since a matrix and
its transpose have the same rook polynomial, we have
∗
Mn0 (x) = x · Mn−1
0
(x) + Mn−1 (x). (1.23)
Mn∗ has one 1 in its bottom row, in position (n, n). Expand about this 1.
Deleting the row and column of this 1 gives m∗n−1 . Changing this 1 to a 0
and deleting the bottom row of zeros gives the transpose of M̄n0 . Hence
Mn∗ (x) = x · Mn−1

∗ ∗
(x) + Mn0 (x) = (x + 1)Mn−1 0
(x) + x · Mn−1 (x). (1.24)
Rewrite the last two equations as a matrix equation:

! ! !
Mn0 (x) x 1 0
Mn−1 (x)
= . (1.25)
Mn∗ (x) x x+1 ∗
Mn−1 (x)
M1∗ = (1) −→ M1∗ (x) = 1 + x.

M10 = (0) −→ M10 (x) = 1.
!
∗ 1 1
M2 = −→ M2∗ (x) = 1 + 3x + x2 .
0 1
!
0 1
M20 = −→ M20 (x) = 1 + 2x.
0 1
Induction Hypothesis (for n ≥ 0):

n−1
!
2(n − 1) − k + 1 k
Mn0 (x)
X
= x ;
k=0 k
n
!
2n − k k
Mn∗ (x)
X
= x .
k=0 k
Then
! !
x 1 Mn0 (x)
x x+1 Mn∗ (x)
 P Pn 2n−k k 
n−1 2(n−1)−k+1
xk+1 + x
 k=0 k
= Pn−1 2(n−1)−k+1 k+1
k=0 k 
Pn 2n−k
k=0 k
x + k=0 k
(xk+1 + xk )
1.16. RECURRENCE RELATIONS: MÈNAGE NUMBERS AGAIN 55
 P 
n 2n−k Pn 2n−k
k=1 k−1
xk + k=0 k
= Pn

2n−k

k Pn+1 2n−k+1 k
Pn
2n−k

k=1 k−1
x + k=1 k−1
x + k=0 k
xk
At this point we can easily compute that
n
!
2n − j 2n j
MN∗ (x) ∗
X
Mn (x) = + xMn−1 = x
j=1 j 2n − j
! !
2n − j 2n 2n − j 2n
−→ rj (I + P ) = −→ sj (I + P ) = (n − j)!
j 2n − j j 2n − j
−→ en (J − I − P ) = ên (I + P ) = e0 (I + P )
n
!
2n − j 2n
= (−1)i (n − j)!
X
= Un .
i=0 j 2n − j
1.16 Recurrence Relations: Mènage Numbers

Again
The original Problème des Ménages was probably that formulated by
Lucas. This asks for the number of ways of seating n married couples at a
circular table with men and women in alternate positions and such that no
wife sits next to her husband. The wives may be seated first, and this may be
done in 2n! ways. Then each husband is excluded from the two seats beside
his wife, but the number of ways of seating the husbands is independent of
the seating arrangement of the wives. Thus is Mn denotes the number of
seating arrangements for this version of the problème des ménages, it is clear
that
Mn = 2n!Un .
Consequently we may concentrate our attention on the ménage numbers
Un . The formula we derived using rook polynomials will now be obtained
using recursion techniques.
Lemma 1.16.1 Let f (n, k) denote the number of ways of selecting k objects,
no two consecutive, from n objects arranged in a row. Then
!
n−k+1
f (n, k) = . (1.26)
k
Proof: Clearly !
n
f (n, 1) = = n,
1
and for n > 1, !
1
f (n, n) = = 0.
n
Now let 1 < k < n. Split the selections into those that include the first
object and those that do not. Those that include the first object cannot
include the second object and are enumerated by f (n − 2, k − 1). Those that
do not include the first object are enumerated by f (n − 1, k). Hence we have
the recurrence
f (n, k) = f (n − 1, k) + f (n − 2, k − 1). (1.27)

We may now prove Eq. 1.26 by strong induction on n. Our induction
hypothesis includes the assertions that
! !
n−k n−k
f (n − 1, k) = ; f (n − 2, k − 1) = .
k k−1
These together with Eq. 1.27 clearly imply Eq. 1.26.
Lemma 1.16.2 Let g(n, k) denote the number of ways of selecting k objects,
no two consecutive, from n objects arranged in a circle. Then
!
n n−k
g(n, k) = )n > k).
n−k k
Proof: As before, split the selections into those that inlcude the first
object and those that do not. the selections that include the first object
cannot include the second object or the last object and are enumerated by
f (n − 3, k − 1).
The selections that do not include the first object are enumerated by
f (n − 1, k).
Hence
g(n, k) = f (n − 1, k) + f (n − 3, k − 1),
1.16. RECURRENCE RELATIONS: MÈNAGE NUMBERS AGAIN 57
and Lemma 1.16.2 is an easy consequence of Lemma 1.16.1.

Now return again to the consideration of the permutations of 1, 2, . . . , n.
Let ai be the property that a permutation has i in position i, 1 ≤ i ≤ n, and
let bi be the property that a permutation has i in position i+1, 1 ≤ i ≤ n−1,
with bn the property that the permutation has n in position 1. Now let the
2n properties be listed in a row:
a1 , b1 , a2 , b2 , . . . , an , bn .
Select k of these properties and ask for the number of permutations that
satisfy each of the k properties. The answer is 0 if the k properties are not
compatible. If they are compatible, then k images under the permutation
are fixed and there are (n − k)! ways to complete the permutation. Let vk
denote the number of ways of selecting k compatible properties from the 2n
properties. Then by the classical inclusion-exclusion principle,
n
(−1)i vi (n − i)!.
X
Un = (1.28)
i=0
It remains to evaluate vk . But we see that if the 2n properties are arranged

in a circle, then only the consecutive ones are not compatible. Hence by
Lemma 1.16.2, !
2n 2n − k
vk = , (1.29)
2n − k k
and this completes the proof.
Chapter 2
Systems of Representatives and

Matroids
2.1 The Theorem of Philip Hall

The material of this chapter does not belong to enumerative combinatorics,
but it is of such fundamental importance in the general field of combinatorics
that we feel impelled to include it.
Let S and I be arbitrary sets. For each i ∈ I let Ai ⊆ S. If ai ∈ Ai for all
i ∈ I, we say {ai : i ∈ I} is a system of representatives for A = (Ai : i ∈ I).
If in addition ai 6= aj whenever i 6= j, even though Ai may equal Aj , then
{ai : i ∈ I} is a system of distinct representatives (SDR) for A. Our first
problem is: Under what conditions does some family A of subsets of a set S
have an SDR?
For a finite collection of sets a reasonable answer was given by Philip Hall
in 1935. It is obvious that if A = (Ai : i ∈ I) has an SDR, then the union
of each k of the members of A = (Ai : i ∈ I) must have at least k elements.
Hall’s observation was that this obvious necessary condition is also sufficient.
We state the condition formally as follows:
Condition (H) : Let I = [n] = {1, 2, . . . , n}, and let S be any (nonempty)
set. For each i ∈ I, let Ai ⊆ S. Then A = (S1 , . . . , Sn ) satisfies Condition
(H) provided for each K ⊆ I, | ∪k∈K Sk | ≥ |K|.
Theorem 2.1.1 The family A = (S1 , . . . , Sn ) of finitely many (not neces-

sarily distinct) sets has an SDR if and only if it satisfies Condition (H).
67
68 CHAPTER 2. SYSTEMS OF REPRESENTATIVES AND MATROIDS
Proof: As Condition (H) is clearly necessary, we now show that it is also

sufficient. Br,s denotes a block of r subsets (Si1 , . . . , Sir ) belonging to A,
where s = | ∪ {Sj : Sj ∈ Br,s }|. So Condition (H) says: s ≥ r for each block
Br,s . If s = r, Br,s is called a critical block. (By convention, the empty block
B0,0 is critical.)
If Br,s = (A1 , . . . , Au , Cu+1 , . . . , Cr ) and
Bt,v = (A1 , . . . , Au , Du+1 , . . . , Dt ), write Br,s ∩ Bt,v =
(A1 , . . . , Au ); Br,s ∪ Bt,v = (A1 , . . . , Au , Cu+1 , . . . , Cr , Du+1 , . . . , Dt ). Here
the notation implies that A1 , . . . , Au are precisely the subsets in both blocks.
Then write
Br,s ∩ Bt,v = Bu,w , where w = | ∪ {Ai : 1 ≤ i ≤ u}|, and Br,s ∪ Bt,v = By,z ,
where y = r + t − u, z = | ∪ {Si : Si ∈ Br,s ∪ Bt,v }|.
The proof will be by induction on the number n of sets in the family A,
but first we need two lemmas.
Lemma 2.1.2 If A satisfies Condition (H), then the union and intersection
of critical blocks are themselves critical blocks.
Proof of Lemma 2.1.2. Let Br,r and Bt,t be given critical blocks. Say Br,r ∩
Bt,t = Bu,v ; Br,r ∪ Bt,t = By,z . The z elements of the union will be the r + t
elements of Br,r and Bt,t reduced by the number of elements in both blocks,
and this latter number includes at least the v elements in the intersection:
z ≤ r + t − v. Also v ≥ u and z ≥ y by Condition (H). Note: y + u = r + t.
Hence r + t − v ≥ z ≥ y = r + t − u ≥ r + t − v, implying that equality holds
throughout. Hence u = v and y = z as desired for the proof of Lemma 2.1.2
.
Lemma 2.1.3 If Bk,k is any critical block of A, the deletion of elements of

Bk,k from all sets in A not belonging to Bk,k produces a new family A0 in
which Condition (H) is still valid.
Proof of Lemma2.1.3. Let Br,s be an arbitrary block, and (Br,s )0 = Br,s

0
0
0
the block after the deletion. We must show that s ≥ r. Let Br,s ∩Bk,k = Bu,v
and Br,s ∪ Bk,k = By,z . Say
Br,s = (A1 , . . . , Au , Cu+1 , . . . , Cr ),

Bk,k = (A1 , . . . , Au , Du+1 , . . . , Dk ).
2.1. THE THEOREM OF PHILIP HALL 69
So Bu,v = (A1 , . . . , Au ), By,z = (A1 , . . . , Au , Cu+1 , . . . , Cr , Du+1 , . . . , Dk ).

The deleted block (Br,s )0 = Br,s
0 0 0
0 is (A1 , . . . , Au , Cu+1 , . . . , Cr ). But Cu+1 , . . . , Cr ,
as blocks of the union By,z , contain at least z − k elements not in Bk,k . Thus
s0 ≥ v + (z − k) ≥ u + y − k = u + (r + k − u) − k = r. Hence s0 ≥ r, as
desired for the proof of Lemma 2.1.3.
As indicated above, for the proof of the main theorem we now use induc-
tion on n. For n = 1 the theorem is obviously true.
Induction Hypothesis: Suppose the theorem holds (Condition (H) implies
that there is an SDR) for any family of m sets, 1 ≤ m < n.
We need to show the theorem holds for a system of n sets. So let 1 <
n, assume the induction hypothesis, and let A = (S1 , . . . , Sn ) be a given
collection of subsets of S satisfying Condition (H).
First Case: There is some critical block Bk,k with 1 ≤ k < n. Delete
the elements in the members of Bk,k from the remaining subsets, to obtain
a new family A0 = Bk,k ∪ Bn−k,v
0 0
, where Bk,k and Bn−k,v have no common
elements in their members. By Lemma 2.1.3, Condition (H) holds in A0 , and
0
hence holds separately in Bk,k and in Bn−k,v viewed as families of sets. By
0
the induction hypothesis, Bk,k and Bn−k,v have (disjoint) SDR’s whose union
is an SDR for A.
Remaining Case: There is no critical block for A except possibly the
entire system. Select any Sj of A and then select any element of Sj as its
representative. Delete this element from all remaining sets to obtain a family
A0 . Hence a block Br,s with r < n becomes a block Br,s
0 0
0 with s ∈ {s, s − 1}.
0
By hypothesis Br,s was not critical, so s ≥ r + 1 and s ≥ r. So Condition
(H) holds for the family A0 \ {Sj }, which by induction has an SDR. Add to
this SDR the element selected as a representative for Sj to obtain an SDR
for A.
In the text by van Lint and Wilson, Theorem 5.3 gives a lower bound on
the number of SDR’s for a family of sets that depends only on the sizes of
the the sets. It is as follows.
Theorem 5.3 of van Lint and Wilson: Let A = (S0 , S1 , . . . , Sn−1 ) be
a family of n sets that does have an SDR. Put mi = |Si | and suppose that
m0 ≤ m1 ≤ · · · ≤ mn−1 . Then the number of SDR’s for A is greater than or
equal to
n−1
Y
Fn (m0 , m1 , . . . , mn−1 ) := (mi − i)∗ ,
i=0
where (a)∗ := max{1, a}.

They leave as an exercise the problem of showing that this is the best
possible lower bound depending only on the size of the sets.
Exercise: 2.1.4 Let A = (A1 , . . . , An ) be a family of subsets of {1, . . . , n}.

Suppose that the incidence matrix of the family is invertible. Show that the
family has an SDR.
Exercise: 2.1.5 Prove the following generalization of Hall’s Theorem:

Let A = (A1 , . . . , An ) be a family of subsets of X that satisfies the follow-
ing property: There is an integer r with 0 ≤ r < n for which the union of
each subfamily of k subsets of A, for all k with 0 ≤ k ≤ n, has at least k − r
elements. Then there is a subfamily of size n − r which has an SDR. (Hint:
Start by adding r “dummy” elements that belong to all the sets.)
Exercise: 2.1.6 Let G be a (finite, undirected, simple) graph with vertex set
V . Let C = {Cx : x ∈ V } be a family of sets indexed by the vertices of
G. For X ⊆ V , let CX = ∪x∈X Cx . A set X ⊆ V is C-colorable if one can
assign to each vertex x ∈ X a “color” cx ∈ Cx so that cx 6= cy whenever x
and y are adjacent in G. Prove that if |CX | ≥ |X| whenever X induces a
connected subgraph of G, then V is C-colorable. (In the current literature of
graph theory, the sets assigned to the vertices are called lists, and the desired
proper coloring of G chosen from the lists is a list coloring of G. When G is
a complete graph, this exercise gives precisely Hall’s Theorem on SDR’s. A
current research topic in graph theory is the investigation of modifications of
this condition that suffice for the existence of list colorings.
Exercise: 2.1.7 With the same notation of the previous exercise, prove that
if every proper subset of V is C-colorable and |CV | ≥ |V |, then V is C-
colorable.
2.1. THE THEOREM OF PHILIP HALL 71
We now interpret the SDR problem as one on matchings in bipartite

graphs. Let G = (X, Y, E) be a bipartite graph. For each S ⊆ X, let N (S)
denote the set of elements of Y connected to at least one element of S by
an edge, and put δ(S) = |S| − |N (S)|. Put δ(G) = max{δ(S) : S ⊆ X}.
Since δ(∅) = 0, clearly δ(G) ≥ 0. Then Hall’s theorem states that G has an
X-saturating matching if and only if δ(G) = 0.
Theorem 2.1.8 G has a matching of size t (or larger) if and only if t ≤

|X| − δ(S) for all S ⊆ X.
Proof: First note that Hall’s theorem says that G has a matching of size
t = |X| if and only if δ(S) ≤ 0 for all S ⊆ X iff |X| ≤ |X| − δ(S) for
all S ⊆ X. So our theorem is true in case t = |X|. Now suppose that
t < |X|. Form a new graph G0 = (X, Y ∪ Z, E 0 ) by adding new vertices
Z = {z1 , . . . , z|X|−t } to Y , and join each zi to each element of X by an edge
of G0 .
If G has a matching of size t, then G0 has a matching of size |X|, implying
that for all S ⊆ X,
|S| ≤ |N 0 (S)| = |N (S)| + |X| − t,
implying
|N (S)| ≥ |S| − |X| + t = t − (|X| − |S|) = t − |X \ S|.
This is also equivalent to t ≤ |X| − (|S| − |N (S)|) = |X| − δ(S).

Conversely, suppose |N (S)| ≥ t−|X \S| = t−(|X|−|S|). Then |N 0 (S)| =
|N (S)| + |X| − t ≥ (t − |X| + |S|) + |X| − t = |S|. By Hall’s theorem, G0 has
an X-saturating matching M . At most |X| − t edges of M join X to Z, so
at least t edges of M are from X to Y .
Note that t ≤ |X| − δ(S) for all S ⊆ X iff t ≤ minS⊆X (|X| − δ(S)) =
|X| − maxS⊆X δ(S) = |X| − δ(G).
Corollary 2.1.9 The largest matching of G has size |X| − δ(G) = m(G),
i.e., m(G) + δ(G) = |X|.
2.2 An Algorithm for SDR’s

Suppose sets S1 , . . . , Sn are given and we have picked an SDR Ar = {a1 , . . . , ar }
of S1 , . . . , Sr in any way. Here is how to find an SDR for S1 , . . . , Sr , Sr+1 , or
to determine that S1 , . . . , Sr , Sr+1 does not satisfy Hall’s Condition (H).
Construct ordered sets T1 , T2 , . . . , etc., as follows. Put T1 = Sr+1 =
{b1 , . . . , bt }. If some bi is not yet used in Ar , let ar+1 = bi . Otherwise,
assume that all the elements b1 , . . . , bt are already in Ar . Form T2 as follows.
First, let S(b1 ) denote the Sj for which b1 = aj . Then
T2 = {b̂1 , b2 , . . . , bt ; bt+1 , . . . , bs },
where bt+1 , . . . , bs are the elements in S(b1 ) not already in T1 .
If some one of bt+1 , . . . , bs is not in Ar , use it to represent S(b1 ) and use
b1 to represent Sr+1 . Leave the other ai ’s as before.
Each list Tj looks like Tj = {b̂1 , b̂2 , . . . , b̂k , bk+1 , . . . , bm }. If all members
of Tj are in Ar , construct
Tj+1 = {b̂1 , . . . , bˆk , b̂k+1 , bk+2 , . . . , bm , (list here any members of S(b̂k+1 )
not already listed)}.

If some bm+s , s > 0, is not in Ar , let bm+s represent S(bk+1 ). Then if
bk+1 ∈ S(bj ) \ S(bj−1 ), let bk+1 represent S(bj ). And if bj ∈ S(bi ) \ S(bi−1 ),
let bj represent S(bi ). Then bi ∈ S(bu ) \ S(bu−1 ). Let bi represent S(bu ).
Eventually, working down subscripts, some bp ∈ S(bj ) with bj ∈ T1 . Let bj
represent Sr+1 , and let bp represent S(bj ). (Each bj is in some S(bi ) with
i < j.)
Exercise: 2.2.1 You have seven employees P1 , . . . , P7 and seven jobs J1 , . . . , J7 .

You ask each employee to select two jobs. They select jobs as given below.
You start by assigning to P1 the job J2 and to P2 the job J6 . Illustrate our
algorithm for producing systems of distinct representatives (when they exist)
to complete this job assignment (if it is possible).
P1 selects jobs numbered 1 and 2.
2.3. THEOREMS OF KÖNIG AND G. BIRKKHOFF 73

2.3 Theorems of König and G. Birkkhoff

Theorem 2.3.1 If the entries of a rectangular matrix are zeros and ones,
the minimum number of lines (i.e., rows and columns) that contain all the
ones is equal to the maximum number of ones that can be chosen with no two
on a line.
Proof: Let A = (aij ) be an n × t matrix of 0’s and 1’s. Let m be

the minumum number of lines containing all the 1’s, and M the maximum
number of 1’s no two on a line. Then trivially m ≥ M , since no line can pass
through two of the 1’s counted by M . We need to show M ≥ m.
Suppose a minimum covering by m lines consists of r rows and s columns,
where r + s = m. We may reorder rows and columns so these become the
first r rows and first s columns. Without loss of generality assume r ≥ 1.
For i = 1, . . . , r, put Si = {j : aij = 1 and j > s}. So Si indicates which
columns beyond the first s have a 1 in row i.
Claim: A = (S1 , . . . , Sr ) satisfies Condition (H). For supppose some k of
these sets contain together at most k − 1 elements. Then these k rows could
be replaced by the appropriate k −1 (or fewer) columns, and all the 1’s would
still be covered by this choice of rows and columns. By the minimality of m
this is not possible! Hence A has an SDR corresponding to 1’s in the first
r rows, no two in the same line and none in the first s columns. By a dual
argument (if s > 1), we may choose s 1’s, no two on a line, none in the first
r rows and all in the first s columns. These r + s = m 1’s have no two on a
line, so m ≤ M . If s = 0, i.e., r = m, just use the r 1’s to see r = m ≤ M .
Theorem 2.3.2 (Systems of Common Representatives) If a set S is parti-

tioned into a finite number n of subsets in two ways S = A1 + · · · + An =
B1 + · · · + Bn and if no k of the A’s are contained in fewer than k of the
B’s, for each k = 1, . . . , n, then there will be elements x1 , . . . , xn that are si-
multaneously representatives of the A’s and B’s (maybe after reordering the
B’s).
Proof: For each Ai , put Si = {j : Ai ∩ Bj 6= ∅}. The hypothesis of the

thereom is just Condition (H) for the system A = (S1 , . . . , Sn ). Let j1 , . . . , jn
be an SDR for A, and choose xi ∈ Ai ∩Bji . Then x1 , . . . , xn is simultaneously
an SDR for both the A’s and the B’s.
Corollary 2.3.3 If B is a finite group with (not necessarily distinct) sub-

groups H and K, with |H| = |K|, then there is a set of elements of B that
are simultaneously representatives for right cosets of H and left (or right!)
cosets of K.
Exercise: 2.3.4 Sixteen (male - female) couples and a caller attend a square
dance. At the door each dancer selects a name-tag of one of the colors red,
blue, green, white. There are four tags of each color for males, and the same
for females. As the tags are selected, each dancer fails to notice what color
her/his partner selects. The caller is then given the job of constructing four
squares with four (original!) couples each in such a way that in each square
no two dancers of the same sex have tags of the same color. Show that this
is possible no matter how the dancers select their name tags.
Corollary 2.3.5 (Theorem of G. Birkkhoff ) Let A = (aij ) be an n×n matrix

where the aij are nonnegative real numbers such that each row and column
has the same sum. Then A is a sum of nonnegative multiples of permutation
matrices.
Proof: A permutation matrix P is a square matrix of 0’s and 1’s with

a single 1 in each row and column. We are to prove that if ni=1 aij = t =
P
Pn
j=1 aij , aij ≥ 0, then A = ui Pi , ui ≥ 0, each Pi a permutation matrix.
P
The proof is by induction on the number w of nonzero entries aij .

If A 6= 0, then w ≥ n. If w = n, then clearly (?) A = tP for some
permutation matrix P . So suppose w > n, and that the theorem has been
established for all such matrices with fewer than w nonzero entries. For each
i = 1, . . . , n, let Si be the set of j’s for which aij > 0.
Claim: A = (S1 , . . . , Sn ) satisfies Condition (H). For suppose some k of
the sets Si1 , . . . , Sik contain together at most k − 1 indices j. Then rows
i1 , . . . , ik have positive entries in at most k − 1 columns. But adding these
entries by rows we get tk, and adding by columns we get at most (k − 1)t,
2.3. THEOREMS OF KÖNIG AND G. BIRKKHOFF 75
an impossibility. Hence A has an SDR j1 , . . . , jn . This means that each of

a1j1 , a2j2 , . . . , anjn is positive. Put P1 = (cij ), where
(
1, if j = ji
cij =
0, otherwise.
Put u1 = min{aiji : 1 ≤ i ≤ n}. Then A1 = A − u1 P1 is a matrix of nonneg-

ative numbers in which each row and column sum is t − u1 . By the choice
of u1 , A1 has fewer nonzero entries than does A. Hence by the induction hy-
pothesis there are permutation matrices P2 , . . . , Ps and nonnegative numbers
u2 , . . . , us for which A1 = sj=2 uj Pj . So A = si=1 ui Pi , as desired.
P P
Exercise: 2.3.6 Let h n be a positive

i integer, and let aij (1 ≤ i, j ≤ n − 1)
n−2 1
be real numbers in (n−1)2 , n−1 which are independent over the rationals and
such that their rational span does not contain 1. Let A be the matrix of order
n whose (i, j) entry equals
1 ≤ i, j ≤ n − 1;

a if
Pij


1 − n−1
k=1 aik if i 6= n and j = n;


Pn−1

 1 − k=1 akj if j 6= n and i = n;
2 − n + n−1
 P Pn−1
l=1 akl if i = j = n.

k=1
Show that A is a doubly stochastic matrix of order n. Show that A cannot be

expressed as the nonnegative linear combination of n2 − 2n + 1 permutation
matrices.
Theorem 2.3.7 Let A = (aij ) be a doubly stochastic matrix of order n with

f (A) fully indecomposable components and #(A) nonzero entries. Then A
is the nonnegative linear combination of #(A) − 2n + f (A) + 1 permutation
matrices.
Proof: The proof is by induction on #(A). Since A is doubly stochastic,

#(A) ≥ n. If #(A) = n, then A is a permutation matrix, and since #(A) −
2n + f (A) + 1 = n − 2n + n + 1 = 1, the theorem holds in this case.
Now assume that #(A) > n. Let k and l be integers such that akl is a
smallest positive entry of A. Since A is doubly stochastic there exists a
permutation matrix P = (pij ) such that if pij = 1 then aij > 0, and pkl = 1.
(By the proof of the claim in the proof of Cor 2.3.5.) Since #(A) > n,

1
akl 6= 1. Let B = 1−a kl
(A − akl P ). Then B is a doubly stochastic matrix
with #(B) < #(A). By induction B is a nonnegative linear combination of
#(B) − 2n + f (B) + 1 permutation matrices. Hence A is a nonnegative linear
combination of #(B) − 2n + f (B) + 2 permutation matrixes. If f (B) =
f (A), then since #(A) > #(B), A is a nonnegative linear combination of
#(A) − 2n + f (A) + 1 permutation matrices, and we are done. Now suppose
that f (B) > f (A). Let S be the set of (i, j) such that pij = 1 and aij = akl .
By permuting rows and columns we may assume that B is a direct sum
B1 ⊕ B2 ⊕ · · · ⊕ Bk of fully indecomposable doubly stochastic matrices. Let
Aij denote the submatrix of A with rows those of Bi and columns those of
Bj . If Aii is not a fully indecomposable component of A, then there exists
a j, j 6= i, such that Aij 6= 0. It follows that |S| ≥ f (B) − f (A). Since
#(B) + |S| = #(A) and f (A) ≤ f (B) − |S|, we have
#(A) − 2n + f (n) + 1 < #(B) − 2n + f (B) + 2.

Therefore A is a nonnegative linear combination of #(A) − 2n + f (A) + 1
permutation matrices, and the proof follows by induction.
In the proof of the following theorem we need the following elementary
inequality that can be proved by induction on r.
Exercise: 2.3.8 Let k1 , . . . , kr , r ≥ 1, be r positive integers. Then
r r
!2
ki2
X X
+r ≤ ki + 1.
i=1 i=1
Theorem 2.3.9 Let A be a doubly stochastic matrix of order n. Then A is

the nonnegative linear combination of n2 − 2n + 2 permutation matrices.
Proof: Let A have r = f (A) fully indecomposable components A1 , . . . , Ar
with Ai being ki by ki . Then #(A) + f (A) ≤ ri=1 ki2 + r ≤ ( ri=0 ki )2 +
P P
1 = n2 + 1. Then by the previous theorem, A is the nonnegative linear

combination of #(A) − 2n + 1 + f (A) ≤ (n2 + 1) − 2n + 1 = n2 − 2n + 2
permutation matrices.
2.4 The Theorem of Marshall Hall, Jr.

Many of the ideas of “finite” combinatorics have generalizations to situations
in which some of the sets involved are infinite. We just touch on this subject.
2.4. THE THEOREM OF MARSHALL HALL, JR. 77
Given a family A of sets, if the number of sets in the family is infinite,

there are several ways the theorem of P. Hall can be generalized. One of the
first (and to our mind one of the most useful) was given by Marshal Hall, Jr.
(no relative of P. Hall), and is as follows.
Theorem 2.4.1 Suppose that for each i in some index set I there is a finite
subset Ai of a set S. The system A = (Ai )i∈I has an SDR if and only if
the following Condition (H’) holds: For each finite subset I 0 of I the system
A0 = (Ai )i∈I 0 satisfies Condition (H).
Proof: We establish a partial order on deletions, writing D1 ⊆ D2 for

deletions D1 and D2 iff each element deleted by D1 is also deleted by D2 .
Of course, we are interested only in deletions which preserve Condition (H’).
If all deletions in an ascending chain D1 ⊆ D2 ⊆ · · · ⊆ Di ⊆ · · · preserve
Condition (H), let D be the deletion which consists of deleting an element
b from a set A iff there is some i for which b is deleted from A by Di . We
assert that deletion D also preserves Condition (H).
In any block Br,s of A, (r, s < ∞), at most a finite number of deletions in
the chain can affect Br,s . If no deletion of the chain affects Br,s , then of course
D does not affect Br,s , and Condition (H) still holds for Br,s . Otherwise, let
Dn be the last deletion that affects Br,s . So under Dn (and hence also under
D) (Br,s )0 = Br,s
0 0
0 still satisfies Condition (H) by hypothesis, i.e., s ≥ r. But
Br,s is arbitrary, so D preserves Condition (H) on A. By Zorn’s Lemma,

there will be a maximal deletion D̄ preserving Condition (H). We show that
under such a maximal deletion D̄ preserving Condition H, each deleted set
Si0 has only a single element. Clearly these elements would form an SDR for
the original A.
Suppose there is an a1 not belonging to a critical block. Delete a1 from
every set Ai containing a1 . Under this deletion a block Br,s is replaced by a
0 0
block Br,s 0 with s ≥ s − 1 ≥ r, so Condition (H) is preserved. Hence after
a maximal deletion each element left is in some critical block. And if Bk,k is
a critical block, we may delete elements of Bk,k from all sets not in Bk,k and
still preserve Condition (H) by Lemma 2.1.3 (since it needs to apply only
to finitely many sets at a time). By Theorem 2.1.1 each critical block Bk,k
(being finite) possesses an SDR when Condition (H) holds. Hence we may
perform an additional deletion leaving Bk,k as a collection of singleton sets
and with Condition (H) still holding for the entire remaining sets. It is now
clear that after a maximal deletion D̄ preserving Condition (H), each element
is in a critical block, and each critical block consists of singleton sets. Hence
after a maximal deletion D̄ preserving Condition (H), each set consists of a
single element, and these elements form an SDR for A.
The following theorem, sometimes called the Cantor–Schroeder–Bernstein
Theorem, will be used with the theorem of M. Hall, Jr. to show that any
two bases of a vector space V over a field F must have the same cardinality.
Theorem 2.4.2 Let X, Y be sets, and let θ : X → Y and ψ : Y → X be

injective mappings. Then there exists a bijection φ : X → Y .
Proof: The elements of X will be referred to as males, those of Y as

females. For x ∈ X, if θ(x) = y, we say y is the daughter of x and x is the
father of y. Analogously, if ψ(y) = x, we say x is the son of y and y is the
mother of x. A male with no mother is said to be an “adam.” A female with
no father is said to be an “eve.” Ancestors and descendants are defined in
the natural way, except that each x or y is both an ancestor of itself and a
descendant of itself. If z ∈ X ∪ Y has an ancestor that is an adam (resp.,
eve) we say that z has an adam (resp., eve). Partition X and Y into the
following disjoint sets:
X1 = {x ∈ X : x has no eve};
X2 = {x ∈ X : x has an eve};
Y1 = {y ∈ Y : y has no eve};
Y2 = {y ∈ Y : y has an eve}.
Now a little thought shows that θ : X1 → Y1 is a bijection, and ψ −1 :
X2 → Y2 is a a bijection. So
φ = θ|X1 ∪ ψ −1 |X2
is a bijection from X to Y .
Corollary 2.4.3 If V is a vector space over the field F and if B1 and B2

are two bases for V , then |B1 | = |B2 |.
2.5. MATROIDS AND THE GREEDY ALGORITHM 79
Proof: Let B1 = {xi : i ∈ I} and B2 = {yj : j ∈ J}. For each i ∈ I, let

Γi = {j ∈ J : yj occurs with nonzero coefficient in the unique linear
expression for xi in terms of the yj0 s}. Then the union of any k (≥ 1) Γ0i s, say
Γi1 , . . . , Γik , each of which of course is finite, must contain at least k distinct
elements. For otherwise xi1 , . . . , xik would belong to a space of dimension
less than k, and hence be linearly dependent. Thus the family (Γi : i ∈ I) of
sets must have an SDR. This means there is a function θ : I → J which is
an injection. Similarly, there is an injection ψ : J → I. So by the preceding
theorem there is a bijection J ↔ I, i.e., |B1 | = |B2 |.
2.5 Matroids and the Greedy Algorithm

A matroid on a set X is a collection I of subsets of X (called independent
subsets of X) satisfying the following:
• Subset Rule: Each subset of an independent set (including the empty

set) is independent.
• Expansion Rule: If I, J ∈ I with |I| < |J|, then there is some x ∈ J

for which I ∪ {x} ∈ I.
An independent set of maximal size in I is called a basis. An additive

cost function f is a function f : P(X) → R with f (∅) = 0 and such that
f (S) = x∈S f ({x}), for each S ⊆ X.
P
Theorem 2.5.1 Let I be a matroid of independent sets on X, and let f :

P(X) → R be an additive cost function. Then the greedy algorithm (given
below) selects a basis of minimum cost.
Proof: The Greedy Algorithm is as follows:
1. Let I = ∅.
2. From set X pick an element x with f (x) minimum.
3. If I ∪ {x} is independent, replace I with I ∪ {x}.
4. Delete x from X.
Repeat Steps 2 through 4 until X is empty.

The Expansion Rule implies that all maximal independent sets have the
same size, and the Greedy Algorithm will add to I until it is a basis. Suppose
the Greedy Algorithm selects a basis B = (b1 , b2 , . . . , bk ) and that A =
(a1 , . . . , ak ) is some other basis, both ordered so that if i < j then f (bi ) ≤
f (bj ) and f (ai ) ≤ f (aj ). By Step 2 of the Greedy Algorithm, f (b1 ) ≤ f (a1 ).
If f (bi ) ≤ f (ai ) for all i, then f (B) ≤ f (A). So suppose there is some
j such that f (bj ) > f (aj ), but f (bi ) ≤ f (ai ) for i = 1, 2, . . . , j − 1.
Then {a1 , . . . , aj } and {b1 , b2 , . . . , bj−1 } are both independent. By
the Expansion Property, for some ai with i ≤ j, {b1 , . . . , bj−1 , ai } is
independent. Since f (ai ) ≤ f (aj ) < f (bj ), ai would have been selected by
the Greedy Algorithm to be bj (almost true; at least bj would not have been
chosen!). Hence f (bj ) ≤ f (aj ) and B must be a minimum-cost basis.
We now consider three contexts in which the Greedy Algorithm has turned
out to be very useful.
Let A = (S1 , . . . , Sm ) be a family of (not necessarily distinct) subsets of a
set X. Suppose each element of X has a weight (or cost) assigned to it. For
x ∈ X let f (x) be the cost assigned to x. And for S ⊆ X, the cost of S is
P
to be f (S) = x∈S f (x). We want to conctruct an SDR for some subfamily
A0 = (Si1 , . . . , Sik ) of A which is as large as possible, and such that the
cost f (D) of the SDR D is the minimum possible among all maximum-sized
SDR’s.
We say that a subset I of X is an independent set (of representatives for
A) if the elements of I may be matched with members of A so that they
form an SDR for the sets with which they are matched. So we want to be
able to find a cheapest independent set of maximal size.
The same problem can be expressed in terms of bipartite graphs. Given
a bipartite graph G = (X, Y, E), we say that a subset S of X is independent
(for matchings of X into Y ) if there is a matching M of G which matches
all elements of S to elements of Y . If each element x of X is assigned a
(nonnegative) cost f (x), for each subset S ⊆ X put F (S) = x∈S f (x).
P
Then the problem is to find a cheapest (or sometimes a most expensive)

independent set in X, and it is usually desirable also to have a corresponding
matching.
Given A = (S1 , . . . , Sm ), Si ⊆ X, construct a bipartite graph G =
(X, A, E) with (x, Sj ) ∈ E iff x ∈ Sj . Then a subset I of X is indepen-
dent in the matching sense iff it is independent in the SDR sense.
Theorem 2.5.2 In both examples we have just given (matchings of bipartite

graphs and SDR’s) the independent sets form a matroid.
We do the case for matchings in bipartite graphs, and we note that if the
empty set (as a subset of X in G = (X, Y, E)) is defined to be independent,
then clearly the set I of independent subsets of X satisfies the Subset Rule.
So we consider the Expansion Rule. However, before proceding into the
proof we need to be sure that our terminology is clear and we need to prove
a couple preliminary lemmas.
A matching M of size m in a graph G is a set of m edges, no two of which
have a vertex in common. A vertex is said to be matched (to another vertex)
by M if it lies in an edge of M . We defined a bipartite graph G with parts
X and Y to be a graph whose vertex set is the union of the two disjoint sets
X and Y and whose edges all connect a vertex in X with a vertex in Y . A
complete matching of X int Y is a matching of G with X edges. Our first
lemma describes the interaction of two matchings.
Lemma 2.5.3 Let M1 and M2 be matchings of the graph G = (V, E). Let
G0 = (V, E 0 ) be the subgraph with E 0 = (M1 ∪ M2 ) \ (M1 ∩ M2 ) = (M1 \ M2 ) ∪
(M2 \ M1 ). Then each connected component of G0 is one of the following
three types:
(i) a single vertex
(ii) a cycle with an even number of edges and whose edges are
alternately in M1 and M2 .
(iii) a chain whose edges are alternately in M1 and M2 ,
and whose two end vertices are each matched by one of
M1 , M2 but not both.
Moreover, if |M1 | < |M2 |, there is a component of G0 of type (iii) with
first and last edges in M2 and whose endpoints are not M1 -matched.
Proof: If x is a vertex of G that is neither M1 -matched nor M2 -matched,

then x is an isolated vertex of G0 . Similarly, if some edge through x lies in
both M1 and M2 , then x is an isolated vertex of G0 . Now suppose G1 is a
connected component of G0 having n vertices and at least one edge. Since
G1 is connected, it has at least n − 1 edges (since it has a spanning tree).
Each vertex of G1 has degree 1 or 2 (on at most one edge of M1 and at most
one edge of M2 ). Hence
X
2(n − 1) ≤ 2(number of edges of G1 ) = deg(x) ≤ 2n.
x∈V (G1 )
So G1 has either n − 1 or n edges. If G1 has n − 1 edges, it must be a tree

in which each vertex has degree at most 2, i.e., G1 is a chain whose edges are
alternately in M1 and M2 . If x is an endpoint of G1 , it is easy to see that x
is matched by only one of M1 , M2 . (If x ∈ e1 ∈ M1 and x ∈ e2 ∈ M2 , then
if e1 = e2 this edge is not in E 0 ; if e1 6= e2 , both edges e1 , e2 are in G0 and x
could not be an endpoint of G1 .)
If G1 has n edges, it must be a cycle whose edges alternate in M1 and
M2 , forcing it to have an even number edges.
Finally, if |M1 | < |M2 |, there must be some connected component G1
with more M2 -edges than M1 -edges. So G1 is of type (iii) with first and last
edge in M2 and whose endpoints are not M1 -matched.
If M is a matching for G, a path v0 e1 v1 e2 · · · en vn is an alternating path
for M if whenever ei is in M , ei+1 is not and whenever ei is not in M , ei+1
is in M . We now show how to use the kind of alternating path that arises in
case (iii) of the previous Lemma to obtain a larger matching.
Lemma 2.5.4 Let M be a matching in a graph G and let P be an alternating

path with edge set E 0 beginning and ending at unmatched vertices. Let M 0 =
M ∩ E 0 . Then
(M \ M 0 ) ∪ (E 0 \ M 0 ) = (M \ E 0 ) ∪ (E 0 \ M )
is a matching with one more edge than M has.
Proof: Every other edge of P is in M . However, P begins and ends with

edges not in M , so there is a number k such that P has k edges in M and
k + 1 edges not in M . The first and last vertices of P are unmatched, and all
other vertices in P are matched by M 0 , so no edge in M \ M 0 contains any
vertex in P . Thus, the edges of M \ M 0 have no vertices in common with the
edges of E 0 \ M 0 . Further, since P is a path and E 0 \ M 0 consists of every
other edge of the path, the edges of E 0 \ M 0 have no vertices in common.
Thus:
(M \ M 0 ) ∪ (E 0 \ M 0 ) = (M \ E 0 ) ∪ (E 0 \ M )
is a matching and by the sum principle, it has m − k + k + 1 = m + 1 edges.
We are now ready for the proof of the theorem.

Proof: As mentioned above, the set I of independent subsets of X (of
vertices of a bipartite graph G = (X, Y, E)) satisfies the Subset Rule. So we
now consider the Expansion Rule.
Suppose M1 is a matching of S into Y , M2 is a matching of T into Y ,
where S ⊆ X, T ⊆ X and |S| < |T |. Let G0 be the graph on X ∪ Y
with edgeset E 0 = (M1 ∪ M2 ) \ (M1 ∩ M2 ) = (M1 \ M2 ) ∪ (M2 \ M1 ). Here
|M1 | = |S| < |T | = |M2 |, and clearly |E 0 ∩ M1 | < |E 0 ∩ M2 |.
At least one of the connected components of G0 has one more edge in
M2 than it has in M1 . So by Lemma 2.5.3 the graph G0 has a connected
component that must be an M1 -alternating path P whose first and last edges
are in M2 . Each vertex of this path that is touched by an M1 edge is also
touched by an M2 edge. And the endpoints of this path are not M1 -matched.
Since the path has an odd number of edges, its two endpoints lie one in X,
one in Y . Say x is the endpoint lying in X. Let E 00 be the edge-set of the
path P . Then M 0 = (M1 \ E 00 ) ∪ (E 00 \ M1 ) is a matching with one edge
more than M1 , and M 0 is a matching of S ∪ {x} into Y . Hence S ∪ {x} is
independent, and we selected x from T .
There is a converse due to Berge that is quite interesting, but we leave
its proof as an exercise.
Theorem 2.5.5 Suppose G is a graph and M is a matching of G. Then M

is a matching of maximum size (among all matchings) if and only if there is
no alternating path connecting two unmatched vertices.
Exercise: 2.5.6 Prove Theorem 2.5.5.
There is a third standard example of a matroid.
Theorem 2.5.7 The edgesets of forests of a graph G = (V, E) form the

independent sets of a matroid on E.
Proof: If for some F ⊆ E it is true that (V, F ) has no cycles, then (V, F 0 )
has no cycles for any subset F 0 of F . This says that the Subset Rule is
satisfied.
Recall that a tree is a connected graph with k vertices and k − 1 edges.

Thus a forest on n vertices with c connected components will consist of c trees
and will thus have n − c edges. Suppose F 0 and F are forests (contained in
E) with r edges and s edges, respectively, with r < s. If no edge of F can
be added to F 0 to give an independent set, then adding any edge of F to F 0
gives a cycle. In particular, each edge of F must connect two points in the
same connected component of (V, F 0 ). Thus each connected component of
(V, F ) is a subset of a connected component of (V, F 0 ). Then (V, F ) has no
more edges than (V, F 0 ), so r ≥ s, a contradiction. Hence the forests of G
satisfy the expansion rule, implying that the collection of edgesets of forests
of G is a collection of independent sets of a matroid on E.
Corollary 2.5.8 The Greedy Algorithm applied to cost-weighted edges of a

connected graph produces a minimum-cost spanning tree. In fact, this is what
is usually called Kruskal’s algorithm.
Chapter 3
Polya Theory
3.1 Group Actions

Let X be a nonempty set and SX the symmetric group on X, i.e., SX is
the group of all permutations of the elements of X with the group operation
being the composition of functions. Let G be a group. An action of G on
X is a homomorphism µ : G → SX . In other words, µ is a function from G
to SX satisfying
µ(g1 ) ◦ µ(g2 ) = µ(g1 · g2 ) (3.1)

for all g1 , g2 ∈ G.
Often (µ(g))(x) is written as g(x) if only one action is being considered.
The only difference between thinking of G as acting on X and thinking of G
as a group of permutations of the elements of X is that for some g1 , g2 ∈ G,
g1 6= g2 , it might be that µ(g1 ) and µ(g2 ) are actually the same permutation,
i.e., g1 (x) = g2 (x) for all x ∈ X. Also, sometimes there are several different
actions of G on X which may be considered in the same context.
Theorem 3.1.1 Let µ be an action of G on X and let e be the identity of

G. Then the following hold:
(i) µ(e)is the identity permutation on X.
(ii) µ(g −1 ) = [µ(g)]−1 , for each g ∈ G.
(iii) More generally, for each n ∈ Z, µ(g n ) = (µ(g))n .
89
90 CHAPTER 3. POLYA THEORY
Proof: These results are special cases of results usually proved for homo-
morphisms in general. If you don’t remember them, you should work out the
proofs in this special case.
Let G act on X. For x, y ∈ X, define x ∼ y iff there is some g ∈ G for
which g(x) = y.
Theorem 3.1.2 The relation “∼” is an equivalence relation on X.
Proof: This is an easy exercise.

The “∼” equivalence classes are called G-orbits in X. The orbit con-
taining x is denoted xG or sometimes just [x] if there is no likelihood of
confusion.
For g ∈ G, put Xg = {x ∈ X : g(x) = x}, so Xg is the set of elements of
X fixed by g. For x ∈ X, put Gx = {g ∈ G : g(x) = x}. Gx is the stabilizer
of x in G.
Theorem 3.1.3 For x ∈ X, Gx is a subgroup of G (written Gx ≤ G). If G

is finite, then |G| = |[x]| · |Gx |.
Proof: It is an easy exercise to show that Gx is a subgroup of G. Having

done that, define a function f from the set of left cosets of Gx in G to [x] by:
f (gGx ) = g(x).
First we show that f is well-defined. If g1 Gx = g2 Gx , then g2−1 g1 ∈ Gx ,
so that (g2−1 · g1 )(x) = x, which implies g1 (x) = g2 (x). Hence f (g1 Gx ) =
f (g2 Gx ). So f is well-defined. Now we claim f is a bijection. Suppose
f (g1 Gx ) = f (g2 Gx ), so by definition g1 (x) = g2 (x) and
(g2−1 · g1 )(x) = x. Hence g2−1 · g1 ∈ Gx , implying g1 Gx = g2 Gx , so f is one-to-
one. If y ∈ [x], then there is a g ∈ G with g(x) = y. So f (gGx ) = g(x) = y,
implying f is onto [x].
Hence f is a bijection from the set of left cosets of Gx in G to [x], i.e.,
|G|/|Gx | = |[x]| as claimed.
Theorem 3.1.4 For some x, y ∈ X and g ∈ G, suppose that g(x) = y.

Then
(i) H ≤ Gx iff gHg −1 ≤ Gy ; in particular,
(ii) Gg(x) = gGx g −1 .

3.1. GROUP ACTIONS 91
Proof: Easy exercise.
Theorem 3.1.5 The Orbit-Counting Lemma (Not Burnside’s Lemma) Let

k be the number of G-orbits in X. Then
 
1 X
k= |Xg | .
|G| g∈G
Proof: Put S = {(x, g) ∈ X × G : g(x) = x}. We determine |S| in two

ways: |S| = x∈X |Gx | = g∈G |Xg |. Since x ∼ y iff [x] = [y], in which
P P
case |[x]| = |[y]|, it must be that |G|/|Gx | = |G|/|Gy | whenever x ∼ y. So

y∈[x] |Gy | = y∈[x] |Gx | = |[x]| · |Gx |
= |G|. Hence x∈X |Gx | = k · |G| =
P P P
g∈G |Xg |. And hence k = g∈G |Xg | /|G|.

P P
The following situation often arises. There is some given action ν of G

on some set X. F is the set of all functions from X into some set Y . Then
there is a natural action µ of G on F defined by: For each g ∈ G and each
f ∈ F = Y X,
µ(g)(f ) = f ◦ ν(g −1 ).
Theorem 3.1.6 µ : G → SF is an action of G on F.
Proof: First note that µ(g1 · g2 )(f ) = f ◦ ν((g1 · g2 )−1 ) = f ◦ ν(g2−1 ·

g1−1 )= f ◦ [ν(g2−1 ) ◦ ν(g1−1 )] = [f ◦ ν(g2−1 )] ◦ ν(g1−1 ) = µ(g1 )(f ◦ ν(g2−1 ) =
µ(g1 )(µ(g2 )(f )) = (µ(g1 ) ◦ µ(g2 ))(f ). So µ is an action of G on F provided
each µ(g) is actually a permutation of the elements of F.
So suppose µ(g)(f1 ) = µ(g)(f2 ), i.e., f1 ◦ ν(g −1 ) = f2 ◦ ν(g −1 ). But since
ν(g −1 ) is a permutation of the elements of X, it must be that f1 = f2 , so
µ(g) is one-to-one on F. For each g ∈ G and f : X → Y , f ◦ ν(g) ∈ Y X and
µ(g)(f ◦ ν(g)) = (f ◦ ν(g)) ◦ ν(g −1 ) = f , implying that µ(g) is onto.
To use Not Burnside’s Lemma to count G-orbits in F, we need to compute
|Fµ(g) | for each g ∈ G.
Theorem 3.1.7 For g ∈ G, let c be the number of cycles of ν(g) as a per-

mutation on X. Then |Fµ(g) | = |Y |c .
Proof: For f : X → Y , g ∈ G, we want to know when is f ◦ ν(g −1 ) = f ,

i.e., (f ◦ ν(g −1 ))(x) = f (x). This is iff f (ν(g −1 )(x)) = f (x). So f must have
the same value at x, ν(g −1 )(x), ν(g −2 )(x), . . ., etc. This just says that f is
constant on the orbits of ν(g) in X. So if c is the number of cycles of ν(g) as
a permutation on X, then |Y |c is the number of functions f : X → Y which
are constant on the orbits of ν(g).
Applying Not Burnside’s Lemma to the action µ of G on F = Y X , we
have:
Theorem 3.1.8 The number of G-orbits in F is
1 X 1 X
|Fµ(g) | = |Y |c(g) ,
|G| g∈G |G| g∈G
where c(g) is the number of cycles of ν(g) as a permutation on X.
3.2 Applications
1 2
Example 3.2.1 Let G be the group of symmetries of the square writ-
4 3
ten as permutations of [4] = {1, 2, 3, 4}. The convention here is that if a
symmetry σ of the square moves a corner labeled i to the corner previously
labeled j, then σ(i) = j. We want to paint the corners with W and R (white
and red) and then determine how many essentially different paintings there
are.
Here X = {1, 2, 3, 4}, Y = {W, R}. A painting f is just a function

f : X → Y . Two paintings f1 , f2 are the same if there is a g ∈ G with
f1 ◦ g −1 = f2 . So the number of distinct paintings is the number of G-orbits
in F = {f : X → Y }, which is
1 X
|Y |c(g) .
|G| g∈G
G = {e, (1234), (13)(24), (1432), (24), (12)(34), (13), (14)(23)}.

3.2. APPLICATIONS 93
So the number of distinct paintings is
1 4
(2 + 21 + 22 + 21 + 23 + 22 + 23 + 22 ) = 6.
8
The distinct paintings are listed as follows:

W W W R W R W W W R R R
W W W W R W R R R R R R
Example 3.2.2 How many necklaces are there with n beads of m colors if
two are the same provided one can be rotated into the other?
Let σ be the basic rotation σ = (1, 2, 3, . . . , n). So G = {σ i : 1 ≤ i ≤ n}.

The length of each cycle in σ i is the order of σ i , which is
n
.
gcd(n, i)
So the number of cycles in σ i is gcd(n, i). For each d such that d|n, there
are φ(n/d) integers i with 1 ≤ i ≤ n and d = gcd(n, i). So the number of
σ i ∈ G with d cycles is φ(n/d), for each d with d|n. If Y is the set of m
colors to be used, the number of distinct necklace patterns under the action
of G is
1 X c(g) 1X 1X
m = φ(n/d)md = φ(d)mn/d .
n g∈G n d|n n d|n
Note: If f (x) = n1 d|n φ(d)xn/d , then f (x) is a polynomial of degree n

P
with rational coefficients which lie between 0 and 1, but f (m) is a positive
integer for each positive integer m.
Exercise: 3.2.3 If G also has “flips,” so |G| = 2n, i.e., G is dihedral, how
many necklaces of n beads in m colors are there? (Hint: Do the cases n odd,
n even separately.)
Solution: If n is odd, each of the n additional permutations is a flip about

a uniquely defined vertex (i.e., bead), and it has 1 + n−1
2
= n+1
2
cycles. If n is
even, each of the n additional permutations is a flip. n/2 of them are about
an axis through two opposite beads and each has 2 + n−2 2
= n+2
2
cycles. The
n
other n/2 flips are about an axis that misses each bead and has 2 cycles. So
the number of distinct necklace patterns under the action of G is
 
1 X n d n n+2 n n
φ( )m + (m 2 ) + m2.
2n  d|n d 2 2
Answer to Exercise
(i) For n odd, the number of necklaces is:
 
1 X n+1
φ(n/d)md + nm 2  .
2n d|n
(ii) For n even, the number of necklaces is:

 
1 X 1 n
φ(n/d)md + nm 2 (m + 1) .
2n d|n 2
Example 3.2.4 A switching function f in n variables is a function

f : Z2n → Z2 . Starting with a group G acting on the set [n] = {1, . . . , n} we
can define an action of G on the set of all switching functions in n variables
and then ask how many inequivalent switching functions in n variables there
are.
As a first case, let G = Sn be the group of all permutations of the elements

of [n], and define an action ν of G on
[n]
Z2 = {x : [n] → Z2 = {0, 1}}
[n]
according to the following: For x ∈ Z2 (write x = (x1 , . . . , xn ): xi =0 or 1,
and where xi is the image of i under x), and g ∈ G, put
3.2. APPLICATIONS 95
(ν(g))(x) = x ◦ g −1 = (xg−1 (1) , . . . , xg−1 (n) ).
Now let
n
Fn = {f : Z2n → Z2 } = Z (Z2 ) .
So there is an action µ of G on Fn defined by: For f ∈ Fn , g ∈ G,
(µ(g))(f ) = f ◦ ν(g −1 ),
i.e.,
(µ(g)(f ))(x) = f (ν(g −1 )(x)) = f (x ◦ g) = f (xg(1) , . . . , xg(n) ).
This last equation says that g ∈ Sn acts on Fn by: (g(f ))(x) = f (x ◦ g),
i.e.,
g(f ) : (x1 , . . . , xn ) 7→ f (xg(1) , . . . , xg(n) ).
We say that f1 and f2 are equivalent if they are in the same G-orbit, and
we would like to determine how many inequivalent switching functions in n
variables there are. This is too difficult for us to do for general n, so we put
n = 3. But first we rename the elements of Z23 essentially by listing them in
a natural order so that we can simplify notation in what follows.
(000) ↔ 0; (001) ↔ 1; (010) ↔ 2; (011) ↔ 3;
(100) ↔ 4; (101) ↔ 5; (110) ↔ 6; (111) ↔ 7.

We note that
G = S3 = {e, (123), (132), (12), (13), (23)}
effects an action ν on Z23 by: ν(g)(x) = x ◦ g −1 :

e : (x1 , x2 , x3 ) 7→ (x1 , x2 , x3 ) ⇒ ν(e) = (0)(1)(2)(3)(4)(5)(6)(7)
(123) : (x1 , x2 , x3 ) 7→ (x3 , x1 , x2 ) ⇒ ν(123) = (0)(142)(356)(7)
(12) : (x1 , x2 , x3 ) 7→ (x2 , x1 , x3 ) ⇒ ν(12) = (0)(1)(24)(35)(6)(7)
(13) : (x1 , x2 , x3 ) 7→ (x3 , x2 , x1 ) ⇒ ν(13) = (0)(14)(2)(36)(5)(7)
(132) : (x1 , x2 , x3 ) 7→ (x2 , x3 , x1 ) ⇒ ν(132) = (0)(124)(365)(7)
(23) : (x1 , x2 , x3 ) 7→ (x1 , x3 , x2 ) ⇒ ν(23) = (0)(12)(3)(4)(56)(7)
F3 = {f : Z23 → Z2 } = {f : {0, . . . , 7} → Z2 }.
For g ∈ G, |(F3 )g | = 2c(g) , where c(g) is the number of cycles in ν(g). So

the number of G-orbits in F3 is
1 X c(g) 1 8
2 = [2 + 24 + 26 + 26 + 24 + 26 ] = 80
|G| g∈G 6
3
(whereas |F3 | = 2(2 ) = 28 = 256).
Exercise: 3.2.5 Repeat this problem with n = 2.
Exercise: 3.2.6 Repeat this problem with n = 3, but extend G to a group of

order 12 on Z23 by allowing complementation: x̄i = 1 + xi , with addition mod
2.
3.3 The Cycle Index: Polya’s Theorem

Let G act on a set X, with n = |X|. For each g ∈ G, let λt (g) be the number
of cycles of length t in the cycle decomposition of g as a permutation on X,
1 ≤ t ≤ n. Let x1 , . . . , xn be variables. Then the CYCLE INDEX of G
(relative to the given action of G) is the polynomial
1 X λ1 (g)
PG (x1 , . . . , xn ) = x · · · xλnn (g) .
|G| g∈G 1
3.3. THE CYCLE INDEX: POLYA’S THEOREM 97
If Y is a set with m = |Y |, then G induces an action on Y X , as we have

seen above. And the “ordinary” version of Polya’s counting theorem is given
as follows:
Theorem 3.3.1 The number of G-orbits in Y X is
1 X (Pn λt (g))
PG (m, . . . , m) = m t=1 .
|G| g∈G
Proof: Of course, this is exactly what Theorem 3.1.8 says.
Example 3.3.2 THE GROUP OF RIGID MOTIONS OF THE CUBE
Consider a cube in 3-space. It has 8 vertices, 6 faces and 12 edges. Let

G be the group of rigid motions, (i.e., rotations) of the cube. G consists of
the following rotations:
(a) The identity.
(b) Three rotations of 180 degrees about axes connecting centers of op-
posite faces.
(c) Six rotations of 90 degrees around axes connecting centers of opposite
faces.
(d) Six rotations of 180 degrees around axes joining midpoints of opposite
edges.
(e) Eight rotations of 120 degrees about axes connecting opposite vertices.
Exercise: 3.3.3 Compute the Cycle Index of G considered as a group of

permutations on the vertices (resp., edges; resp., faces) of the cube.
CONVENTION: If A labels a vertex (edge, face, etc.) which is carried

by a motion π to the position originally held by the vertex (edge, face, etc.)
labeled B, we write π(A) = B.
3.4 Sylow Theory Via Group Actions

We begin with a preliminary result.
Theorem 3.4.1 Let n = pα m where p is a prime number, and let pr ||m

(i.e., pr divides m, but pr+1 does not divide m). Then
!
r pα m
p || .
pα
Proof: The question is: What power of p divides
(pα m)!
!
pα m
= =
pα (pα )!(pα m − pα )!
pα m(pα m − 1) · · · (pα m − i) · · · (pα m − pα + 1)

= ?
pα (pα − 1) · · · (pα − i) · · · (pα − pα + 1)
Looking at this expression written out, one can see that except for the
factor m in the numerator, the power of p dividing (pα m − i) = pα − i +
pα (m − 1) is the same as that dividing pα − i, since 0 ≤ i ≤ pα − 1, so all
powers of p cancel out except the power which divides m.
Theorem 3.4.2 (Sylow Theorem 1) Let G be a finite group with |G| = n =

pα q, pβ ||n. (So β ≥ α.) Then G has a subgroup of order pα .
!
n β−α
Proof: From the preceding result it follows that p || . Put X =
pα
{S ⊆ G : |S| = pα }. µ : G → SX is an action of G on X, where µ is defined
by
µ(g)S = gS = {gs : s ∈ S}.

Fix S = {g1 , . . . , gpα } ∈ X. If g ∈ GS , then S = gS ⇒ {gg1 , . . . , ggpα } =
{g1 , . . . , gpα }, which implies that gg1 = gk for some k, so that
g = gk g1−1 ∈ {g1 g1−1 , g2 g1−1 , . . . , gpα g1−1 }.

3.4. SYLOW THEORY VIA GROUP ACTIONS 99
From this it is clear that |GS | ≤ pα . !

n
1 h
Let O , . . . , O be the distinct G-orbits in X. = |X| = ht=1 |Ot |,
P
α
p
β−α+1
so that p does not divide |X|. Hence there is some t for which pβ−α+1
does not divide |Ot |. Let O be any one of the orbits Ot for which pβ−α+1 does
not divide |O|. For any S ∈ O, O = {gS : g ∈ G}, so |G| = |GS | · |O| = pβ d
pβ d
where p does not divide d. Hence |O| = |G S|
, forcing pα to divide |GS |.
Putting this together with the result of the previous paragraph, |GS | = pα .
This shows that if O is an orbit for which pβ−α+1 does not divide |O| (and
there must be at least one such), then for each S ∈ O, |GS | = pα . So G has
a subgroup of order pα .
Theorem 3.4.3 (Sylow Theorem 2) If pα is an exact divisor of |G|, then all

subgroups of G with order pα (i.e., all Sylow p-subgroups of G) are conjugate
in G.
Proof: In Theorem 3.4.2 put β = α. So there is an orbit O for which p

does not divide |O|, and |GS | = pα for each S ∈ O.
We claim that the converse holds: If H ≤ G with |H| = pα , then there
must be some T ∈ O for which H = GT , so that H and GS are conjugate
for each S ∈ O.
Given H ≤ G with |H| = pα , clearly H acts on O. Let Q1 , . . . , Qm be
all the H-orbits in O, so that |O| = m t
t=1 |Q |. Since p does not divide |O|,
P
there must be some t for which p does not divide |Qt |. Choose any T ∈ Qt ,
α
so Qt = {hT : h ∈ H}. Then |HT | · |Qt | = |H| = pα and |Qt | = |Hp T | . Hence
pα = |HT |, since p does not divide |Qt |. Since HT ≤ H and |HT | = |H|, we
have H = HT . Also T ∈ O with p not dividing |O|, so |GT | = pα . Then
HT ≤ GT and |HT | = |GT | imply that HT = GT . Hence H = GT .
Theorem 3.4.4 (Sylow Theorem 3) Let pα be an exact divisor of |G|, with

|G| = pα q. Let tp be the number of Sylow p-subgroups in G. Then:
(a) tp ≡ 1 (mod p);
(b) tp |q.
Proof: Let H1 , . . . , Hr be the distinct Sylow p-subgroups of G. Recall

that there is an orbit O with p not dividing |O|, and each Hi is the stabilizer
of some element of O. In fact, each Hi is the stabilizer of the same number

s of elements of O, so |O| = rs.
Let P1 = {S ∈ O : H1 = GS }. So |P1 | = s. If T ∈ O \ P1 , then the
H1 -orbit U containing T has more than one element. |(H1 )T | · |U| = |H1 |.
But |H1 | = pα , |U| > 1, imply that p divides |U|. So rs = |O| = s + mp ≡
s 6≡ 0 (mod p). So s 6≡ 0 (mod p) and s(r − 1) ≡ 0 (mod p) imply that
r ≡ 1 (mod p).
Finally, |O| = |G|/|GS | = q for any S ∈ O, so that rs = q. With r = tp
the theorem is proved.
3.5 Patterns and Weights

Let X and Y be finite, nonempty sets (with |X| = m), and let ν : G → SX
be an action of the group G on the set X. Then G induces an equivalence
relation on Y X , with equivalence classes being called patterns: viz., for f, h ∈
Y X , then f ∼ h if and only if there is some g ∈ G for which f = h ◦ ν(g −1 ).
Let Ω be a commutative ring containing the rational numbers Q as a
subring. Frequently, Ω is a polynomial ring over Q in finitely many variables.
Also, we suppose there is some weight function w : Y → Ω. Y is called the
store, and y∈Y w(y) is the store inventory. A weight function W : Y X → Ω
P
is then defined by
Y
W (f ) = w(f (x)).
x∈X
It is easy to see that if f and h belong to the same pattern, then W (f ) =

W (h). For, with h = f ◦ ν(g −1 ),
w[f (ν(g −1 )(x))] =

Y Y Y
W (h) = x∈X
w(h(x)) = x∈X x∈X
w[f (x)] = W (f ),
since ν(g −1 )(x) varies over all elements of X as x varies over all elements of
X. So the weight of a pattern may be defined as the weight of any function in
P |X|
that pattern. The inventory f ∈Y X W (f ) of Y X is equal to
P
y∈Y w(y) .
This is a special case (with each |Xi | = 1) of the following result, whose proof
is given.
3.5. PATTERNS AND WEIGHTS 101
Theorem 3.5.1 If X is partitioned into disjoint, nonempty subsets X =

X1 + · · · + Xk , put
S = {f ∈ Y X : f is constant on each Xi , i = 1, . . . , k}.

Qk nP o
P |Xi |
Then the inventory of S is defined to be f ∈S W (f ) and is equal to i=1 y∈Y (w(y)) .
Proof: A term in the product is obtained by selecting one term in each

factor and multiplying them together. This is equivalent to selecting a map-
ping φ of the set {1, . . . , k} into Y , yielding the term ki=1 [w(φ(i))]|Xi | . Let
Q
ψ : X → {1, . . . , k} be defined by ψ(x) = i if and only if x ∈ Xi . Put

f = φ ◦ ψ. Then
w(φ(i))|Xi | =
Y Y Y
w(φ(i)) = w((φ ◦ ψ)(x)) = w(f (x)),
x∈Xi x∈Xi x∈Xi
from which it follows that
k
[w(φ(i))]|Xi | =
Y Y
w(f (x)) = W (f ).
i=1 x∈X
Since each f ∈ S can be written uniquely in the form f = φ ◦ ψ for some

φ : {1, . . . , k} → Y , the desired result is easily seen to hold; viz.:
 
k X k
!

|Xi | |Xi |
Y X Y X
(w(y)) = w(φ(i)) = W (f ).
 
i=1 y∈Y φ:{1,...,k}→Y i=1 f ∈S
P |X|
If each |Xi | = 1, we have S = Y X and
P
f ∈Y X W (f ) = y∈Y w(y) .
Theorem 3.5.2 (Polya-Redfield) The Pattern Inventory is given by:
[w(y)]2 , . . . , [w(y)]m ),
X X X X
W (F ) = PG ( w(y),
F y∈Y y∈Y y∈Y
where the summation is over all patterns F , and PG is the cycle index. In
particular, if all weights are chosen equal to 1, the number of patterns is
PG (|Y |, |Y |, . . . , |Y |). If f ∈ F where F is a given pattern, then W (F ) =

Q
W (f ) = x∈X w(f (x)). If w(yi ) = xi , where x1 , . . . , xm are independent com-
muting variables, then W (f ) = x1 b1 x2 b2 · · · xm bm , where bi is the number of
times the color yi appears in the coloring of any f in the pattern F . Hence the
coefficient of x1 b1 x2 b2 · · · xm bm in PG ( y∈Y w(y), y∈Y [w(y)]2 , . . . , y∈Y [w(y)]m )
P P P
is the number of patterns in which the color yi appears bi times.
Proof: Let w be one of the possible values that the weight of a function
may have. Put S = {f ∈ Y X : W (f ) = w}. If g ∈ G, then W (f ◦ ν(g −1 )) =
w. Hence for each g ∈ G, µ(g) : f → f ◦ ν(g −1 ) maps S into S. (And it is
easy to see from earlier results that µ is an action of G on S.) Clearly, for
f1 , f2 ∈ S, f1 and f2 belong to the same pattern (in the sense mentioned at
the beginning of this section) if and only if they are equivalent relative to
the action µ of G on S. Not Burnside’s Lemma applied to µ : G → SS says
1 P
that the number of patterns contained in S is equal to |G| g∈G ψw (g), where
ψw (g) denotes the number of functions f with W (f ) = w and f = µ(g)(f ) =
f ◦ ν(g −1 ).
The patterns contained in S all have weight w. So if we multiply by w
and sum over all possible values of w, we obtain the patern inventory
X 1 XX
W (F ) = ψw (g) · w.
F |G| w g∈G
Also,
(g)
X X
ψw (g) · w = W (f ),
w f
where the right hand side is summed over all f ∈ Y X with f = f ◦ ν(g −1 ).
It follows that
(g)
X 1 XX
(W (F )) = W (f ).
|G| g∈G f
Here ν(g) splits X into cycles. And f = f ◦ ν(g −1 ) means
f (x) = f (ν(g −1 )(x)) = · · · = f (ν(g −i )(x)),

i.e., f is constant on each cycle of ν(g −1 ), and hence on each cycle of ν(g).
Conversely, each f constant on each cycle of ν(g) automatically satisfies
f = f ◦ ν(g −1 ), since ν(g −1 )(x) always belongs to the same cycle as x itself.
P(g)
Thus if the cycles are X1 , . . . , Xk , then f W (f ) is the inventory calculated
by Theorem 3.5.1 to be
 
(g) k
[w(y)]|Xi |  .
X Y X
W (f ) = 
f i=1 y∈Y
Let (b1 , . . . , bm ) be the cycle type of ν(g). This means that among the
numbers |X1 |, . . . , |Xk |, the number 1 occurs b1 times, 2 occurs b2 times, . . . ,
etc. Hence
 b1  b2  bm

(g) X  X  X 
(w(y))2  · · ·  (w(y))m 
X
W (f ) =  (w(y)) ·  .
f y∈Y y∈Y y∈Y
P 1 P P(g)
Finally, F W (F ) = |G| g∈G f W (f ) is obtained by putting
i 1 P b1 bm
g∈G x1 · · · xm .
P
xi = y∈Y (w(y)) in PG (x1 , . . . , xm ) = |G|
We close this section with some examples that partly duplicate some of
those given earlier.
Example 3.5.3 Suppose we want to distribute m counters over three persons

P1 , P2 , P3 with the condition that P1 obtain the same number as P2 . In how
many ways is this possible?
We are not interested in the individual counters, but only in the number
each person gets. Hence we want functions f defined on X = {P1 , P2 , P3 }
with range Y = {0, 1, . . . , m} and with the restrictions f (P1 ) = f (P2 ) and
P3
i=1 f (Pi ) = m. Put X1 = {P1 , P2 }, and X2 = {P3 }. Define w : Y → Ω
by w(i) = xi . Thus the functions we are interested in have weight xm , and
they are the only ones with weight xm . By Theorem 3.5.1 the inventory
P
f ∈S W (f ) must be equal to
 
2
w(y)|Xi |  = (1 + x2 + x4 + · · · + x2m )(1 + x + x2 + · · · + xm ).
Y X

i=1 y∈Y
But the coefficient of xm in this product is the coefficient of xm in
1 1 1
(1 − x2 )−1 (1 − x)−1 = (1 + x)−1 + (1 − x)−2 + (1 − x)−1 ,
4 2 4
which is the coefficient of xm in
1
(1 − x + x2 − x3 + · · · + (−1)m xm + · · ·)+
4
∞
!
1 2+i−1 1
xi + (1 + x + x2 + · · · + xm + · · ·),
X
+
2 i=0 i 4
which is equal to
(
1 1 1
m + 1, m even,
(m + 1) + ((−1)m + 1) = 2
1
2 4 2
(m + 1), m odd.
For the next few examples let G be the group of rigid motions of a cube.
The elements of G were given earlier, but this time we want to include the
details giving the cycle indices. Recall the elements of G from Example 3.3.2.
Example 3.5.4 Let X be the set of vertices of the cube. The cycle types are
indicated as follows:
(a) x81 (b) x42 (c) x24
(d) x42 (e) x21 x23

1
So PG = 24
(x81 + 9x42 + 6x24 + 8x21 x23 ).
Example 3.5.5 Let X be the set of edges of the cube. The cycle types are
indicated as follows:
(a) x12
1 (b) x62 (c) x34
(d) x21 x52 (e) x43

1
So PG = 24
(x12
1 + 3x62 + 6x34 + 6x21 x52 + 8x43 ).
Example 3.5.6 Let X be the set of faces of the cube. Then
1 6
PG = (x + 3x21 x22 + 6x21 x4 + 6x32 + 8x23 ).
24 1
Example 3.5.7 Determine how many ways a cube can be painted so that
each face is red or blue? In other words, how many patterns are there?
Let X be the set of faces of the cube, and µ : G → SX as in the preceding

example. Put Y = {red, blue}, with the weight of each element being 1. Then
1
the number of patterns is PG (2, 2, . . .) = 24 (26 +3·24 +6·23 +6·23 +8·22 ) = 10.
(Summary: (a) All faces red; (b) five red, one blue; (c) two opposite faces
blue, the others red; (d) two adjacent faces blue, the others red; (e) three
faces at one vertex red, the others blue; (f) two opposite faces plus one other
red, the remaining faces blue; (g), (h), (i) and (j) obtained from (d), (c), (b)
and (a) upon interchanging red and blue.)
Example 3.5.8 In the preceding example, how many color patterns show
four red faces, two blue?
Let w(red) = x, w(blue) = y. Then the pattern inventory is
1
[(x + y)6 + 3(x + y)2 (x2 + y 2 )2 +
X
W (F ) =
F 24
+6(x + y)2 (x4 + y 4 ) + 6(x2 + y 2 )3 + 8(x3 + y 3 )2 ].

1
The coefficient of x4 y 2 is 24
(15 + 9 + 6 + 18 + 0) = 2.
Example 3.5.9 In how many ways can the eight vertices be painted with n
colors?
Let X be the set of vertices, with µ : X → SX as in Example 3.5.4. Let

Y = {c1 , . . . , cn }, with w(ci ) = xi . Then the pattern inventory P I is given
by
1
PI = [(x1 + · · · + xn )8 + 9(x21 + · · · + x2n )4 + 6(x41 + · · · + x4n )2 +
24
+8(x1 + · · · + xn )2 (x31 + · · · + x3n )2 ].
If the total number of patterns is all that is sought, putting xi = 1 shows

1 2
this number to be 24 n (n6 + 17n2 + 6).
Example 3.5.10 Let G be a finite group of order m. For each a ∈ G put

λa (g) = ag. So λa ∈ SG . Then Gλ = {λa : a ∈ G} is a subgroup of
SG , and Λ : G → Gλ : a 7→ λa is an isomorphism called the left regular
representation of G. We now calculate the cycle index of G relative to its
left regular representation.
Let k(a) be the order of a for each a ∈ G. Then λa splits G into m/k(a)
cycles of length k(a). So
1 X m/k(a) 1 X m/d
PG = xk(a) = ν(d)xd ,
m a∈G m d|m
where ν(d) is the number of elements a in G of order k(a) = d. If G is cyclic

of order m, then ν(d) = φ(d) for each d such that d|m, so
1 X m/d
PG = φ(d)xd .
m d:d|m
3.6 The Symmetric Group

Let X be a finite set with m elements. Let G = SX ' Sm . Let b̄ =
(b1 , . . . , bm ) be a permissible cycle type of some g ∈ G, i.e., bi ≥ 0 and
b1 + 2b2 + 3b3 + · · · + mbm = m. Then:
Theorem 3.6.1 The number #(b̄) of permutations in Sm and having type b̄

is
3.6. THE SYMMETRIC GROUP 107
m!
#(b̄) = .
b1 !1b1 b 2 !2b2 b b3
3 !3 · · · bm !m
bm
!
m
Proof: There are ways to form the 1-cycles. Suppose we have
b1
taken care of the 1-cycles, 2-cycles, . . . , (k − 1)-cycles and are about to form
the k-cycles. We have ηk−1 = m − b1 − 2b2 − · · · − (k − 1)bk−1 elements at !
ηk−1
our disposal (η0 = m). The first k-cycle can be formed (k − 1)!
k
!
ηk−1 − k
ways, the second k-cycle in (k − 1)! ways, . . . , the bk th k-cycle
k
!
ηk−1 − (bk − 1)k
in (k − 1)! ways. Hence the k-cycles can be formed in
k
( ! ! !
1 ηk−1 ηk−1 − k ηk−1 − 2k
(k − 1)! (k − 1)! (k − 1)! ···
bk ! k k k
!)
ηk−1 − (bk − 1)k
· · · (k − 1)! =
k
(
1 (k − 1)!ηk−1 ! (k − 1)!(ηk−1 − k)! (k − 1)!(ηk−1 − 2k)!
= · · ···
bk ! k!(ηk−1 − k)! k!(ηk−1 − 2k)! k!(ηk−1 − 3k)!
(k − 1)!(ηk−1 − k(bk − 1))! ηk−1 !

· }=
k!(ηk−1 − bk k)! bk !k bk ηk !
1
ways. (The initial factor bk !
arises from the fact that the k-cycles may be
written in any order.)
From Theorem 3.6.1 it follows readily that:
Theorem 3.6.2 The cycle index of Sm is given by
X xb11 · · · xbmm
PS m = ,
b̄
b1 !1b1 b2 !2b2 · · · bm !mbm
where the sum is over all b̄ = (b1 , . . . , bm ) for which bi ≥ 0, b1 + 2b2 + · · · +

mbm = m.
Now let M = {1, 2, . . . , m}, and let X be the set of unordered pairs {i, j}
of distinct elements of M . The symmetric group Sm acting on M also has a
natural action on X. For each σ ∈ Sm , define σ ∗ ∈ SX by
σ ∗ : {i, j} 7→ {σ(i), σ(j)}.
Then µ : σ 7→ σ ∗ is an action of Sm on X, and
∗
µ : Sm → Sm = {σ ∗ : σ ∈ Sm }
∗
is an isomorphism. We need to calculate the cycle index of Sm .
This is feasible because the cycle type of each σ determines the cycle type
of the corresponding σ ∗ . Specifically, each factor xbt t in a term xb11 · · · xbmm of
PSm corresponding to a σ of cycle type (b1 , . . . , bm ) yields specific factors in
the term corresponding to σ ∗ .
Let σ ∈ Sm have cycle type (b1 , . . . , bm ), m i=1 ibi = m. Then for {i, j} ∈
P
X we ask what is the length of the cycle of σ ∗ to which {i, j} belongs?

First suppose i and j belong to the same cycle of σ of length t. If t is
odd, (σ ∗ )k : {i, j} 7→ {σ k (i), σ k (j)} = {i, j} if and only if 1) σ k (i) = i and
σ k (j) = j, or 2) σ k (i) = j and σ k (j) = i. Since σ k (i) = j and σ k (j) = i
imply σ 2k (i) = i, it must be that k is a multiple of t. Hence σ i (i) = i and
σ k (j) = j. So the σ ∗ -cycle of {i, j} has! length t also. Hence one cycle of σ
t
having odd length t produces 1t = t−12
cycles of length t.
2
Now suppose i and j belong to the same cycle of σ of even length t = 2k.
This cycle produces one cycle of σ ∗ of length k and
" ! # " #
1 2k 1 2k(2k − 1) 2k t−2
−k = − = .
2k 1 2k 2 2 2
Since there are bt cycles of length t, the pairs i, j belonging to common

cycles of σ yield cycles of σ ∗ in such a way that the terms of the cycle index
∗
of Sm yield terms of the cycle index of Sm as follows:

bt ( t−1 )
 xt 2 , t odd;


xbt t ∈ PSm 7→ t−2
( )
bt
 xt/2 xt 2 , t even.


3.6. THE SYMMETRIC GROUP 109
Now we suppose that i and j come from distinct cycles cr and ct of σ

of lengths r and t, respectively. (Note: (r, t) and [r, t], denote the greatest
common divisor and the least common multiple, respectively, of r and t.)
Also, rt = (r, t)[r, t]. The cycles cr and ct induce on the pairs of elements, one
each from cr and ct , exactly (r, t) cycles of length [r, t], since {σ k (i), σ k (j)} =
{i, j} if and only if k ≡ 0 (mod [r, t]). In particular, if r = t = k, there are k
cycles of length k. Hence we have the following:
(r,t)br bt
If r 6= t, xbrr xbt t ∈ PSm 7→ x[r,t] ∈ PSm∗ .
!
bk
k
2
If r = t = k, xbkk ∈ PSm 7→ xk .
Multiplying over appropriate cases, and summing over permissible cycle

∗
types b̄, we finally obtain the cycle index of Sm :
PSm∗ (x1 , . . . , xm ) =
1 X m!
·
m! b̄ b1 ! · · · bm !1b1 2b2 · · · mbm
!
bk
[ m−1
2 ]
m
[Y
2 ]
m
[Y
2 ]
k
kb2k+1 2 (r,t)br bt
(xk xk−1 b2k
Y Y
· x2k+1 · 2k ) · xk · x[r,t]
k=0 k=1 k=1 1≤r<t≤m−1
Example 3.6.3 For m = 4 we have
1 4
PS 4 = (x + 6x21 x2 + 8x1 x3 + 3x22 + 6x4 ).
4! 1
1 6
PS4∗ = (x1 + 6x21 x22 + 8x23 + 3x21 x22 + 6x2 x4 ).
4!
For future reference we calculate:
PS4∗ (1 + x, 1 + x2 , 1 + x3 , 1 + x4 ) =
1 + x + 2x2 + 3x3 + 2x4 + x5 + x6 .
3.7 Counting Graphs

We count the graphs Γ on m vertices with q edges. Let G denote the set of
graphs Γ on the vertices M = {1, 2, . . . , m}. Such a Γ is a function from the
set X of unordered pairs {i, j} of distinct elements of M to the set Y = {0, 1},
where Γ({i, j}) is 1 or 0, according as {i, j} is an edge or a nonedge of the
graph Γ.
Two such graphs Γ1 and Γ2 are equivalent if and only if there is a rela-
beling of the vertices of Γ1 so that it has the same edges as Γ2 , i.e., iff there
is a permutation σ ∈ Sm so that as functions Γ2 = Γ1 ◦ σ ∗ , where for each
σ ∈ Sm , σ ∗ acts on X as usual: σ ∗ : {i, j} 7→ {σ(i), σ(j)}.
Let w : Y → {1, x} be defined by w(i) = xi . Then the weight of a graph
Γ ∈ Y X is W (Γ) = xq , where q is the number of edges of Γ. The pattern
inventory of G = Y X is PSm∗ (1 + x, 1 + x2 , 1 + x3 , . . .) by Polya’s theorem. So
the number of graphs on m vertices with q edges is the coefficient of xq in
PSm∗ (1 + x, 1 + x2 , 1 + x3 , . . .).
In particular, putting m = 4, we see from the preceding section that there
are 11 graphs on 4 vertices.
Chapter 4
Formal Power Series as

Generating Functions
4.1 Using Power Series to Count Objects

Suppose we are interested in all combinations of three objects O1 , O2 , O3 :
∅, {O1 }, {O2 }, {O3 }, {O1 , O2 }, {O1 , O3 }, {O2 , O3 }, {O1 , O2 , O3 }.

Consider the “generating function”
C3 (x) = (1 + O1 x)(1 + O2 x)(1 + O3 x)

= 1 + (O1 + O2 + O3 )x + (O1 O2 + O1 O3 + O2 O3 )x2 + (O1 O2 O3 )x3 .
This is readily generalized to
Cn (x) = (1 + O1 x)(1 + O2 x) · · · (1 + On x) = 1 + a1 x + · · · an xn ,
where ak is the “kth elementary symmetric function” of the n variables O1 to
On . Cn (x), after multiplication of it factors, contains the actual exhibition
of combinations. If only the number of combinations is of interest, the object
labels may be ignored and the generating function becomes an enumerating
generating function (sometimes just called an enumerator), i.e.,
∞
!
n n
xk .
X
Cn (x) = (1 + x) =
k=0
k
113
114CHAPTER 4. FORMAL POWER SERIES AS GENERATING FUNCTIONS
As a simple (and familiar!) example of the way in which generating

functions are used, consider the following :
n−1
!
n n−1
xk
X
Cn (x) = (1 + x) = (1 + x)Cn−1 (x) = (1 + x)
k=0
k
! " ! !#
n−1 n−1 n−1
= + + x + ···+
0 0 1
" ! !# !
n−1 n−1 n−1 n−1
+ + x + xn ,
n−2 n−1 n−1
where for 1 ≤ k ≤ n − 1, the coefficient on xk is
! !
n−1 n−1
+ .
k−1 k
But
n
!
n
xk ,
X
Cn (x) =
k=0
k
so
! ! !
n n−1 n−1
= + ,
k k−1 k
a basic relation for binomial coefficients.
Now look again at the generating function for combinations of n distinct
objects. In its factored form, each object is represented by a binomial and
each binomial spells out the fact that its object has two possibilities in any
combination: either it is absent (the term 1) or it is present (the term Oi x for
the object Oi ). So Cn (x) = (1 + O1 x) · · · (1 + On x) is the generating function
for combinations without repetition. For repetitions of a certain kind, we
use a special generating function. For example, if an object may appear in
a combination zero, one or two times, then the function is the polynomial
1 + Ox + O2 x2 . If the number of repetitions is unlimited, it is the function
1 + Ox + O2 x2 + O3 x3 · · · = (1 − Ox)−1 . If the number of repetitions is even,
the generating function is 1 + O2 x2 + O4 x4 + · · · + O2k x2k + · · ·. Moreover,
the specification of repetitions may be made arbitrarily for each object. The
generating function is a representation of this specification in its factored
4.1. USING POWER SERIES TO COUNT OBJECTS 115
form as well as a representation of the corresponding combinations in its

developed (multiplied out) form.
EXAMPLE: For combinations of n objects with no restrictions on the
number of repetitions for any object, the enumerating generating function is
∞
!
2 n −n −n
(−x)k
X
(1 + x + x + · · ·) = (1 − x) =
k=0
k
∞ ∞
!
k n+k−1
xk .
X X
= (−n)(−n − 1) · · · (−n − k + 1)(−x) /(k!) =
k=0 k=0
k
This is worth stating as a theorem.
Theorem 4.1.1 The number of combinations, with arbitrary repetition, of

n objects k at a time, is the same as the number of combinations without
repetition of n + k − 1 objects, k at a time. The corresponding generating
function is given by:
∞
!
2 n −n
X n+k−1 k
(1 + x + x + · · ·) = (1 − x) = x . (4.1)
k=0 k
For the problem in the above example with the added specification that
2 n
each object must appear at least once,
! ! + x + · · ·) =
the enumerator is (x
n+k−1 k−1
xn (1 − x)−n = ∞ xk+n = ∞ xk . Hence the
P P
k=0 k=n
k k−n
number of combinations of n objects taken k at a time with the restriction
that each object appear at least once is the same as the number of combina-
tions without repetition of k − 1 objects taken k − n at a time.
In this section we have seen power series manipulated without any con-
cern for whether or not they converge in an analytic sense. We want to
establish a firm foundation for the theory of formal power series so that we
can continue to work “magic” with series, so a significant part of this chap-
ter will be devoted to developing those properties of formal power series that
have proved to be most useful. Before starting this development we give one
more example to illustrate the power of these methods.
4.2 A famous example: Stirling numbers of

the 2nd kind
( )
n
We have let be the number of partitions of [n] into k nonempty classes,
k
( )
n
for integers n, k, with 0 < k ≤ n. Also, we put = 0 if k > n or n < 0
k
( )
n
or k < 0. Further, we take = 0 for n 6= 0. Hence we have
0
( ) ( ) ( )
n n−1 n−1
= +k (4.2)
k k−1 k
( )
0
for (n, k) 6= (0, 0). And = 1. (Recall: We proved this earlier for
0
0 < k ≤ n.)
For each integer k ≥ 0 put
( )
n
xn .
X
Bk (x) = (4.3)
n k
Multiply Eq. 4.2 by xn and sum over n:
Bk (x) = xBk−1 (x) + kxBk (x) (k ≥ 1; B0 (x) = 1.) (4.4)

This leads to:
x x2
Bk (x) = Bk−1 (x) = Bk−2 (x).
1 − kx (1 − kx)(1 − (k − 1)x)
x x x2 x
So B1 (x) = 1−x
·1; B2 (x) = B (x)
1−2x 1
= (1−x)(1−2x)
; B3 (x) = B (x)
1−3x 2
=
x3
(1−x)(1−2x)(1−3x)
. And in general,
xk
( )
n
xn =
X
Bk (x) = , k ≥ 0. (4.5)
n k (1 − x)(1 − 2x) · · · (1 − kx)
The partial fraction expansion of Eq. 4.5 has the form

k
1 X αj
= .
(1 − x)(1 − 2x) · · · (1 − kx) j=1 1 − jx
4.2. A FAMOUS EXAMPLE: STIRLING NUMBERS OF THE 2ND KIND117
To find the α’s, fix r, 1 ≤ r ≤ k, and multiply both sides by 1 − rx.

1
=
(1 − x) · · · (1 − (r − 1)x)(1 − (r + 1)x) · · · (1 − kx)
k
X αj (1 − rx)
= + αr .
j=1;j6=r (1 − jx)
Now putting x = 1/r gives:
1
αr =
(1 − 1/r)(1 − 2/r) · · · (1 − (r − 1)/r)(1 − (r + 1)/r) · · · (1 − k/r)
rk−1 rk−1 (−1)k−r
= = ,
(r − 1)(r − 2) · · · (1)(−1) · · · (−(k − r)) (r − 1)!(k − r)!
which implies
(−1)k−r rk−1
αr = , 1 ≤ r ≤ k.
(r − 1)!(k − r)!
Notation: If f (x) = ∞ n n
n=0 an x , then [x ]{f } = an . Clearly in general
P
n n+r r
we have [x ]{f } = [x ]{x f }.
We now have for k ≥ 1:
xk
( ) ( ) ( )
n 1
= [xn ] = [xn−k ]
k (1 − x) · · · (1 − kx) (1 − x) · · · (1 − kx)
k k
αr 1
= [xn−k ] αr [xn−k ]
X X
=
r=1 1 − rx r=1 1 − rx
k k
X
n−k
X (−1)k−r rn−1
= αr r = =
r=1 r=1 (r − 1)!(k − r)!
k
!
1 k−1
(−1)k−r rn−1
X
= .
(k − 1)! r=1 r−1
( )
n
This gives a closed form formula for :
k
k−1
( ) !
n 1 k−1
(−1)k−r−1 (r + 1)n−1
X
= . (4.6)
k (k − 1)! r=0 r
4.3 Ordinary Generating Functions

Now that we have seen some utility for the use of power series as “generating
functions,” we want to introduce formal power series and lay a sound theoret-
ical foundation for their use. However, this material is rather technical, and
we believe it makes better pedagogical sense to use power series informally
as “ordinary” generating functions first. This will permit the reader to get
used to working with these objects before having to deal with the abstract
definitions and theorems.
ops
Def. The symbol f ↔ {an }∞ 0 means that f is the ordinary power series
∞ P∞
generating function for the sequence {an }0 , i.e., f (x) = i=0 ai xi .
ops
Rule 1. If f ↔ {an }∞
0 , then for a positive integer h,
f − a0 − a1 x − · · · − ah−1 xh−1
↔ {an+h }∞
n=0 .
xh
ah xh +ah+1 xh+1 +···
This follows immediately, since the L.H.S. equals xh
= ah +
ah+1 x + · · · + ah+n xn + · · ·.
an xn is f 0 = nan xn−1 .
P P
Def. The derivative of f =
Proposition 1. f 0 = 0 iff f = a0 , i.e., ai = 0 for ı ≥ 1.
Proposition 2. If f 0 = f , then f = cex .
Proof: f = f 0 iff ∞
P∞
n n−1
= ∞ n
P P
n=0 an x = n=1 nan x n=0 (n + 1)an+1 x iff
an
an = (n + 1)an+1 for all n ≥ 0, i.e., an+1 n+1 for n ≥ 0. By induction on n,
P xn
an = A0 /n! for all n ≥ 0. So f = a0 n! = a0 ex .
P∞
Starting with f = n=0 an xn , we clearly have
∞
0
(n + 1)an+1 xn .
X
f =
n=0
Then we see that

∞ ∞
xf 0 = (n + 1)an+1 xn+1 = nan xn .
X X
n=0 n=0
So if we let D be the operator on f that gives f 0 , then

ops ops
f ↔ {an }∞ ∞
0 ⇒ (xD)f ↔ {nan }0 .
4.3. ORDINARY GENERATING FUNCTIONS 119
Then clearly (xD)2 f ↔ {n2 an }∞ 2

0 . And (2 − 3(xD) + 5(xD) )f ↔ {(2 −
3n + 5n2 )an }∞
0 . More generally, we obtain
ops
Rule 2: If f ↔ {an }∞
0 and P is a polynomial, then
ops
(P (xD))f ↔ {P (n)an }n≥0 .
The next Rule is just the product definition.

ops ops
Rule 3: If f ↔ {an }∞ ∞
0 and g ↔ {bn }0 , then
n
ops
ar bn−r }∞
X
fg ↔ { 0 .
r=0
An immediate generalization is:

ops
0 , then
ops
fk ↔ { an1 an2 · · · ank }∞
X
n=0
where the sum is over all (an1 , . . . , ank ) for which n1 + n2 + · · · + nk = n and
ni ≥ 0.
Example: Lef f (n, k) be the number of weak k-compositions of n. Since
1 ops 1 ops ∞
(1−x)
↔ {1}, by Rule 4, (1−x) k ↔ {f (n, k)}n=0 . And as we have already seen,
! !
−k −k
(1 − x)−k = ∞ (−1)n xn implies that f (n, k) = (−1)n
P
n=0 =
n n
!
n+k−1
.
n
1
We now ask what happens when we multiply by 1−x :
2 2
f (x)/(1 − x) = (a0 + a1 x + a2 x + · · ·)(1 + x + x + · · ·) = a0 + (a0 + a1 )x +
(a0 + a1 + a2 )x2 + · · ·.
This leads to Rule 5:
ops ops Pn
0 , then f /(1 − x) ↔ { j=0 aj }n≥0 .

1 (m+n)!
Exercise: 4.3.1 Dn (1−x)m+1
= m!(1−x)m+n+1
; m, n ≥ 0.
Recall that there is a “formal” Taylor’s formula. If you did not prove it
when we met it earlier, do it now.
P∞ 1
Exercise: 4.3.2 If f (x) = n=0 an xn , then an = n!
Dn (f (x))|x=0 .
We can put together the two previous exercises to obtain

!
n 1 (m + n)!
[x ] = (4.7)
(1 − x)m+1 m!n!
1 ops ops

1
Example 4.3.3 Start with f = 1−x
↔ {1}n≥0 . By Rule 2 we have (xD)2 1−x ↔
ops P
{n2 }n≥0 . Then by Rule 5, 1−x1
(xD)2 1−x 1
↔ { nj=0 j 2 }n≥0 . This implies

= [xn ] x(1+x)
Pn 2 n 1 2 1
j=0 j = [x ] 1−x (xD) 1−x (1−x)4
, after some calculation. From
!
1 (n+3)! n+3 Pn
Eq.4.7 with m = 3, [xn ] (1−x) 4 = 3!n!
= . Hence j=0 j2 =
3
! !
n

x x2

n−1

1

n−2

1
n+2 n+1
[x ] (1−x)4
+ (1−x)4
= [x ] (1−x)4
+[x ] (1−x)4
= + =
3 3
n(n+1)(2n+1)
6
.
Example 4.3.4 (Harmonic numbers) Put Hn = 1 + 12 + 13 + · · · + n1 , n ≥ 1.
n
Start with f = n≥1 xn = −log(1−x). (Check: f 0 = n≥1 xn−1 = n≥0 xn =
P P P
ops
1
1−x
, and (−log(1 − x))0 = (−1) · 1−x
1 1
· (−1) = 1−x ). So f ↔ { n1 }n≥1 . By Rule
ops
1
5, 1−x f ↔ {Hn }∞n=1 . So
1 1

ops
log ↔ {Hn }∞ n=1 .
1−x 1−x
In the first three sections of this chapter we gave examples of how to use
“ordinary power series” as generating functions. There are other ways to
associate a possibly infinite sequence with a power series. In later sections
we explore a couple more types, the more important of which are the exp-
ponential generating functions. But first we begin our exposition of formal
power series.
Exercise: 4.3.5 Given three types of objects O1 , O2 and O3 , let an be the
number of combinations of these objects, n at a time, so that O1 is selected at
most 2 times, O2 is selected an odd number of times, and O3 is selected at least
once. Determine the ordinary generating function of the sequence {an }∞ n=0
and determine a recursion that is satisfied by the sequence. Compute a closed
form formula for an . As always, be sure to include enough details so that I
can see a proof that your answers are correct.
4.4. FORMAL POWER SERIES 121
Exercise: 4.3.6 If O1 can appear any number of times and O2 can appear
any multiple of k times, use Rule 5 to show that the number of combinations
of n objects must be an = 1 + b nk c.
Exercise: 4.3.7 Put an = |{(i, j) : i ≥ 0; j ≥ 0; i + 3j = n}|. Determine

the ordinary generating function for the sequence {an }∞
n=0 and determine an
explict value in closed form for an .
Exercise: 4.3.8 If O1 can appear an even number of times and O2 a multiple

of 4 times, determine the number of combinations of n of the objects.
Exercise: 4.3.9 If O1 can appear an odd number of times, O2 an even num-

ber of times, and O3 either 0 or 1 time, how many combinations of n of these
three objects are there? (Find the enumerating generating function.)
4.4 Formal Power Series

If we want to be fairly general we define formal power series in one indeter-
minate z over a commutative ring A with unity 1. Even though usually in
our applications A is the field of complex numbers or perhaps even just the
field of real numbers, it really does not take much additional effort to assume
merely that A is a commutative ring with unity. On the other hand, often
when power series are defined and their theory developed, series in n inde-
terminates are discussed. For simplicity, we will use only one indeterminate
in our “formal” development and in many applications. But the extension to
several indeterminates is conceptually natural and will be more or less taken
for granted after a certain stage.
A power series in one indeterminate x over the commutative ring A with
unity 1 is a formal sum
∞
ai xi = a0 + a1 x + · · · + an xn + · · · , ai ∈ A.
X
i=0
To begin with, {an }∞ n=0 is an arbitrary sequence in A; no question of

convergence is considered. Two power series ai xi and bi xi are equal iff
P P
ai = bi for all i = 0, 1, 2, . . . .
Define addition and multiplication of two power series as follows:

ai x i + bi x i = (ai + bi )xi , and
P P P
(i)
ai xi )( bi x i ) = ci xi ,
P P P
(ii) (
Pi
where ci = j=0 aj bi−j .
P∞
As an example of multiplication, consider the product (1−x) n=0 xn = 1.
Be sure to check this out! It says that (1 − x)−1 = ∞ n
P
n=0 x .
The ring of all formal power series in x over the commutative ring A is
denoted by A[[x]].
Exercise: 4.4.1 Prove that with these definitions, if A is an integral domain,

then the set A[[x]] of all formal power series in x over A is an integral domain
with unity equal to ∞ i
P
i=0 ai x , where a0 = 1, and ai = 0 for i > 0.
Proof: All the details are fairly routine. We include here only the details
for associativity of multiplication. So, to show
hX X i X X hX X i
ai x i · bi x i ci xi = ai x i bi x i ci xi
we must show that the coefficient of xj on on the left hand side is equal to the
coefficient of xj on the right hand side. The coefficientof xj on the L.H.S.
Pj Pk Pj Pk
is d c , where dk = i=0 ai bk−i . This is k=0 i=0 ai bk−i cj−k =
P k=0 k j−k
ai bk cl , where this last sum is over all i, k, l for which i + k + l = j, 0 ≤
i, k, l ≤ j. And the last equality is obtained by observing that each term
of the R.H.S. of this last equality appears exactly once on the left, and vice
versa. Similarly, the coefficient of xj on the R.H.S. of the main equality to
P
be established is also equal to ai bk cl , where this sum is over all i, j, k for
which i + k + l = j, 0 ≤ i, k, l ≤ j.
We have established the following:
n X o
[xj ] ( ai xi )( bi xi )( ci xi ) =
X X X
ai b k c l (4.8)
where the sum is over all i, k, l for which i + k + l = j, 0 ≤ i, k, l ≤ j.

This suggests more general formulas, but we write down only one special
case:
4.4. FORMAL POWER SERIES 123
 !k 
∞
 X 
[xn ] ai x i
X
= an1 an2 · · · ank (4.9)
 
i=0
where the sum is over all (n1 , n2 , . . . , nk ) for which ni ≥ 0 and n1 + n2 + · · · +

nk = n.
Def.n Let f (x) = ai xi be a nonzero element of A[[x]]. Then there must
P
be a smallest integer n for which an 6= 0. This n is called the order of f (x)

and will be denoted by o(f (x)). If n = o(f (x)), then an will be called the
initial term of f (x). We say that the order of the zero power series (i.e., the
zero element of the ring A[[x]]) is +∞.
Theorem 4.4.2 If f, g ∈ A[[x]], then

1. o(f + g) ≥ min{o(f ), o(g)};
2. o(f g) ≥ o(f ) + o(g).

Furthermore, if A is an integral domain, then A[[x]] is an integral domain
and o(f g) = o(f ) + o(g).
Proof: Easy exercise.

P∞
Theorem 4.4.3 If f (x) = i=0 ai xi ∈ A[[x]], then f (x) is a unit in A[[x]]
iff a0 is a unit in A.
Proof: If f (x)g(x) = 1, then (with g(x) = bi xi ), a0 b0 = 1, implying a0

P
is a unit in A. Conversely, suppose a0 is a unit in A. Then there is a b0 ∈ A

for which a0 b0 = 1. And there is a (unique!) b1 ∈ A for which a0 b1 +a1 b0 = 0,
i.e., b1 = a−1 2
0 (−a1 b0 ) = −a1 b0 .PProceed inductively. Suppose b0 , b1 , . . . , bn
have been determined so that ji=0 ai bj−i = 0 for j = 1, 2 . . . , n. Then put
bn+1 = −a−1 (a1 bn + · · · + an+1 b0 ) = −b0 (a1 bn + · · · + an+1 b0 ). By induction,
P 0
g(x) = ∞ i
i=0 bi x ∈ A[[x]] is constructed (uniquely!) so that f (x)g(x) = 1.
Theorem 4.4.4 If K is a field, then the units of K[[x]] are the power series
of order 0, i.e., with nonzero constant term.
The next result is inserted here for the algebraists. It will not be needed
in this course.
Theorem 4.4.5 Let K be a field. Then the ring K[[x]] has a unique maximal
ideal (generated by x), and the nontrivial ideals are all powers of this one.
Theorem 4.4.6 Let K be a field, and let f (x), g(x) ∈ K[[x]] with f (x) 6= 0.
Then there is a unique power series q(x) ∈ K[[x]] and there is a unique poly-
nomial r(x) ∈ K[x] such that g(x) = q(x)f (x) + r(x) and either deg(r(x)) <
o(f (x)) or r(x) = 0.
Proof: Let g(x) = ai xi , f (x) = bi xi , where h = o(f (x)) < ∞. Put

P P
r(x) = a0 + a1 x + · · · + ah−1 xh−1 . Then g(x) − r(x) = an xn + an+1 xn+1 + · · · ,

where an 6= 0 and n ≥ h. Now f (x) = bh xh + bh+1 xh+1 · · · = xh (bh +
bh+1 x + · · ·), where bh + bh+1 x + · · · is invertible with a unique inverse v(x) ∈
K[[x]]. Then g(x) − r(x) = xn (an + an+1 x + · · ·) = xn−h (an + an+1 x +
· · ·)xh = xn−h (an + an+1 x + · · ·)xh (bh + bh+1 x + · · ·)v(x) = xn−h (an + an+1 x +
· · ·)v(x)f (x). So we put q(x) = xn−h (an + an+1 x + · · ·)v(x). Then g(x) =
q(x)f (x) + r(x) where r(x) = 0 or deg(r(x)) < o(f (x)). Now suppose there
could be two power series q1 (x), q2 (x) and two polynomials r1 (x), r2 (x), with
g(x) = qi (x)f (x)+ri (x), where deg(ri (x)) < o(f (x)) or ri = 0, i = 1, 2. Then
(q1 (x) − q2 (x))f (x) = r2 (x) − r1 (x). Clearly q1 (x) = q2 (x) iff r1 (x) = r2 (x),
since K is a field and f (x) 6= 0. So suppose q2 (x) − q1 (x) 6= 0 6= r1 (x) − r2 (x).
Then deg(r2 (x)−r1 (x)) < o(f (x)) ≤ o((q1 (x)−q2 (x))f (x)) = o(r2 (x)−r1 (x)).
This is impossible, since no nonzero polynomial can have degree less than its
order. Hence q1 (x) = q2 (x) and r1 (x) = r2 (x).
A very common technique used to obtain identities that at first glance
look impossible to prove is to calculate the coefficient on xn in two different
expressions for the same formal power series. Here are some exercises that
offer practice using this method.

m+1
! ! (
X
r m+1 m+n−r 1, if n = 0;
(−1) =
r=0 r m 0, if n > 0.
Hint: Recall that

∞
!
m+1
X m+k k
1 = (1 − z) · z ,
k=0 k
and compute the coefficient on xn .
4.5. COMPOSITION OF POWER SERIES 125
Exercise: 4.4.8 Compute the coefficient of x2n in both sides of

2n
!
2 2n
X
r 2n 2r
(1 − x ) = (−1) x
r=0 r
to show that !2
2n
!
X
k 2n n 2n
(−1) = (−1) .
k=0 k n

n+1
" ! !#2 !
X n n 2 2n
− = .
k=0 k k−1 n+1 n
Hint: Compute the coefficient of xm on both sides of

1 1

n
[(1 − x)(1 + x) ] · (1 − )(1 + x)n = (− + 2 − x)(1 + x)2n .
x x
4.5 Composition of Power Series

For each n ≥ 0 let fn (x) = ∞ i
P
i=0 cni x be a formal power series. Suppose that
for fixed i, only finitely many cni could be nonzero. Say cni = 0 for n > ni .
Then we can formally define
∞ ∞ ni
!
cni xi .
X X X
fn (x) =
n=0 i=0 n=0
By hypothesis this formal sum of infinitely many power series involves

the sum of only finitely many terms in computing the coefficient of any given
power of x. This definition allows the introduction of substition of one power
series b(x) for the “variable” x of a second power series a(x), at least when
o(b(x)) ≥ 1.
If a(x) = ∞ n ∞ n
P P
n=0 an x , and if b(x) = n=1 bn x , i.e., b0 = 0, then the
powers bn (x) := (b(x))n satisfy the condition for formal addition, i.e.,
∞
an bn (x).
X
a(b(x)) :=
n=0
As an example, let a(x) := (1 − x)−1 , b(x) = 2x − x2 . Then formally

h(x) := a(b(x)) = 1 + (2x − x2 ) + (2x − x2 )2 + · · ·
= 1 + 2x + 3x2 + 4x3 + · · ·
= (1 − x)−2 .
The middle equality is a bit mysterious. It follows from (legitimate)
algebraic manipulation:
a(b(x)) = (1 − (2x − x2 ))−1 = (1 − x)−2 .
If we try to verify it directly we find that the coefficient on xn in 1 + (2x −

x2 ) + (2x − x2 )2 + · · · is
n
!
X
n−j 2j−n j
(−1) 2 ,
j=0 2j − n
which we must then show is equal to n + 1. This is quite tricky to show

directly, say by induction.
From analysis we know that 1 + 2x + 3x2 + 4x3 + · · · is the power series
expansion of (1 − x)−2 , so this must be true in C[[x]]. Indeed we have
∞
xn = (1 − x)−1 .
X
(1 − x)h(x) =
n=0
If b(x) = ∞ n
P
n=0 bn x is a power series with o(b(x)) = 1, i.e., b0 = 0
and b1 6= 0, we can also find the (unique!) inverse function as a power
series. We “solve” the equation b(a(x)) = x by substitution, assuming that
a(x) = ∞ n
P
n=1 an x with a0 = 0. Then
∞ ∞ ∞
an x n ) + b 2 ( an xn )2 + b3 ( an xn )3 + · · ·
X X X
x = b1 (
n=1 n=1 n=1
From this we find b1 a1 = 1, b1 a2 + b2 a21 = 0, b1 a3 + b2 (a1 a2 + a2 a1 ) +

b3 (a1 a1 a1 ) = 0, etc. In general the zero coefficient on xn (for n > 1) must
equal an expression that starts with b1 an and for which the other terms
involve the coefficients b1 , . . . , bn and coefficients ak with 1 ≤ k < n. Hence
recursively we may solve for an starting with a1 . This compositional inverse of
b(x) will be denoted b[−1] (x) to distinguish it from the multiplicative inverse
b−1 (x) (sometimes denoted b(x)−1 ) of b(x). Note that for b(x) ∈ A[[x]],
b[−1] (x) exists iff o(b(x)) = 1 and b(x)−1 exists iff o(b(x)) = 0 and b(0)−1
4.6. THE FORMAL DERIVATIVE AND INTEGRAL 127
exists in A. Also note that if b(a(x)) = x, and we put y = a(x), then

a(b(y)) = a(b(a(x))) = a(x) = y, i.e., a(b(y)) = y, so if b is a ‘left’ inverse of
a, it is also the unique ‘right’ inverse of a.
Note that certain substitutions that make perfect sense in analysis are
forbidden within the present theory. For suppose b(x) = 1 + x, i.e., o(b(x)) =
xn
0. If a(x) = ex = ∞
P
n=0 n! , then we are not allowed to substitute 1+x in place
of x in the formula for ex to find the power series for e1+x . If we try to do this
(1+x)n P∞ 1 Pn n j
anyway, we see that e1+x would appear as ∞
P
n=0 n!
= n=0 n! j=0 j x ,
which has infinitely many nonzero contributions to the coefficient on xi for
each i. This is not defined.
4.6 The Formal Derivative and Integral

Let f (x) = i≥0 ci xi and g(x) = dj xj be two power series. We define
P P
j≥0
the formal derivative f 0 (x) by:
f 0 (x) = (i + 1)ai+1 xi .
X
(4.10)
i≥0
It is now easy to show that the Sum Rule holds:
(f (x) + g(x))0 = f 0 (x) + g 0 (x), (4.11)
and
(f (x)g(x))0 = + j)ci dj xi+j−1
P
i,j≥0 (i
ici dj xi−1+j + jci dj xj−1+i ,

P P
= i,j≥0 i,j≥0
which proves the Product Rule:
(f (x)g(x))0 = g(x)f 0 (x) + f (x)g 0 (x). (4.12)
Suppose that o(g(x)) = 1, so that the composition f (g(x)) is defined.

The Chain Rule
(f (g(x))0 = f 0 (g(x))g 0 (x) (4.13)

is established first for f (x) = xn , n ≥ 1, by using induction on n along with

the product rule, and then by linear extension for any power series f (x).
If f (0) = 1, start with the equation f −1 (x)f (x) = 1. Take the formal
derivative of both sides, applying the product rule. It follows that (f −n (x))0 =
−nf −n−1 (f 0 (x)), for n ≥ 1. But now given f (x) and g(x) as above with
g(0) = 1, we can use this formula for (f −1 (x))0 = −f 0 (x)/f 2 (x) with the
product rule to establish the Quotient Rule:
g(x)f 0 (x) − f (x)g 0 (x)

(f (x)/g(x))0 = . (4.14)
g 2 (x)
If R has characteristic 0 (so we can divide by any positive integral mul-

tiple of 1), the usual Taylor’s Formula in one variable (well, MacLaurin’s
formula) holds:
X f (n) (0)xn
f (x) = . (4.15)
n≥0 n!
If R has characteristic zero and f , g ∈ R[[x]], then
f 0 = g 0 and f (0) = g(0) ⇔ f (x) = g(x). (4.16)
In order to define the integral of f (x) we need to assume that the char-
acteristic of A is zero. In that case define the formal integral Ix f (x) by
Z x
i−1 ci−1 xi .
X
Ix f (x) = f (x)dx = (4.17)
0 i≥1
It is easy to see that 0x (f (x) + g(x))dx =

R Rx Rx
0 f (x)dx + 0 g(x)dx. The
following also follow easily:
Z x
0 0
(Ix f (x)) = f (x); If F (x) = f (x), then f (x) = F (x) − F (0). (4.18)
0
4.7. LOG, EXP AND BINOMIAL POWER SERIES 129
4.7 Log, Exp and Binomial Power Series

In this section we assume that R is an integral domain with characteristic 0.
The exponential series is
X xj
ex = exp(x) = . (4.19)
j≥0 j!
The logarithmic series is
X xj
log((1 − x)−1 ) = . (4.20)
j≥1 j
If y is also an indeterminate, then the binomial series is
xj
(1 + x)y = 1 +
X
y(y − 1) · · · (y − j + 1) =
j≥1 j!
!
y
xj ∈ (R[y])[[x]].
X
= (4.21)
j≥0
j
If o(f (x)) ≥ 1, the compositions of these functions with f are defined.

So exp(f ), log(1 + f ) and (1 + f )y are defined. Also, any element of R[[x]]
may be substituted for y in (1 + x)y . Many of the usual properties of these
‘analytic’ functions over C hold in the formal setting. For example:
(ex )0 = ex
(log((1 − x)−1 ))0 = (1 − x)−1
((1 + x)y )0 = y(1 + x)y−1

The application of the chain rule to these is immediate except possibly
in the case of the logarithm.
We ask: When is it permissible to compute log(f (x)) for f (x) ∈ R((x))?
−1 j
1
= j≥1 (1−fj ) ,
P
To determine this, note that log(f (x)) = log 1−(1−f −1 )
where this latter expression is well-defined provided o(1 − f −1 ) ≥ 1. So in

particular we need o(f (x)) = 0 and f (0) = 1. In this case Dx logf (x) =
f (x)Dx (1 − f −1 (x)), and
(log(f (x))0 = f 0 (x)/f (x) (4.22)
By the chain rule, Dx log(ex ) = (ex )−1 ex = 1 = Dx x, and log(ex )|x=0 =

0 = x|x=0 , which implies that
log(ex ) = x (4.23)
Similarly, using both the product and chain rules,
Dx {(1 − x)exp(log((1 − x)−1 ))}

= −exp(log((1 − x)−1 )) + (1 − x)(1 − x)−1 exp(log((1 − x)−1 )) = 0,
so that
(1 − x)exp(log((1 − x)−1 )) = 1,
and
exp(log((1 − x)−1 )) = (1 − x)−1 . (4.24)
Again, this is because both (1−x)exp(log((1−x)−1 )) and 1 have derivative

0 and constant term 1.
Now consider properties of the binomial series. We have already seen that
for positive integers n:
!
n n
xj .
X
(1 + x) = (4.25)
j≥0
j
This is the binomial expansion for positive integers. Thus for positive
integers m and n, [xk ] can be applied to the binomial series expansion of
the identity (1 + x)m (1 + x)n = (1 + x)m+n , giving the Vandermonde
Convolution
k
! ! !
X n m m+n
= . (4.26)
i=0
i k−i k
4.7. LOG, EXP AND BINOMIAL POWER SERIES 131
If f (x) is a polynomial in x of degree k, and the equation f (x) = 0

has more! than k roots, ! then f (x)
! = 0 identically. Thus the polynomial
y+z y z
− ki=0
P
in indeterminates y and z must be identi-
k i k−i
cally 0, since it has an infinite number of roots, namely all positive integers.
Accordingly we have the binomial series identity
(1 + x)y (1 + x)z = (1 + x)y+z . (4.27)
Substitution of −y for z yields (1 + x)y (1 + x)−y = (1 + x)0 = 1, so
((1 + x)y )−1 = (1 + x)−y . (4.28)
This allows us to prove that
log((1 + x)y ) = ylog(1 + x) (4.29)
by the following differential argument:
Dx (log((1 + x)y )) = (1 + x)−y y(1 + x)y−1

= y(1 + x)−1 = Dx (ylog(1 + x)),
and
log((1 + 0)y ) = 0 = ylog(1 + 0).

Combining these results gives
{(1 + x)y }z = exp(log{(1 + x)y }z ) = exp(zlog((1 + x)y ))

= exp(zylog(1 + x)) = exp(log((1 + x)yz )),
so
{(1 + x)y }z = (1 + x)yz . (4.30)
By the binomial theorem,

X (x + y)n n
XX xi y n−i
exp(x + y) = = ,
n≥0 n! n≥0 i=0 i! (n − i)!
so
exp(x + y) = (exp x)(exp y). (4.31)
The substitution of −x for y yields exp(0) = (exp(x))(exp(−x)), and we

have
(exp(x))−1 = exp(−x). (4.32)
By making the substitution x = f , for f ∈ R[[x]] with o(f (x)) ≥ 1,

and y = g for any g ∈ R[[x]], in the preceding results, we obtain many of
the results that are familiar to us in terms of the corresponding analytic
functions. The only results that do not hold for the formal power series are
those that correspond to making inadmissible substitutions. For example, it
is not the case that exp(log(x)) = x, since log(x) does not exist as a formal
power series.
HERE ARE TWO MORE OFTEN USED POWER SERIES:
P∞ k 2k+1
sin(x) = k=0 (−1) x /((2k + 1)!)
P∞ k 2k
cos(x) = k=0 (−1) x /((2k)!)
It is a good exercise to check out the usual properties of these formal

power series.
4.8 Exponential Generating Functions

Recall that P (n, k) is the number of k-permutations of an n-set, and P (n, k) =
n!/(n − k)! = n(n − 1) · · · (n − k − 1). The ordinary generating function of
the sequence P (n, 0), P (n, 1), . . . is
G(x) = P (n, 0)x0 + P (n, 1)x1 + · · · + P (n, n)xn .

Also recall the similar binomial expansion
C(x) = C(n, 0)x0 + · · · + C(n, n)xn = (1 + x)n .

4.8. EXPONENTIAL GENERATING FUNCTIONS 133
But we can’t find a nice closed form for G(x). On the other hand,
P (n, r) = C(n, r)r!, so the equation for C(x) can be written
P (n, 0)x0 /0! + P (n, 1)x1 /1! + P (n, 2)x2 /2! + · · · + P (n, n)xn /n! = (1 + x)n ,
i.e.,
n
P (n, k)xk /k! = (1 + x)n .
X
k=0
So P (n, k) is the coefficient of xk /k! in (1 + x)n . This suggests another

kind of generating function - to be called the exponential generating function
- as follows: If {an } is a sequence, the exponential generating function for this
sequence is
∞
an xn /n!
X
H(x) =
n=0
EXAMPLE 1. {ak } = {1, 1, 1, . . .} has H(x) = xk /k! = ex as its

P
exponential generating function.

EXAMPLE 2. From above, we already see that (1 + x)n is the expo-
nential generating function of the sequence P (n, 0), P (n, 1), · · ·. Then the
exponential generating funcion of {1, α, α2 , . . .} is
∞ ∞
αk xk /k! = (αx)k /k! = eαx .
X X
H(x) =
k=0 k=0
Now suppose we have k kinds of letters in an alphabet. We want to form

a word with n letters using i1 of the 1st kind, i2 of the second kind, . . . , ik
of the kth kind. The number of such words is
!
n
p(n; i1 , . . . , ik ) = = n!/(i1 ! · · · ik !).
i1 , . . . , i k
Consider the product:
(1 + O1 x1 /1! + O12 x2 /2! + O13 x3 /3! + · · ·) · · · (1 + Ok x1 /1! + Ok2 x2 /2! + · · ·).
The term involving O1i1 O2i2 · · · Okik is (if we put n = i1 + i2 + · · · + ik )

(O1i1 xi1 /i1 !)(O2i2 xi2 /i2 !) · · · (Okik xik /ik !)
= O1i1 · · · Okik (xi1 +···+ik )/(i1 ! · · · ik !) = O1i1 · · · Okii (xn /n!)(n!/(i1 ! · · · ik !))
= O1i1 · · · Okii p(n; i1 , · · · , ik )xn /n!

The complete coefficient on xn /n! ! is
i1 ik n
i1 +···+ik =n O1 · · · Ok
P
, provided that there is no restriction on
i1 , · · · , ik
the number of repetitions of any given object, except that i1 + · · · + ik = n.
And the various Oj really do not illustrate the permutations, so we place
each Oj equal to 1. Also, for the object Oj , if there are restrictions on the
number of times Oj can appear, then for its generating function we include
only those terms of the form xm /m! if Oj can appear m times. Specifically,
let O be an object (i.e., a letter) to be used. For the exponential generating
function of O, use ∞ k
P
k=0 ak x /k!, where ak is 1 or 0 according as O can appear
exactly k times or not.
EXAMPLE 3: Suppose O1 can appear 0, 2 or 3 times, O2 can appear
4 or 5 times, and O3 can be used without restriction. The the exponential
generating functions for O1 , O2 , O3 are:
(1 + x2 /2! + x3 /3!)(x4 /4! + x5 /5!)(1 + x + x2 /2! + · · ·)
Theorem 4.8.1 Suppose we have k kinds of objects O1 , · · · , Ok . Let fj (x)

be the exponential generating function of the object Oj determined as above
by whatever restrictions there are on the number of occurrences allowed Oj .
Then the number of distinct permutations using n of the objects (i.e., words of
length n), subject to the restrictions used in determining f1 (x), f2 (x), . . . , fk (x),
is the coefficient of xn /n! in f1 (x) · · · fk (x).
EXAMPLE 4. Suppose there are k objects with no restrictions on repeti-

tions. So each individual exponential generating function is ∞ x
P
k=0 = e . The
complete exponential generating function is then
∞ ∞
(ex )k = ekx = (kx)n /n! = (k n )(xn /n!).
X X
n=0 n=0
4.9. FAMOUS EXAMPLE: BERNOULLI NUMBERS 135
But we already know that k n is the number of words of length n with k

types of objects and all possible repetitions allowed.
EXAMPLE 5. Again suppose there are k types of object, but that each
object must appear at least once. So the exponential generating function is
(ex − 1)k . The coefficient of xn /n! in
k k ∞
! !
x k k jx k−j k
(−1)k−j (jx)n /n!
X X X
(e − 1) = e (−1) =
j=0
j j=0
j n=0
 
∞ k
!
k
(−1)k−j j n  xn /n!
X X
= 
n=0 j=0
j
is
k
!
k
(−1)k−j j n .
X
j=0
j
This proves the strange result:
The number of permutations of n objects of k types, each type appearing
at least once is
k
!
k
(−1)k−j j n .
X
j=0
j
egf
Def. The symbol f ↔ {an }∞ 0 means that the power series f is the expo-
n
∞
nential generating function of the sequence {an }0 , i.e., that f = n≥0 an xn! .
P
egf P∞ n−1 P∞ n
So suppose f ↔ {an }∞ 0
0 . Then f = n=1
x
an (n−1)! = n=0 an+1 xn! , i.e.,
egf
f 0 ↔ {an+1 }∞
0 . By induction we have an analogue to Rule 1:
egf egf
Rule 10 : If f ↔ {an }∞ h ∞
0 , then for h ∈ N , D f ↔ {an+h }n=0 .
4.9 Famous Example: Bernoulli Numbers

Define Bn , n ≥ 0, by
∞
x X xn
= Bn .
ex − 1 n=0 n!
The defining equation for Bn is equivalent to
∞ ∞
xk xk
! !
X X
1= Bk .
k=0 (k + 1)! k=0 k!
Recursively we can solve for the Bk using this equation. But first notice
the following: Replace x by −x in the egf for Bn :
∞
X (−x)k −x xex
Bk = −x = x .
k=0 k! e −1 e −1
So
∞
xex 1 − (−1)k k
" #
x X
− = −x = Bk x .
ex − 1 ex − 1 k=0 k!
This implies that
2 2
−x = B0 · 0 + B1 · x + B2 · 0 · x2 + B3 · x3 + B4 · 0 · x4 + · · ·
1 3!
which implies that
1
B1 = − and B2k+1 = 0 for k ≥ 1.
2
Then recursively from above we find B0 = 1; B1 = − 12 ; B2 = 16 ; B4 = − 30
1
;
1
B6 = 42 , . . . ,.
A famous result of Euler is the following:
∞
1 (−1)k π 2k · 22k−1 −B2k
X
ζ(2k) = = , k = 1, 2, . . . .
n=1 n2k (2k − 1)! 2k
In particular,
π2
ζ(2) =
.
6
Bernoulli originally introduced the Bn to give a closed form formula for
Sn (m) = 1n + 2n + 3n + · · · + mn .
On the one hand
4.10. FAMOUS EXAMPLE: FIBONACCI NUMBERS 137
∞ ∞
! 
x(emx − 1) x xk  X mj xj 

mx
X
= (e − 1 = Bk =
ex − 1 ex − 1 k=0 k! j=1 j!
∞ X ∞
"m n
mi n X n
# ! !
X Bn−i X n i x
= · x = n=0 Bn−i m .
n=0 i=1 (n − i)! i! i=1 i n!
0
(The coefficient on x0! is 0.)
On the other hand:
x(emx − 1) emx − 1

x
= x x
= x(e(m−1)x + e(m−2)x + · · · + ex + 1) =
e −1 e −1
   
m−1 ∞ r r ∞
j x xr+1 m−1
jr =
X X X X
=x  = r=0
j=0 r−0 r! r! j=0
∞ ∞
X xr+1 X nxn
= Sr (m − 1) = Sn−1 (m − 1) .
r=0 r! n=1 n!
xn
Equating the coefficients of n!
we get:
n
!
n
Bn−i mi = Sn−1 (m − 1)n, n ≥ 1,
X
i=1 i
or
n+1
!
n+1
Bn+1−i (m + 1)i = Sn (m) · (n + 1), n ≥ 0.
X
i=1 i
So Bernoulli’s formula is:
n+1
(m + 1)i
!
n n n
X n+1
Sn (m) = 1 + 2 + · · · + m = i=1 Bn+1−i .
i (n + 1)
4.10 Famous Example: Fibonacci Numbers

egf
Let Fn+1 = Fn + Fn−1 for n ≥ 0, and put F−1 = 0, F0 = 1. Put f ↔
P∞ xn 0 egf
{Fn }∞
0 , i.e., f =
0 ∞
n=0 Fn n! . By Rule 1 we have f ↔ {Fn+1 }n=0 and
egf
f 00 ↔ {Fn+2 }∞ n=0 . Use the recursion given in the form Fn+2 = Fn+1 + Fn ,
n ≥ 0. So by Rule 10 we have f 00 = f 0 + f . From the theory√of differential
equations we see that f (x) = c1 er+ x + c2 er− x , where r± = 1±2 5 , and where
c1 and c2 are to be determined by the initial conditions: f (0) = F0 = 1, and
f 0 (0) = 1!F1 = 1. Then f (0) = c1 + c2 = 1 and f 0 (0) = r+ c1 + r− c2 = 1. So
! ! !
1 1 1 c1
= ⇒
1 r+ r − c2

1 1
√
−1− 5
√
1 r− r− − 1

2 1 + 5 r+
c1 = = = √ = √ =√ .
1 1 r− − r + − 5 2 5 5
r+ r −

Similarly,

11

r+
1 −r−

c2 = = √ .
r − − r+ 5
P∞ n P∞ n

√1 (r+ er+ x − r− er− x ) = √1 nx nx
So f = 5 5
r+ n=0 (r+ ) n! − r− n=0 (r− ) n! .
Then
xn 1

Fn = f = √ (r+ n+1 − r− n+1 ). (4.33)
n! 5
egf egf egf

Suppose f ↔ {an }∞
0 . Then Df = f 0 ↔ {an+1 }∞n=0 and (xD)f ↔
∞ xn 0 xn−1 xn
{nan }0 . f = an n! ⇒ f = an (n−1)! ⇒ xf 0 = (xD)f = an (n−1)!
P P P
=
n egf
nan xn! . So xf 0 ↔ {nan }∞
P
n=0 . This leads easily to
4.11 Roots of a Power Series

We continue to assume that R is an integral domain with characteristic 0.
Occasionally we need to solve a polynomial equation for a a power series.
As an example we consider the nth root. Let g(x) ∈ R[[x]] satisfy g(0) = αn ,
α ∈ R, α−1 ∈ R. We want to determine f ∈ R[[x]] such that
4.12. LAURENT SERIES AND LAGRANGE INVERSION 139
f n (x) = g(x) with f (0) = α. (4.34)
Then the unique such power series is

(write α−n g(x) = ((α−n g(x) − 1) + 1))
!
−n 1/n
1/n
(α−n g(x) − 1)i ,
X
f (x) = α(α g(x)) =α (4.35)
i≥0
i
since (α−n g(x) − 1)i ∈ R[[x]] with o((α−n g(x) − 1)i ) ≥ 1. This is a solution
since
f n (x) = αn {(α−n g(x))1/n }n = αn (α−n g(x))1 = g(x),
from Eq. 4.30 and since f (0) = α. To extablish uniqueness, suppose that f
and h are both solutions to Eq. 4.34, so that
0 = f n − hn = (f − h)(f n−1 + f n−2 h + · · · + f hn−2 + hn−1 ).
Since R, and therefore R[[x]], has no zero divisors, then either f − h = 0

or (f n−1 + f n−2 h + · · · + f hn−2 + hn−1 ) = 0. But
(f n−1 + f n−2 h + · · · + f hn−2 + hn−1 )|x=0 = nαn−1 6= 0

since α 6= 0 and R has characteristic 0 and no zero divisors. Thus f = h and
the f of Eq. 4.34 is the unique solution to Eq. 4.35.
This result is used most frequently when f (x) satisfies a quadratic equa-
tion with a given initial condition.
4.12 Laurent Series and Lagrange Inversion

Again in this section we assume that R is a field with characteristic 0, so
that the notation and results of the preceding sections will apply. Usually
this field is the field of quotients of an integral doman obtained by starting
with the complex numbers and adjoining polynomials and/or power series in
sets of commuting independent variables.
The quotient field of R[[x]] may be identified with the set R((x)) of so-
called Laurent series f (x) = ∞ n
n=k an x , where k ∈ Z is the order o(f (x))
P
provided ak 6= 0. When k ≥ 0 this agrees with the earlier definition of

order. We give the coefficient of x−1 a name familiar from complex analysis.
If a(x) = ∞ n
P
n=k an x , we say that a−1 is the residue of a(x). This will be
written as Res a(x) = [x−1 ]{a(x)}.
For a Laurent series f , the multiplicative inverse exists iff o(f ) < ∞. If
o(f ) = k, then f = xk g where g ∈ C((x)) has o(g) = 0. In this case we define
f −1 = x−k g −1 .
Since the usual quotient formula for derivatives of power series holds, it is
straightforward to carry the theory of the derivative over to Laurent series.
The following facts are then easily proved (Exercises!):
Exercise: 4.12.1 If w(x) is a Laurent series, then

(R1) Res(w0 (x)) = 0;
and n 0 o
(R2) [x−1 ] ww(x)
(x)
= o(w(x)).
We have already mentioned the idea of an “inverse function” of a power

series with order 1 and have shown how to compute its coefficients recursively.
The next theorem gives an expression for the coefficients.
Theorem 4.12.2 Let W (x) = w1 x + w2 x2 + · · · be a power series with w1 6=

0. Let Z(w) = c1 w+c2 w2 +· · · be a power series in w such that Z(W (x)) = x.
Then
!
1
cn = Res . (4.36)
nW n (x)
Proof: From our computations above we see that c1 = w1−1 . Now apply
formal derivation to Z(W (x)) = x. This yields:
∞
kck W k−1 (x)W 0 (x).
X
1= (4.37)
k=1
Consider the series obtained by dividing this equation by nW n (x):

( )
−1 1
[x ] =
nW n (x)
 
W 0 (x)
( )
 X kc 
k
= [x−1 ] cn + [x−1 ] W k−1−n
(x)W 0
(x)
W (x) 
k≥1:k6=n n

If n 6= k, then the term W k−1−n (x)W 0 (x) is a derivative by the chain rule
and hence has residue 0 by (R1). Now apply (R2) to the term with n = k
to see that the residue of the R.H.S. is equal to cn · o(W (x)) = cn , proving
the theroem.
Practice using the previous theorem on the following exercises:
Exercise: 4.12.3 (i) If w = W (z) = ∞ n P

n=1 z , use the previous theorem
to compute z = Z(w). Check your result by expressing z and w as simple
rational functions of each other.
z
(ii) Put w = W (z) = (1−z)2
. Use the previous theorem to compute z =
Z(w).
z
(iii) Put w = W (z) = (1−z)2
. Use the “quadratic formula” to solve for z
1
as a function of w. Then use the binomial expansion (for (1 + t) 2 with an
appropriate t to solve for z as a function of w.
(iv) Show that the two answers you get for parts (ii) and (iii) are in fact
equal.
Before proving our next result we need to recall the so-called Rule of
Leibniz:
Exercise: 4.12.4 Let D denote the derivative operator, and for a function
f (in our case a formal power series or Laurent series) let f (j) denote the
jth derivative of f , i.e., Dj (f ) = f (j) .
(i) Prove that
n
!
n n
f (i) g (n−i) .
X
D (f · g) =
i=0 i
(ii) Derive as a corollary to part (i) the fact that

!
j 2 j
f (i1 ) f (i2 ) .
X
D (f ) =
i1 +i2 =j i1 , i2
(iii) Now use part (i) and induction on n to prove that

!
j n j
f (i1 ) · · · f (in ) .
X
D (f ) =
i1 +···+in =j i1 , . . . , i n
Theorem 4.12.5 (Special Case of Lagrange Inversion Formula) Let f (z) =

P∞ i −1 z
i=0 fi z with f0 6= 0, so in particular f (z) ∈ C[[z]]. If w = W (z) = f (z)
(which is a power series with o(W (z)) = 1), then we can solve for z = Z(w)
as a power series in w with order 1. Specifically,
∞
cn wn , with
X
z = Z(w) =
n=1
f n (z)
!
1 n−1 n
cn = Res = D f (0).
nz n n!
P∞
Proof: Since f (z) = i=0 fi z i with f0 6= 0, in C[[z]], W (z) = P∞z fi zi =
i=0
1
= (f0 z −1 + f1 + f2 z + f3 z 2 + · · ·)−1 = ∞ n
n=1 wn z . Here f0 6= 0
P
P∞
f z i−1
i=0 i P∞
implies that w1 6= 0. By Theorem 4.12.2, z = n=1 cn wn , where
 
f n (z)
!
1 1 n−1 n 1X
cn = Res  zn  = Res = [z ]f (z) = fi1 · · · fin .
n f n (z) nz n n n
Here the sum is over all (i1 , . . . , in ) for which i1 + · · · in = n − 1 and ij ≥ 0

for 1 ≤ j ≤ n. But now we need to evaluate
!
1 n−1 n 1 X n−1
D f (0) = f (i1 ) · · · f (in ) (0),
n! n! i1 , · · · , in
where the sum is over all (i1 , . . . , in ) for which i1 + · · · in = n − 1 and in ≥ 0

for 1 ≤ j ≤ n. In this expression f (ij ) is the (ij )th derivative of f and when
evaluated at 0 yields fij · ij ! by Taylor’s formula. Hence the sum in question

equals ( )
1 X (n − 1)!i1 !fi1 · · · in !fin 1X
= fi1 · · · fin ,
n! i1 ! · · · in ! n
as desired.
Let
j
a−i x−i + a0 + ai x i ;
X X
f (x) =
i=1 i≥1
j
f 0 (x) = −ia−i x−i−1 + 0 + iai xi−1 .
X X
i=1 i≥1
Similarly, let
j
b−i x−i + a0 + bi x i ;
X X
g(x) =
i=1 i≥1
j
0
−ib−i x−i−1 + 0 + ibi xi−1 .
X X
g (x) =
i=1 i≥1
Then it is easy to compute the following:
j k
−1 0
X X
[x ] {f (x)g (x)} = ia−i bi + −iai b−i ;
i=1 i=1
j k
[x−1 ] {f 0 (x)g(x)} = iai b−i = −[x−1 ] {f (x)g 0 (x)} .
X X
−ia−i bi +
i=1 i=1
Note that neither a0 nor b0 affects this value. Hence we may write for
f, g ∈ R((x)),
[x−1 ]{f g 0 } = −[x−1 ]{f 0 (g(x) − g(0))} (4.38)
When we use this a little later, g(w) = log(φ(w)), so g 0 (w) = φ0 (w)/φ(w),

and Eq. 4.38 then appears as
( " !#)
−1
n
0 −1
o φ(w)
−1 0
[w ] f (w) · φ (w) · φ (w) = −[w ] f (w) log . (4.39)
φ(0)
The next result allows us to change variables when computing residues,

and in some ways is the main result of this section, since the full Lagrange
Inversion formula follows from it.
Theorem 4.12.6 (Residue Composition) Let f (x), r(x) ∈ C((x)) and sup-
pose that α = o(r(x)) > 0. We want to make the substitution x = r(z).
α[x−1 ]{f (x)} = [z −1 ]{f (r(z))r0 (z)}.
n −1 n 0
Proof: First consider
n f (x) = o x , −1 6= n ∈ Z. Then [z ]{r (z)r (z)}
= (n + 1)−1 [z −1 ] dz d
rn+1 (z) = 0 by (R1), since rn+1 (z) ∈ C((z)). Also,
α[x−1 ]xn = 0.
On the other hand, if n = −1, then [z −1 ]r0 r−1 = o(r(z)) = α > 0. It
follows that for all integers n, [z −1 ]{rn (z)r0 (z)} = αδn,−1 = α[x−1 ]{xn }. Now
let f (x) = n≥k an xn (o(f (x)) = k < ∞). Since o(r(z)) > 0, f (r(z)) exists,
P
and we have
α[x−1 ]{f (x)} = [z −1 ]{ an rn (z)r0 (z)} = [z −1 ]{f (r(z))r0 (z)}.
P
n≥k
As an application of Residue Composition we present the following prob-

lem: Find a closed form formula for the sum
n
! !
X 2n + 1 j+k
S= .
k=0
2k + 1 2n
We give the major steps as “little” exercises.

1
1. Put f (x) = 2x
{(1 + x)2n+1 − (1 − x)2n+1 }. Show that
n
!
2n + 1
x2k .
X
f (x) =
k=0
2k + 1
Pn
2. f ((1 + y)1/2 ) = k=0 (1 + y)k [x2k ]{f (x)}.
( !)
2n j Pn k 2n + 1
3. [y ] (1 + y) k=0 (1 + y) =
2k + 1
! !
P 2n + 1 j+k
k = S. (Hint: At one stage you will have to use
2k + 1 2n
! ! !
P a b a+b
the fact that m = for appropriate choices of
m n−m n
a, b, m, n. You might want to prove this as a separate step if you have not
already done so.)
So at this stage we have
n
( )
2n j k 2k
X
S = [y ] (1 + y) (1 + y) [x ]f (x) =
k=0
n o
= [y −1 ] y −(2n+1) (1 + y)j f ((1 + y)1/2 ) .
At this point we want to use Residue Composition using the substitution
y = y(z) = z 2 (z 2 −2), so o(y(z)) = 2, and y 0 (z) = 4z(z 2 −1). Also, (1+y)1/2 =
1
(1 − z 2 ). Now use f ((1 + y)1/2 ) = f (1 − z 2 ) = 2(1−z 2 2n+1
2 ) {(2 − z ) − z 4n+2 }
and Residue Composition to obtain S =
[z −1 ] −(4n+2) 2 {(2 − z 2 )2n+1 − z 4n+2 }

( )
z (z − 2)−(2n+1) (1 − z 2 )2j 4z(z 2 − 1)
2 2(1 − z 2 )
which simplifies to
( ! )
1 1
[z −1 ] (z 2 − 1)2j 2 2n+1
+ 4n+2 z =
(z − 2) z
n o
[z −1 ] (z 2 − 1)2j z −(4n+1) + 0,
1 2 2j
since (z2 −2) 2n+1 is a power series, so when multiplied by (z −1) it contributes
−1
nothing to [z ]. Hence
!
4n
n
2 2j
o 2j
S = [z ] (z − 1) = .
2n
Theorem 4.12.7 (Lagrange Inversion) Let φ(x) ∈ C[[x]] with o(φ(x)) = 0.

Hence φ−1 (x) exists and w · φ−1 (w) has order 1 in w. Put t = w · φ−1 (w),
i.e., w = tφ(w). We want to generalize the earlier special case of Lagrange
Inversion that found w as a function of t. Here we let f (t) be some Laurent
series about t and find the coefficients on powers of t in f (W (t)). Specifically,
we have the following:
1. If f (λ) ∈ C((λ)), then
[λ ]{f 0 (λ)φn (λ)},

1 n−1
for 0 6=
n ≥o
(
n n
o(f );
[t ]{f (W (t))} = 0 −1
n
0 φ(λ)
[λ ]{f (λ)} + [λ ] f (λ)log φ(0) , for n = 0.
2. If F (λ) ∈ C[[λ]], then
)−1
w · φ0 (w)
(
cn tn , where cn = [λn ]{F (λ)φn (λ)}.
X
F (w) 1 − =
φ(w) n≥0
w
Proof: Let Φ(w) = φ(w) , so t = Φ(w) and o(Φ(w)) = 1, which implies that
Φ[−1] (λ) exists. Here w = Φ[−1] (t) is the unique solution w of w = tφ(w). For
any integer n : [tn ]{f (W (t))} = [t−1 ]{t−(n+1) f (Φ[−1] (t))}. Now use Residue
Composition to substitute t = Φ(w) with α = 1 = o(Φ), and f (x) of the
Residue Composition theorem is now
t−(n+1) f (Φ[−1] (t)). Hence
1
[tn ]{f (W (t))} = − [w−1 ]{f (w)(Φ−n (w))0 } =
n
φn (w)
( )
1 1
= [w−1 ]{f 0 (w)Φ−n (w)} = [w−1 ] f 0 (w) · .
n n wn
φ(w)−wφ0 (
n
If n = 0, [tn ]{f (w)} = [t0 ]f (w) = [w−1 ]{Φ−1 (w)f (w)Φ0 (w)} = [w−1 ] f (w) φ(w)
w φ2 (w)
φ0 (w)

[w−1 ]{ f (w)
w
− f (w) }=
nφ(w) o
−1 0 φ(w)
[w0 ]{f (w)} + [w ] f (w)log φ(0)
.
This completes the proof of 1. Now let F (λ)
Rw
∈ C[[λ]]. It follows that
F (λ)φ (λ) ∈ C[[λ]]. Hence we may put f (w) = 0 F (λ)φ−1 (λ)dλ and know
−1
that f (w) ∈ C[[w]]. Also, since f 0 (λ) = F (λ)φ−1 (λ), we see that F (w) =
f 0 (w)φ(w). By 1., f (w) = f (0) + n≥1 n1 [λn−1 ]{φn (λ)}f 0 (λ)}tn . Differentiate
P
this latter equality with respect to t:

dw
f 0 (w) · [λn−1 ]{φn (λ)f 0 (λ)}tn−1 = [λn ]{φn+1 (λ)f 0 (λ)}tn .
X X
=
dt n≥1 n≥0
But w = t · φ(w) implies that dw

dt
= φ(w) + t · φ0 (w) · dw
dt
, from which we see
that
dw −1
= φ(w)[1 − tφ0 (w)] .
dt
Putting this all together, we find
−1 −1
f 0 (w)φ(w)[1 − tφ0 (w)] = F (w)[1 − tφ0 (w)] =
[λn ]{φn+1 (λ)f 0 (λ)}tn = [λn ]{φn (λ)F (λ)}tn .
X X
=
n≥0 n≥0
We write this finally in the form:
F (W (t))
[λn ]{φn (λ)F (λ)}tn .
X
0
=
1 − tφ (W (t)) n≥0
The following example illustrates the above ideas and gives some idea of
the power of the method.
Example of Inversion Formula Suppose that for all n ∈ Z we have
the following relation:
!
X k
bn = ak . (4.40)
k
n−k
Then we want to show that

!
2n − k − 1
(−1)n−k kbk .
X
nan = (4.41)
k
n−k
The latter formula says nothing for n = 0, but the former says that
a0 = b0 . Multiply Eq. 4.40 by wn and sum over n:
! !
n k
ak w n =
X X X
B(w) = bn w =
n n k
n−k
! !
k n k k
wn−k =
X X X X
ak w = ak w
k n n−k k n n−k
k
!
k k
wn = ak wk (1 + w)k =
X X X
= ak w
k n=0
n k
2
= A(w + w ) = A(t),

1
where we have put t = w + w2 = w(1 + w), or w = t · 1+w . So in the
1
notation of the Theorem of Lagrange, φ(w) = 1+w . So if w = W (t) we want
to find A(t) = B(W (t)), i.e., we want the coefficients of A in terms of the
coefficients of B: an tn = bk (W (t))k . At this stage we can say:
P P
an = [tn ]{ bk (W (t))k } = [tn ]{(W (t))k }.

X X
k k
In the notation of the theorem of Lagrange, put f (u) = uk , so that f 0 (u) =

kuk−1 and f (W (t)) = (W (t))k . So for n > 0,
1 n−1 0
[tn ]{(W (t))k } =
[λ ]{f (λ)φn (λ)} =
n
kλk−1
( ) ! ( )
1 n−1 k n−k 1
= [λ ] = [λ ] =
n (1 + λ)n n (1 + λ)n
∞
! ! !
k −n k −n
[λn−k ] λi =
X
= =
n i=0
i n n−k
!
k 2n − k − 1
(−1)n−k .
n n−k
!
P k n−k 2n − k − 1
This implies that an = k n (−1) bk , as desired.
n−k
Central Trinomial Numbers We shall use the second statement in
the Theorem of Lagrange to find the generating function of the “central
trinomial numbers” cn defined by cn = [λn ]{(1 + λ + λ2 )n }. Clearly cn =
[λn ]{F (λ)φn (λ)} where F (λ) = 1, φ(λ) = 1 + λ + λ2 . So φ0 (λ) = 1 + 2λ. Part
2 of the Theorem of Lagrange says that
4.13. EGF: A SECOND LOOK 149
cn tn = F (w){1 − tφ0 (w)}−1 ,

X
n≥0
where w =√ tφ(w) = (t(1+w+w2 )). Hence tw2 +(t−1)w+t = 0, implying that

2 0
√
w = 1−t− 1−2t−3t
2t
. It is easy to compute that 1 − tφ (w) = 1 − 2t − 3t2 .
Now it follows that
cn tn = (1 − 2t − 3t2 )−1/2 .
X
n≥0
P 1·3·5···(2i−1)3 n−i
Exercise: 4.12.8 Show that cn = n2 ≤i≤n (n−i)!(2i−n)!2 n−i =
! !
P n i
n
≤i≤n . (Hint: Remember that you now have cn described
2 i n−i
in two different ways as a coefficient of a certain term in a power series
expansion of some ordinary generating function.)
4.13 EGF: A Second Look

Let M denote a “type” of combinatorial structure. Let mk be the number
of ways of giving a labeled k-set such a structure. In each separate case we
shall specify whether we take m0 = 0 or m0 = 1. Then define
∞
X xk
M (x) = mk .
k=0 k!
Consider a few examples. If T denotes the structure “labeled tree,” then
k−2 xk
as we saw above, T (x) = ∞
P
k=0 k k!
. Similarly, if S denotes the structure
“a set” (often called the “uniform structure”), then sk = 1 for all k ≥ 0, so
xk
S(x) = ∞ x
then ck = (k − 1)! for
P
k=0 k! = e . If C denotes “oriented circuit,”

P∞ xk 1
k ≥ 1. Put c0 = 0. Then C(x) = k=1 k = log 1−x = −log(1 − x). If Π
P∞ k P∞
denotes the structure “permutation,” then Π(x) = k=0 k! xk! = k=0 xk =
1
1−x
.
Suppose we wish to consider the number of ways a labeled n-set can be
partitioned into two parts, one with a structure of type A and the other
with a
structure of type B. The number of ways to do this is clearly nk=0 nk ak bn−k .
P
It follows that if we call this a structure of type A · B, then

∞ n
xn
! !
X X n
(A · B)(x) = ak bn−k = A(x) · B(x).
n=0 k=0 k n!
Famous Example: Derangements again Let D denote the structure

“derangement.” Any permutation consists of a set of fixed points (interpreted
as a set) and a derangement on the remaining points. Hence we have Π(x) =
S(x) · D(x), i.e., (1 − x)−1 = D(x) · ex , implying
∞ ∞
(−1)k xk
D(x) = e−x (1 − x)−1 = xj =
X X
·
k=0 k! j=0
∞ n ∞ n
(−1)k (−1)k xn
! !
· 1 xn =
X X X X
= n! .
n=0 k=0 k! n=0 k=0 k! n!
It follows that we get the usual formula:
n
X (−1)k
dn = n! .
k=0 k!
EXAMPLE 6. In how many ways can a labeled n-set be split into a

number of pairs and a number of singletons?
First, let pn be the number of ways to split an n-set into pairs. Clearly,
if n is odd, pn = 0. By convention we say that p0 = 1. Suppose n = 2k ≥ 2.
Pick a first element a1 in 2k ways, and then the second element in 2k − 1
ways, the third in 2k −2 ways, etc., so that {a1 , a2 }, {a3 , a4 }, · · · , {a2k−1 , a2k }
is chosen in (2k)! ways. But the same pairs could be chosen in k! orders, and
each pair in two ways, so that
(2k)! (2k)(2k − 1)(2k − 2)(2k − 3) · · · 1

p2k = k
=
2 k! 2k k!
2k k!(2k − 1)!!
= = (2k − 1)!!
2k k!
Here (2k − 1)!! = (2k − 1)(2k − 3)(2k − 5) · · · 1, with (2 · 0 − 1)!! = 1 by
convention. Then we find
∞ ∞
X xn X x2k
P (x) := pn = p2k
n=0 n! k=0 (2k)!
∞ ∞
X x2k X x2k
= (2k − 1)!! =
k=0 (2k)! k=0 2k k!
k
∞ x2
X 2 x2
= =e2.
k=0 k!
The number of ways to pick n singletons from an n-set is 1, i.e., the

corresponding egf is S(x) = ex . Hence
1 1
(P · S)(x) = P (x) · S(x) = exp( x2 ) · exp(x) = exp(x + x2 ).
2 2
We can also obtain the same result by using a recursion relation. Denote
the structure P ·S by B. In the set {1, . . . , n} we can either let n be a singleton
or make a pair {x, n} with 1 ≤ x ≤ n − 1. So bn = bn−1 + (n − 1)bn−2 , n ≥ 1.
As b1 = 1 by definition, and b1 = b0 according to the recursion, it must be
xn−1
that b0 = 1. Multiply the recursion by (n−1)! for n ≥ 1 and sum.
∞ ∞ ∞
X xn−1 X xn−1 X xn−1
bn = bn−1 + (n − 1)bn−2 .
n=1 (n − 1)! n=1 (n − 1)! n=1 (n − 1)!
P∞ xn
Also B(x) = n=0 bn n! implies
∞
xn−1
B 0 (x) =
X
bn .
n=1 (n − 1)!
This implies that
B 0 (x) = B(x) + xB(x) = (1 + x)B(x).
Since B(0) = 1, the theory of differential equations shows that B(x) =

exp(x + 12 x2 ).
Example 7. Recall (Theorem 1.7.1) that the Stirling number S(n, k) of
the second kind, the number of partitions of an n-set into k nonempty blocks,
satisfies the following recursion:
S(n, k) = kS(n−1, k)+S(n−1, k−1), n ≥ k; S(n, k) = 0 for n < k. (4.42)

xn−1
Multiply Eq. 4.42 by (n−1)!
and sum over n ≥ k:
X xn−1 X xn−1 X xn−1

S(n, k)· = k(S(n−1, k))· + S(n−1, k−1) .
n≥k (n − 1)! n≥k (n − 1)! n≥k (n − 1)!
n n−1
S(n, k) xn! . Then Fk0 (x) = x
P P
Put Fk (x) = n≥k n≥k S(n, k) (n−1)! and
X xn−1 X xn−1 X xn
k(S(n − 1, k)) =k· S(n − 1, k) =k S(n, k) .
n≥k (n − 1)! n≥k+1 (n − 1)! n≥k n!
Also
X xn−1 X xn−1
S(n − 1, k − 1) = S(n − 1, k − 1) = Fk−1 (x).
n≥k (n − 1)! n−1≥k−1 (n − 1)!
The preceding says that
Fk0 (x) = kFk (x) + Fk−1 (x). (4.43)

We now use induction on k in Eq. 4.43 to prove the following:
Theorem 4.13.1
xn 1
= (ex − 1)k .
X
S(n, k)
n≥k n! k!
n
Proof: For n ≥ 1, S(n, 1) = 1. And n≥1 1 · xn! = 1!1 (ex − 1)1 . So the
P
theorem is true for k = 1.

The induction hypothesis is that for 1 ≤ t < k, Ft (x) = t!1 (ex − 1)t . then
Fk0 (x) = kFk (x) + Fk−1 (x) implies
Fk0 (x) = kFk (x) + (k−1)!
1
(ex − 1)k−1 .
nP n
o
x S(k,k) 1
[xk ]{Fk (x)} = [xk ] n≥k S(n, k) n! = k!
= k!
.
Put Gk (x) = k!1 (ex − 1)k . Then [xk ]{Gk (x)} = k!1 and
G0k (x) = (k−1)!
1
(ex − 1)k−1 · ex . Also kGk (x) + Gk−1 (x) = k
k!
(ex − 1)k +
1
(ex − 1)k−1 = (k−1)!
(k−1)!
1
(ex − 1k−1 [ex − 1 + 1] = G0k (x). This is enough
to guarantee that Fk (x) = Gk (x).
Let n be a positive integer. For each k-tuple (b1 , . . . , bk ) of nonnegative
integers for which b1 + 2b2 + · · · + kbk = n, we count how many ways there
are to partition an n-set into b1 parts of size 1, b2 parts of size 2, . . . , bk parts
of size k. Imagine the elements of the n-set are to be placed in n positions.
The positions are grouped from left to right in bunches. The first b1 bunches
have one position each; the second group of b2 bunches have b2 blocks of size
2, etc. There are n! ways to order the integers in the positions. Within each
grouping of k positions there are k! ways to permute the integers within those
positions. So we divide by (1!)b1 (2!)b2 · · · (k!)bk . But the groups of the same
cardinality can be permuted without affecting the partition. So we divide by
b1 !b2 ! · · · bk !. Hence the number of partitions is:
n!
.
b1 ! · · · bk !(1!)b1 · · · (k!)bk
Now suppose that each j-set can have nj “structures” of type N on it. So
each partition gives (n1 )b1 · · · (nk )bk configurations. Hence the total number
of such configurations is
b1 bk
n! n1 nk

· ··· .
b1 ! · · · bk ! 1! k!
It follows that the number of configurations on an n-set is
b1 bk
n! n1 nk
X
an = · ··· ,
b1 ! · · · bk ! 1! k!
where the sum is over k-tuples (b1 , . . . , bk ) with b1 +2b2 +· · ·+kbk = n, bi ≥ 0;

k ≥ 0. Among the tuples (b1 , . . . , bk ) for which b1 + 2b2 + · · · + kbk = n, we
lump together those for which b1 +· · ·+bk is a constant, say b1 +· · ·+bk = m,
xn
m = 0, 1, . . .. If we let A(x) = ∞ n
P
n=0 an n! , we see that the coefficient on x
equals
b1 bk
1 n1 nk
X
··· ,
b1 ! · · · bk ! 1! k!
where the sum is as above, but we think of it as coming in parts, “part m”

being the sum of those terms with b1 + b2 + · · · + bk = m.
i N (x) m
Put N (x) = ∞ x
P
i=1 ni i! . What does m!
contribute to the coefficient of
N (x) m
n ∞ 1
x in m=0 m! ? In expanding m! N (x)N (x) · · · N (x) (m factors),
P
!
choose
m
terms of degree i, bi times, 1 ≤ i ≤ n. There are ways to
b1 , . . . , b k
choose terms of degree i, bi times, where b1 + · · · + bk = m. This gives a term
of degree 1b1 + 2b2 + · · · + kbk . So the contribution to the term of degree n is
!bk
nk xk
! b1
X 1 m n1 x
··· =
m! b1 , . . . , b k 1! k!
b1 bk
1 n1 nk

xb1 +2b2 +···+kbk .
X
= ···
b1 ! · · · bk ! 1! k!
The sum is over all k ≥ 0, and over all (b1 , . . . , bk ) with bi ≥ 0, ibi = n,
P
P
bi = m. Now sum over all m. (Of course the contribution is zero unless
xn
m ≤ n.) It is clear that A(x) = ∞
P
n=0 an n! = exp(N (x)), and we have proved
the following theorem.
Theorem 4.13.2 If the compound structure S(N ) is obtained by splitting a

set into parts, each of which then gets a structure of type N , and if a k-set
xk n
gets nk structures of type N , so N (x) = ∞
P
k=1 nk k! , and there are k ways
of selecting a k-set, then
S(N )(x) = exp(N (x)).

xk
1·
P
(Keep in mind that S(x) = k!
, since there is only 1 way to impose
the structure of set on a set.)
Example 8. If we substitute the structure “oriented cycle” into the uni-

form structure (set), the we are considering the compound structure consist-
ing of a partition of an n-set into oriented cycles, i.e., the structure Π with
π0 = 1. So we must have Π(x) = exp(C(x)). Indeed, above we determined
that Π(x) = (1 − x)−1 and C(x) = −log(1 − x).
Exercise: 4.13.3 A directed tree with all edges pointing toward one vertex
xn
called the root is called an arborescence. Let T (x) = ∞
P
n=1 tn n! , where tn
4.14. DIRICHLET SERIES - THE FORMAL THEORY 155
n
is the number of labeled trees on n vertices. And let A(x) = ∞ x P
n=1 an n! ,
where an is the number of arborescences on n vertices. Since a labeled tree
on n vertices can be rooted in n ways and turned into an arborescence, and
the process is reversible, clearly an = ntn , i.e., A(x) = xT 0 (x). Consider a
labeled tree on n + 1 vertices as an arborescence with vertex n + 1 as its root.
Then delete the root and all incident edges. The result is a “rooted forest”
on n vertices, with the roots of the individual trees being exactly the vertices
xn
that were originally adjacent to the root n + 1. If F (x) = ∞
P
n=1 fn n! , where
fn is the number of rooted forests on n vertices (and f0 = 1 by convention),
then by Theorem 4.13.2, exp(A(x)) = F (x). Hence we have
∞ ∞
xn xn
= T 0 (x) = x−1 A(x).
X X
exp(A(x)) = fn = tn+1
n=0 n! n=0 n!
P∞ n
Use the special case of Lagrange Inversion to find cn if A(x) = n=1 cn x =
P∞ xn
n=1 an n! , and complete another proof of Cayley’s Theorem.
4.14 Dirichlet Series - The Formal Theory

In this brief section we just introduce the notion of Dirichlet Series.
Def.n Given a sequence {an }∞ , the formal series
∞
X an
f (s) =
n=1 ns
is the Dirichlet series generating function (Dsgf) of the given sequence:
Dsf g
f (s) ↔ {an }.
P an P bn
Suppose A(s) = ns
and B(s) = ns
. The Dirichlet convolution
product is
 
∞ ∞
X an b m X X 1
A(s)B(s) = =  ad bn/d  s .
m,n=1 ns ms n=1 d|n
n
Dsf g
nP o∞
Rule 100 A(s)B(s) ↔ d|n ad bn/d .
n=1
Dsf g
nP o∞
Rule 200 A(s)k ↔ (n1 ,...,nk ):n1 ···nk =n an1 an2 · · · ank .
n=1
A most famous example is given by the Riemann zeta function
∞
1 Dsf g
↔ {1}∞
X
ζ(s) = n=1 .
n=1 ns
Theorem 4.14.1 Let f be a multiplicative

Parithmetic function. Then
P∞ f (n) Q ∞ f (pi )
(i) L(f, s) = n=1 ns = p prime i=0 pis .
(ii) If f is completely multiplicative, then

f (p) −1

prime 1 −
Q
L(f, s) = p ps
.
Proof: If the unique factorization of n is n = pe11 · · · perr , then there is a

e
f (p i )
unique term in the product that looks like ri=1 (peii)s = fn(n)
Q
s .
i
Since U defined by U (n) = 1 is completely multiplicative, we may write

−1 nQ o−1
−s
ζ(s) = n1s = p prime 1 − p1s −
P Q
= p prime (1 p ) . Hence

ζ(s)−1 = −s
Y
p prime 1 − p . (4.44)
∞ ∞
µ(pi )
!
µ(n) Y
1 − p−s = ζ −1 (s).
X Y X
L(µ, s) = = = (4.45)
n=1 ns p
i=0 pis p
In other words,
1 Dsf g
↔ {µ(n)}∞
n=1 . (4.46)
ζ(s)
In the present context we give another proof of the usual Möbius Inversion
Formula.
Theorem 4.14.2 Let F and f be arithmetic functions. Then

X X
F (n) = f (d) for all n ∈ N iff f (n) = F (d)µ(n/d) =
d|n d|n
X
= µ(d)F (n/d), for all n ∈ N .
d|n
4.14. DIRICHLET SERIES - THE FORMAL THEORY 157
Dsf g Dsf g
Proof: suppose F (s) ↔ {F (n)}, f (s) ↔ {f (n)}. Then
F (s) = f (s) · ζ(s) iff F (s)(ζ(s))−1 = f (s).

So F = f ∗ ζ iff f = F ∗ ζ −1 .
Recall the following commonly used multiplictive arithmetic functions in
this context.
(
1, n = 1;
I(n) =
0, n > 1.
P I(n)
So L(I, s) = ns
= 1 is the multiplicative identity.
U (n) = 1 for all n ∈ N . So L(U, s) = ζ(s).
E(n) = n for all n ∈ N . So L(E, s) =

P n P 1
ns
= ns−1 = ζ(s − 1).
1, so τ = U ∗ U is multiplicative, and
P
τ (n) = d|n
ζ (s) = n ( d|n 1 · 1) n1s = n τn(n)
2 P P P
s .
d = d|n E(d)U (n/d) = (E ∗ U )(n). Hence

P P
σ(n) = d|n
σ = E ∗ U is multiplicative.
Since µ = U −1 , E = σ ∗ µ, wich says n =
P
d|n µ(d)σ(n/d). And
 
X 1 X n X X n 1 X σ(n)
ζ(s) · ζ(s − 1) = ( )·( )=  1·  s = .
ns ns n k|n
k n ns
Similarly,
 
X nq q P q
X 1 X X n 1 X d|n d
ζ(s) · ζ(s − q) = · =  1·  = .
ns ns n d|n
d ns n ns
We give some more examples.

Example 4.14.3
∞
1 0 X −(ns )0 −ns log(n)

0
X
ζ (s) = = = =
n=1 ns n2s n2s
X log(n) Dsf g
− ⇒ ζ 0 (s) ↔ {−log(n)}.
ns
Example 4.14.4 The familiar identity
X
φ(d) = n
d|n
says that φ ∗ U = E, from which we see φ = µ ∗ E, i.e.,
n X n
X
φ(n) = µ(d) · = d·µ ,
d|n
d d|n
d
which is the same thing as:
ζ(s − 1)
= L(φ, s).
ζ(s)
Example 4.14.5 Put f (n) = |µ(n)| for all n ∈ N . Clearly f is multiplica-

tive. So
f (p) f (p2 )
! !
X f (n) Y Y 1
= 1 + s + 2s + · · · = p 1 + s =
ns p p p p

1
1−
!
Y p2s Y 1 Y 1
= 1 − 2s · p1 − 1
.
p
1 − p1s p p ps
Also,
4.15. RATIONAL GENERATING FUNCTIONS 159
!
−1
Y 1
(ζ(2s)) = 1 − 2s .
p p
Hence,
X |µ(n)| ζ(s)
= .
ns ζ(2s)
Example 4.14.6
!
1 1 µ(n)
X X
1 = ζ(s) · = s
=
ζ(s) n ns
  (
n 1 1, n = 1;
X X X
 1·µ · ⇒ µ(d) =
n d|n
d s n d|n
0, n > 1.
Example 4.14.7
1 n

· ζ 2 (s) ⇒ 1 =
X
ζ(s) = µ(d) · τ .
ζ(s) d|n
d
P
This also follows from doing Möbius inversion on τ (n) = d|n 1.
4.15 Rational Generating Functions

In this section we consider the simplest general class of generating functions,
namely, the rational generating functions in one variable, and their connec-
tion with homogeneous linear recursions. These are generating functions of
the form
un xn
X
U (x) =
n≥0
for which there are p(x), q(x) ∈ C[x] with

p(x)
U (x) = .
q(x)
Here we assume q(0) 6= 0, so q(x)−1 exists in C[[x]]. Before considering the
connection between rational generating functions and homogeneous linear
recursions, we recall the notion of reverse of a polynomial.
Let f (x) = an + an−1 x + an−2 x2 + · · · + a0 xn ∈ C[x]. The reverse fˆ(x) of
f (x) is defined by
1
fˆ(x) = xn f ( ) = a0 + a1 x + · · · + an xn .
x
If n0 is the multiplicity of 0 as a zero of f (x), i.e., an = an−1 = · · · =
an−n0 +1 = 0, but an−n0 6= 0, and if w1 , . . . , wq are the nonzero roots of
f (x) = 0, then w11 , . . . , w1q are the roots of fˆ(x) = 0, and fˆ(x) = a0 (1 −
w1 x) · · · (1 − wq x). So deg(fˆ(x)) = n − n0 .
Alternatively, if f (x) = (x−α1 )m1 · · · (x−αs )ms , where m1 +· · ·+ms = n
and α1 , . . . , αs are distinct, then
1
fˆ(x) = xn f ( ) = (1 − α1 x)m1 · · · (1 − αs x)ms .
x
ˆ
If a0 · an 6= 0, so neither f (x) nor fˆ(x) has x = 0 as a zero, thenfˆ = f ,
and f (α) = 0 if and only if fˆ(α−1 ) = 0.
Suppose that U (x) = n≥0 un xn = p(x)
P
q(x)
, where deg(p(x)) < deg(q(x)),
is a rational generating function. We assume q(0) 6= 0 in order that q(x)−1
exist in C[[x]], so we may assume without loss of generality that q(0) = 1.
Hence q(x) = 1 + a1 + a2 x2 + · · · + ak xk , p(x) = p0 + p1 x + · · · + pd xd , d < k.
From this it follows that
p0 + · · · pd xd = (1 + a1 x + · · · + ak xk )(u0 + u1 x1 + · · · + un xn + · · ·).
The right hand side of this equality expands to
u0 + (u1 + a1 u0 )x + (u2 + a1 u1 + a2 u0 )x2 + · · ·

+(uk−1 + a1 uk−2 + · · · + ak−1 u0 )xk−1 .
And for n ≥ k,
un + a1 un−1 + · · · + ak un−k = 0, (4.47)

which is the coefficient on xn . If u0 , . . . , uk−1 are given, then un is determined
recursively for n ≥ k.
Put f (x) = q̂(x). Then for the complex number α, it is easily checked
that f (α) = 0 if and only if un = αn is a solution of the recurrence of Eq. 4.47.
The polynomial f (x) is the auxiliary polynomial of the recurrence of
Eq. 4.47.
Theorem 4.15.1 If U (n) = p(x) q(x)

, where deg(p(x)) < deg(q(x)), is a ratio-
nal generating function, then the sequence {un }∞
P
n=0 where U (x) = n≥0 un ,
satisfies a homogeneous linear recurrence, and the denominator q(x) is the
reverse of the auxiliary polynomial of the corresponding recurrence.
Now take the converse point of view.

Let c0 , c1 , . . . , ck−1 be given complex constants, and let a1 , . . . , ak also be
given. Let U = {un }, n ≥ 0 be the unique sequence determined by the
following initial conditions and homogeneous linear recursion:
u0 = c0 , u1 = c1 , . . . , uk−1 = ck−1 ,
[HLR]
un+k + a1 un+k−1 + a2 un+k−2 + · · · + ak un = 0, n ≥ 0.
Theorem 4.15.2 The ordinary generating function for the sequence{un } de-
fined by [HLR] is
∞
un xn = R(x)/(1 + a1 x + · · · + ak xk ),
X
U (x) =
n=0
where R(x) is a polynomial with degree less than k.
Proof:
Consider the product:
(1 + a1 x + · · · + ak xk )(u0 + u1 x + · · ·).
The coefficient on xn+k is
un+k + a1 un+k−1 + a2 un+k−2 + · · · + ak un .

And this equals 0 for n ≥ 0 by [HLR], so the only coefficients that are possibly
nonzero in the product are those on 1, x, . . . , xk−1 .
Note that the coefficients of R(x) may be obtained from multiplying out
the two factors (just as we did above):
R(x) = u0 + (u1 + a1 u0 )x + (u2 + a1 u1 + a2 u0 )x2 + · · ·

+(uk−1 + a1 uk−2 + · · · + ak−1 u0 )xk−1 .
As u0 , . . . , uk−1 are given by the initial conditions, R(x) is determined.
Theorem 4.15.3 Suppose (un ) is given by [HLR] and the auxiliary polyno-
mial has the form
f (t) = (t − α1 )m1 · · · (t − αs )ms .

Then s
Pi (n)αin ,
X
un =
i=1
where Pi is a polynomial with degree at most mi − 1, 1 ≤ i ≤ s.
Proof: By the theory of partial fractions, U (x) can be written as the sum
of s expressions of the form:
(∗) γ1 /(1 − αx) + γ2 /(1 − αx)2 + · · · + γm /(1 − αx)m ,

where in each such expression α = αi , m = mi , for some i in the range
1 ≤ i ≤ s.
Recall:
∞
!
−n n+k−1
xk .
X
(1 − x) =
k=0
k
k
So the coefficient of x in (*) is
! ! !
1+k−1 k 2+k−1 k m+k−1
γ1 α + γ2 α + · · · + γm αk
k k k
" ! ! !#
k k+1 m+k−1
= γ1 + γ2 + · · · + γm αk = P (k)αk .
0 1 m−1
The formula
!
k+l
= (k + l)(k + l − 1) · · · (k + 1)/l(l − 1) · · · 1
l
!
k+l
shows that is a polynomial in k with degree l. Hence P (k) is a
l
polynomial in k with degree at most m − 1. The theorem follows.
In practice we assume the form of the result for un and obtain the co-
efficients of the polynomials Pi (n) by substituting in the initial values of
u0 , u1 , . . . , uk−1 and solving k equations in k unknowns.
Example 4.15.4 The Fibonacci Sequence again Put F0 = F1 = 1 and

Fn+2 − Fn+1 − Fn = 0 for n ≥ 0. So the √
auxiliary√equation is 0 = f (t) = t2 −
t−1 = (t−α1 )(t−α2 ), where α1 = 1+2 5 , α2 = 1−2 5 . Put F (x) = n≥0 Fn xn ,
P
1 ops ∞
and compute F (x)(1 − x − x2 ) = 1, so F (x) = 1−x−x 2 ↔ {Fn }n=0 . Then
A B 1
F (x) = 1−α 1x
+ 1−α 2x
= 1−x−x 2 leads to
2 2
(α1 )i xi + (α2 )i xi .
X X
F (x) = √ √
5− 5 i 5+ 5 i
Hence
√ n √ n
2(1 + 5) 2(1 − 5)
Fn = [xn ]{F (x)} = √
n
+ √ .
(5 − 5)2 (5 + 5)2n
Exercise: 4.15.5 Let {un }∞

n=0 be the sequence satisfying the recurrence
un+4 = 2un+3 − 2un+1 + un , n ≥ 0,
and satisfying the initial conditions
u0 = −1, u1 = +1, u2 = 0, u3 = 1.
Find a formula for un . Also find the generating function for the sequence
{un }∞
n=0 .
4.16 More Practice with Generating Func-

tions
n o P k
1 xj
Theorem 4.16.1 [y j ] 1−x−xy
= k j
xk = (1−x)j+1
.
P
Proof: For j ≥ 0, put gj (x) = k kj xk . Note that g0 (x) = 1−x 1
. We
x
claim gj+1 (x) = 1−x gj (x), for j ≥ 0. For j ≥ 1, xgj−1 (x) + xgj (x) =

k
+ k≥j kj xk+1 = j−1
k+1 P j P k+1 k+1 j P k
xk =
P
k≥j−1 j−1 x j−1
x + k≥j j
x = x + k≥j+1 j

k x
xk = gj (x). Hence for j ≥ 1, gj (x) = 1−x
P
k≥j j
gj−1 (x). Now put
P∞ x
H(x, y) = j=0 gj (x)y . Then j≥1 gj (x)y = 1−x j≥1 gj−1 (x)y j , implying
j j
P P

xy P j xy xy
H(x, y)−g0 (x) = 1−x j≥0 gj (x)y = 1−x
H(x, y). Hence H(x, y) 1 − 1−x
=
1
g0 (x) = 1−x , and thus
1
H(x, y) = .
1 − x − xy
1

P k k j j
n
1
o
j
This forces gj (x) = k x = [y ] {H(x, y)} = [y ] = [y ] 1−x
1−( x =

j 1−x−xy
1−x )y
P i
1 x xj
[y j ] 1−x i 1−x
yi = (1−x)j+1
.
P k
P n−k n
1
o
Theorem 4.16.2 k n−k
xn−k = k k
xk = [y n ] 1−y−xy 2
.
P n−k
Proof: For n ≥ 0, put fn (x) = k k
xk (0 ≤ k ≤ n2 ). We claim that
fn+2 (x) = xfn (x) + fn+1 (x). For,
! !
X n−k k X n+1−k k
x x + x =
0≤k≤ n
k 0≤k≤ n+1 k
2 2
! ! !
X n − k k+1 n+1 X n+1−k k
= x + + x =
0≤k≤ n
k 0 1≤k≤ n+1
k
2 2
! ! !
X n−t+1 t n+2 X n+1−t t
= x + x
1≤t≤ n+1
t−1 0 1≤t≤ n+1
t
2 2

n+2
! ! !
n+2 X n+2−t t n− 2
+1 n+2
= + x n x 2
0 1≤t≤ n+1
t 2
2
4.16. MORE PRACTICE WITH GENERATING FUNCTIONS 165
!
X n+2−t t
= x = fn+2 (x).
0≤t≤ n+2
t
2
Note that f0 (x) = 1; f1 (x) = 1; f2 (x) = 1 + x.

Put G(x, y) = ∞ n
P
n=0 fn (x)y . Multiply the recursion just established for
fn (x) by y n+2 , n ≥ 0, and sum over n.
∞ ∞ ∞
n+2 n+2
fn+2 (x)y n+2
X X X
xfn (x)y + fn+1 (x)y =
n=0 n=0 n=0
⇒ xy 2 G(x, y) + y (G(x, y) − f0 (x)) = G(x, y) − f0 (x) − f1 (x)y

⇒ G(x, y) = [xy 2 + y − 1] = y(1 − 1) − 1 = −1.
1
⇒ G(x, y) = .
1 − y − xy 2
P n−k n
1
o
Note that fn (1) = k k
= [y n ] 1−y−y 2
= Fn , the nth Fibonacci
number.
Exercise: 4.16.3 (Ex. 10F, p.77 of van Lint & Wilson) Show that
n
!
X
k 2n − k 2n−2k
(−1) 2 = 2n + 1.
k=0 k
P n−k
Exercise: 4.16.4 Evaluate the sum k k
(−1)k .
P n−k
Exercise: 4.16.5 Evaluate k k
.
Theorem 4.16.6 Sylvia’s Problem. Establish the identity

 k
n b2c
k
! !  4 , n≥1
k j−n−1 n


j
X
(−2) = (4.48)
j=2n j n−1 
0, n = 0.


Proof: It is clear that the L.H.S. in the desired

equality
equals 0 when
n =
Pk j k j−n−1 n Pk k j−2−(n−1)
0. So assume n ≥ 1 and note that j=2n (−2) j n−1
= 4 j=2n j n−1
(−2)j−2n
So we may restate the desired result as:
 k
b2c
k
! !  , n≥1
k j−n−1 n


(−2)j−2n =
X
(4.49)
j=2n j j − 2n 
0, n = 0.


Put
X bkc X X bkc
! ! !
∗ k n n
T (x, y) = 2 x y = 2 y xk
k,n n k n n
k
(1 + y)b 2 c xk = (1 + x) + (1 + y)(x2 + x3 ) + (1 + y)2 (x4 + x5 ) + · · ·
X
=
k
∞
1+x
(1 + y)i x2i =
X
= (1 + x) .
i=0 1 − (1 + y)x2
0 ∗ 1
Note that [y ]T (x, y) = 1−x
.
1−x2 −1+x2 +yx2
Now put T (x, y) = T ∗ (x, y) − 1
1−x
= 1+x
1−(1+y)x2
− 1
1−x
= (1−x)(1−(1+y)x2 )
=
x2 y
(1−x)(1−(1+y)x2 )
.
 k
b2c


 n
, n ≥ 1;
x2 y
n o
So [xk y n ] (1−x)(1−(1+y)x2 )
=

0, n = 0.


x2 y
Hence T (x, y) = is the generating function for the doubly-
(1−x)(1−(1+y)x2 )
infinite sequence of terms on the R.H.S. of Eq. 4.48.
Put ! !
X k j−n−1
S(x, y) = (−2)j−2n xk y n .
k,n,j j j − 2n
Then [xk y n ] {S(x, y)} is the desired sum (on the L.H.S. of Eq. 4.49).
Hence our task is equivalent to showing that S(x, y) = T (x, y).
Make!the invertible !substitution
! (change of variables):
t 1 −2 j
= , i.e., t = j − 2n, s = j − n, with inverse
s 1 −1 n
j = 2s − t, n = s − t. Hence we have
! !
k s−1
(−2)t xk y s−t
X
S(x, y) =
k,s,t 2s − t t
4.17. THE TRANSFER MATRIX METHOD 167
! ! !!
k s−1
xk y s−t (−2)t
X X X
=
s t k 2s − t t
(now use Theorem 4.16.1)
x2s−t
! !!
s−1
y s−t (−2)t
X X
= 2s−t+1
s t (1 − x) t
 !t 
x2s y s
!
X X s−1 (1 − x)(−2)
=  
s t t xy (1 − x)2s+1
 !s−1  !s
X
 1−
2(1 − x) x2 y 1
= 
s≥1 xy (1 − x)2 1−x
!j !j+1
X xy − 2(1 − x) x2 y 1
j≥0 xy (1 − x)2 1−x
!j
x2 y X (xy − 2(1 − x))x x2 y 1
= = · xy−2(1−x))x
(1 − x)3 j≥0 (1 − x)2 (1 − x)3 1 − (1−x)2
x2 y x2 y
= =
(1 − x)(1 − 2x + x2 − x2 y + 2x − 2x2 ) (1 − x)(1 − x2 − x2 y)
= T (x, y).
4.17 The Transfer Matrix Method

The Transfer Matrix Method, when applicable, is often used to show that a
given sequence has a rational generating function. Sometimes that knowledge
helps one to compute the generating function using other information.
Let A be a p × p matrix over the complex numbers C. Let f (λ) =
det(λI − A) = ap−n0 λn0 + · · · + a1 λp−1 + λp be the characteristic polynomial
of A with ap−n0 6= 0. So the reverse polynomial fˆ (cf. Section
4.15) isgiven by
fˆ(λ) = 1+a1 λ+· · · ap−n0 λ 0 . Hence det(I −λA) = λ det λ1 I − A = fˆ(λ).
p−n p
We have essentially proved the following:

Lemma 4.17.1 If f (λ) = det(λI − A), then fˆ(λ) = det(I − λA). Moreover,
ˆ
if A is invertible, so n0 = 0, then fˆ = f , and f (λ) = det(λI − A) iff
fˆ(λ) = det(I − λA).
For 1 ≤ i, j ≤ p, define the generating function
(An )ij λn .
X
Fij (A, λ) = (4.50)
n≥0
Here A0 = I even if A is not invertible.

(−1)i+j det[(I−λA):j,i]
Theorem 4.17.2 Fij (A, λ) = det(I−λA)
.
Proof: Here (B : i, j) denotes the matrix obtained from B by deleting the

i+j det(B:j,i)
i th
row and the j th column. Recall that (B −1 )ij = (−1) det(B) . Suppose
(−1)i+j det(B:j,i)
that B = I − λA, so B −1 = (I − λA)−1 = ∞ n n P
n=0 A λ , and det(B)
=
(B −1 )ij = ∞ n n
P
n=0 (A )ij λ = Fij (A, λ), proving the theorem.
Corollary 4.17.3 Fij is a rational function of λ whose degree is strictly less

than the multiplicity n0 of 0 as an eigenvalue of A.
Proof: Let f (λ) = det(λI − A) as in the paragraph preceding the state-

ment of Lemma 4.17.1, so fˆ(λ) = det(I − λA) has degree p − n0 , and
deg(det(I − λA) : j, i)) ≤ p − 1. Hence deg(Fij (A, λ)) ≤ (p − 1) − (p − n0 ) =
n0 − 1 < n 0 .
Now write q(λ) = det(I −λA) = fˆ(λ). If w1 , . . . , wq are the nonzero
eigen-

1 1 1 1
values of A, then w1 , . . . , wq are the zeros of q(λ), so q(λ) = a λ − w1 · · · λ − wq
for some nonzero a. From the definition of q(λ) we see that q(0) = det(I) = 1,
so
!
1 1

q
q(λ) = (−1) w1 · · · wq λ− ··· λ − . (4.51)
w1 wq
Then after computing the derivative q 0 (λ) we see easily that
 
−λq 0 (λ)  1 1 
= −λ + · · · + (4.52)
q(λ) λ − 1 λ − w1q 
w1
w1 λ w2 λ wq λ
= + +
1 − w1 λ 1 − w2 λ 1 − wq λ
q X
∞ ∞ q ! ∞
win λn win λn = tr(An )λn .
X X X X
= =
i=1 n=1 n=1 i=1 n=1
We have proved the following corollary:
P∞ −λq 0 (λ)
Corollary 4.17.4 If q(λ) = det(I − λA), then n=1 tr(An )λn = q(λ)
.
Let D = (V, E, φ) be a finite digraph, where V = {v1 , . . . , vp } is the set of
vertices, E is a set of (directed) edges or arcs, and φ : E → V × V determines
the edges. If φ(e) = (u, v), then e is an edge from u to v, with initial vertex
int(e) = u and final vertex f in(e) = v. If u = v, then e is a loop. A walk
Γ in D of length n from u to v is a sequence e1 e2 · · · en of n edges such that
int(e1 ) = u, f in(en ) = v, and f in(ei ) = int(ei+1 ) for 1 ≤ i < n. If also
u = v, then Γ is called a closed walk based at u. (Note: If Γ is a closed walk,
then ei ei+1 · · · en e1 · · · ei−1 is in general a different closed walk.)
Now let w : E → R be a weight function on E (R is some commutative
ring; usually R = C or R = C[x].) If Γ = e1 e2 · · · en is a walk, then the weight
of Γ is defined by w(Γ) = w(e1 )w(e2 ) · · · w(en ). Fix i and j, 1 ≤ i, j ≤ p.
P
Put Aij (n) = Γ w(Γ), where the sum is over all walks Γ in D of length n
from vi to vj . In particular, Aij (0) = δij . The fundamental problem treated
by the transfer matrix method (TMM) is the evaluation of Aij (n), or at least
the determination of some generating function for the Aij (n).
Define a p × p matrix A = (Aij ) by
X
Aij = w(e),
e
where the sum is over all edges with int(e) = vi and f in(e) = vj . So
Aij = Aij (1). A is the adjacency matrix of D with respect to the weight
function w.
Theorem 4.17.5 Let n ∈ N . Then the (i, j)-entry of An is equal to Aij (n).
(By convention, A0 = I even if A is not invertible.)
Proof: (An )ij = Aii1 Ai1 i2 · · · Ain−1 j , where the sum is over all sequences
P
(i1 , . . . , in−1 ) ∈ [p]n−1 . (Here i = i0 and j = in .) The summand is zero unless

there is a walk e1 · · · en from vi to vj with int(ek ) = vik−1 (1 ≤ k ≤ n), and
f in(ek ) = vik (1 ≤ k ≤ n). If such a walk exists, then the summand is equal
to the sum of the weights of all such walks.
We give a special case that occasionally works out in a very satisfying
P
way. Let CD (n) = Γ w(Γ), where the sum is over all closed walks Γ in D of
length n. In this case we have the following.
−λq 0 (λ)
CD (n)λn = where q(λ) = det(I − λA).
P
Corollary 4.17.6 n≥1 q(λ)
,
Proof: Clearly CD (1) = tr(A), and by Theorem 4.17.5 we have CD (n) =

0 (λ)
tr(An ). Hence by Cor 4.17.4 we have n≥1 CD (n)λn = −λq
P
q(λ)
.
Often an enumeration problem can be represented as counting the number
of sequences a1 a2 · · · an ∈ [p]n of integers 1, . . . , p subject to certain restric-
tions on the subsequences ai ai+1 that may appear. In this case we form a
digraph D with vertices vi = i, 1 ≤ i ≤ p, and put an arc e = (i, j) from
i to j provided the subsequence ij is permitted. So a permitted sequence
ai1 ai2 · · · ain corresponds to a walk Γ = (i1 , i2 )(i2 , i3 ) · · · (in−1 , in ) in D of
length n − 1 from i1 to in . If w(e) = 1 for all edges in D and if A is the
adjacency matrix of D with respect to this particular weight function, then
clearly f (n) := pi,j=1 Aij (n − 1) is the number of sequences a1 a2 · · · an ∈ [p]n
P
subject to the restrictions used in defining D. Put q(λ) = det(I − λA) and
qij (λ) = det((I − λA) : j, i). Then by Theorem 4.17.2
 
p
f (n + 1)λn = Aij (n) λn
X X X
F (λ) :=  (4.53)
n≥0 n≥0 i,j=1
p X p p
(−1)i+j qij (λ)
Aij (n)λn =
X X X
= Fij (A, λ) = .
i,j=1 n≥0 i,j=1 i,j=1 q(λ)
We state this as a corollary.
Corollary 4.17.7 If w(e) = 1 for all edges in D and f (n) is the number
of sequences a1 a2 · · · an ∈ [p]n subject to the restrictions used in defining D,
then
p
(−1)i+j qij (λ)
f (n + 1)λn =
X X
. (4.54)
n≥0 i,j=1 q(λ)
We give an easy example that can be checked by other more elementary

means.
Example 1. Let f (n) be the number of sequences a1 a2 · · · an ∈ [3]n with
the property that a1 = an and ai 6= ai+1 for
 1 ≤ i ≤ n − 1. Then the
0 1 1
adjacency matrix A for this example is A =  1 0 1 .
 
1 1 0
We apply Cor. 4.17.6. It is easy to check that q(λ) = det(I − λA) =

(1 + λ)2 (1 − 2λ), and q 0 (λ) = −6λ(1 + λ). Using partial fractions, etc., we
find that
−λq 0 (λ) 2 1
= −3 + +
q(λ) 1 + λ 1 − 2λ
∞ ∞
2(−λ)n + 2n λn
X X
= −3 +
n=0 n=0
∞
(2n + (−1)n 2)λn .
X
= −3 +
n=0
Here n = 3 gives 8 − 2 = 6. The six sequences are of the form aba with
a and b arbitrary but distinct elements of {1, 2, 3}.
Example 2. Let D be the complete (weighted) digraph on two vertices,
i.e., p = 2, V = {v0 , v1 }, and the weight w(eij ) of the edge eij from vi to vj , is
the indeterminate xij , 0 ≤ i, j ≤ 1. A sequence a0 a1 a2 · · · an of n + 1 0’s and
1’s corresponds to a walk of length n along edges a0 a1 , a1 a2 , . . . , an−1 an , and
!
x00 x01
has weight xa0 a1 xa1 a2 · · · xan−1 an . The adjacency matrix is A = .
x10 x11
Then (An )ij = Γ w(Γ), where the summation is over all walks Γ of length
P
n from i to j, 0 ≤ i, j ≤ 1. At this level of generality we are in a position to

consider several different problems.
Problem 2.1 Let f (n) be the number of sequences of n 0’s and 1’s with 11
never appearing as a subsequence ai ai+1 , i.e., x11 = 0. Then as in Cor. 4.17.7
i+j
we put x00 = x01 = x10 = 1 and we have n≥0 f (n+1)λn = 2i,j=1 (−1) q(λ)qij (λ) ,
P P
!!
1 1
where q(λ) = det I − λ = 1−λ−λ2 . A quick computation shows
1 0
that q11 = 1, q12 = −λ; q21 = −λ; q22 = 1 − λ. Hence n≥0 f (n + 1)λn =
P
2+λ
1−λ−λ2
. We recognize that this denominator gives a Fibonacci type sequence.
√ √
2+λ b c 1− 5 1+ 5
If we solve 1−λ−λ 2 = 1−αλ + 1−βλ
with α = 2
and β = 2
for b and c,
we eventually find that
2+λ
f (n + 1)λn
X
2
=
1−λ−λ n≥0
if and only if
√ ! √ !n √ ! √ !n
5−2 1− 5 5+2 1+ 5
f (n + 1) = √ + √ .
5 2 5 2
Problem 2.2. Find the number of sequences of n + 1 0’s and 1’s with 11
never appearing as a subsequence ai ai+1 , i.e., x11 = 0 as above, but this time
consider only those sequences starting with a fixed i ∈ {0, 1} and ending
with a fixed j ∈ {0, 1}.
For this
! situation we need to find the ij entry of the nth power of A =
1 1
. Here we diagonalize the matrix A to find its powers
1 0
!  1+√5 n  √ !
0
1
An = √ √ 2 √ −2  2 √ n  1 + √5 2
4 5 5−1 5+1 0 1− 5 1− 5 2
2
 √ √ √ √ 
(1+ 5)n+1 (1− 5)n+1 (1+ 5)n (1− 5)n
1 n−1 − √ 2n−1 2n−2
− √2n−2
= √  √ 2√
( 5−1)(1+ 5)n+1
√
( 5+1)(1− 5)n+1
√ √
( 5−1)(1+ 5)n
√
( 5+1)(1− 5)n
.
4 5 2n
+ 2n 2n−1
+ 2n−1
For example the 12 entry of this matrix is the number of sequences of
n + 10’s and 1’s with 11 never appearing as a subsequence ai ai+1 and starting
with 0 and ending with 1. A little routine computation gives
√ √
n (1 + 5)n (1 − 5)n
tr(A ) = + .
2n 2n
0
We could also have used Cor. 4.17.6 and calculated −λq (λ)
q(λ)
= −2 +
2−λ 1 1
1−λ−λ2
= −2 + 1−αλ + 1−βλ . This agrees with the above for n ≥ 1, but
in the proof of Cor. 4.17.6 the term CD (0) is not accounted for.
Problem 2.3 Suppose we still require that two 1’s never appear together,
but now we want to count sequences with prescribed
! numbers of 0’s and 1’s.
x00 x01
Return to the situation where A = . Then
x10 x11
!
1 − x11 x01
∞
!−1
X
n −1 1 − x00 −x01 x10 1 − x00
A = (I − A) = =
n=0
−x10 1 − x11 (1 − x00 )(1 − x11 ) − x01 x10
!  
1 − x11 x01 1 1
= · · x01 x10

x10 1 − x00 (1 − x00 )(1 − x11 1 − (1−x00 )(1−x11 )
∞
xi01 xi10
!
1 − x11 x01 X
=
x10 1 − x00 i=0 (1 − x00 )
i+1 (1 − x )i+1
11
∞ ∞ X
∞
! ! !
1 − x11 x01 i+j j i+k k
xi01 xi10
X X
= x00 x11 .
x10 1 − x00 i=0 j=0 k=0 j k
If we suppose that the pair 11 never appears, so x11 = 0, then xk11 = δk,0 .
And
∞ ∞
! !
X
n 1 x01 X i+j i i j
A = x01 x10 x00 .
n=0
x10 1 − x00 i,j=0 j
We now consider what this equation implies for the (i, j) position, 1 ≤
i, j ≤ 2.
P∞ n P∞ i+j i i j
Case 1. (1,1) position: n=0 (A )11 = i,j=0 j
x01 x10 x00 . So there

must be i+j j
ways of forming walks of length 2i + j using the edges x01 and
x10 each i times and the edge x00 j times. This corresponds to a sequence of
length 2i + j + 1 with exactly i 1’s (and i + j + 1 0’s), starting and ending
with a 0, and never having two 1’s next to each other. Another way to view
this is as needing to fill i + 1 boxes with 0’s (the boxes before and after each
1) so that each box has at least one 0. This is easily seen tobe the
same
as
i+j i+j
an (i + 1)-composition of i + j + 1, of which there are i = j . (See
pages 15-16 of our class notes.)

P∞ P∞ i+j j
Case 2. (1,2) position: n
n=0 (A )12 = i,j=0 j
xi+1 i
01 x10 x00 . Here there

must be i+j j
walks of length 2i + j + 1 using the edge x01 i + 1 times, the
edge x10 i times, and the edge x00 j times. This corresponds to a sequence
of length 2i + j + 2 with exactly i + 1 1’s (and i + j + 1 0’s), starting with
a 0 and ending with a 1, and never having two 1’s next to each other. It is
clear that this kind of sequence is just one from Case 1 with a 1 appended
at the end.
Case 3. (2,1) position: This is the same as Case 2., with the roles of x01
and x10 interchanged, and the 1 appended at the beginning of the sequence.

i+j
xi01 xi10 xj00
P∞ n P∞
Case 4. (2,2) position: n=0 (A )22 = i,j=0 j

i+j
xi01 xi10 xj+1
P∞
− i,j=0 j 00
∞ ∞
! !
i+j i+j
xi01 xi10 xj00 xi01 xi10 xj+1
X X
= − 00
i,j=0 j i,j=0 j
∞ ∞
" ! !#
i+j i+j−1
xi01 xi10 xi01 xi10 xj00
X X
= + −
i=0 i=0;j=1 j j−1
∞
!
i+j−1
xi01 xi10 xj00
X
=1+ ,
k,j=1 j
after some computation. A term xi01 xi10 xj00 corresponds to a sequence of

length 2i + j + 1 = n, and n must be at least 3 before anything interesting
shows up. Here i + j − 1 = n − 3 − (i − 1) and j = n − 2i − 1 so (n − 3 −
(i − 1)) − (n − 2i − 1) = i − 1. Hence, the number of sequences
of 0’s and
n−3−(i−1)
1’s of length n ≥ 3 and with 11 never appearing is 1≤i≤ n−1
P
i−1
=
2
n−3−k
. We recognize this as Fn−3 , the (n−3)th Fibonacci number.
P
0≤k≤ n−3
2 k
(See Section 4.11.)
Example 3. Let f (n) be the number of sequences a1 . . . an ∈ [3]n such

that neither 11 nor 23 appear as two consecutive terms ai ai+1 . Determine
f (n) or at least a generating function for f (n).
Solution: Let D be the digraph on V = [3] with an edge (i, j) if and only
if j is allowed to follow i in the sequence. Also let 
w(e) = 1 for

each edge e
0 1 1
of D. The corresponding adjacency matrix is A =   1 1 0  . So f (n) =

1 1 1
P3
i,j=1 Aij (n − 1). Put q(λ) = det(I − λA), and qij (λ) = det(I − λA : j, i).
By Theorem 4.17.2, F (λ) := n≥0 f (n + 1)λn = n≥0 ( 3i,j=1 A)ij (n)) =
P P P
P3
(−1)i+j qij (λ)
i,j=1
q(λ)
. It is easy to work out q(λ) = λ3 − λ2 − 2λ + 1. Then
det[(I − λA)−1 ] = [det(I − λA)]−1 . By Cor. 4.17.3, Fij (A, λ), and hence F (λ)
is a rational function of λ of degree less than the multiplicity n0 of 0 as an
eigenvalue of A. But q(λ) has degree 3, forcing A to have rank at least 3.
But A is 3 × 3, so n0 = 0. Since the denominator of F (λ) is q(λ), which has
degree 3, the numerator of F (λ) has degree at most 2, sois determined by
2 2 1
2
its values at three points. Note: we need A =  1 2 1 .
 
2 3 2
Then
3
X
f (1) = Aij (0) = tr(I) = 3.
i,j=1
3
X
f (2) = Aij (1) = 7.
i,j=1
3
X
f (3) = Aij (2) = 16.
i,j=1
2
Then for some a0 , a1 , a2 ∈ Q, F (λ) = a0 +a 1 λ+a2 λ
f (n + 1)λn =
P
1−2λ−λ2
= n≥0
2
f (1) + f (2)λ + f (3)λ + · · ·, which implies that
(a0 + a1 λ + a2 λ2 ) = (1 − 2λ − λ2 + λ3 )(3 + 7λ + 16λ2 + · · ·) = 3 + λ − λ2 .
Hence
3 + λ − λ2
f (n + 1)λn =
X
F (λ) = ,
n≥0 1 − 2λ − λ2 + λ3
from which it follows that
X
n+1 3λ + λ2 − λ3
f (n + 1)λ = .
n≥0 1 − 2λ − λ2 + λ3
Now add f (0) = 1 to both sides to get

∞
1+λ
f (n)λn =
X
.
n=0 1 − 2λ − λ3 + λ3
The above generating function for f (n) implies that
f (n + 3) = 2f (n + 2) + f (n + 1) − f (n). (4.55)
For a variation on the problem, let g(n) be the number of sequences

a1 · · · an such that neither 11 nor 23 appears as two consecutive terms ai ai+1
or as an a1 . So g(n) = CD (n), the number of closed walks in D of length n.
0 (λ) 2)
So we just need to compute −λq q(λ)
= λ(2+2λ−3λ
1−2λ−λ2 +λ3
.
The preceding example is Example 4.7.4 from R. P. Stanley, Enumerative
Combinatorics, Vol. 1., Wadsworth & Brooks/Cole, 1986. See that reference
for further examples of applications of the transfer matrix method.
Exercise: 4.17.8 Let f (n) be the number of sequences a1 a2 · · · an ∈ [3]n with

[3] = {0, 1, 2} and with the property that a1 = an and 0 and 2 are never next
to each other. Use the transfer matrix method to find a generating function
for the sequence {an }∞n=1 , and then find a formula for fn .
4.18 A Famous NONLINEAR Recurrence

For n ≥ 3 let un be the number of ways to associate a finite sequence
x1 , . . . , xn . As a first example,
u3 = |{x1 (x2 x3 ), (x1 x2 )x3 }| = 2.

Similarly, u4 = 5 =
|{x1 (x2 (x3 x4 )), x1 ((x2 x3 )x4 ), (x1 x2 )(x3 x4 ), (x1 (x2 x3 ))x4 , ((x1 x2 )x3 )x4 }|.
By convention, u1 = u2 = 1.
A given associated product always looks like (x1 . . . xr )(xr+1 . . . xn ), where
1 ≤ r ≤ n − 1. So un = u1 un−1 + u2 un−2 + · · · + un−1 u1 , n ≥ 2. Hence
n−1
X
un = ui un−i .
i=1
P∞
Put f (x) = n=1 un xn . Then
∞ n−1 ∞
!
2
ui un−i xn = un xn = f (x) − x.
X X X
(f (x)) =
n=2 i=1 n=2
1
It follows that [f (x)]2 − f (x) + x = 0, so f (x) = 12 [1 ± (1 − 4x) 2 ]. We
must use the minus sign, since the constant term of f (x) is f (0) = 0. This
leads to
∞
!
1 1 1 1 1X 1
f (x) = − (1 − 4x) 2 = − 2 (−4x)n .
2 2 2 2 n=0 n
Then a little computation shows that
1 (2n − 2)!
un = (− )(−1)n−1 (1 · 3 · 5 · · · (2n − 3))(−1)n 4n ÷ 2n · n! = =
2 n!(n − 1)!
!
1 2(n − 1)
= = Cn−1 .
n n−1

1 2n
These numbers Cn = n+1 n
are the famous Catalan numbers. See
Chapter 14 of Wilson and van Lint for a great deal more about them.
4.19. MACMAHON’S MASTER THEOREM 177
4.19 MacMahon’s Master Theorem

4.19.1 Preliminary Results on Determinants
Theorem 4.19.2 Let R be a commutative ring with 1, and let A be an n × n
matrix over R. The characteristic polynomial of A is given by
n
ci xn−i
X
f (x) = det(xI − A) = (4.56)
i=0
where c0 = 1, and for 1 ≤ i ≤ n, ci =

P
det(B), where B ranges over all the
i × i principal submatrices of −A.
Proof: Clearly det(xI − A) is a polynomial of degree n which is monic,

i.e., c0 = 1, and and with constant term det(−A) = (−1)n det(A). Suppose
1 ≤ i ≤ n − 1 and consider the coefficient ci of xn−i in the polynomial
det(xI − A). Recall that in general, if D = (dij ) is an n × n matrix over a
commutative ring with 1, then
(−1)sgn(π) · d1,π(1) d2,π(2) · · · dn,π(n) .

X
det(D) =
π∈Sn
So to get a term of degree n − i in det(xI − A) = π∈Sn (−1)sgn(π) (xI −

P
A)1,π(1) · · · (xI − A)n,π(n) we first select n − i indices j1 , . . . , jn−i , with comple-

mentary indices k1 , . . . , ki . Then in expanding the product (xI−A)1,π(1) · · · (xI−
A)n,π(n) when π fixes j1 , . . . , jn−i , we select the term x from the factors
(xI − A)j1 ,j1 , . . . , (xI − A)jn−i jn−i , and the terms (−A)k1 ,π(k1 ) , . . . , (−A)ki ,π(ki )
otherwise. So if A(k1 , . . . , ki ) is the principal submatrix of A indexed by rows
and columns k1 , . . . , ki , then det(−A(k1 , . . . , ki )) is the associated contribu-
tion to the coefficient of xn−i . It follows that ci = det(B) where B ranges
P
over all the principal i × i submatrices of −A.

Suppose the permutation π ∈ Sn consists of k permutation cycles of sizes
P
l1 , . . . , lk , respectively, where li = n. Then sgn(π) can be computed by
sgn(π) = (−1)l1 −1+l2 −1+···lk −1 = (−1)n−k = (−1)n (−1)k .

We record this formally as:
sgn(π) = (−1)n (−1)k if π ∈ Sn is the product of k disjoint cycles. (4.57)

4.19.3 Permutation Digraphs

Let A = (aij ) be an n × n matrix over the commutative ring R with 1.
Let Dn be the complete digraph of order n with vertices 1, 2, . . . , n, and
for which each ordered pair (i, j) is an arc of Dn . Assign to each arc (i, j)
the weight aij to obtain a weighted digraph. The weight of a directed cycle
γ : i1 7→ i2 7→ · · · 7→ ik 7→ i1 is defined to be
wt(γ) = −ai1 i2 · ai2 i3 · · · · · aik−1 ik · aik i1 , (4.58)

which is the negative of the product of the weights of its arcs.
Let π ∈ Sn . The permutation digraph D(π) has vertices 1, . . . , n and arcs
(i, π(i)), 1 ≤ i ≤ n. So D(π) is a spanning subgraph of Dn . The directed
cycles of the graph D(π) are in 1-1 correspondence with the permutation
cycles of π. Also, the arc sets of the directed cycles of D(π) partition the set
of arcs of D(π).
The weight wt(D(π)) of the permutation digraph D(π) is defined to be
the product of the weights of its directed cycles. Hence if π has k permutation
cycles,
wt(D(π)) = (−1)k a1,π(1) a2,π(2) · · · an,π(n) . (4.59)

Then using Equations 4.56 and 4.57 we obtain
X
det(−A) = wt(D(π)), (4.60)
where D(π) ranges over all permutation digraphs of order n.
Fix X ⊆ [n] = {1, 2, . . . , n} and let σ ∈ SX . The permutation digraph
D(σ) has vertex set X and is a (not necessarily spanning) subgraph of Dn
with weight equal to the product of the weights of its cycles. (If X = ∅, the
corresponding weight is defined to be 1.) If B is the principal submatrix of
−A whose rows and columns are the (intersections of the) rows and columns
of −A indexed by X ⊆ [n], then det(B) = σ∈SX wt(D(σ)). If we put x = 1
P
in Eq. 4.56 we obtain

X
det(In − A) = wt(D(σ)). (4.61)
σ∈SX
X⊆[n]
Let y1 , . . . , yn be independent commuting variables over R, and put R∗ =

R[y1 , . . . , yn ]. Replace A in the preceding discussion with AY , where Y is
the diagonal matrix with diagonal entries y1 , . . . , yn . So (AY )ij = aij yj . So

if π ∈ Sn has k permutation cycles, D(π) has k directed cycles. And
wt(D(π)) = (−1)k a1,π(1) yπ(1) · · · an,π(n) yπ(n) . (4.62)

From a different point of view, let H be the set of all digraphs H of order n
for which each vertex has the same indegree and outdegree, and this common
value is either 0 or 1. Then H consists of a number of pairwise disjoint
directed cycles, and henced is a permutation digraph on a subset of [n]. The
weight wt(H) of a digraph H ∈ H is defined to be wt(H) = (−1)c(H) ×(the
product of the weights of its arcs), where c(H) is the number of directed
cycles of H and the weight of an arc (i, j) of H is wt(i, j) = aij yj . So if
H ∈ H satisfies H = D(π), π ∈ SX , X ⊆ [n], then wt(H) is given by
P
Eq. 4.62. Moreover, if wt(H) = H∈H wt(H), by Eq. 4.61 we have
X
wt(H) = wt(H) = det(In − AY ). (4.63)
H∈H
4.19.4 A Class of General Digraphs

We now consider the set D of general digraphs D on vertices in [n], for which
the arcs having i as initial vertex are linearly ordered, and such that for
each i, 1 ≤ i ≤ n, there is a nonnegative integer mi such that mi equals
both the indegree and the outdegree of the vertex i. Recall that a loop on i
contributes 1 to both the indegree and the outdegree of i. We still have the
n × n matrix A = (aij ) and the independent indeterminates y1 , . . . , yn . If D
is a general digraph, and if (i, j) is the tth arc with i as initial vertex, let atij yj
be the weight of the arc (i, j) (atij is the (i, j) entry of A with a superscript t
adjoined.) The weight wt(D) of D is the product of the weights of its arcs.
Each D is uniquely identified by wt(D).
Moreover, suppose that the variables y1 , · · · yn commute with all the en-
tries of A, but do not commute with each other. We show that each D is
identified uniquely by the word in y1 , . . . , yn associated with wt(D). As an
example, suppose that
wt(D) = a111 y1 a213 y3 a313 y3 a122 y2 a131 y1 a231 y1 a333 y3 (4.64)
= a111 a213 a313 a122 a131 a231 a333 y1 y3 y3 y2 y1 y1 y3 . (4.65)
Here in D vertex 1 has outdegree 3 and indegree 3, i.e., m1 = 3. Similarly,
m2 = 1 and m3 = 3. Notice that the word y1 y32 y2 y1 y1 y3 is sufficient to
recreate the digraph D along with the linear order on its arcs. To see this,
start with y1 y3 y3 y2 y1 y1 y3 and work from the left. For each j, 1 ≤ j ≤ 3, the
number of yj appearing in the word is the indegree mj of j. Since m1 = 3,
m2 = 1, and m3 = 3, the first 3 arcs have initial vertex 1, the 4th arc has
initial vertex 2, the last 3 arcs have initial vertex 3.
As another example, consider the word y2 y1 y2 y32 y12 , and let D be the
associated digraph. Here m1 = 3, m2 = 2, m3 = 2. It follows that
wt(D) = a112 a211 a312 a123 a223 a131 a231 y2 y1 y2 y32 y12 .
Two digraphs D1 and D2 in D are considered the same if and only if for
each i, 1 ≤ i ≤ n, and for each t, 1 ≤ t ≤ mi , the tth arc of D1 having initial
vertex i, and the tth arc of D2 having initial vertex i, both have the same
terminal vertex.
Consider the product
Yn
i=1
(ai1 y1 + · · · + ain yn )mi . (4.66)
Label the factors in each power, say,
(ai1 y1 + · · · ain yn )mi =

= (ai1 y1 + · · · ain yn )1 (ai1 y1 + · · · ain yn )2 · · · (ai1 y1 + · · · ain yn )mi ,
and then write atij in place of aij in the tth factor. Then the product appears
as
(ai1 y1 + · · · ain yn )(a2i1 y1 + · · · a2in yn ) · · · (am mi

i1 y1 + · · · ain yn )
i
(4.67)
Consider the product as i goes from 1 to n of the product in Eq. 4.67.

Each summand of the expanded product that involves a word in the y’s using
mj of the yj0 s, 1 ≤ j ≤ n, corresponds to (i.e., is the weight of) a unique
general digraph in which vertex i has both indegree and outdegree equal to
mi . If we remove the superscript t on the element atij and now assume that the
y’s commute, we see that if B(m1 , . . . , mn ) is the coefficient of y m1 y m2 · · · y mn
in the product as i goes from 1 to n of the product in Eq. 4.67, then
B(m1 , . . . , mn )y1m1 · · · ynmn .

X X
wt(D) = wt(D) = (4.68)
D∈D (m1 ,...,mn )≥(0,...,0)
To see this, let
D(m1 ,...,mn ) = {D ∈ D : mi = outdeg(i) = indegree(i) in D}.
Clearly,
wt(D) = B(m1 , . . . , mn )y1m1 · · · ynmn .

X
wt(D(m1 ,...,mn ) ) =
D∈D(m1 ,...,mn )
4.19.5 MacMahon’s Master Theorem for Permutations

Continue with the same use of notation for A and Y .
Theorem 4.19.6 Let A(m1 , . . . , mn ) be the coefficient of y1m1 y2m2 · · · ynmn in

the formal inverse det(In − AY )−1 of the polynomial det(In − AY ). Let
B(m1 , . . . , mn ) be the coefficient of y1m1 y2m2 · · · ynmn in the product
n
(ai1 y1 + ai2 y2 + · · · + ain yn )mi .
Y
i=1
Then A(m1 , . . . , mn ) = B(m1 , . . . , mn ).
Proof: Put G = D × H = {(D, H) : D ∈ D, H ∈ H}, and define the

weight of the pair (D, H) by wt(D, H) = wt(D) · wt(H). Then
X
wt(G) := wt(D, H) = wt(D) · wt(H).
(D,H)∈G
This implies (by Eqs. 4.63 and 4.68) that
 
B(m1 , . . . , mn )y1m1 · · · ynmn  · det(In − AY ). (4.69)

X
wt(G) = 
(m1 ,...,mn )≥(0,...,0)
If we can show that wt(G) = 1, we will have proved MacMahon’s Master

Theorem.
Let ∅ denote the digraph on vertices 1, . . . , n, with an empty set of arcs.
Then wt(∅, ∅) = 1. We want to define an involution on the set G \ (∅, ∅)
which is sign-reversing on weights.
Given a pair (D, H) ∈ (G \ (∅, ∅)), we determine the first vertex u whose
outdegree in either D or H is positive. Beginning at that vertex u we walk
along the arcs of D, always choosing the topmost arc( arc atij from i with t
the largest available), until one of the following occurs:
(i) We encounter a previously visited vertex (and have thus located a
directed cycle γ of D).
(ii) We encounter a vertex which has a positive outdegree in H (and thus
is a vertex on a directed cycle δ of H).
We note that if u is a vertex with positive outdegree in H then we are
immediately in case (ii). We also note that cases (i) and (ii) cannot occur
simultaneously. If case (i) occurs, we form a new element of G by removoing γ
form D and putting it in H. If case (ii) occurs, we remove δ from H and put
it in D in such a way that each arc of γ is put in front of (in the linear order)
those with the same initial vertex. Let (D0 , H 0 ) be the pair obtained in this
way. Then D0 is in D and H 0 is in H, and hence (D0 , H 0 ) is in G. Moreover,
since the number of directed cycles in H 0 differs from the number in H by
one, it follows that wt(D0 , H 0 ) = −wt(D, H). Define σ(D, H) = (D0 , H 0 ) and
note that σ(D0 , H 0 ) = (D, H). Thus σ is an involution on G \ (∅, ∅) which is
sign-reversing on weights. It follows that wt(G) = wt(∅, ∅) = 1. Hence the
proof is complete.
We give two examples to help the reader be sure that the above proof is
understood. Let D be the general digraph with arcs
D : a115 a123 a132 a235 a331 a153 a253 .

Let X = {2, 3, 4, 6} ⊆ [6]. Let π = (2, 4, 6)(3) ∈ SX , and let H = D(π) :
a24 a33 a46 a62 . Since the first vertex 1 has positive outdegree in D, we start
walking along arcs in D: first is a115 . As 5 does not have positive outdegree
in H, the next arc is a253 . As 3 has positive outdegree in H and belongs to
the directed cycle (which is a loop) δ = a33 . We put this loop into D as arc
a433 , and remove it from H to obtain H 0 = a24 a46 a62 . So σ(D, H) = (D0 , H 0 ).
We now check that σ(D0 , H 0 ) = (D, H).
So let D be the same as D0 above, and suppose X = {2, 4, 6} and π =
(2, 4, 6). So
(D, H) = (a115 a123 a132 a235 a331 a433 a153 a253 , a24 a46 a62 ).
We start our walk with a115 , moving to a253 , then to a433 . Since 3 is a
repeated vertex, the loop γ = 3 7→ 3 represented by a433 is removed from
D and adjoined to H as the loop a33 . We have now obtained the original
element of G.
When we specialize! to n = 2 we obtain the following:

a11 a12
If A = , then
a21 a22
det(I − AY )−1 = (1 − a11 y1 − a22 y2 + (a11 a22 − a12 a21 )y1 y2 )−1 =
! ! !
X m1 m2
ai am1 −i am 1 −i m2 −m1 +i
y1m1 y2m2 .
X
a22 (4.70)
(m1 ,m2 )≥(0,0) i i m1 − i 11 12 21
Note: If some aij = 0, then to get a nonzero contribution the power on

aij must be zero.
Computing det(I − AY )−1 directly, we get
∞
[a11 y1 + a22 y2 − (a11 a22 − a12 a21 )y1 y2 ]k .
X
k=0
Then computing the coefficient of y1m1 y2m2 in this sum (and writing ∆ in
place of a11 a22 − a12 a21 ) we get
∞
!
k
ak−m a22 ∆m1 +m2 −k (−1)m1 +m2 −k .
2 k−m1
X
11
k=0 k − m 2 , k − m 1 , m 1 + m 2 − k
(4.71)
This gives a variety of equalities. In particular, suppose each aij = 1.
Hence ∆ = 0 so k = m1 + m2 for a nonzero contribution. Then the Master
Theorem yields the familiar equality:
! ! ! !
X m1 m2 m1 + m2 m1 + m2
= = . (4.72)
i i m1 − i m 1 , m2 , 0 m1
P k

Exercise: 4.19.7 Prove that k k−m,k−n,m+n−k
(−1)m+n−k = 1.
(Hint: Compute the coefficient of am 1 m2

11 a22 in the two equations Eq. 4.70
and Eq. 4.71, which must be equal by the Master Theorem.)
4.19.8 Dixon’s Identity as an Application of the Mas-

ter Theorem
3
Problem: Evaluate the sum S = nk=0 (−1)k nk .
P
Since each summand is the product of three binomial coefficients with

upper index n, we are led to consider the expression:
!n n n
x y z
1− 1− 1−
y z x
! ! !
n n n
(−1)i+j+k xi−k y j−i z k−j .
X
=
0≤i,j,k≤n i j k
To force the lower indices in the binomial coefficients to be equal, we
apply the operator [x0 y 0 z 0 ]. From the above we see that
( !n n n )
0 0 0 x y z
S = [x y z ] 1− 1− 1−
y z x
!3
n
(−1)3i .
X
=
0≤i≤n i
We can see directly that this is equal to
= [xn y n z n ] {(y − x)n (z − y)n (x − z)n } ,

but the point of this exercise is to get it from the Master Theorem.
 
0 1 −1
Now let A =  −1 0 1  and

1 −1 0
 
x 0 0
Y =  0 y 0 . A simple calculation shows that
 
0 0 z
 
1 −y z
I − AY =  x , and det(I − AY )−1 =
1 −z 

−x y 1
!
i+j+k
= (1 + xy + yz + zx)−1 = (−1)i+j+k (xy)i (yz)j (zx)k . (4.73)
X
i,j,k≥0 i, j, k
MacMahon’s Master Theorem with m1 = m2 = m3 = n applied to I −AY

says that
n o
[xn y n z n ] det(I − AY )−1 = [xn y n z n ] {(y − z)n (z − x)n (x − y)n } , (4.74)
from which we obtain
S = [xn y n z n ] {(y − z)n (z − x)n (x − y)n }

!
n n n i+j+k i+j+k
(xy)i (yz)j (zx)k
X
= [x y z ] (−1)
i,j,k≥0 i, j, k
!
X
i+j+k i+j+k
= (−1)
i,j,k≥0 i, j, k
where the sum is over all (i, j, k) for which i + j = j + k = k + i = n. Hence
i = j = k = n/2, and i, j and k are integers. From this it follows that
(
(−1)m (3m)!(m!)−3 if n = 2m,
S=
0 otherwise.
 
0 1 1
Exercise: 4.19.9 Apply the Master Theorem to the matrix B =  1 0 1 

.
1 1 0
Show that
!3 !
X m m+n
· 2m−2n .
X
=
i i n m − 2n, n, n, n
Show that this is the number of permutations of the letters in the sequence
xm m m
1 x2 x3such that no letter is placed in a position originally occupied by itself.
Chapter 5
Möbius Inversion on Posets
This chapter deals with locally finite partially ordered sets (posets), their
incidence algebras, and Möbius inversion on these algebras.
5.1 Introduction
Recall first that we have proved the following (see Theorem 1.5.5):
n
(x)n = c(n, k)xk ,
X
(5.1)
k=0
" #
n
where c(n, k) = is the number of σ ∈ Sn with k cycles. Replacing x
k
with −x and observing that (−x)k = (−1)k (x)k , we obtained
n
s(n, k)xk ,
X
(x)k = (5.2)
k=0
where s(n, k) = (−1)n−k c(n, k) is a Stirling number of the first kind.

Q
Let n be the set of all partitions of the set [n], and S(n, k) the number
of partitions of [n] with exactly k parts. For each function f : [n] → [m], let
πf denote the partition of [n] determined by f . For σ ∈ n , let χσ (m) =
Q
|{f : [n] → [m] : σ = πf }| = |{f : [ν(σ)] → [m] : f is one-to-one}| = (m)ν(σ ),

where ν(σ) denotes the number of parts of σ. Given any f : [n] → [m],
there is a unique σ ∈ n for which f is one of the maps counted by χσ (m),
Q
203
204 CHAPTER 5. MÖBIUS INVERSION ON POSETS
i.e., σ = πf . And mn = |{f : [n] → [m]}|. So mn =

P Q χ (m) =
σ∈ σ
n
(m)ν(σ) = nk=0 S(n, k)(m)k for all n ≥ 0.
P Q P
σ∈ n
n
xn =
X
S(n, k)(x)k , n ≥ 0, (5.3)
k=0
where S(n, k) is a Stirling number of the second kind. If we use the same
trick of replacing x with −x again, we get
n
xn = (−1)n−k S(n, k)(x)k .
X
(5.4)
k=0
Here we can see that Eq. 5.1 and Eq. 5.4 are “inverses” of each other, and
Eq. 5.2 and Eq. 5.3 are “inverses” of each other. We proceed to make this a
little more formal.
Let Pn be the set of all polynomials of degree k, 0 ≤ k ≤ n, (along with
the zero polynomial), with coefficients in C. Then Pn is an (n+1)-dimensional
vector space.
B1 = {1, x, x2 , . . . , xn },
B2 = {(x)0 = 1, (x)1 , . . . , (x)n }
and
B3 = {(x)0 = 1, (x)2 , . . . , (x)n }
are three ordered bases of Pn . Recall that if B = {v1 , . . . , vm } and B 0 =
{w1 , . . . , wm } are two bases of the same vector space over C (or over any field
K), then there are unique scalars aij , 1 ≤ i, j ≤ m for which wj = m
P
i=1 aij vi
and unique scalars a0ij , 1 ≤ i, j ≤ m for which vj = m 0
P
a w
i=1 ij i . And the
0 0
matrices A = (aij ) and A = (aij ) are inverses of each other.
So put:
A = (aij ), 0 ≤ i, j ≤ n aij = c(j, i);
B = (bij ), 0 ≤ i, j ≤ n bij = s(j, i);
C = (cij ), 0 ≤ i, j ≤ n, cij = S(j, i);
D = (dij ), 0 ≤ i, j ≤ n, dij = (−1)j−i S(j, i).

5.2. POSETS 205
Then A and D are inverses of each other, and B and C are inverses. So
n
X n
X
S(j, k)s(k, i) = bik ckj = (BC)ij = δij , (5.5)
k=0 k=0
n n
j−k
X X
(−1) S(j, k)c(k, i) = aik dkj = (AD)ij = δij . (5.6)
k=0 k=0
We want to see Eq. 5.5 expressed in the context of “Möbius inversion over
a finite partially ordered set.” Also, when two matrices, such as A and D
above, are recognized as being inverses of each other, b̄ = āA iff ā = b̄D.
Consider a second example. Let A, B, C be three subsets of a universal
set E. Then |E \ (A ∪ B ∪ C)| = |E| − (|A| + |B| + |C|) + (|A ∩ B| + |A ∩ C| +
|B ∩ C|) − |A ∩ B ∩ C|. This is a very special case of the general principle
of inclusion - exclusion that we met much earlier and which we now want to
view as Möbius inversion over a certain finite partially ordered set.
As a third example, recall “Möbius inversion” as we studied it earlier:
f (n) = d|n g(d) for all n ∈ N iff g(n) = d|n µ(d)f (n/d) for all n ∈ N ,
P P
where µ is the classical Möbius function of elementary number theory. The

general goal is to introduce the abstract theory of Möbius inversion over
finite posets and look at special applications that yield the above results and
more as special examples of this general theory. As usual, we just scratch the
surface of this broad subject. An interesting observation, however, is that
although special examples have been appearing at least since the 1930’s, the
general theory has been developed primarily by G.-C. Rota and his students,
starting with Rota’s 1964 paper, On the foundations of combinatorial theory
I. Theory of Möbius functions, Z. Wahrsch. Verw. Gebiete 2(1964), 340 –
368.
5.2 POSETS
A partially ordered set P (i.e., a poset P ) is a set P together with a
relation “ ≤ ” on P for which (P, ≤) satisfies the following:
P O1. ≤ is reflexive (x ≤ x for all x ∈ P );
P O2. ≤ is transitive (x ≤ y and y ≤ z ⇒ x ≤ z ∀x, y, z ∈ P );
P O3. ≤ is antisymmetric (x ≤ y and y ≤ x ⇒ x = y ∀x, y ∈ P ).

A poset (P, ≤) is a chain (or is linearly ordered) provided
P O4. For all x, y ∈ P, either x ≤ y or y ≤ x.
Given a poset (P, ≤), an interval of P is a set of the form
[x, y] = {z ∈ P : x ≤ z ≤ y},
where x ≤ y. So [x, x] = {x}, but ∅ is NOT an interval. P is called locally
finite provided |[x, y]| < ∞ whenver x, y ∈ P , x ≤ y. An element of P is
called a zero (resp., one) of P and denoted 0̂ (resp., 1̂) provided 0̂ ≤ x for
all x ∈ P (resp., x ≤ 1̂ for all x ∈ P ). Finally, we write x < y provided x ≤ y
but x 6= y.
EXAMPLES OF LOCALLY FINITE POSETS
Example 5.2.1 P = {1, 2, . . . , } with the usual linear order. Here P is a

chain with 0̂ = 1. For each n ∈ P, let [n] = {1, 2, . . . , n} with the usual
linear order.
Example 5.2.2 For each n ∈ N , Bn consists of the subsets of [n] ordered

by inclusion (recall that [0] = ∅). So we usually write Bn = 2[n] , with S ≤ T
in Bn iff ∅ ⊆ S ⊆ T ⊆ [n].
Example 5.2.3 In general any collection of sets can be ordered by inclu-

sion to form a poset. For example, let Ln (q) consist of all subspaces of an
n-dimensional vector space Vn (q) over the field F = GF (q), ordered by in-
clusion.
Example 5.2.4 Put D = P with ≤ defined by: For i, j ∈ D, i ≤ j iff i|j.

For each n ∈ P, let Dn be the interval [1, n] = {d : 1 ≤ d ≤ n and d|n}. For
i, j ∈ Dn , i ≤ j iff i|j.
5.2. POSETS 207
Example 5.2.5 Let n ∈ P. The set Πn of all partitions of [n] is made into
a poset by defining π ≤ σ (for π, σ ∈ Πn ) iff each block of π is contained in
some block of σ. In that case we say π is a refinement of σ.
Example 5.2.6 A linear partition λ of [n] is a partition of [n] with a

linear order on each block of λ. The blocks themselves are unordered, and
ν(λ) denotes the number of blocks of λ. Ln is the set of linear partitions of
[n] with partial order “ ≤ ” defined by: η ≤ λ, for η, λ ∈ Ln , iff each block of
λ can be obtained by the juxtaposition of blocks of η.
Example 5.2.7 For n ∈ P, let Sn denote the set of permutations of the

elements of [n] with the following partial order: given σ, τ ∈ Sn , we say
σ ≤ τ iff each cycle of σ (written with smallest element first) is composed of
a string of consecutive integers from some cycle of τ (also written with the
smallest element first).
For example, (12)(3) ≤ (123), (1)(23) ≤ (123), but (13)(2) 6≤ (123). The
0̂ of Sn is 0̂ = (1)(2) · · · (n). As an example, for σ = (12435) ∈ S5 , we give
the Hasse diagram of the interval [0̂, σ]. Note, for example, that (12)(435) is
not in the interval since it would appear as (12)(354).
(12435)
h
H HH
H
HH
H
HH
H
HH
H (1)(2435)
h

(124)(35) h(1243)(5) HHh
PP
@ PP J
B
@ PP J
B
PP
@ PP J
B
PP
@ PP J
B
@ PP J
B
@ PP J
B
PP
@ J PP
B
PP
B
PP (1)(24)(35)
@ J PP
@ J
B
(12)(35)(4) @ J
PP
PP B
h

P @h(124)(3)(5) h(1)(243)(5)
J
PBh
B PPP B
"
PP "
B PP B
"
B "
B PP
"
PP "
B PP B
"
B PPB
"
"
B P
B PP
"
B B PP
"
P "
B B
P "
"P
P
B B
" PP
PP
B B
"" PP
Bh (12)(3)(4)(5) h
B
" PPh
HHH (1)(24)(3)(5)

HH
H (1)(2)(35)(4)
HH
H
HH
H
HH
H h
0̂ = (1)(2)(3)(4)(5)
Hasse diagram of interval [0̂, σ] in Sn , σ = (12435)
5.3 Vector Spaces and Algebras

Let K be any field (but K = C is the usual choice for us). Let P be any
(nonempty) set. The standard way to make K P = {f : P → K} into a
vector space over K is to define vector addition by: (f + g)(p) = f (p) + g(p)
for all p ∈ P and any f, g ∈ K P . And then scalar multiplication is defined
by (af )(p) = a · f (p), for all a ∈ K, f ∈ K P and p ∈ P . The usual axioms
for a vector space are then easily verified.
5.3. VECTOR SPACES AND ALGEBRAS 209
If V is any vector space over K, V is an algebra over K if there is

also a vector product which is bilinear over K. This means that for each
pair (x, y) of elements of V , there is a product vector xy ∈ V for which the
following bilinearity conditions hold:
(1) (x + y)z = xz + yz and x(y + z) = xy + xz, ∀x, y, z ∈ V ;
(2) a(xy) = (ax)y = x(ay) for all a ∈ K; x, y ∈ V.
In these notes we shall be interested only in finite dimensional (linear)

algebras, i.e., algebras in which the vector space is finite dimensional over
K. So suppose V has a basis e1 , . . . , en as a vector space over K. Then
ei ej is to be an element of V , so ei ej = nk=1 cijk ek for unique scalars cijk ∈
P
K. The n3 elements cijk are called the multiplication constants of the

algebra relative to the chosen basis. They give the value of each product
ei ej , 1 ≤ i, j ≤ n. Moreover, these products determine every product in V .
For suppose x = ni=1 ai ei and y = nj=1 bj ej are any two elements of V .
P P
Then xy = ( i ai ei )( j bj ej ) = i,j (ai ei )(bj ej ) = i,j ai (ei (bj ej )) = · · · =

P P P P
P
i,j ai bj (ei ej ), and hence xy is completely determined by all the products
ei ej . In fact, if we define ei ej (any way we please!) to be some vector of V ,
1 ≤ i, j ≤ n, and then define xy = ni,j=1 ai bj (ei ej ) for x and y as above,
P
then it is an easy exercise to show that conditions (1) and (2) hold so that
V with this product is an algebra.
An algebra V over K is said to be associative provided its multiplication
satisfies the associative law (xy)z = x(yz) for all x, y, z ∈ V .
Theorem 5.3.1 An algebra V over K with finite basis e1 , . . . , en as a vector

space over K is associative iff (ei ej )ek = ei (ej ek ) for 1 ≤ i, j, k ≤ n.
Proof: If V is associative, clearly (ei ej )ek = ei (ej ek ) for all i, j, k =

P P
1, . . . , n. Conversely, suppose this holds. Let x = ai ei , y = bj ej , z =
P P
ck ek are any three elements of V . Then (xy)z = ai bj ck (ei ej )ek and
P
x(yz) = ai bj ck ei (ej ek ). Hence (xy)z = x(yz) and V is associative.
The algebras we study here are finite dimensional linear associative alge-
bras.
5.4 The Incidence Algebra I(P, K)

Let (P, ≤) be a locally finite poset, and let Int(P ) denote the set of intervals
of P . Let K be any field. If f ∈ K Int(P ) , i.e., f : Int(P ) → K, write f (x, y)
for f ([x, y]).
Here is an example we will find of interest. P = Dn = {d : 1 ≤ d ≤
n and d|n}. An interval of P is a set of the form [i, j] = {k : i|k and k|j},
where 1 ≤ i ≤ j ≤ n and i|j, i|n, j|n. Define µn : Int(P ) → C by µn (i, j) =
µ( ji ), where µ is the classical Möbius function we studied earlier and [i, j] is
any interval of P .
Def.n The incidence algebra I(P, K) of P over K is the K-algebra of all
functions f : Int(P ) → K, with the usual structure of a vector space over
K, and where the algebra multiplication (called convolution) is defined by
X
(f ∗ g)(x, y) = f (x, z)g(z, y), (5.7)
z:x≤z≤y
for all intervals [x, y] of P and all f, g ∈ I(P, K). The sum in Eq. 5.7 is finite
(so f ∗ g really is defined), since P is locally finite.
Theorem 5.4.1 I(P, K) is an associative K-algebra with two-sided identity

denoted δ (or sometimes denoted 1) and defined on intervals [x, y] by
(
1, if x = y;
δ(x, y) =
0, 6 y.
if x =
Proof:
This is a straightforward and worthwhile (but tedious!) exercise. It is
probably easier to establish associativity by showing (f ∗ g) ∗ h = f ∗ (g ∗ h)
in general than it is to establish associativity for some specific basis and
then to use Theorem 5.3.1. And to establish that δ ∗ f = f ∗ δ = f for
all f ∈ I(P, K) is almost trivial. The bilinearity conditions are also easily
established.
Note: It is quite helpful to think of I(P, K) as the set of all formal
expressions
X
f= f (x, y)[x, y] (allowing infinite linear combinations).
[x,y]∈Int(P )
5.4. THE INCIDENCE ALGEBRA I(P, K) 211
Then convolution is defined by requiring that

(
[x, w], if y = z;
[x, y] ∗ [z, w] =
0, 6 z,
if y =
for all [x, y], [z, w] ∈ Int(P ), and then extending to all of I(P, K) by bilin-
earity. This shows that when Int(P ) is finite, one basis of I(P, K) may be
obtained by setting 1[x,y] equal to the function from Int(P ) to K defined by
(
1, if [z, w] = [x, y];
1[x,y] (z, w) =
0, if [z, w] 6= [x, y] (but [x, y], [z, w] ∈ Int(P )).
Then the set {1[x,y] : [x, y] ∈ Int(P )} is a basis for I(P, K) and the
multiplication constants are given by
(
1, y = z;
1[x,y] ∗ 1[z,w] = δy,z 1[x,w] , where δy,z = (5.8)
0, otherwise.
Exercise: 5.4.2 Show that if P is any finite poset, its elements can be labeled
as x1 , x2 , . . . , xn so that xi < xj in P implies that i < j.
Suppose that P is a finite poset, say P = {x1 , . . . , xn }. Let f = f (xi , xj )[xi , xj ] ∈

P
xi ≤xj
Int(P )
K . Then define the n × n matrix Mf by
(
f (xi , xj ), if xi ≤ xj ;
(Mf )i,j =
0, otherwise.
We claim that the map f 7→ Mf is an algebra isomorphism from I(P, K)
to the algebra of all n × n matrices over K with (i,j) entries equal to zero
if xi 6≤ xj . It is almost obvious that f 7→ Mf is an isomorphism of vector
spaces, but we have to work a little to see that multiplication is preserved.
So suppose f, g ∈ K Int(P ) . Using [x, y] ∗ [z, w] = δy,z [x, w] we have
   
X X
f ∗g = f (xi , xj )[xi , xj ] ∗  f (xk , xl )[xk , xl ]
xi ≤xj xk ≤xl
X
= f (xi , xj )g(xk , xl )[xi , xj ] ∗ [xk , xl ]
xi ≤xj
xk ≤xl
 
X X
=  f (xi , xj )g(xj , xl ) [xi , xl ].
xi ≤xl xj :xi ≤xj ≤xl
So
X
(f ∗ g)([xi , xl ]) = f (xi , xj )g(xj , xl ).
xj :xi ≤xj ≤xl
Now with matrices Mf and Mg defined as above:

n
X X
(Mf · Mg )i,l = (Mf )i,j (Mg )j,l = (Mf )i,j (Mg )j,l
j=1 xj :xi ≤xj ≤xl
( P
xj :xi ≤xj ≤xl f (xi , xj )g(xj , xl ), if xi ≤ xl ;
=
0, otherwise.
Note that if P is ordered so that xi < xj implies i < j, then the matrices are
exactly all upper triangular matrices M = (mij ) over K, 1 ≤ i, j ≤ n, with
mij = 0 if xi 6≤ xj .
For example, if P has the (Hasse) diagram
x5
h
@
@
x3 h @hx4
@
@
@
@
x1 h @hx2
then I(P, K) is isomorphic to the algebra of all matrices of the form:

 
∗ 0 ∗ 0 ∗
0 ∗ ∗ ∗ ∗
 
 
 

 0 0 ∗ 0 ∗ .


 0 0 0 ∗ ∗ 

0 0 0 0 ∗
5.4. THE INCIDENCE ALGEBRA I(P, K) 213
Theorem 5.4.3 Let f ∈ I(P, K). Then the following are equivalent:
(i) f has a left inverse;
(ii) f has a right inverse;
(iii) f has a two-sided inverse f −1 (which is necessarily the unique

left and right inverse of f );
(iv) f (x, x) 6= 0 ∀x ∈ P.
Moreover, if f −1 exists, then f −1 (x, y) depends only on the

poset [x, y].
Proof: f ∗g = δ iff f (x, x)g(x, x) = 1 for all x ∈ P and 0 =

P
z:x≤z≤y f (x, z)g(z, y)
whenever x < y, x, y ∈ P . The last equation is equivalent to
g(x, y) = −f (x, x)−1

X
f (x, z)(g(z, y))
z:x<z≤y
whenever x < y, x, y ∈ P , and also to
f (x, y) = −g(y, y)−1

X
f (x, z)g(z, y).
z:x≤z<y
It follows that if f has a right inverse g then f (x, x) 6= 0 for all x ∈

P , and in this case g(x, y) = f −1 (x, y) depends only on [x, y]. For the
converse, suppose this condition holds. (Note: If xi < xj implies i < j, so
that Mf is upper triangular, then Mf is invertible iff it has no zero on the
main diagonal, which is iff f (x, x) 6= 0 for all x ∈ P . And of course f is
invertible iff Mf is invertible. But we give the general proof.) First define
g(y, y) = f (y, y)−1 for all y ∈ P . Then if [x, y] = {x, y} with x 6= y, put
g(x, y) = −(f (x, x))−1 f (x, y)g(y, y). If the maximum length of any chain in
[x, y] is 3, say [x, y] = {x, z1 , . . . , zk , y} with x < zi < y and zi 6≤ zj for all
i 6= j, put
(" k # )
−1
X
g(x, y) = −(f (x, x)) f (x, zi )g(zi , y) + f (x, y)g(y, y) .
i=1
Now suppose that the maximum length of any chain in [x, y] is 4, say
x < w < z < y is a maximal chain in [x, y]. For each u ∈ [x, y] \ {x}, either
[u, y] = {y}, or [u, y] = {u, y}, or [u, y] = {u, w, y} for some w ∈ [x, y]. In
any case, g(u, y) is already defined for all u ∈ [x, y] \ {x}. So we may define
g(x, y) = − (f (x, x))−1

X
f (x, u)g(u, y).
u∈[x,y]\{x}
Proceed “downward” by induction on the maximum length of chain contained

in [x, y]. Since P is finite, this process will terminate in a finite number of
steps. Clearly if f −1 exists, then f (x, y)−1 (x, y) depends only on the poset
[x, y].
Similarly, g has a left inverse f iff g(y, y) 6= 0 for all y ∈ P , and in this case
f (x, y) = g −1 (x, y) depends only on [x, y]. But here we define f (x, y) by an
upward induction. Applying this argument to f (instead of g), we see f has a
left inverse h iff f (x, x) 6= 0 for all x ∈ P iff f has a right inverse g. But since
∗ is associative, from f ∗g = δ = h∗f , we have g = (h∗f )∗g = h∗(f ∗g) = h.
The zeta function ζ of P is defined by
ζ(x, y) = 1 for all [x, y] ∈ Int(P ). (5.9)
The zeta function is of interest in its own right (we include an optional
page dealing with ζ), but for us the main interest in ζ is that by Theorem 5.4.3
it has an inverse µ called the Möbius function of the poset P .
One can define µ inductively without reference to the incidence algebra
I(P, K). Namely, µ ∗ ζ = δ is equivalent to
µ(x, x) = 1 for all x ∈ P,
and
X
µ(x, y) = − µ(x, z) whenever x < y. (5.10)
z:x≤z<y
Similarly, ζ ∗ µ = δ is equivalent to
5.5. OPTIONAL SECTION ON ζ 215
µ(x, x) = 1 for all x ∈ P,
and
X
µ(x, y) = − µ(z, y) whenever x < y. (5.11)
z:x<z≤y
5.5 Optional Section on ζ

Start with ζ(x, y) = 1 for all [x, y] ∈ Int(P ). Then ζ 2 (x, y) =
z:x≤z≤y 1 = |[x, y]|, if x ≤ y. More generally, for k ∈ P,
P
ζ k (x, y) =
X
1
(x0 ,...,xk ):x=x0 ≤x1 ≤···≤xk =y
is the number of multichains of length k from x to y.

(
1, x < y;
Theorem 5.5.1 (ζ − δ)(x, y) =
0, x = y.
Proof: Clear.
Hence for k ∈ P, (ζ − δ)k (x, y) is the number of chains x = x0 < x1 <
· · · < xk = y of length k from x to y.
(
1, if x = y;
Theorem 5.5.2 (2δ −ζ)(x, y) = So (2δ −ζ)−1 exists and
−1, if x < y.
(2δ − ζ)−1 (x, y) is equal to the total number of chains x = x0 < x1 < · · · <
xk = y from x to y.
Proof: Let l be the length of the longest chain in the interval [x, y].
Then (ζ − δ)l+1 (u, v) = 0 whenever x ≤ u ≤ v ≤ y. Thus for x ≤ u ≤
v ≤ y, (2δ − ζ)[1 + (ζ − 1) + (ζ − 1)2 + · · · + (ζ − 1)l ](u, v) = (1 − (ζ −
1))[1 + (ζ − 1) + (ζ − 1)2 + · · · + (ζ − 1)l ](u, v) = (1 − (ζ − 1)l+1 )(u, v) =
δ(u, v). Hence (2δ − ζ)−1 = 1 + (ζ − 1) + · · · + (ζ − 1)l , when restricted to the
interval [x, y]. But by the definition of l and theorem 5.5.1 it is clear that
(1 + (ζ − 1) + · · · + (ζ − 1)l )(x, y) is the total number of chains from x to y.
Theorem 5.5.3 Define η : Int(P ) → K by

(
1, if y covers x;
η(x, y) =
0, otherwise.
Then (1 − η)−1 (x, y) is equal to the total number of maximal chains in [x, y].
Proof: η k (x, y) = 1, where the sum is over all (z0 , . . . , zk ) where x =

P
z0 , y = zk , and zi+1 covers zi , i = 0, . . . , k − 1. So ∞ k

P
k=0 η (x, y) is the total
number of maximal chains in [x, y]. If l is the length of the longest maximal
chain in [x, y], then (1 − η)−1 = ∞
P k Pl k
k=0 η , which equals k=0 η on [x, y].
5.6 The Action of I(P, K) and Möbius Inver-

sion
The Möbius function plays a central role in Möbius inversion, as does the
incidence algebra I(P, K). But before we can make this precise we need to
see how I(P, K) acts on the vector space K P = {f : P → K}. Clearly K P
is a vector space over K in the usual way.
For each ξ ∈ I(P, K), ξ acts in two ways as a linear transformation on
K P . On the right ξ acts by
f (y)ξ(y, x) for all x ∈ P and f ∈ K P .

X
(f · ξ)(x) = (5.12)
y≤x
In particular (f · δ)(x) = f (y)δ(y, x) = f (x), implying that f · δ = f .

P
y≤x
On the left ξ acts by
ξ(x, y)f (y), for all x ∈ P, f ∈ K P .

X
(ξ · f )(x) = (5.13)
y≥x
Similarly, δ · f = f .
For these to be actions in the usual sense, it must be true that the fol-
lowing hold:
(f · ξ1 ) · ξ2 = f · (ξ1 ∗ ξ2 ), for all f ∈ K P ; ξ1 , ξ2 ∈ I(P, K), (5.14)

5.6. THE ACTION OF I(P, K) AND MÖBIUS INVERSION 217
and
ξ1 · (ξ2 · f ) = (ξ1 ∗ ξ2 ) · f for all f ∈ K P ; ξ1 , ξ2 ∈ I(P, K). (5.15)
We verify Eq. 5.14 and leave Eq. 5.15 as a similar exercise. So for each
x ∈ P,
X X
((f · ξ1 ) · ξ2 )(x) = (f · ξ1 )(y)ξ2 (y, x) = f (w)ξ1 (w, y)ξ2 (y, x) =
y≤x w,y:w≤y≤x
X X X
= f (w)( ξ1 (w, y)ξ2 (y, x)) = f (w)(ξ1 ∗ ξ2 )(w, x) =
w:w≤x y:w≤y≤x w:w≤x
= (f · (ξ1 ∗ ξ2 ))(x).
Theorem 5.6.1 Möbius Inversion Formula Let P be a poset in which

every principal order ideal Λx = {y ∈ P : y ≤ x} is finite. Let f, g : P → K.
Then g(x) = y≤x f (y) for all x ∈ P iff f (x) = y≤x g(y)µ(y, x) for all
P P
x ∈ P . Dually, if each principal dual order ideal Vx = {y ∈ P : y ≥ x}

is finite and f, g : P → K, then g(x) = y≥x f (y), for all x ∈ P , iff
P
f (x) = y≥x µ(x, y)g(y) for all x ∈ P .

P
Proof: The first version of Möbius inversion is just that f · ζ = g iff

f = g·µ. The second is that ζ ·f = g iff f = µ·g. These follow easily from the
above. For example, if f ·ζ = g we have g·µ = (f ·ζ)·µ = f ·(ζ ∗µ) = f ·δ = f .
Example 5.6.2 Consider the chain P = N with the usual linear ordering,
so 0̂ = 0. Here µ(x, x) = 1 for all x ∈ P , and for x < y, µ(x, y) =
− z:x≤z<y µ(x, z). So if y covers x, µ(x, y) = −1. If y covers z and z
P
covers x, then µ(x, y) = −(µ(x, x) + µ(x, z)) = −(1 − 1) = 0. If y covers

z, z covers w, w covers x, then µ(x, y) = −(µ(x, x) + µ(x, w) + µ(x, z)) =
−(1 + (−1) + 0) = 0. By induction,

 1,
 if i = j,
µ(i, j) =  −1 if j = i + 1,

0, otherwise.
Then Möbius inversion takes the following form: For f, g : P → K, g(n) =

Pn
≥ 0 iff f (n) = ni=0 g(i)µ(i, n) = g(n) − g(n − 1). So
P
i=0 f (i) for all n
(( ·f )(n) = ni=0 f (i)) and the difference operator 4
P P P
the sum operator
((4 · f )(n) = f (n) − f (n − 1)) are inverses of each other. This may be viewed
as a finite difference analogue of the fundamental theorem of calculus.
5.7 Evaluating µ: the Product Theorem

One obstacle to applying Möbius inversion is that even when the Poset (P, ≤)
is fairly well understood, evaluating the Möbius function µ can be quite
difficult. In this section we prove the Product Theorem and apply it to
three well known posets. One corollary is the famous Principle of Inclusion-
Exclusion.
Let P and Q be posets. Then the direct (or Cartesian) product of P
and Q is the poset P × Q = {(x, y) : x ∈ P and y ∈ Q}, with (x, y) ≤ (x0 , y 0 )
in P × Q iff x ≤ x0 in P and y ≤ y 0 in Q. The direct product P × P × · · · × P
with n factors is then defined in the natural way and is denoted by P n . There
are three examples of interest at this point.
Example 5.7.1 For integers m, n ∈ P, let Bm , Bn be the posets of all sub-

sets of [m], [n], respectively, ordered by inclusion. Then Bm × Bn ' Bm+n .
If we identify Bm with [2][m] = {f : [m] → [2]} in the usual way, then we
have [2][m] × [2][n] ' [2]m+n . On the other hand, if 2 = {1, 2} with partial
order defined by 1 ≤ 1 ≤ 2 ≤ 2 (i.e., 2 is the set [2] with the usual linear
order), then B1 ' 2, so Bn ' B1 × · · · × B1 ' 2 × · · · × 2 ' 2n . Hence
2m × 2n ' 2m+n .
Example 5.7.2 Recall that for a positive integer k, k denotes [k] with the
usual linear order. Then let n1 , . . . , nk ∈ N and put P = 1 + n1 ×· · ·×1 + nk .
We may identify the elements of P with the set of k-tuples (a1 , . . . , ak ) ∈ N k
with 0 ≤ ai ≤ ni ordered componentwise, i.e., (a1 , . . . , ak ) ≤ (b1 , . . . , bk )
iff ai ≤ bi for i = 1, 2, . . . , k. Then if this relation holds, the interval
[(a1 , . . . , ak ), (b1 , . . . , bk )] is isomorphic to b1 − a1 + 1 × b2 − a2 + 1 × · · · ×
bk − ak + 1.
5.7. EVALUATING µ: THE PRODUCT THEOREM 219
A little thought reveals that Example 5.7.2 is a straightforward general-

ization of Example 5.7.1.
Q
Example 5.7.3 Recall that n is the poset of partitions of [n], where two
partitions σ, π ∈ n satisfy σ ≤ π iff each block of σ is contained in a single
Q
block of π. Now suppose that π = {A1 , . . . , Ak } and that Ai is partitioned

into λi blocks in σ. Then in n , the intereval [σ, π] is isomorphic to λ1 ×
Q Q
λ2 × · · · ×
Q Q
λk .
Note that 2 ' 2 and ( 2 )k ' (2)k . Hence if σ and π are partitions of
Q Q
[n] for which π = {A1 , . . . , Ak } has k parts, each of which breaks into two
Q
parts in σ, then the interval [σ, π] in n is isomorphic to Bk .
Theorem 5.7.4 (The Product Theorem) Let P and Q be locally finite

posets with Möbius functions µP and µQ , respectively, and let P × Q be their
direct product with Möbius function µP ×Q . Then whenever (x, y) ≤ (x0 , y 0 )
in P × Q, µP ×Q ((x, y), (x0 , y 0 )) = µP (x, x0 ) · µQ (y, y 0 ).
Proof: X
µP (x, u)µQ (y, v) =
(u,v):(x,y)≤(u,v)≤(x0 ,y 0 )
  
X X
= µP (x, u)  µQ (y, v) =
u:x≤u≤x0 v:y≤v≤y 0
= δx,x0 · δy,y0 = δ(x,y),(x0 ,y0 ) .
Also
X
µP ×Q ((x, y), (u, v)) = δ(x,y),(x0 ,y0 ) .
(u,v):(x,y)≤(u,v)≤(x0 ,y 0 )
P
But since z:x≤z≤y µ(x, z) = δx,y in some poset with Möbius function µ
determines µ uniquely, it must be that µP ×Q = µP · µQ .
Using the product theorem we can say a great deal about the Möbius
functions of the three examples above.
Theorem 5.7.5 If µ is the Möbius function of Bn , then µ(S, T ) = (−1)|T −S|

if S ≤ T , for all S, T ∈ Bn .
Proof: A subset S (of [n], say) corresponds to an n-tuple (s1 , . . . , sn ) with

each si equal to 0 or 1. Similarly, T ↔ (t1 , . . . , tn ). Then S ≤ T iff si ≤ ti
for all i = 1, . . . , n. Then (with a natural abuse of notation) µi (si , ti ) = 1
or −1 according as si = ti or si 6= ti . And µ(S, T ) = ni=1 µi (si , ti ) =
Q
Qn ti −si
i=1 (−1) = (−1)|T −S| .
It follows that the two versions of Möbius inversion for Bn become:
Let f, g : Bn → K. Then
Theorem 5.7.6 P
(i) g(S) = T :T ⊆S f (T ) ∀S ∈ Bn
|S−T |
∀S ∈ Bn ;
P
iff f (S) = T :T ⊆S (−1) g(T )
and
∀S ∈ Bn
P
(ii) g(S) = T :T ⊇S f (T )
|T −S|
∀S ∈ Bn .
P
iff f (S) = T :T ⊇S (−1) g(T )
Either of these two statements is called the Principle of Inclusion -

Exclusion.
The following is a standard combinatorial situation involving inclusion
– exclusion. We are given a set E of objects and a set P = {P1 , . . . , Pn }
of properties. Each object in E either does or does not have each of the
properties. (We may actually think of Pi as just being a subset of E but
without the assumption that Pi 6= Pj when i 6= j.) For each collection
T = {Pi1 , . . . , Pik } of the properties (i.e. for T ⊆ P ), let f (T ) = |{x ∈ E :
x ∈ Pi iff Pi ∈ T }|. And let g(T ) = |{x ∈ E : x ∈ Pi whenever Pi ∈ T }|.
P
Then g(T ) = S:S≥T f (S). (This just says that an object x of E has all the
properties in T iff it has precisely the properties in S for some S ⊇ T .) So
by the second version of Möbius inversion, we have
(−1)|S−T | g(S).
X
f (T ) = (5.16)
S:S⊇T
In particular, the number of objects x of E having none of the properties

in P is given by
(−1)|Y | g(Y ) (By convention g(∅) = |E|.)

X
f (∅) = (5.17)
Y ⊆P
We look at three classical applications.

Application 1. Let A1 , . . . , An be subsets of some (finite) set E. Then
by Eq. 5.17 the number of objects of E in none of the Ai is
n
X X
|E \ (A1 ∪ · · · ∪ An )| = |E| − |Ai | + |Ai ∩ Aj |+
i=1 1≤i<j≤n
|Ai ∩ Aj ∩ Ak | + · · · + (−1)n |A1 ∩ A2 ∩ · · · An |.

X
− (5.18)
1≤i<j<k≤n
Application 2 (Derangements) Let E = Sn , the set of all permutations

of the elements of [n]. Let Ai = {π ∈ Sn : π(i) = i}, i = 1, . . . , n. If T is a set
of j of the Ai ’s, then g(T ) = |{π ∈ Sn : π(i) = i for all Ai ∈ T }| = (n − j)!
It follows that the number D(n) of derangements in Sn (i.e., π ∈ Sn with
π(i) 6= i for all i, so π ∈ Sn \ (A1 ∪ · · · ∪ An )) is given by
n
!
X
|T |
X
i n
D(n) = (−1) g(T ) = (−1) (n − i)! =
T :T ⊆{A1 ,...,An } i=0
i
n
X (−1)i
= (n!) . (5.19)
i=0 i!
This is a special case of a general situation: suppose f (T ) = f (T 0 ) when-

ever |T | = |T 0 | (As above, f (T ) is the number of objects of E having a prop-
erty Pi if and only if Pi ∈ T .) Then also g(T ) = S:S⊇T f (S) depends only on
P
|T |. So for each i, 0 ≤ i ≤ n, if |T | = i, let a(n−i) = f (T ) and b(n−i)

! = g(T ).
n−i
Then g(T ) = S:S⊇T f (S) becomes b(n − i) = nj=i a(n − j), or
P P
j−i
writing m = n − i, k = j − i, we have
m m
! !
X m X m
b(m) = a(m − k) = a(i). (5.20)
k=0
k i=0
i
P |S−T |
And f (T ) = S:S⊇T (−1) !
f (S) becomes
Pn j−i n−i
a(n − i) = j=i (−1) b(n − j), which we rewrite as
j−i
m m
! !
X
k m X
m−k m
a(m) = (−1) b(m − k) = (−1) b(k). (5.21)
k=0
k k=0
k
Hence the Möbius inversion formula says:

m
!
X m
b(m) = a(i) for 0 ≤ m ≤ n
i=0
i
m
!
X
m−i m
iff a(m) = (−1) b(i), 0 ≤ m ≤ n. (5.22)
i=0
i
!
j
Exercise: 5.7.7 If A is the matrix whose (i, j) entry is , 0 ≤ i, j ≤ n,
i

then A−1 is the matrix whose (i, j) entry is (−1)j−i ji . (Hint: Try putting
b(m) = (x + 1)m and a(m) = xm for 0 ≤ m ≤ n.)
We give an explicit example of the above to illustrate the simple nature

of the statememt of the result when viewed in matrix form.
 −1  
1 1 1 1 1 1 −1 1 −1 1
0 1 2 3 4 0 1 −2 3 −4 
   
  
   

 0 0 1 3 6 
 =
 0 0 1 −3 6 
.

 0 0 0 1 4 


 0 0 0 1 −4 

0 0 0 0 1 0 0 0 0 1
Application 3 (Euler’s phi-function φ again). Let n ∈ P and suppose
p1 , . . . , pk are the distinct prime divisors of n. E = [n]. Ai = {x ∈ E : pi |x},
n
i = 1, . . . , k. First note that |Ai1 ∩ Ai2 ∩ · · · ∩ Aik | = pi1 ···pik
. Then the
principle of inclusion-exclusion gives
k
!
X n n n 1
− · · · + (−1)K
X Y
φ(n) = n − + =n 1− .
1≤i≤k pi 1≤i<j≤k pi pj p1 · · · pk i=1 pi
It is easy to show that this agrees with our formula developed quite some
time ago.
Note:
X 1 1 1 X µ(d)
− · · · + (−1)k
X
1− + = ,
pi pi pj p1 · · · pk d|n
d
where µ is the classical Möbius function. So φ(n) = d|n µ(d) nd . Now using
P
P
classical Möbius inversion (in reverse), n = d|n φ(d), a familiar equality.
We now propose to illustrate the connection between classical Möbius
inversion and our new version over posets. Recall Example 5.7.2 from the
beginning of this section: P = (n1 + 1)×· · ·×(nk + 1), as well as the Möbius
function for chains as worked out in Section 5.6. By the product theorem, if
[(a1 , . . . , ak ), (b1 , . . . , bk )] is an interval in P ,
( P
(bi −ai )
(−1) , if each bi − ai = 0 or 1;
µ((a1 , . . . , ak ), (b1 , . . . , bk )) =
0, otherwise.
(5.23)
Now suppose n is a positive integer of the form n = pn1 1 · · · pnk k , where
p1 , . . . , pk are distinct primes. Let Dn be the poset of positive integral divisors
of n, ordered by division (i.e., i ≤ j in Dn iff i|j.) But we identify P above
with Dn according to the following scheme: (a1 , . . . , ak ) ∈ P corresponds
to pa11 · · · pakk in Dn . (Here it is convenient to let the elements of nk + 1
be {0, 1, . . . , nk }.) Then Eq. 5.23, when interpreted for Dn , becomes: For
r, s ∈ Dn ,
(
(−1)t , if s/r is a product of t distinct primes,
µ(r, s) = (5.24)
0, otherwise.
In other words, µ(r, s) is just the classical Möbius function µ(s/r). Then
our new Möbius inversion formula in Dn looks like:
X X
g(m) = f (d), ∀m|n, iff f (m) = g(d)µ(d, m), ∀m|n.
d|m d|m
Writing µ( md ) in place of µ(d, m) gives:

m
X X
g(m) = f (d) ∀m|n iff f (m) = µ g(d) ∀m|n.
d|m d|m
d
As n is arbitrary, this is just the classical Möbius inversion formula.

At this point we have seen that the classical Principle of Inclusion-Exclusion
is just Möbius inversion over Bn and the classical Möbius inversion formula
is just Möbius inversion over Dn .
Exercise: 5.7.8 The Möbius Function of the Poset n , n ≥ 1. Recall that

Q
to make n into a poset, for σ, π ∈ n , we defined σ ≤ π iff each part of σ

Q Q
is contained in some part of π. For σ ∈ n , define ν(σ) to be the number of

Q
parts of σ. Example: If σ = {{1, 3}, {5, 7, 8}, {2, 4, 6}}, then ν(σ) = 3. The
Q
goal of this exercise is to compute the Möbius function of n . The underlying
field of coefficients is denoted by K. The exercise is broken into ten small
steps.
Q
Step 1. n has a 0̂ and a 1̂, with ν(σ) = n iff σ = 0̂ and ν(σ) = 1 iff
σ = 1̂.
Step 2. Let σ ≤ π = {B1 , . . . , Bk } ∈ n . Suppose that Bi is partitioned
Q
Q
into λi blocks in σ. The interval [σ, π] in n is isomorphic to the direct prod-
uct λ1 × λ2 × · · · × λk . Illustrate this with π = {{1, 2, 3, 4, 5}, {6, 7, 8, 9}},
Q Q Q
σ = {{1, 2}, {3, 4, 5}, {6}, {7}, {8, 9}}. As a special case, for σ ∈ n , ,
Q
[σ, 1̂] ' ν(σ) .

Q
Step 3. For each positive integer n, let µn = µ(0̂, 1̂), where µ is the
Möbius function of n . Then using the notation of Step 2, i.e., [σ, π] '
Q
λ1 × λ2 × · · · × λk , we have µ(σ, π) = µλ1 µλ2 · · · µλk .

Q Q Q
( )
Pn n
Step 4. Recall (from where ?) that xn = k=0 xk . Then for each
k
positive integer m,
mn = mν(σ) .
X
Q
σ∈ n
Define f : n → K : π 7→ mν(π) , and g : → K : π 7→ mν(π) . Then

Q Q
P n
g(0̂) = σ≥0̂ f (σ).
Step 5. For each σ ∈
Q
n, (briefly justify each step)
g(σ) = mν(σ)
mν(π)
P
= π∈
Q
ν(σ)
mν(π)
P
= π∈
Q
:π≥σ
n
P
= π∈
Q
:π≥σ f (σ).
n
For each σ ∈ n , the poset Pσ = {π ∈ n : π ≥ σ} = [σ, 1̂] is isomorphic

Q Q
Q Q P
to ν(σ) . So σ in n is 0̂ in Pσ . And g(0̂) = σ≥0̂ f (σ) stated for Pσ says that
for all σ ∈ n , g(σ) = π≥σ f (π). To this we may apply Möbius inversion
Q P
to obtain
X
f (σ) = µ(σ, π)g(π).
π≥σ
Step 6. For each σ ∈

Q
n,
µ(σ, π)mν(σ) .
X
f (σ) =
Q
π∈ n
:π≥σ
In this put σ = 0̂ to obtain
 
µ(0̂, π)mν(π) µ(0̂, π) mk .

X X X
(m)n = f (0̂) = =


Q Q
π∈ k π∈ :ν(π)=k
n n
As this holds for infinitely many m, we have a polynomial identity:

 
n
µ(0̂, π) xk .
X X
(x)n =
 

Q
k=0 π∈ :ν(π)=k
n
Recall (from where?) that (x)n = nk=0 s(n, k)xk , where s(n, k) = (−1)n−k c(n, k)
P
is a Stirling number of the first kind. So comparing the coefficients on xk we

P
find once again that s(n, k) = π∈Q :ν(π)=k µ(0̂, π) = wn−k (which is called
Q n
the (n − k)th Whitney number of n of the first kind).
Step 7. For each positive integer m,
 
n
mn = µ(0̂, π) mk .
X  X 

Q
k=1 π∈ :ν(π)=k
n
Step 8. As polynomials in x we have

 
n
xn = µ(0̂, π) xk ,
X  X 

Q
k=1 π∈ :ν(π)=k
n
and
" #
n
(−1)n−k , 1 ≤ k ≤ n.
X
µ(0̂, π) =
Q k
π∈ n
:ν(π)=k
Step 9. µ(0̂, 1̂) = (−1)n−1 [(n − 1)!].

Step 10. If π = {B1 , . . . , Bk } and Bi breaks into λi parts in σ, then
Yn
µ(σ, π) = i=1
(−1)λi −1 [(λi − 1)!].
5.8 More Applications of Möbius Inversion

Consider the following three familiar sequences of polynomials.
(1) The power sequence: xn , n = 0, 1, . . ..
5.8. MORE APPLICATIONS OF MÖBIUS INVERSION 227
(2) The falling factorial sequence: (x)n = x(x − 1) · · · (x − n + 1), n =

0, 1, . . ..
(3) The rising factorial sequence: (x)n = x(x + 1) · · · (x + n − 1), n =
0, 1, . . ..
For n = 0 we have the following conventions: x0 = (x)0 = (x)0 = 1.
Theorem 5.8.1 For m, n ∈ P we have the following:

(i) mn = |{f : [n] → [m]}| = |[m][n] |.
(ii) (m)n = |{f : [n] → [m] : f is one-to-one}|.
(iii) (m)n = |{f : [n] → [m] : f is a disposition, i.e., for each

d ∈ [m], f −1 (d) is assigned a linear order}|.
Proof: The first two identities need no further explanation, but the third
one probably does. A disposition may be visualized as a placing of n distin-
guishable flags on m distinguishable flagpoles. The poles are not ordred, but
the flags on each pole are ordered. For the first flag there are m choices of
flagpole. For the second flag there are m − 1 choices of pole other than the
one flag 1 is on. On that pole there are two choices, giving a total of m + 1
choices for flag 2. Similarly, it is easy to see that there is one more choice for
flag k + 1 than there was for flag k. Hence the number of ways to assign all
n flags is m(m + 1) · · · (m + n − 1) = (m)n .
Theorem 5.8.2 For each n ∈ N we have the following:

Pn
(i) xn = k=0 S(n, k)(x)k . This is Theorem 1.7.2.
Pn
(i)0 (x)n = k=0 s(n, k)xk . This is Corollary 1.5.6.

Pn n! n−1
(ii) (x)n = k=1 k! k−1 (x)k .

Pn n−k n! n−1
(ii)0 (x)n = k=1 (−1) k! k−1
(x)k .
(x)n = c(n, k)xk . This is Theorem 1.5.5.

P
(iii) k
(iii)0 xn = n−k
S(n, k)(x)k .
P
k (−1) This is Corollary 1.7.3.
Proof: The only two parts that need proving are (ii) and (ii)0 , and we
now establish (ii).
A linear partition λ is a partition of [n] together with a total order on
the numbers in each part of λ. The parts themselves are unordered. Let Ln
denote the collection of all linear partitions of [n] , and let ν(λ) denote the
number of blocks of λ. Each disposition from [n] to [m] may be thought of
as a pair consisting of a linear partition λ of [n] into k blocks and a one-to-
one function g mapping [k] to [m]. Since (m)n counts the total number of
dispositions of [n] and (m)k counts the one-to-one functions from [k] to [m],
we have
(m)n =
X
(m)ν(λ) . (5.25)
λ∈Ln
To obtain
the number of linear partitions of [n] into k blocks, note that
n−1
there are n! k−1 linear partitions of [n] with k ordered blocks. Visualize this
as a placing of k − 1 slashes into the n − 1 interior spaces of a permutation
(ordered array) of [n], at most one slash per space. Then divide by k! to get
unordered blocks:
!
n! n − 1
= #{ linear partitions of [n] with k blocks.} (5.26)
k! k − 1

n−1
These numbers n!
k! k−1
are called Lah numbers. Then (ii) follows im-
mediately from Eqs. 5.25 and 5.26. Now replacing x with −x interchanges
(ii) and (ii)0 .
We now do Möbius inversion on each of three carefully chosen posets to
explore the relationship between (a) and (a)0 , for a = i, ii, and iii.
Let n be the set of all partitions of [n] made into a poset by: for σ, π ∈
Q
n , σ ≤ π iff each part of σ is contained in some part of π. In the proof of

Q
Eq. 5.3 (which is equivalent to (i)) we obtained the following:
mn =
X
(m)ν(σ) . (5.27)
Q
σ∈ n
Define f : n → K by f (π) = (m)ν(π) and define g : n → K by

Q Q
g(π) = mν(π) . Since 0̂ in n is 0̂ = {{1}, {2}, {3}, . . . , {n}}, and ν(σ) = n iff
Q
σ = 0̂, we have
g(0̂) = mn =
X X
(m)ν(σ) = f (σ). (5.28)
Q
σ∈ n
σ≥0̂
For each σ ∈ n , the poset Pσ = {π ∈ n : π ≥ σ} = [σ, 1̂] is isomorphic to

Q Q
ν(σ) . So σ ∈
Q Q
n is 0̂ in Pσ . And Eq. 5.28 applied to Pσ says:
X Y
g(σ) = f (π), for all σ ∈ n
. (5.29)
π≥σ
Apply Möbius inversion to Eq. 5.29 to obtain
µ(σ, π)mν(π) .
X X
f (σ) = µ(σ, π)g(σ) = (5.30)
π≥σ π≥σ
Putting σ = 0̂ yields
 
n
µ(0̂, π)mν(π) µ(0̂, π) mk .
X X X
(m)n = f (0̂) = =
 

Q
k=1 and
Q
π∈ n π∈ ν(π)=k
n
(5.31)
As this holds for infinitely many m, we have a polynomial identity:
 
n
k
X X
(x)n = 
 µ(0̂, π)
x . (5.32)
k=1 and
Q
π∈ n
ν(π)=k
Comparing Eq. 5.32 with (i)0 we see that
X
s(n, k) = µ(0̂, π) = wn−k , (5.33)
and
Q
π∈ n
ν(π)=k
the (n − k)th Whitney number of n of the first kind. This shows that (i)
Q
and (i)0 are related by Möbius inversion on n .

Q
Putting k = 1 in Eq. 5.32 (ν(π) = 1 iff π = {1, 2, . . . , n} = 1̂) yields

µ(0̂, 1̂) is the coefficient of x in (x)n , which is (−1)n−1 (n − 1)! (5.34)
If π has type (a1 , . . . , ak ), i.e., π has ai parts of size i, then [0̂, π] ' ( 1 )a1 ×
Q
a
( 2 ) × · · · × ( k )ak . Hence µ(0̂, π) = ki=1 [(−1)i−1 (i − 1)!] i . Putting this
Q a2 Q Q
in Eq. 5.33 yields a rather strange formula for s(n, k).

For our second example turn to the set Ln of all linear partitions of [n].
For η, λ ∈ Ln , say η ≤ λ iff each block of λ can be obtained by juxtaposition
of blocks of η. Then Ln is a finite poset.
Fix m ∈ P. Define f : Ln → K by
f (λ) = (m)n(λ) (5.35)
Define g : Ln → K by
g(λ) = (m)ν(λ) (5.36)
Note that for λ ∈ Ln , λ = 0̂ iff ν(λ) = n. Then Eq. 5.25 implies that
(m)n = g(0̂) =
X X
(m)ν(λ) = f (λ). (5.37)
λ≥0̂ λ≥0̂
Exercise: 5.8.3 Show that Pη = {λ ∈ Ln : λ ≥ η} is isomorphic to Lν(η) .
So Eq. 5.37 generalizes to
X
g(η) = f (λ) for all η ∈ Ln . (5.38)
λ≥η
Then Möbius inversion gives
X
f (η) = g(λ)µ(η, λ) for all η ∈ Ln . (5.39)
λ≥η
Putting η = 0̂ in Eq. 5.38 yields

µ(0̂, λ)(m)ν(λ) .
X
(m)n = (5.40)
λ∈Ln
For each λ ∈ Ln , the interval Bλ = [0̂, λ] is Boolean. For example, if λ =

{{1, 2}, {3, 4, 5, 6}, {7}}, there are 1+3+0 = 4 places to put slashes between
members of one (ordered) part to obtain “lower” linear partitions. So the
set whose subsets form the Boolean poset is the set of positions between
members of a same part of λ. And µ(0̂, λ) must then be (−1)k , where k is
the total number of positions between members of a same part of λ. If λ has
ν(λ) parts, then k = n − ν(λ). Hence from Eq. 5.40 we have
µ(0̂, λ)(m)ν(λ) =
X
(m)n =
λ∈Ln
n
!
n−ν(λ) ν(λ) n−k n!ν−1
(m)k .
X X
= (−1) (m) = (−1) (5.41)
λ∈Ln k=1 k! k − 1
This holds for all m ∈ P, so yields a polynomial identity
n
!
n−k n! n−1
(x)k .
X
(x)n = (−1) (5.42)
k=1 k! k − 1
So Eq. 5.42, which is (ii)0 , is related to (ii) by Möbius inversion on Ln .

For the third example, make Sn into a poset as follows. Always write a
permutation σ ∈ Sn as a product of disjoint cycles so that in each cycle the
smallest element always comes first (furthest to the left in the cycle). Then
given σ, τ ∈ Sn , say σ ≤ τ iff each cycle of σ is composed of a string of
consecutive integers from some cycle of τ . For example, (12)(3) ≤ (123),
(1)(23) ≤ (123), but (13)(2) 6≤ (123). See Example 5.2.7 where we gave the
Hasse diagram of the interval [0̂, σ].
Equation (iii) of Theorem 5.8.1 can be written as
(m)n = mc(σ) , where c(σ) is the number of cycles of σ.

X
(5.43)
σ≥0̂
So c(σ) = n iff σ = 0̂.

Fix τ = c1 c2 · · · ck ∈ Sn (where, of course each cycle cj is written with

its smallest element first, and if i < j, the smallest element of ci is less than
the smallest element of cj ). Then σ ≥ τ iff each cycle of σ is made up of the
juxtaposition of some cycles of τ . It follows that Pτ = {σ ∈ Sn : σ ≥ τ } is
isomorphic to Sk . So Eq. 5.43 generalizes to
(m)c(τ ) = mc(σ) .
X
(5.44)
σ≥τ
Define f : Sn → K and g : Sn → K by
f (σ) = (m)c(σ) and g(σ) = mc(σ) , for all σ ∈ Sn . (5.45)
Then Eq. 5.44 says
X
f (τ ) = g(σ), for all τ ∈ Sn . (5.46)
σ≥τ
So by Möbius inversion we have
X
g(τ ) = µ(τ, σ)f (σ), for all τ ∈ Sn . (5.47)
σ≥τ
Putting τ = 0̂ in Eq. 5.47 gives
mn = µ(0̂, σ)(m)c(σ) .
X
(5.48)
σ∈Sn
We now wish to evaluate µ(0̂, σ). Say σ ∈ Sn is increasing if each of its

cycles increases. So if (i1 , . . . , is ) is a cycle of σ, then i1 < i2 < · · · < is .
Lemma 5.8.4 The Möbius function µ for Sn satisfies the following:
(
(−1)n−c(σ) , if σ is increasing;
For each σ ∈ Sn , µ(0̂, σ) = (5.49)
0, otherwise.
5.9. LATTICES AND GAUSSIAN COEFFICIENTS 233
Proof: Given σ ∈ Sn , consider the interval [0̂, σ]. The atoms of [0̂, σ]
correspond to transpositions (ir , ir+1 ) where (ir , ir+1 ) is a substring of a cycle
of σ and ir < ir+1 . Thus if σ is increasing, the atoms of Iσ = [0̂, σ] correspond
to all of the possible n − c(σ) transpositions. In that case Iσ is Boolean, and
µ(0̂, σ) = (−1)n−c(s) . So suppose σ is not increasing. Then some cycle
of σ has a consecutive pair (· · · , ir , ir+1 · · ·) with ir > ir+1 . Form a new
permutation σ ∗ from σ by inserting a pair )( of parentheses between ir , ir+1
for every consecutive pair (· · · ir , ir+1 · · ·) of all cycles where ir > ir+1 . Then
σ ∗ ≥ τ for every atom τ of [0̂, σ], and σ ∗ < σ. It follows that in the upper
Möbius algebra AV ([0̂, σ], K), if X is the set of atoms of [0̂, σ],
Y X
x= σt has σσ∗ as a summand. (5.50)
x∈X t≥x∀x∈X
Hence no (nonempty) product of atoms ever equals σ. Then by the dual of

Theorem 5.10.4 (with σ the 1̂ of [0̂, σ]), µ(0̂, σ) = 0.
Now Eqs. 5.48 and 5.49 give
mn = (−1)n−c(σ) (m)c(σ) =
X
σ∈Sn ;σ increasing
 
(−1)n−k (m)k 
X X
1 . (5.51)
 
k σ increasing and c(σ)=k
Since the number of increasing permutations with k cycles is easily seen

to be equal to S(n, k) (the number of partitions of [n] with k blocks), we
have derived (iii)0 from (iii) by Möbius inversion on Sn .
Recapitulation: (i) and (i)0 are related by Möbius inversion on n ; (ii)
Q
and (ii)0 are related by Möbius inversion on Ln ; and (iii) and (iii)0 are
related by Möbius inversion on Sn .
5.9 Lattices and Gaussian Coefficients

A lattice L is a poset with the property that any finite subset S ⊆ L has a
meet (or greatest lower bound), that is, an element b ∈ L for which
1. b ≤ a for all a ∈ S, and
2. if c ≤ a for all a ∈ S, then c ≤ b.

And dually, there is a join (or least upper bound), i.e., an element
b ∈ L for which
1.0 a ≤ b for all a ∈ S, and
2.0 if a ≤ c for all a ∈ S, then b ≤ c.
The meet and join of a two element set S = {x, y} are denoted, respec-
tively, by x ∧ y and x ∨ y. It is easily seen that ∧ and ∨ are commutative,
associative, idempotent binary operations. Moreover, if all 2-element subsets
have meets and joins, then any finite subset has a meet and a join.
The lattices we will consider have the property that there are no infinite
chains. Such a lattice has a (unique) least element (denoted 0̂ or 0L ), because
the condition that no infinite chains exist allows us to find a minimal element
m, and any minimal element m is a minimum, since if m 6≤ a, then m ∧ a
would be less than m. Similarly, there is a unique largest element 1L (or 1̂).
For elements a and b of a poset, we say a covers b and write a ·> b,
provided a > b but there are no elements c with a > c > b. For example,
when U and W are linear subspaces of a vector space, then U > · W iff U ⊇ W
and dim(U ) = dim(W ) + 1. A point of a lattice with 0̂ is an element that
covers 0̂. A copoint of a lattice with 1̂ is an element covered by 1̂.
Theorem 5.9.1 (L. Weisner 1935) Let µ be the Möbius function of a finite
lattice L. And let a ∈ L with a > 0̂. Then
X
µ(0̂, x) = 0.
x:x∨a=1̂
P
Proof: Fix a. Put S := x,y∈L µ(0̂, x)ζ(x, y)ζ(a, y)µ(y, 1̂). Now compute
S in two different ways.
X X X X
S= µ(0̂, x)µ(y, 1̂) = µ(0̂, x) · µ(y, 1)
x∈L x
y≥x y≥x
y≥a y≥a
X X
= µ(0̂, x) µ(y, 1̂)
x y≥x∨a
(
X X X 1 x ∨ a = 1;
= µ(0̂, x) · µ(y, 1̂) = µ(0̂, x) ·
x x 0, otherwise.
x∨a≤y≤1̂
X
= µ(0̂, x), which is the sum in the theorem.
x:x∨a=1
Also,
X X X
S= µ(y, 1̂) µ(0̂, x) = µ(y, 1̂) · 0,
y≥a 0≤x≤y y≥a
since y > 0.
Let Vn (q) denote an n-dimensional vector space over Fq = GF (q). The
term k-subspace will denote a k-dimensional subspace. It is fairly easy to
see that the poset Ln (q) of all subspaces of Vn (q) is a lattice with 0̂ = {0}
and 1̂ = Vn (q). We begin with some counting.
Exercise: 5.9.2 The number of ordered bases for a k-subspace of Vn (q) is:
(q k −1)(q k −q)(q k −q 2 ) · · · (q k −q k−1 ). How many ordered, linearly independent
subsets of size k are there in Vn (q)?
To obtain a maximal chain (i.e., a chain of size n + 1 containing one

subspace of each possible dimension) in the poset of Ln (q) of all subspaces
of Vn (q), we start with the 0-subspace. After we have chosen an i-subspace
Ui , 0 ≤ i < n, we can choose an (i + 1)-subspace Ui+1 that contains Ui , in
q n −q i
q i+1 −q i
ways, since we can take the span of Ui and any of the q n − q i vectors
not in Ui . But an (i + 1)-subspace will arise exactly q i+1 − q i times in this
manner. Hence the number of maximal chains of subspaces in Vn (q) is:
(q n − q 0 ) (q n − q 1 ) (q n − q n−1 )
M (n, q) = · · · · =
(q 1 − q 0 ) (q 2 − q 1 ) (q n − q n−1 )
(q n − 1)q(q n−1 − 1)q 2 (q n−2 − 1) · · · q n−1 (q − 1)
= =
(q − 1)q(q − 1)q 2 (q − 1) · · · q n−1 (q − 1)
(q n − 1)(q n−1 − 1) · · · (q − 1)
= .
(q − 1)n
This implies that
M (n, q) = (q n−1 + q n−2 + · · · + q + 1)(q n−2 + · · · q + 1) · · · (q + 1).

We may consider M (n, q) as a polynomial in q for each integer n. When

the indeterminate q is replaced by a prime power, we have the number of
maximal chains in the poset P G(n, q).
Note: When q is replaced by 1, we have M (n, 1) = n!, which is the
number of maximal chains in the poset of subsets of an n-set.
" #
n
The Gaussian number (or Gaussian coefficient) can be de-
k q
fined
" #as the number of k-subspaces of Vn (q). This holds for 0 ≤ k ≤ n, where
n
= 1.
0 q
" #
n
To evaluate , count the number N of pairs (U, C) where U is a
k q
k-subspace and C is a maximal chain that contains U . Since every maximal
chain contains one subspace of dimension k, clearly N = M (n, q). On the
other hand, we get each maximal chain uniquely by appending to a maximal
chain in the poset of subspaces of U – of which there are M (k, q) – a maximal
chain in the poset of all subspaces of Vn (q) that contain U . There are M (n −
k, q) of these, since the poset {W : U ⊆ W ⊆ V } is isomorphic to the poset
of subspaces of V /U , and dim(V /U ) = n − k. Hence
" #
n
M (n, q) = · M (k, q) · M (n − k, q),
k q
which implies that
" # " #
n M (n, q) n
= =
k q
M (k, q)M (n − k, q) n−k q
(q n−1 + q n−2 + · · · + q + 1)(q n−2 + q n−3 + · · · + q + 1) · · · (q + 1)

=
(q k−1 + · · · + 1) · · · (q + 1)(q n−k−1 + · · · + q + 1) · · · (q + 1)
(q n−1 + · · · + 1)(q n−2 + · · · + 1) · · · (q n−k + · · · + 1)
=
(q k−1 + · · · + 1) · · · (q + 1)
(q n − 1)(q n−1 − 1) · · · (q n−k+1 − 1)
= .
(q k − 1)(q k−1 − 1) · · · (q − 1)
In fact there is a satisfactory way to generalize the notion and the notation
of Gaussian coefficient to the multinomial case. (See the book by R. Stanley
for this.) However, for our present purposes it suffices to consider just the
binomial case. Define (0)q = 1, and for a positive integer j put (j)q =
1 + q + q 2 + · · · + q j−1 . Then put (0)!q = 1 and for a positive integer k, put
(k)!q = (1)q (2)q · · · (k)q . So (n)!q = M (n, q). With this notation we have
" #
n (n)!q
= . (5.52)
k q
(k)!q (n − k)!q
" #
n
For some purposes it is better to think of as a polynomial in an
k q " #
n
indeterminate q rather than as a of function of a prime power q. That
k q
is a polynomial in q is an easy corollary of the following exercise.
Exercise: 5.9.3 Prove the following recurrence:

" # " # " #
n n−1 n−k n−1
= +q .
k q
k q
k−1 q
Exercise: 5.9.4 Prove the following recurrence:

" # " # " #
n+1 n k n
= +q .
k q
k−1 q
k q
(Hint: There is a completely elementary proof just using the formulas for
the symbols.)
Note that the relation of the previous exercise reduces to the binomial
recurrence when q = 1. However, unlike the binomial recurrence, it is not
‘symmetric’.
" #
n
= l≥0 αl q l , where αl is the number of
P
k q
partitions of l into at most k parts, each of which is at most n − k.
" #
n
If we regard a Gaussian coefficient as a function of the real variable
k q
q (where n and k are fixed integers), then we find that the limit as q goes to
1 of a Gaussian coefficient is a binomial coefficient.
Exercise: 5.9.6 " # !

n n
limq→1 = .
k q
k
Exercise: 5.9.7 (The q-binomial Theorem) Prove that:
n
" #
n−1
i(i−1) n
xi , for n ≥ 1.
X
(1 + x)(1 + qx) · · · (1 + q x) = q 2
i=0
i q
Letting q → 1, we obtain the usual Binomial Theorem.
Exercise: 5.9.8 Prove that:
k
" # " # " #
n+m n m
q (n−i)(k−i) .
X
=
k q i=0
i q
k−i q
Define the Gaussian polynomials gn (x) ∈ R[x] as follows: g0 (x) =

1; gn (x) = (x − 1)(x − q) · · · (x − q n−1 ) for n > 0. Clearly the Gaussian
polynomials form a basis for R[x] as a vector space over R.
Theorem 5.9.9 The Gaussian coefficients connect the usual monomials to

the Gaussian polynomials, viz.:

Pn n
(i) xn = k=0 k
(x − 1)k ;
" #
n Pn n
(ii) x = k=0 gk (x).
k q
Proof: (i) is a special case of the binomial theorem. And (ii) becomes (i)
if q = 1. To prove (ii), suppose V, W are vector spaces over F = GF (q) with
dim(V ) = n and |W | = r. Here r = q t is any power of q with t ≥ n. Then
|HomF (V, W )| = rn .
Now classify f ∈ HomF (V, W ) according to the kernel subspace f −1 (0) ⊆
V . Given some subspace U ⊆ V , let {u1 , . . . , uk } be an ordered basis of
U and extend it to an ordered basis {u1 , . . . , uk , uk+1 , . . . , un } of V . Then
f −1 (0) = U iff f (ui ) = 0, 1 ≤ i ≤ k, and f (uk+1 ), . . . , f (un ) are linearly
independent vectors in W . Now
rn = (r − 1)(r − q) · · · (r − q n−r(U )−1 )

X
U ⊆V
n
" #
n
(r − 1)(r − q) · · · (r − q n−k−1 )
X
=
k=0
k q
n
" #
n
(r − 1)(r − q) · · · (r − q k−1 )
X
=
k=0
k q
" # " #
n n
(Use the fact that = )
k q
n−k q
n
" #
X n
= gk (r).
k=0
k q
As r "can #be any power of q as long as r ≥ q n , the polynomials xn and

Pn n
g (x) agree on infinitely many values of x and hence must be
k=0
k q k
identical.
The inverse connection can be obtained from the q-binomial theorem (c.f.
Ex. 5.9.7).
Exercise: 5.9.10 Prove that

n
" #
n n−i
q( ) (−1)n−i xi .
X
gn (x) = 2
i=0
i q
(Hint: In Ex. 5.9.7 first replace x with −x and then replace q with q −1
and simplify.)
If {an }∞n=0 is a given Psequence of numbers we have considered its ordi-
nary generating function n≥0 an xn and its exponential generating function
P xn
n≥0 an n! . (Also considered in Chapter 4 was the Dirichlet generating series
function.) There is a vast theory of Eulerian generating series functions de-
P xn
fined by n≥0 an n! q
. (See the book by R. Stanley for an introduction to this
subject with several references.) The next exercise shows that two specific
Eulerian generating functions are inverses of each other.
k
!
(−t)k q (2)
P
P tk
Exercise: 5.9.11 k≥0 k!q k≥0 k!q = 1.
(Hint: Compute the coefficient on tn separately for n = 0 and n ≥ 1.

Then use the q-binomial theorem with x = −1.)
Exercise: 5.9.12 Gauss inversion: Let {ui }∞ ∞

i=0 and {vi }i=0 be two sequences
of real numbers. Then
n n
" # " #
n n−i n
n−i
q( 2 )
X X
vn = ui (n ≥ 0) ⇔ un = (−1) vi (n ≥ 0).
i=0
i q i=0
i q
(Hint: Use Exercise 5.9.11.)

See the book by R. P. Stanley and that by Goulden and Jackson for a
great deal more on the subject of q-binomial (and q-multinomial) coefficients.
We are now going to compute the Möbius function of the lattice Ln (q).
Theorem 5.9.13 The Möbius function of the lattice Ln (q) of subspaces of

a vector space of dimension n over the Galois field F = GF (q) is given by
k
(−1)k q (2) ,
(
µ(U, W ) = if U ⊆ W and k = dim(W ) − dim(U );
0, if U ⊆
6 W.
Proof: The idea is to use Weisner’s theorem on the interval [U, W ] viewed
as isomorphic to the lattice of subspaces of the quotient space W/U . This
means that we need only compute µ(0̂, 1̂), where V = 1̂ is a space of dimen-
sion n and 0̂ = {0}. If V has dimension 1, then L1 (q) is a chain with two
1
elements and µ(0̂, 1̂) = −1 = (−1)1 · q (2) . Now suppose n = 2. Let a be a
point. By Weisner’s theorem
X
µ(0̂, 1̂) = − µ(0̂, p) = |{p : p ∨ a = 1̂ and p 6= 1̂}|
p : p ∨ a = 1̂
p 6= 1̂
2
= q = (−1)2 q (2) .
Now suppose that our induction hypothesis is that

k
µ(0̂, V ) = (−1)k q (2) if k = dim(V ) < n.
Let p cover 1. By Weisner’s Theorem,

X
µ(0̂, V ) = − µ(0̂, U ).
U :U ∨p=V
U 6= V
The subspaces U such that U ∨ p = V and U 6= V are those of dimension

n − 1 (i.e., hyperplanes) that do not contain p. The number
" of hyperplanes
#
n−1
on p is the number of points on a hyperplane, which is , so that
1 q
" # " #
n n−1
the number of hyperplanes not containing p is − = q n−1 .
1 q
1 q
n−1 n
So if dim(V ) = n, then µ(0̂, V ) = (−1)q n−1
(−1)n−1
q ( 2 ) = (−1)n q ( 2 ) , after
a little simplification.
As an application we count the number of linear transformations from
an n-dimensional vector space Y onto an m-dimensional vector space V over
F = GF (q). Clearly we must have n ≥ m if this number is to be nonzero.
However, we do not make this assumption.
Theorem 5.9.14 If Y and V are vector spaces over F with dim(Y ) = n

and dim(V ) = m, then
m
" #
m m−k
(−1)m−k q nk+( ).
X
|{T ∈ Hom(Y, V ) : T (Y ) = V }| = 2
k=0
k q
Proof: For each subspace U of V , let f (U ) = |{T ∈ Hom(Y, V ) : T (Y ) =

U }|, and let g(U ) = |{T ∈ Hom(Y, V ) : T (Y ) ⊆ U }|. Then g(U ) = q nr if
P
dim(U ) = r, and clearly g(U ) = W :W ⊆U f (W ). By Möbius inversion we
have f (U ) = W :W ⊆U µ(W, U )q n·dim(W ) . If U = V , by our formula for the
P
Möbius function on Ln (q) we have

m
" #
m−k m
n·dim(W ) m−k
q( 2 ) q nk ,
X X
f (V ) = µ(W, V )q = (−1)
W k=0
k q
which finishes the proof.
Corollary 5.9.15 The number of n×m matrices over F = GF (q) with rank
r is
r
" # " #
m r r−k
(−1)r−k q nk+( ).
X
2
r q k=0
k q
Corollary 5.9.16 The number of invertible n × n matrices over GF (q) is

n
" #
X
n−k n nk+(n−k
2 ).
(−1) q
k=0
k
Remark: There are gn (q m ) = (q m − 1)(q m − q) · · · (q m − q n−1 ) injective

linear transformations from Vn to Vm . If m = n, then “injective” is equivalent
to “onto.” Hence
n
" #
n n−k
n n n n n−1 n−k
q nk+( ).
X
gn (q ) = (q − 1)(q − q) · · · (q − q )= (−1) 2
k=0
k q
Exercise: 5.9.17 It is possible to define the Gaussian coefficients for any

positive integer q, not just the prime powers. Read the following article:
John Konvalina, A Unified Interpretation of the Binomial Coefficients, the
Stirling Numbers, and the Gaussian Coefficients, Amer. Math. Monthly, 107
(2000), 901 – 910.
5.10. POSETS WITH FINITE ORDER IDEALS 243
5.10 Posets with Finite Order Ideals

Let P be a poset for which each order ideal Λx = {y ∈ P : y ≤ x} is finite,
and let µ be the Möbius function of P . If K is any field, the (lower) Möbius
algebra AΛ (P, K) is the algebra obtained by starting with the vector space
K P and defining a (bilinear) multiplication on basis elements x, y ∈ P by
 
X X
x·y =  µ(s, t) s =
s:s≤x and s≤y t∈[s,x]∩[s,y]
 
X X X
= µ(s, t)s =  µ(s, t)s . (5.53)
(s,t):s≤t≤x and s≤t≤y t∈Λx ∩Λy s:s≤t
P
So if we put δt = s:s≤t µ(s, t)s, we have
X
x·y = δt (5.54)
t∈Λx ∩Λy
fix z ∈ P , g:g≤z
P P
If we
P
δt = (s,t):s≤t≤z µ(s, t)s =
s:s≤z δs,z · s = z. Hence:
P P
s:s≤z t:s≤t≤z µ(s, t) s =
X X
z= δt , and x · x = δt = x. (5.55)
t∈Λz t∈Λx
Moreover,
x · y = y iff y ≤ x. (5.56)
Note: In the above we are thinking of f ∈ K P as a formal (possibly
P
infinite) linear combination of the elements of P : f = x∈P f (x)x. So the
P
element x of P is identified with the element 1x = y∈P (δx,y )y = x. Then
the above discussion shows that {δt : t ∈ P } (as well as {x : x ∈ P }) is a
basis for AΛ (P, K).
Let A0Λ (P, K) be the abstract algebra x∈P Kx with Kx isomorphic to
Q
K. So A0Λ (P, K) is K |P | with direct product operations. Let δx0 be the

identity of Kx , so δx0 · δy0 = δx,y · δx0 . Then define a linear transformation
θ : AΛ (P, K) → A0Λ (P, K) by θ(δx ) = δx0 , and extend by linearity.
Theorem 5.10.1 θ is an algebra isomorphism.
Proof: For each x ∈ P , put x0 = y≤x δ0 ∈ A 0 P

P y
(P, K). As θ is clearly a
Λ P
0 0
vector space isomorphism with θ(x) = θ y≤x δy = y≤x δy = x , it suffices
to show that θ(x · y) = θ(x)θ(y) for all x, y ∈ P . So, θ(x · y) =
  
δt0 = δt0 δs0 =  δt0   δt0  = x0 · y 0 = θ(x)θ(y).

X X X X
t:t≤x and t≤y t≤x and s≤y t≤x t≤y
As a simple corollary we have the following otherwise not so obvious

result.
Theorem 5.10.2 If P is finite, {δt : t ∈ P } is a complete set of orthogonal

P
idempotents, and t∈P δt = δ (= 1).
(Note: In the notation for lattices, x · y = z iff Λx ∩ Λy = Λz iff z = x ∧ y.)
Theorem 5.10.3 Let P be finite with |P | ≥ 2. Let a, x ∈ P , a 6= x. On

the one hand,
 
X X X X
a · δx = a · µ(t, x)t = µ(t, x)(a · t) = µ(t, x) d.


t≤x t≤x d∈P t:t≤x and a·t=d
On the other hand,
(
X X δx , if x ≤ a;
a · δx = δt · δx = δt,x δx =
t:t≤a t:t≤a
0, if x ≤
6 a.
This has the following consequences:

(i) If x 6≤ a and d ∈ P, then

P
t:t≤x and a·t=d
µ(t, x) = 0.
For example, if a 6= 1̂ ∈ P, then t:a·t=d µ(t, 1̂) = 0.
P
P
As a special case, t:a·t=0̂ µ(t, 1̂) = 0, if P has 0̂ and 1̂,
with a 6= 1̂.
P P
(ii) If x ≤ a, d t:t≤x and a·t=d
µ(t, x) d = δx =
P
d:d≤x µ(d, x)d. So
(a) If d ≤ x ≤ a, then
P
t:t≤x and a·t=d
µ(t, x) = µ(d, x),
and
(b) If d 6=≤ x ≤ a, then
P
t:t≤x and a·t=d
µ(t, x) = 0.
Theorem 5.10.4 Let P be a finite poset with 0̂ and 1̂, 0̂ 6= 1̂. And let
X be the set of coatoms of P (i.e., elements covered by 1̂). Then µ(0̂, 1̂) =
P|X| k
k=1 (−1) Nk , where Nk is the number of subsets of X of size k whose product
is 0̂.
Proof: For any x ∈ X, δ − x = ( t∈P δt ) − t:t≤x δt = t:t6≤x δt . Hence

P P P

x∈X (1 − x) = t:t6≤x δt = δ1̂ , since if t 6= s, δt · δs = 0 and δ1̂ is
Q Q P
x∈X
the only idempotent appearing in all terms of the product. The coefficient
of 0̂ in δ1̂ = s:s≤1̂ µ(x, 1̂)s is µ(0̂, 1̂). The coefficient of 0̂ in x∈X (1 − x) is
P Q
exactly k (−1)k Nk .
P
If P is a poset for which each dual order ideal Vx = {y ∈ P : y ≥ x}

is finite, we can dualize the construction of the (lower) Möbius algebra and
define the upper Möbius algebra AV (P, K) which has primitive idempotents
P
of the form σx = y≥x µ(x, y)y.
Theorem 5.10.5 Let P and Q be finite posets. If φ : P → Q is any map,

then φ extends to an algebra homomorphism φ : AV (P, K) → AV (Q, K) iff
the following hold:
(i) φ is order preserving, and
(ii) for any q ∈ Q, the set {p ∈ P : φ(p) ≤ q} either has a maximum or is
empty.(That is, if I is a principal order ideal of Q, then φ−1 (I) is principal
or empty).
Proof: First suppose that φ extends to a homomorphism. Since x ≤ y iff

x·y = y (in AV (P, K)), it must be that x ≤ y iff x·y = y iff φ(x)·φ(y) = φ(y)
iff φ(x) ≤ φ(y), so φ is order preserving. Now for a fixed q ∈ Q suppose that
{p ∈ P : φ(p) ≤ q} = 6 ∅ and choose a p ∈ P for which φ(p) ≤ q. Then
 
X X X
σy = φ(p) = φ  σx  φ(σx ). (5.57)
y≥φ(p) x≥p x≥p
since {σy : y ∈ P } and {σy : y ∈ Q} are bases for AV (P, K) and

AV (Q, K), respectively, we see from Eq. 5.57 that σq ∈ AV (Q, K) appears
as a summand in φ(σx ) for some x ≥ p. Moreover, this x is unique, because
if σq is a summand in both φ(σx ) and φ(σy ), then φ(σx ) · φ(σy ) 6= 0. But
φ(σx ) · φ(σx ) = φ(σx ) · σy = φ(0) = 0. We claim that the unique x, x ≥ p, for
which σq is a summand of φ(σx ) is x = max{p ∈ P : φ(p) ≤ q}. The above
argument at least shows that x ≥ p for each p such that φ(p) ≤ q. But as
P
σq is a summand of φ(σx ), it is also a summand of φ(x) = φ g≥x σt =
P P
t≥x φ(σt ). But φ(x) = t≥φ(x) σt having σq as a summand implies that
q ≥ φ(x). We now have: x ≥ p for each p with φ(p) ≤ q and φ(x) ≤ q. So
x = sup{p ∈ P : φ(p) ≤ q}. Hence {p ∈ P : φ(p) ≤ q} = φ−1 (I = Λq ) = Λx .
This complete the proof that when φ extends to an algebra homomor-
phism both (i) and (ii) hold.
Conversely, suppose both (i) and (ii) hold. Let Q0 = {q ∈ Q : q ≥
φ(p) for some p ∈} = {q ∈ q : φ−1 (Λq ) 6= ∅}. If p ∈ P , then q = φ(p) is
automatically in Q0 . And if q ∈ Q0 , put ψ(q) = max{p ∈ P : φ(p) ≤ q}. So
Λψ(q) = {p ∈ P : φ(p) ≤ q}.
If q = φ(x), then q̄ = φ(ψ(q)) = φ(ψ(φ(x))) = φ(x). On the other hand
if q is not in the image of φ, then φ(ψ(q)) = φ(max{x : φ(x) ≤ q}) ≤ q.
If φ(p1 ) = φ(p) with p1 6= p, put
X0
φ∗ (σp ) =
X
σq = σq
q∈Q:φ(ψ(q))=φ(p)
where the sum 0 is over all σq for which q satisfies the following: q ≥ φ(p)and
P
the largest x with φ(x) ≤ q has φ(x) = φ(p). In this set of q’s, φ(p) is the
only one in the image of φ. The other q’s are only “slightly” larger than φ(p).
(This set of q’s is the set {q : [φ(p), q] ∩ φ(P ) = {φ(p)}}.)
Fix p ∈ P . Then for each q ∈ Q with q ≥ φ(p) there is a unique x in P

for which x is the largest element of P with φ(x) ≤ q. Necessarily x ≥ p.
Similarly, if we fix x, x ≥ p, there is a well-defined set of q’s for which φ(x) is
the largest element of φ(P ) which is less than or equal to q. Hence for p ∈ P ,
 
φ∗ (p) = φ∗  φ∗ (σx ) =
X X
σx  =
x:x≥p x:x≥p
φ∗ (σx )
X
=
x:x≥p and x=max{y:φ(x)=φ(y)}
 
X X
= σq  =
 

x:x≥p and x=max{y:φ(x)=φ(y)} q:q≥φ(x) and x=max{t∈P :φ(t)≤q}
X
= σq = φ(p).
q:q≥φ(p)
Hence φ∗ is the desired extension of φ to AV (P, K).
Corollary 5.10.6 Let (P, ≤) be a finite poset, and let P0 ⊆ P . Then the
injection φ : P0 → P : p 7→ p “extends to ” an algebra homomorphism of
AV (P, K) into AV (P, K) iff the restriction of each principal order ideal of P
to P0 is either empty or principal.

S. E. Payne - Applied Combinatorics (Student Version) - 2003 - 216p PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

S. E. Payne - Applied Combinatorics (Student Version) - 2003 - 216p PDF

Загружено:

Авторское право:

Доступные форматы

Applied Combinatorics – Math 6409

Student Version - Fall 2003

1 Basic Counting Techniques 11

2 Systems of Representatives and Matroids 67

4 Formal Power Series as Generating Functions 113

4.21.1 Symmetric Polynomials . . . . . . . . . . . . . . . . . . 194

5 Möbius Inversion on Posets 203

C = The set of complex numbers

N = The set of nonnegative integers

P = The set of positive integers

Q = The set of rational numbers

R = The set of real numbers

Z = The set of integers

N = {a1 , . . . , an } = typical set with n elements

[n] = {1, 2, . . . , n}; [0] = ∅

[i, j] = [i, i + 1, . . . , j], if i ≤ j

dxe = The ceiling of x (i.e., smallest integer not smaller than x)

P(S) = {A : A ⊆ S} (for any set S)

|A| = The number of elements of A (also denoted #A)

(n)!q = (1)q (2)q · · · (n)q n-qtorial

Sn = The symmetric group on [n]

B(n) = Total number of partitions of an n-set

nk = (n)k = n(n − 1) · · · (n − k + 1) (n to the k falling)

nk̄ = (n)k = n(n + 1) · · · (n + k − 1) (n to the k rising)

introduce the idea of a generating function, here is how a “generatingfunc-

Basic Counting Techniques

1.1 Sets and Functions: The Twelvefold Way

Let N , X be finite sets with #N = n, #X = x. Put X N = {f : N → X}.

Restrictions on f : (ii) f is injective

Consider N to be a set of balls, X to be a set of boxes and f : N → X

N = {1, 2, 3}, X = {a, b, c, d}.

Case 4. Both balls and boxes unlabeled.

3. with both N and X unlabeled provided there are bijections π :

Obs. 1. These three notions of equivalence determine equivalence rela-

Obs. 2. If f and g are equivalent in any of the above ways, then f is

The 12–Fold Way

Value of |{f : N → X}| if |N | = n and |X| = x

|P(N )| = #(X N ) = 2n . (1.1)

of T in n ways, then choose the second element in n − 1 ways, . . .!, and

Exercise: 1.1.3 Binomial Expansion If xy = yx and n ∈ N , show that

Put x = 1 in Eq. 1.4 to obtain

(Hint: A straightforward induction argument works easily. For a more amus-

Exercise: 1.1.5 Prove that for all a, b, n ∈ N the following holds:

Exercise: 1.1.6 Let r, s, k be any nonnegative integers. Then the following

Exercise: 1.1.8 Show that if 0 ≤ m < n, then

(Hint: Fix m and induct on n.)

Exercise: 1.1.9 Show that if 0 ≤ m < n, then

(Hint: Fix n > 0 and use “finite” induction on m.)

Exercise: 1.1.10 Show that:

Exercise: 1.1.11 If n is a positive integer, show that

1.2 Composition of Positive Integers

A composition of n ∈ P is an ordered set σ = (a1 , . . . , ak ) of positive

θ(σ) = {a1 , a1 + a2 , . . . , a1 + a2 + · · · + ak−1 }.

Moreover, the total number of compositions of n is

The bijection θ is often represented schematically by drawing n dots in a

There is a closely related problem. Let N (n, k) be the number of solutions

Exercise: 1.2.1 Find the number of solutions (x1 , . . . , xk ) in nonnegative

Exercise: 1.2.2 Find the number of solutions (x1 , . . . , xk ) in integers to

yi = xi + 1 − ai .)(Ans: If k = 4 and m = a1 + a2 + a3 + a4 , the answer is

(1 + y + xy)n = c(j, k)xj y k .

What does this give if x = y = 1?

Exercise: 1.2.5 Show that

∞. If x∈S ν(x) = k, M is called a k-multiset. Sometimes we write k = #M .

(Hint: A straightforward induction argument works easily. For a more amus-