Академический Документы
Профессиональный Документы
Культура Документы
Murali K. Srinivasan
Jugal K. Verma
3
4 CONTENTS
Chapter 1
Convention 1.1.1. We shall write F to mean either the real numbers R or the complex numbers
C. Elements of F will be called scalars.
The entry in row i and column j is aij . We also write A = (aij ) to denote the entries. When all
the entries are in R we say that A is a real matrix. Similarly, we define complex matrices. For
example,
1 −1 3/2
5/2 6 11.2
is a 2 × 3 real matrix.
A 1 × n matrix [a1 a2 · · · an ] is called a row vector and a m × 1 matrix
b1
b2
·
·
bn
5
6 CHAPTER 1. MATRICES, LINEAR EQUATIONS AND DETERMINANTS
Matrix multiplication
b1
·
First we define the product of a row vector a = [a1 . . . an ] and a column vector b =
· ,
bn
both with n components.
Define ab to be the scalar ni=1 ai bi .
P
The product of two matrices A = (aij ) and B = (bij ), denoted AB, is defined only when the
number of columns of A is equal to the number of rows of B. So let A be a m × n matrix and let
B be a n × p matrix. Let the row vectors of A be A1 , A2 , . . . , Am and let the column vectors of B
be B1 , B2 , . . . , Bp . We write
A1
A2
· , B = [B1 B2 · · · Bp ] .
A=
·
Am
xp
be a column vector with p components. Then
Bx = x1 B1 + x2 B2 + · · · + xp Bp .
as desired. So, Bx can be thought of as a linear combination of the columns of B, with column l
having coefficient xl . This way of thinking about Bx is very important.
Example 1.1.3. Let e1 , e2 , . . . , ep denote the standard column vectors with p components, i.e., ei
denotes the p × 1 column vector with 1 in component i and all other components 0. Then Bei = Bi ,
column i of B.
So, the jth column of AB is a linear combination of the columns of A, the coefficients coming from
the jth column Bj of B. For example,
2 0
1 3 1 5 4
1 1 = .
2 4 2 8 6
0 1
Similarly, ith row Ai B of AB is a linear combination of the rows of B, the coefficients coming from
the ith row Ai of A.
Properties of Matrix Operations
Theorem 1.1.4. The following identities hold for matrix sum and product, whenever the sizes of
the matrices involved are compatible (for the stated operations).
(i) A(B + C) = AB + AC.
(ii) (P + Q)R = P R + QR.
(iii) A(BC) = (AB)C.
(iv) c(AB) = (cA)B = A(cB).
Proof. We prove item (iii) (leaving the others as exercises). Let A = (aij ) have p columns,
B = (bkl ) have p rows and q columns, and C = (crs ) have q rows. Then the entry in row i and
column s of A(BC) is
p
X
= a(i, m){entry in row m, column s of BC}
m=1
p q
( )
X X
= a(i, m) b(m, n)c(n, s)
m=1 n=1
q
( p )
X X
= a(i, m)b(m, n) c(n, s),
n=1 m=1
but
0 1 1 0 0 0
=
0 0 0 0 0 0
Definition 1.1.5. A matrix all of whose entries are zero is called the zero matrix. The entries
aii of a square matrix A = (aij ) are called the diagonal entries. If the only nonzero entries of a
square matrix A are the diagonal entries then A is called a diagonal matrix. An n × n diagonal
matrix whose diagonal entries are 1 is called the n × n identity matrix. It is denoted by In . A
square matrix A = (aij ) is called upper triangular if all the entries below the diagonal are zero,
i.e., aij = 0 for i > j. Similarly we define lower triangular matrices.
A square matrix A is called nilpotent if Ar = 0 for some r ≥ 1.
Example 1.1.6. Let A = (aij ) be an upper triangular n × n matrix with diagonal entries zero.
Then A is nilpotent. In fact An = 0.
Since column j of An is An ej , it is enough to show that An ej = 0 for j = 1, . . . , n. Denote
column j of A by Aj .
1.1. MATRIX OPERATIONS 9
Similarly
A3 e3 = A2 (Ae3 ) = AA3 = A2 (a13 e1 + a23 e2 ) = 0.
Continuing in this fashion we see that all columns of An are zero.
Inverse of a Matrix
Definition 1.1.7. Let A be an n×n matrix. If there is an n×n matrix B such that AB = In = BA
then we say A is invertible and B is the inverse of A. The inverse of A is denoted by A−1 .
Remark 1.1.8. (1) Inverse of a matrix is uniquely determined. Indeed, if B and C are inverses of
A then
B = BI = B(AC) = (BA)C = IC = C.
(2) If A and B are invertible n × n matrices, then AB is also invertible. Indeed,
Transpose of a Matrix
Proof. For any matrix C, let Cij denote its (i, j)th entry.
(i) Let A = (aij ), B = (bij ). Then, for all i, j,
(ii) Since AA−1 = I = A−1 A, we have (AA−1 )t = I = (A−1 A)t . By (i), (A−1 )t At = I =
At (A−1 )t . Thus (At )−1 = (A−1 )t .
Lemma 1.1.12. (i) If A is a symmetric matrix then so is A−1 . (ii) Every square matrix A is a
sum of a symmetric and a skew symmetric matrix in a unique way.
A = P + Q.
We discuss a widely used method called the Gauss elimination method to solve a system of m linear
equations in n unknowns x1 , . . . , xn :
where the aij ’s and the bi ’s are known scalars in F. If each bi = 0 then the system above is called
a homogeneous system. Otherwise, we say it is inhomogeneous.
Set A = (aij ), b = (b1 , . . . , bm )t , and x = (x1 , . . . , xn )t . We can write the system above in the
matrix form
Ax = b.
The matrix A is called the coefficient matrix. By a solution, we mean any choice of the unknowns
x1 , . . . , xn which satisfies all the equations.
1.2. GAUSS ELIMINATION 11
Definition 1.2.2. A m × n matrix M is said to be in row echelon form (ref ) if it satisfies the
following conditions:
(a) By a zero row of M we mean a row with all entries zero. Suppose M has k nonzero rows
and m − k zero rows. Then the last m − k rows of M are the zero rows.
(b) The first nonzero entry in a nonzero row is called a pivot. For i = 1, 2, . . . , k, suppose that
the pivot in row i occurs in column ji . Then we have j1 < j2 < · · · < jk . The columns {j1 , . . . , jk }
are called the set of pivotal columns of M . Columns {1, . . . , n} \ {j1 , . . . , jk } are the nonpivotal
or free columns.
Note that a matrix in row canonical form is automatically in row echelon form. Also note that,
in both the definitions above, the number of pivots k is ≤ m, n.
where the aij ’s are arbitrary scalars. It may be checked that U is in rcf with pivotal columns 2, 5, 7
12 CHAPTER 1. MATRICES, LINEAR EQUATIONS AND DETERMINANTS
Example 1.2.6. Let R be the matrix from the example above. Let c = (c1 , c2 , c3 , c4 )t . We want
to write down all solutions to the system U x = c.
(i) If c4 6= 0 then clearly there is no solution.
(ii) Now assume that c4 = 0. Call the variables x2 , x5 , x7 pivotal and the variables x1 , x3 , x4 , x6 , x8
nonpivotal or free.
Give arbitrary values x1 = s, x3 = t, x4 = u, x6 = v, x8 = w to the free variables. These values
can be extended to values of the pivotal variables in one and only one way to get a solution to the
system Rx = c:
The process above is called back substitution. Given arbitrary values for the free variables,
we first solve for the value of the largest pivotal variable, then using this value (and the values of
the free variables) we get the value of the second largest pivotal variable, and so on.
We extract the following Lemma from the examples above and its proof is left as an exercise.
Lemma 1.2.7. Let U be a m × n matrix in ref. Then the only solution to the homogeneous system
U x = 0 which is zero in all free variables is the zero solution.
Note that a matrix in rcf is also in ref and the lemma above also applies to such matrices.
Theorem 1.2.8. Let Ax = b, with A an m × n matrix. Let c be a solution of Ax = b and S the
set of all solutions of the associated homogeneous system Ax = 0. Then the set of all solutions to
Ax = b is
c + S = {c + v : v ∈ S}.
(iv) Let p be the unique solution of U x = c having all free variables zero. Then every solution
of U x = c is of the form X
p+ ai s i ,
i∈F
Example 1.2.10. In our previous two examples P = {2, 5, 7} and F = {1, 3, 4, 6, 8}. To make sure
the notation of the theorem is understood write down p and si , i = 1, 3, 4, 6, 8.
We now discuss the first step in Gauss elimination, namely, how to reduce a matrix to ref or
rcf. We define a set of elementary row operations to be performed on the equations of a system.
These operations transform a system of equations into another system with the same solution set.
Performing an elementary row operation on Ax = b is equivalent to replacing this system by the
system EAx = Eb, where E is an invertible elementary matrix.
Let eij denote the m × n matrix with 1 in the ith row and jth column and zero elsewhere. Any
matrix A = (aij ) of size m × n can be written as
m X
X n
A= aij eij .
i=1 j=1
For this reason eij ’s are called the matrix units. Let us see the effect of multiplying e13 with a
matrix A written in terms of row vectors :
0 0 1 ··· 0
.. .. .. — R1 — — R3 —
. . . .
..
—
R2 —
— 0 —
e13 A = ... ... ... .. — R3 — — 0 —
= .
.
.. .
.. .. .. .. — ..
— . — —
. . . .
— Rm — m×n — 0 —
0 0 0 · · · 0 m×m
We now define three kinds of elementary row operations and elementary matrices. Consider the
system Ax = b, where A is m × n, b is m × 1, and x is a n × 1 unknown vector.
1.2. GAUSS ELIMINATION 15
(i) Elementary row operation of type I: For i 6= j and a scalar a, add a times equation j to
equation i in the system Ax = b.
What effect does this operation have on A and b? Consider the matrix
1 1
1 a
1
E=
1 or
.. = I + aeij , i 6= j.
.
..
. a
1 1
This matrix has 1’s on the diagonal and a scalar a as an off-diagonal entry. By the above observation
— R1 — — R1 — — 0 —
— R2 — — R2 — — Rj —
ith row
(I + aeij ) = + a
.. .. ..
. . .
— Rm — — Rm — — 0 —
— R1 —
..
.
= — Ri + aRj —
ith row
..
.
— Rm —
It is now clear that performing an elementary row operation of type I on the system Ax = b we
get the new system EAx = Eb.
Suppose we perform an elementary row operation of type I as above. Then perform the same
elementary row operation of type I but with the scalar a replaced by the scalar −a. It is clear that
we get back the original system Ax = b. It follows (why?) that E −1 = I − aeij .
(ii) Elementary row operation of type II: For i 6= j interchange equations i and j in the system
Ax = b.
What effect does this operation have on A and b?. Consider the matrix
1
1
..
.
0 1
F = = I + eij + eji − eii − ejj .
..
.
1 0
..
.
1
Premultiplication by this matrix has the effect of interchanging the ith and jth rows. Performing
this operation twice in succession gives back the original system. Thus F 2 = I.
16 CHAPTER 1. MATRICES, LINEAR EQUATIONS AND DETERMINANTS
(iii) Elementary row operation of type III: Multiply equation i in the system Ax = b by a
nonzero scalar c.
What effect does this operation have on A and b?. Consider the matrix
1
1
..
.
c
= I + (c − 1)eii , c 6= 0
G=
1
1
..
.
1
Premultiplication by G has the effect of multiplying the ith row by c. Do this operation twice
in succession, first time with the scalar c and the second time with scalar 1/c, yields the original
system back. It follows that G−1 = I + (c−1 − 1)eii .
The matrices E, F, G above are called elementary matrices of type I,II,III respectively. We
summarize the above discussion in the following result.
Theorem 1.2.11. Performing an elementary row operation (of a certain type) on the system
Ax = b is equivalent to premultiplying A and b by an elementary matrix E (of the same type),
yielding the system EAx = Eb.
Elementary matrices are invertible and the inverse of an elementary matrix is an elementary
matrix of the same type.
Since elementary matrices are invertible it follows that performing elementary row operations
does not change the solution set of the system. We now show how to reduce a matrix to row
reduced echelon form using a sequence of elementary row operations.
Theorem 1.2.12. Every matrix can be reduced to a matrix in rcf by a sequence of elementary row
operations.
Proof. We apply induction on the number of rows.If the matrix A is a row vector, the conclusion
is obvious. Now suppose that A is m × n, where m ≥ 2. If A = 0 then we are done. If A is not
the zero matrix then there is a nonzero column in A. Find the first nonzero column, say column
j1 , from the left. Interchange rows to move the first nonzero in column j1 to the top row. Now
multiply by a nonzero scalar to make this entry (in row 1 and column j1 ) 1. Now add suitable
multiples of the first row to the remaining rows so that all entries in column j1 , except the entry
in row 1, become zero. The resulting matrix looks like
0 ··· 0 1 ∗ ··· ∗
0 ··· 0 0 ∗ ··· ∗
· · · · ·
A1 =
·
· · · ·
· · · · ·
0 ··· 0 0 ∗ ··· ∗
1.2. GAUSS ELIMINATION 17
Proof. (i) Reduce A to rcf U by Gauss elimination. Since m < n there is atleast one free variable.
It follows that there is a nontrivial solution.
(ii) Reduce Ax = b to EAx = Eb using Gauss elimination, where U = EA is in rref. Put
c = Eb = (c1 , . . . , cm )t . Suppose U has k nonzero rows. There cases arise:
(a) atleast one of ck+1 , . . . , cm is nonzero: in this case there is no solution.
(b) ck+1 = · · · = cm = 0 and k = n: there is a unique solution (why?).
(c)ck+1 = · · · = cm = 0 and k < n: there are infinitely many solutions (why?).
No other cases are possible (why?). That completes the proof.
In the following examples an elementary row operation of type I is indicated by Ri + aRj , of
type II is indicated by Ri ↔ Rj , and of type III is indicated by aRi .
Example 1.2.14. Consider the system
2 1 1 x1 5
Ax = 4 −6 0 x2 = −2 = b.
−2 7 2 x3 9
−3 −4 −2
0
0 1 0 0
0 0 −2
+ t 0 ,
+ s +r
0 0 1 0
0 0 0 1
1/3 0 0 0
(d) ⇒ (a) First observe that a square matrix in rcf is either the identity matrix or its bottom row
is zero. If A can’t be reduced to I by elementary row operations then U = the rcf of A has a zero
row at the bottom. Hence U x = 0 has atmost n − 1 nontrivial equations. which have a nontrivial
solution. This contradicts (d).
This proposition provides us with an algorithm to calculate inverse of a matrix if it exists. If A
is invertible then there exist invertible matrices E1 , E2 , . . . , Ek such that Ek · · · E1 A = I. Multiply
by A−1 on both sides to get Ek · · · E1 I = A−1 .
20 CHAPTER 1. MATRICES, LINEAR EQUATIONS AND DETERMINANTS
Lemma 1.2.18. (Gauss-Jordan Algorithm) Let A be an invertible matrix. To compute A−1 , apply
elementary row operations to A to reduce it to an identity matrix. The same operations when
applied to I, produce A−1 .
Example 1.2.19. We find the inverse of the matrix
1 0 0
A = 1 1 0 .
1 1 1
1.3 Determinants
In this section we study determinants of matrices. Recall the formula for determinants of k × k
matrices, for k = 1, 2, 3.
a b
det[a] = a, det = ad − bc
c d
a b c
and det d e f = aei − ahf − bdi + bgf + cdh − ceg.
g h i
Our approach to determinants of n × n is via their properties (rather than via an explicit formula
as above). It makes their study more elegant. Later, we will give a geometric interpretation of
determinant in terms of volume.
Let d be a function that associates a scalar d(A) ∈ F with every n × n matrix A over F. We use
the following notation. If the columns of A are A1 , A2 , . . . , An , we write d(A) = d(A1 , A2 , . . . , An ).
Definition 1.3.1. (i) d is called multilinear if for each k = 1, 2, . . . , n; scalars α, β and n × 1
column vectors A1 , . . . , Ak−1 , Ak+1 , . . . , An , B, C
Our immediate objective is to show that there is only one determinant function of order n. This
fact is very useful in proving that certain formulas yield the determinant. We simply show that
the formula defines an alternating, multilinear and normalized function on the columns of n × n
matrices.
0 = d(A1 , A2 , . . . , B + C, B + C, . . . , An )
= d(A1 , A2 , . . . , B, B + C, . . . , An ) + d(A1 , A2 , . . . , C, B + C, . . . , An )
= d(A1 , A2 , . . . , B, C, . . . , An ) + d(A1 , A2 , . . . , C, B, . . . , An )
Remark 1.3.3. Note that the properties (a), (b), (c) have been derived by properties of determinant
functions without having any formula at our disposal yet.
Computation of determinants
Example 1.3.4. We now derive the familiar formula for the determinant of 2×2 matrices. Suppose
d(A1 , A2 ) is an alternating multilinear normalized function on 2 × 2 matrices A = (A1 , A2 ). Then
x y
d = xu − yz.
z u
To derive this formula, write the first column as A = xe1 + ze2 and the second column as A2 =
22 CHAPTER 1. MATRICES, LINEAR EQUATIONS AND DETERMINANTS
Similarly, the formula for 3 × 3 determinants can also be derived as above. We leave this as an
exercise.
Lemma 1.3.5. Suppose f is a multilinear alternating function on n×n matrices and f (e1 , e2 , . . . , en ) =
0. Then f is identically zero.
where the sum is now over all 1 − 1 onto functions h : {1, 2, . . . , n} → {1, 2, . . . , n}.
By using part (c) of the lemma above we see that we can write
X
f (A1 , . . . , An ) = ±ah(1)1 ah(2)2 · · · ah(n)n f (e1 , e2 , . . . , en ),
h
where the sum is over all 1 − 1 onto functions h : {1, 2, . . . , n} → {1, 2, . . . , n}.
Thus f (A) = 0.
Existence and uniqueness of determinant function
g(e1 , e2 , . . . , en ) = 0
Since Aj = Aj+1 , a1j = a1j+1 and A1j = A1j+1 . Thus f (A) = 0. Therefore f (A1 , A2 , . . . , An ) is
alternating.
If A = (e1 , e2 , . . . , en ) then by induction
Theorem 1.3.10. (i) Let U be an upper triangular or a lower triangular matrix. Then detU =
product of diagonal entries of U .
(ii) Let E be an elementary matrix of the type I + aeij , for some i 6= j. Then detE = 1.
(iii) Let E be an elementary matrix of the type I + eij + eji − eii − ejj , for some i 6= j. Then
detE = −1.
(iv) Let E be an elementary matrix of the type I + (a − 1)eii , a 6= 0. Then detE = a.
Proof. (i) Let U = (uij ) be upper triangular. Arguing as in Lemma 3.5 we see that
X
detU = ±uh(1)1 uh(2)2 · · · uh(n)n ,
h
where the sum is over all 1 − 1 onto functions h : {1, 2, . . . , n} → {1, 2, . . . , n}. Since U is upper
triangular the only choice of h yeilding a nonzero term is the identity function (and this gives a
plus sign).
The proof for a lower triangular matrix is similar.
(ii) Follows from part (i).
(iii) E is obtained from the identity matrix by exchanging columns i and j. The result follows
since determinant is an alternating function.
(iv) Follows form part (i).
Determinant and Invertibility
Theorem 1.3.11. Let A, B be two n × n matrices. Then
det(AB) = detAdetB.
(AB)i = ABi .
Therefore
detA = detAt .
Proof. Let B be the rcf of A. Then EA = B, where E is a product of elementary matrices. Since
inverses of elementary matrices are elementary matrices (of the same type) we can write
A = E1 · · · Ek B,
At = B t Ekt · · · E1t ,
Example 1.3.15. (Computation by Gauss–Elimination Method). This is one of the most efficient
ways to calculate determinant functions. Let A be an n × n matrix. Suppose
E = the n × n elementary matrix for the row operation Ai + cAj
F = the n × n elementary matrix for the row operation Ai ∼ Aj
G = the n × n elementary matrix for the row operation Ai ∼ cAi .
Suppose that U is the rcf of A. If c1 , c2 , . . . , cp are the multipliers used for the row operations
Ai ∼ cAi and r row exchanges have been used to get U from A then for any alternating multilinear
function d, d(A) = (−1)r c1 c2 . . . cp d(U ). To see this we simply note that
Suppose that u11 , u22 , . . . , unn are the diagonal entries of U then
Definition 1.3.16. Let A = (aij ) be an n × n matrix. The cofactor of aij , denoted by cofaij is
defined as
cofaij = (−1)i+j detAij .
cofA = (cofaij ).
If i = j, it is easy to see that it is detA. When i 6= j consider the matrix B obtained by replacing ith
column of A by j th column of A. So B has a repeated column. The expansion by minors formula
for detB shows that detB = 0. The other equation A(cofA)t = (detA)I is proved similarly.
1.3. DETERMINANTS 27
detCj
xj = .
detA
Definition 2.1.1. A nonempty set V of objects (called elements or vectors) is called a vector
space over the scalars F if the following axioms are satisfied.
I. Closure axioms:
1. (closure under vector addition) For every pair of elements x, y ∈ V there is a unique element
x + y ∈ V called the sum of x and y.
2. (closure under scalar multiplication of vectors by elements of F) For every x ∈ V and every
scalar α ∈ F there is a unique element αx ∈ V called the product of α and x.
II. Axioms for vector addition:
3. (commutative law) x + y = y + x for all x, y ∈ V.
4. (associative law) x + (y + z) = (x + y) + z for all x, y, z ∈ V.
5. (existence of zero element) There exists a unique element 0 in V such that x + 0 = 0 + x = x
for all x ∈ V.
6. (existence of inverse or negatives) For x ∈ V there exists a unique element written as −x such
that x + (−x) = 0.
III. Axioms for scalar multiplication
7. (associativity) For all α, β ∈ F, x ∈ V,
α(βx) = (αβ)x.
8. (distributive law for addition in V ) For all x, y ∈ V and α ∈ F,
α(x + y) = αx + αy.
9. (distributive law for addition in F) For all α, β ∈ F and x ∈ V,
(α + β)x = αx + βx
29
30 CHAPTER 2. VECTOR SPACES AND LINEAR TRANSFORMATIONS
1x = x.
Remark 2.1.2. When F = R we say that V is a real vector space. If we replace real numbers in
the above definition by complex numbers then we get the definition of a complex vector space.
6. Let a < b be real numbers and set V = {f : [a, b] −→ R}, F = R. If f, g ∈ V then we set
(f + g)(x) = f (x) + g(x) for all x ∈ [a, b]. If a ∈ R and f ∈ V then (af )(x) = af (x) for all
x ∈ [a, b]. This gives a real vector space. Here V is also denoted by R[a,b] .
8. C[a, b] = {f : [a, b] −→ R|f is continuous on [a, b]} is a real vector space under addition and
scalar multiplication defined in item 6 above.
9. V = {f : [a, b] −→ R|f is differentiable at x ∈ [a, b], x fixed } is a real vector space under the
operations described in item 6 above.
00 0
10. The set of all solutions to the differential equation y + ay + by = 0 where a, b ∈ R form a
real vector space. More generally, in this example we can take a = a(x), b = b(x) suitable
functions of x.
2.1. VECTOR SPACES 31
11. Let V = Mm×n (R) denote the set of all m × n matrices with real entries. Then V is a real
vector space under usual matrix addition and multiplication of a matrix by a real number.
The above examples indicate that the notion of a vector space is quite general. A result proved
for vector spaces will simultaneously apply to all the above different examples.
Subspace of a Vector Space
Definition 2.1.4. Let V be a vector space over F. A nonempty subset W of V is called a subspace
of V if
(i) 0 ∈ W .
(ii) u, v ∈ W implies u + v ∈ W .
(iii) u ∈ W, α ∈ F implies αu ∈ W .
Definition 2.1.5. Let S be a subset of a vector space V over F. The linear span of S is the
subset of all vectors in V expressible as linear combinations of finite subsets of S, i.e.,
( n )
X
L(S) = ci xi |n ≥ 0, x1 , x2 , . . . , xn ∈ S and c1 , c2 , . . . , cn ∈ F
i=1
The empty sum of vectors is the zero vector. Thus L(∅) = {0}. We say that L(S) is spanned by
S.
Proof. Note that L(S) is a subspace (why?). Now, if S ⊂ W ⊂ V and W is a subspace of V then
by L(S) ⊂ W (why?). The result follows.
N (A) = {x ∈ Fn : Ax = 0}.
2. Different sets may span the same subspace. For example L({e1 , e2 }) = L({e1 , e2 , e1 + e2 }) =
R2 . The vector space Pn (R) is spanned by {1, t, t2 , . . . , tn } and also by {1, (1 + t), . . . , (1 + t)n }
(why?).
32 CHAPTER 2. VECTOR SPACES AND LINEAR TRANSFORMATIONS
A set S is called linearly independent (L.I.) if it is not linearly dependent, i.e., for all n ≥ 1
and for all distinct v1 , v2 , . . . , vn ∈ S and scalars α1 , α2 , . . . , αn
α1 v1 + α2 v2 + . . . + αn vn = 0 implies αi = 0, for all i.
Elements of a linearly independent set are called linearly independent. Note that the empty
set is linearly independent.
Remark 2.1.9. (i) Any subset of V containing a linearly dependent set is linearly dependent.
(ii) Any subset of a linearly independent set in V is linearly independent.
Example 2.1.10. (i) If a set S contains the zero vector 0 then S is dependent since 1.0 = 0.
(ii) Consider the vector space Rn and let S = {e1 , e2 , . . . , en }. Then S is linearly independent.
Indeed, if α1 e1 + α2 e2 + . . . + αn en = 0 for some scalars α1 , α2 , . . . , αn then (α1 , α2 , . . . , αn ) = 0.
Thus each αi = 0. Hence S is linearly independent.
(iii) Let V be the vector space of all continuous functions from R to R. Let S = {1, cos2 t, sin2 t}.
Then the relation cos2 t + sin2 t − 1 = 0 shows that S is linearly dependent.
(iv) Let α1 < α2 < . . . < αn be real numbers. Let V = {f : R −→ R|f is continuous }. Consider
the set S = {eα1 x , eα2 x , . . . , eαn x }. We show that S is linearly independent by induction on n. Let
n = 1 and βeα1 x = 0. Since eα1 x 6= 0 for any x, we get β = 0. Now assume that the assertion is
true for n − 1 and
β1 eα1 x + . . . + βn eαn x = 0.
Then β1 e(α1 −αn )x + . . . + βn e(αn −αn )x = 0
Let x −→ ∞ to get βn = 0. Now apply induction hypothesis to get β1 = . . . = βn−1 = 0.
(v) Let P denote the vector space of all polynomials p(t) with real coefficients. Then the set
S = {1, t, t2 , . . .} is linearly independent. Suppose that 0 ≤ n1 < n2 < . . . < nr and
α1 tn1 + α2 tn2 + . . . + αr tnr = 0
for certain real numbers α1 , α2 , . . . , αr . Differentiate n1 times to get α1 = 0. Continuing this way
we see that all α1 , α2 , . . . , αr are zero.
2.1. VECTOR SPACES 33
We show that all bases of a finite dimensional vector space have same cardinality (i.e., they
contain the same number of elements) For this we prove the following result.
Lemma 2.1.13. Let S = {v1 , v2 , . . . , vk } be a subset of a vector space V. Then any k + 1 elements
in L(S) are linearly dependent.
for some scalars cs , s ∈ S. Since w1 , . . . , wj+1 are linearly idependent there exists t ∈ Sj −
{w1 , . . . , wj } with ct 6= 0 (why?). It follows that
1 X
t = (wj+1 − cs s)
ct
s∈Sj −{t}
and hence the set (Sj − {t}) ∪ {wj+1 } satisfies conditions (i), (ii), and (iii) above for i = j + 1. That
completes the proof.
(second proof ) Let T = {u1 , . . . , uk+1 } ⊆ L(S). Write
k
X
ui = aij vj , i = 1, . . . , k + 1.
j=1
34 CHAPTER 2. VECTOR SPACES AND LINEAR TRANSFORMATIONS
We now have
k+1
X k+1
X Xk
ci ui = ci ( aij vj )
i=1 i=1 j=1
k X
X k+1
= ( ci aij )vj
j=1 i=1
= 0,
Theorem 2.1.14. Any two bases of a finite dimensional vector space have same number of ele-
ments.
Proof. Suppose S and T are bases of a finite dimensional vector space V. Suppose |S| < |T |. Since
T ⊂ L(S) = V, T is linearly dependent. This is a contradiction.
has dimension 2. A basis is {e−x , e3x }. Every solution is a linear combination of the solutions e−x
and e3x .
Exercise 2.1.17. What is the dimension of Mn×n (C) as a real vector space?
Lemma 2.1.18. Suppose V is a finite dimensional vector space. Let S be a linearly independent
subset of V . Then S can be enlarged to a basis of V .
2.1. VECTOR SPACES 35
Proof. Suppose that dim V = n and S has less than n elements. Let v ∈ V \ L(S). Then S ∪ {v}
is a linearly independent subset of V (why?). Continuing this way we can enlarge S to a basis of
V.
Gauss elimination, row space, and column space
Lemma 2.1.19. Let A be a m × n matrix over F and E a nonsingular m × m matrix over F. Then
(a) R(A) = R(EA). Hence dim R(A) = dim R(EA).
(b) Let 1 ≤ i1 < i2 < · · · < ik ≤ n. Columns {i1 , . . . , ik } of A are linearly independent if and
only if columns {i1 , . . . , ik } of EA are linearly independent. Hence dim C(A) = dim C(EA).
Proof. (a) R(EA) ⊆ R(A) since every row of EA is a linear combination of the rows of A.
Similarly,
R(A) = R(E −1 (EA)) ⊆ R(EA).
Columns 1,4,6 of A form a basis of C(A) and the first 3 rows of U form a basis of R(A).
Definition 2.1.22. The rank of an m × n matrix A, denoted by r(A) or rank (A) is dim R(A) =
dim C(A). The nullity of A is the dimension of the nullspace N (A) of A.
rank A + nullity A = n.
Proof. Let k = r(A). Reduce A to rcf (or even ref) U using elementary row operations. Then U
has k nonzero rows and k pivotal columns. We need to show that dim N (A) = dim N (U ) = n − k.
Let j1 , . . . , jk be the indices of the pivotal columns of U . Set P = {j1 , . . . , jk } and F =
{1, 2, . . . , n} \ P , so |F | = n − k. Recall from Chapter 2 the following:
(i) Given arbitrary scalars xi for i ∈ F , there are unique scalars xi for i ∈ P such that
x = (x1 , . . . , xn )t satisfying U x = 0.
(ii) Given i ∈ F , there is a unique si = (x1 , . . . , xn ) satisfying U si = 0, xi = 1, and xj = 0, for
all j ∈ F − {i}.
Then si , i ∈ F forms a basis of N (A) (why?).
Fundamental Theorem for systems of linear equations
Theorem 2.1.24. Consider the following system of m linear equations in n unknowns x1 , x2 , . . . , xn :
a11 a12 · · · a1n x1 b1
a21 a22 · · · a2n x2 b2
= .. or Ax = b.
.. ..
. . .
am1 am2 · · · amn xn bm
(3) If r(A) = r([A | b]) = r < n then Ax = b has infinitely many solutions.
Hence x1 = d1 , . . . , xn = dn is a solution.
2.2. LINEAR TRANSFORMATIONS 37
(2) Let r(A) = r([A | b]) = n. Then by the rank-nullity theorem, nullity (A) = 0. Hence Ax = 0
has a unique solution, namely x1 = · · · = xn = 0. If Ax = b = Ay then A(x − y) = 0. Hence
x − y = 0. Thus x = y.
(3) Suppose r(A) = r([A | b]) = r < n. Then n − r = dim N (A) > 0. Thus Ax = 0 has infinitely
many solutions. Let c ∈ Fn and Ac = b. Then we have seen before that all the solutions of Ax = b
are in the set c + N (A) = {c + x | Ax = 0}. Hence Ax = b has infinitely many solutions.
Rank in terms of determinants
We characterize rank in terms of minors of A. Recall that a minor of order r of A is a
submatrix of A consisting of r columns and r rows of A.
Theorem 2.1.25. An m × n matrix A has rank r ≥ 1 iff detM 6= 0 for some order r minor M of
A and detN = 0 for all order r + 1 minors N of A.
Proof. Let the rank of A be r ≥ 1. Then some r columns of A are linearly independent. Let B
be the m × r matrix consisting of these r columns of A. Then rank(B) = r and thus some r rows
of B will be linearly independent. Let C be the r × r matrix consisting of these r rows of B. Then
det(C) 6= 0 (why?).
Let N be a (r + 1) × (r + 1) minor of A. Without loss of generality we may take N to consist of
the first r + 1 rows and columns of A. Suppose det(N ) 6= 0. Then the r + 1 rows of N , and hence
the first r + 1 rows of A, are linearly independent, a contradiction.
The converse is left as an exercise.
Let A be an m × n matrix with real entries. Then A “acts” on the n-dimensional space Rn by left
multiplication : If v ∈ Rn then Av ∈ Rm .
In other words, A defines a function
TA : Rn −→ Rm , TA (v) = Av.
where c ∈ R and v, w ∈ Rn . We say that TA respects the two operations in the vector space Rn . In
this section we study such maps between vector spaces.
Example 2.2.3.
(1) Let c ∈ R, V = W = R2 . Define T : R2 −→ R2 by
x c 0 x cx
T = = .
y 0 c y cy
T stretches each vector v in R2 to cv. Hence
T (v + w) = c(v + w) = cv + cw = T (v) + T (w)
T (dv) = c(dv) = d(cv) = dT (v).
Hence T is a linear transformation.
(2) Rotation
Fix θ and define T : R2 −→ R2 by
x cos θ − sin θ x x cos θ − y sin θ
T = = .
y sin θ cos θ y x sin θ + y cos θ
Then T (e1 ) = (cos θ, sin θ)t and T (e2 ) = (− sin θ, cos θ)t . Thus T rotates the whole space by θ.
(Draw a picture to convince yourself of this. Another way is to identify the vector (x, y)t with the
complex number z = x + iy. Then we can write T (z) = zeiθ ).
(3) Let D be the vector space of differentiable functions f : R −→ R such that f (n) exists for all n.
Define D : D −→ D by
0
D(f ) = f .
0 0
Then D(af + bg) = af + bg = aD(f ) + bD(g). Hence D is a linear transformation.
(4) Define I : D −→ D by Z x
I(f )(x) = f (t) dt
0
By properties of integration, I is a linear transformation.
(7) Let V = Mn×n (F) be the vector space of all n × n matrices over F. Fix A ∈ V . The map
T : V → V given by T (N ) = AN is linear (why?).
2.2. LINEAR TRANSFORMATIONS 39
Then
T (β1 w1 + · · · + βn−l wn−l ) = 0.
α1 v1 + α2 v2 + · · · + αl vl = β1 w1 + β2 w2 + · · · + βn−l wn−l .
In a later exercise in this section we ask you to derive the rank-nullity theorem for matrices
from the result above.
Coordinate vectors
Let V be a finite dimensional vector space (fdvs) over F. By an ordered basis of V we mean a
sequence (v1 , v2 , . . . , vn ) of distinct vectors of V such that the set {v1 , . . . , vn } is a basis. Let u ∈ V .
Write uniquely (why?)
u = a1 v1 + a2 v2 + · · · + an vn , ai ∈ F.
Define the coordinate vector of u with respect to (wrt) the ordered basis B by
a1
a2
[u]B =
. .
.
an
Suppose C = (u1 , . . . , un ) is another ordered basis of V . Given u ∈ V , what is the relation between
[u]B and [u]C .
Define MBC , the transition matrix from C to B, to be the n × n matrix whose jth column
is [uj ]B :
[u]B = M [u]C .
2.2. LINEAR TRANSFORMATIONS 41
Proof. Let
a1
a2
[u]C =
. .
.
an
Then u = a1 u1 + a2 u2 + · · · + an un and we have
[u]B = [a1 u1 + · · · + an un ]B
= a1 [u1 ]B + · · · + an [un ]B
a1
a2
= [[u1 ]B [u2 ]B · · · [un ]B ]
.
.
an
= M [u]C .
Check that
2 1 0 0
3 = 2 1 + 1 + 0 .
4 1 1 1
Lemma 2.2.9. Let V be a fdvs and B and C be two ordered bases. Then
Thus (why?) M N = N M = I.
Example 2.2.10. Let M be the (n + 1) × (n + 1) matrix, with rows and columns indexed by
{0, 1, . . . , n}, and with entry in row i and column j, 0 ≤ i, j ≤ n, given by ji . We show that M is
invertible and find the inverse explicitly.
Consider the vector space Pn (R) of real polynomials of degree ≤ n. Then B = (1, x, x2 , . . . , xn )
and C = (1, x − 1, (x − 1)2 , . . . , (x − 1)n ) are both ordered bases (why?).
We claim that M = MCB . To see this note the following computaion. For 0 ≤ j ≤ n we have
xj = (1 + (x − 1))j
j
X j
= (x − 1)i
i
i=0
n
X j
= (x − 1)i ,
i
i=0
j
where in the last step we have used the fact that i = 0 for i > j.
Thus M −1 = MBC and its entries are given by the following computation. For 0 ≤ j ≤ n we
have
j
j−i j
X
j
(x − 1) = (−1) xi
i
i=0
n
j−i j
X
= (−1) xi
i
i=0
j
Thus, the entry in row i and column j of M −1 is (−1)j−i
i .
Exercise 2.2.11. Let A be a m × n matrix over F and consider the linear map TA : Fn → Fm
given by TA (v) = Av, for v ∈ Fn (we are considering column vectors here).
Consider the ordered basis E = (e1 , . . . , en ) and F = (e1 , . . . , em ) of Fn and Fm respectively.
Show that MFE (TA ) = A.
Let L(V, W ) denote the set of all linear transformations from V to W . Suppose S, T ∈ L(V, W )
and c is a scalar. Define S + T and cS as follows :
(S + T )(x) = S(x) + T (x)
(cS)(x) = cS(x)
for all x ∈ V. It is easy to show that L(V, W ) is a vector space under these operations.
Lemma 2.2.12. Fix ordered bases E and F of V and W respectively. For all S, T ∈ L(V, W ) and
scalar c we have
Proof. Exercise.
Lemma 2.2.13. Suppose V, W are vector spaces of dimensions n, m respectively. Suppose T :
V −→ W is a linear transformation. Suppose E = (e1 , . . . , en ), F = (f1 , . . . , fm ) are ordered bases
of V, W respectively. Then
[T (v)]F = MFE (T )[v]E , v ∈ V.
Proof. Let
a1
a2
[v]E =
. .
.
an
Then v = a1 e1 + a2 e2 + · · · + an en and hence T (v) = a1 T (e1 ) + a2 T (e2 ) + · · · + an T (en ).
We have
[T (v)]F = [a1 T (e1 ) + · · · + an T (en )]F
a1
a2
= [[T (e1 )]F [T (e2 )]F · · · [T (en )]F ]
.
.
an
= MFE (T )[v]E .
44 CHAPTER 2. VECTOR SPACES AND LINEAR TRANSFORMATIONS
An easy induction gives the following generalization the lemma above. Its proof is left as an
exercise.
MEEm+1
1
(Tm ◦ Tm−1 ◦ · · · ◦ T2 ◦ T1 ) = MEEm+1
m
(Tm ) ◦ · · · ◦ MEE32 (T2 ) ◦ MEE21 (T1 ).
T : R2 → R2 , T (e1 ) = e1 , T (e2 ) = e1 + e2 .
Hence
1/2 1/2 1 1 1 1
MBB =
1/2 −1/2 0 1 1 −1
1 3 −1
= .
2 1 1
where I is the r × r identity matrix and 0 stands for a matrix of zeros of appropriate size.
V + W = L(V ∪ W ).
Proof. We shall give a sketch of a proof leaving the reader to fill in the details.
Consider the set V × W = {(v, w) : v ∈ V, w ∈ W }. This set is a vector space with component
wise addition and scalar multiplication. Check that the dimension of this space is dim V + dim W .
Define a linear map T : V × W → V + W by T ((v, w)) = v − w. Check that T is onto and that
the nullspace of T is {(v, v) : v ∈ V ∩ W }. The result now follows from the rank nullity theorem
for linear maps.
Example 2.2.21. Let V, W be finite dimensional vector spaces over F with dimensions n, m re-
spectively. Fix ordered bases E, F for V, W respectively.
Consider the map f : L(V, W ) → Mm×n (F) given by f (T ) = MFE (T ), for T ∈ L(V, W ). Lemma
2.2.12 shows that f is linear and 1-1 and Lemma 2.2.5 shows that f is onto. It follows (why?) that
dim L(V, W ) = mn.
Example 2.2.22. Often we see statements like “If every vector in a vector space is uniquely
determined by t parameters the dimension of V is t”. In this example we show one possible way of
making this precise.
46 CHAPTER 2. VECTOR SPACES AND LINEAR TRANSFORMATIONS
Let V be a vector space over F. A linear functional is a linear map f : V → F. We shall refer
to a linear functional as a “parameter”. Suppose we have t parameters fi : V → F, i = 1, 2, . . . , t.
Suppose every vector in V is uniquely determined by these t parameters, i.e., given arbitrary scalars
a1 , a2 , . . . , at in F, there is a unique vector v ∈ V with fi (v) = ai , i = 1, . . . , t. Then dim V = t.
We show this as follows.
For i = 1, . . . , t, let vi ∈ V be the (unique) vector with fi (vi ) = 1 and fj (vi ) = 0, for j 6= i. We
claim that v1 , . . . , vt is a basis of V .
Let v ∈ V . Put ai = fi (v), i = 1, . . . , t. Consider the vector v − (a1 v1 + · · · + at vt ). Check that
fi (v − (a1 v1 + · · · + at vt )) = 0, for i = 1, . . . , t. Since each of the fi is linear and the parameters
f1 , . . . , ft uniquely determine the vectors in V it follows that the only vector with all parameters 0
is the 0 vector. Thus v − (a1 v1 + · · · + at vt ) = 0 and v1 , . . . , vt span V .
Now suppose a1 v1 + · · · + at vt = 0. Then, for all i, fi (a1 v1 + · · · + at vt ) = ai = 0 and thus linear
independence follows.
Example 2.2.23. Given a n × n matrix, by ri we mean the sum of elements in row i. Similarly,
by cj we mean the sum of elements in column j.
A real magic square of order n is a real n × n matrix satisfying
r1 = r2 = · · · = rn = c1 = c2 = · · · = cn .
Let RM S(n) denote the set of all real magic squares of order n. It is easy to see that RM S(n) is
a subspace of Mn×n (R), the vector space of all n × n real matrices. The dimension of Mn×n (R) is
n2 . What is the dimension of RM S(n)?
We show that dim RM S(n) = (n − 1)2 + 1 using the previous example.
For 1 ≤ i, j ≤ n − 1, define a linear map fij : RM S(n) → R by
Check that the (n − 1)2 + 1 parameters f, fij satisfy the hypothesis of the previous example.
Chapter 3
The concept of a (real) vector space abstracts the operations of adding directed line segments and
multiplying a directed line segment by a real number. In plane geometry we also speak of other
geometric concepts such as length, angle, perpendicularity, projection of a point on a line etc.
Remarkably, we need to put only a single additional algebraic structure, that of an inner product,
on a vector space in order to have these geometric concepts available in the abstract setting.
We shall use the following notation. Recall that F = R or C. Given a ∈ F, a will denote a if
F = R and is the complex conjugate of a if F = C. Given a matrix A over F we denote by A∗ the
conjugate transpose of A, i.e., if A = (aij ) then A∗ = (aji ).
Definition 3.1.1. Let V be a vector space V over F. An inner product on V is a rule which
to any ordered pair of elements (u, v) of V associates a scalar, denoted by hu, vi satisfying the
following axioms: for all u, v, w in V and c any scalar we have
(1) hu, vi = hv, ui (Hermitian property or conjugate symmetry),
(2) hu, v + wi = hu, vi + hu, wi (additivity),
(3) hcu, vi = chu, vi (homogeneity),
(4) hv, vi ≥ 0 with hv, vi = 0 iff v = 0 (positive definite).
Example
Pn 3.1.2. (1) Let v = (x1 , x2 , . . . , xn )t and w = (y1 , y2 , . . . , yn )t ∈ Rn . Define hv, wi =
t n
i=1 xi yi = v w. This is an inner product on the real vector space R . This is often called the
standard inner product.
(2) Let v = (x1 , x2 , . . . , xn )t and w = (y1 , y2 , . . . , yn )t ∈ Cn . Define hv, wi = ni=1 xi yi = v ∗ w.
P
This is an inner product on the complex vector space Cn . This is often called the standard inner
product.
(3) Let V = the vector space of all real valued continuous functions on the unit interval [0, 1].
For f, g ∈ V, put
47
48 CHAPTER 3. INNER PRODUCT SPACES
Z 1
hf, gi = f (t)g(t)dt.
0
Simple properties of the integral show that hf, gi is an inner product.
(4) Let B be a nonsingular n × n complex matrix. Set A = B ∗ B. Given x, y ∈ Cn define
hx, yi = x∗ Ay. Denote the standard inner product on Cn by the dot product (i.e., the inner
product of u and v is u · v). We have
hx, yi = x∗ Ay = x∗ B ∗ By = (Bx)∗ By = Bx · By
Proof. We have
Note that the map pw : V → V given by v 7→ pw (v) is linear. This is the reason for using hw, vi
instead of hv, wi in the definition of pw (v).
The next lemma, whose geometric content is clear, explains the use of the term projection.
Lemma 3.1.7. Let v, w ∈ V with w 6= 0. Then
w (v), i.e., the projection of v on w is the same as the projection of v on the unit
(i) pw (v) = p kwk
vector in the direction of w.
(ii) pw (v) and v − pw (v) are orthogonal.
(iii) kpw (v)k ≤ kvk with equality iff {v, w} are linearly dependent.
3.1. LENGTH, PROJECTION, AND ANGLE 49
= 0.
(iii) We have (in the third step below we have used part (ii) and Pythogoras)
kvk2 = hv, vi
= hpw (v) + v − pw (v), pw (v) + v − pw (v)i
= kpw (v)k2 + kv − pw (v)k2
≥ kpw (v)k2 .
Clearly, there is equality in the last step iff v = pw (v), i.e., iff {v, w} are dependent.
Theorem 3.1.8. Cauchy-Schwarz inequality For v, w ∈ V
|hw, vi| ≤ kvkkwk,
with equality iff {v, w} are linearly dependent.
Definition 3.1.10. Let V be a real inner product space. Given v, w ∈ V with v, w 6= 0, by C-S
inequality
hv, wi
−1 ≤ ≤ 1.
kvkkwk
hv, wi
So, there is a unique 0 ≤ θ ≤ π satisfying cos(θ) = kvkkwk . This θ is the angle between v and w.
The distance between u and v in V is defined as d(u, v) = ku − vk.
Lemma 3.1.11. Let u, v, w ∈ V . Then
(i) d(u, v) ≥ 0 with equality iff u = v.
(ii) d(u, v) = d(v, u).
(iii) d(u, v) ≤ d(u, w) + d(w, u).
Proof. Exercise.
Definition 3.1.12. Let V be an n-dimensional inner product space. A basis {v1 , v2 , . . . , vn } of V
is called orthogonal if its elements are mutually perpendicular, i.e., if hvi , vj i = 0 for i 6= j. If,
in addition, kvi k = 1, for all i, we say that the basis is orthonormal.
Example 3.1.13. In Fn , with the standard inner product, the basis {e1 , . . . , en } is orthonormal.
Lemma 3.1.14. Let U = {u1 , u1 , . . . , un } be a set of nonzero vectors in an inner product space V.
If hui , uj i = 0 for i 6= j, 1 ≤ i, j ≤ n, then U is linearly independent.
c1 u1 + c2 u2 + . . . + cn un = 0.
ci hui , ui i = 0.
Theorem 3.1.16. Let V be a finite dimensional inner product space. Let W ⊆ V be a subspace and
let {w1 , . . . , wm } be an orthogonal basis of W . If W 6= V , then there exist elements {wm+1 , . . . , wn }
of V such that {w1 , . . . , wn } is an orthogonal basis of V .
Taking W = {0}, the zero subspace, we see that V has an orthogonal, and hence orthonormal,
basis.
Proof. The method of proof is as important as the theorem and is called the Gram-Schmidt
orthogonalization process.
Since W 6= V , we can find a vector vm+1 such that {w1 , . . . , wm , vm+1 } is linearly independent.
The idea is to take vm+1 and subtract from it its projections along w1 , . . . , wm }. Define
hw, vi
(Recall that pw (v) = hw, wi w.)
Clearly, wm+1 6= 0 as otherwise {w1 , . . . , wm , vm+1 } would be linearly dependent. We now check
that {w1 , . . . , wm+1 } is orthogonal. For this, it is enough to check that wm+1 is orthogonal to each
of wi , 1 ≤ i ≤ m.
For i = 1, 2, . . . , m we have
m
X
hwi , wm+1 i = hwi , vm+1 − pwj (vm+1 )i
j=1
m
X
= hwi , vm+1 i − hwi , pwj (vm+1 )i
j=1
= hwi , vm+1 i − hwi , pwi (vm+1 )i, (since hwi , wj i = 0 for i 6= j)
= hwi , vm+1 − pwi (vm+1 )i
= 0, (by part (ii) of 3.1.7).
Example 3.1.17.
Find
anorthonormal
basis
for the subspace of R4 (under standard inner product)
1 1 1
1 −2 0
spanned by
0 , 0 , and −1
1 0 2
Denote these vectors by a, b, c respectively. Set
b·a
b0 = b − a
a · a
4
1 −5 .
=
3 0
1
52 CHAPTER 3. INNER PRODUCT SPACES
c·a c · b0
c0 = c − a − 0 0 b0
a·a b ·b
−4
1 −2 .
=
7 −7
6
Now a, b0 , c0 are orthogonal and generate the same subspace as a, b, c. Dividing by the lengths we
a 0 0
get the orthonormal basis kak , kbb0 k , kcc0 k .
Example 3.1.18. Let V = P3 [−1, 1] denote the real vector space of polynomials of degree atmost
3 defined on [−1, 1]. V is an inner product space under the inner product
Z 1
hf, gi = f (t)g(t)dt.
−1
To find an orthonormal basis, we begin with the basis {1, x, x2 , x3 }. Set v1 = 1. Then
1
v2 = x − hx, 1i
k1k2
Z 1
1
= x− tdt = x,
2 −1
1 x
v3 = x2 − hx2 , 1i − hx2 , xi
2 (2/3)
Z 1 Z 1
1 3
= x2 − t2 dt − x t3 dt
2 −1 2 −1
1
= x2 − ,
3
1 x 1 x2 − 13
v4 = x3 − hx3 , 1i − hx3 , xi − hx3 , x2 − i
2 (2/3) 3 (2/5)
3
= x3 − x.
5
You will meet these polynomials later when you will learn about differential equations.
Let V be a finite dimensional inner product space. We have seen how to project a vector onto a
nonzero vector. We now discuss the (orthogonal) projection of a vector onto a subspace.
3.2. PROJECTIONS AND LEAST SQUARES APPROXIMATIONS 53
W ⊥ = {u ∈ V | u ⊥ w for all w ∈ W }.
v = x + y,
where x ∈ W and y ∈ W ⊥ .
hy, vi i = hv − x, vi i
= hv, vi i − hx, vi i
k
X
= hv, vi i − h hvj , vivj , vi i
j=1
k
X
= hv, vi i − hvj , vihvj , vi i
j=1
= hv, vi i − hv, vi i (by orthonormality)
= 0.
Proof. Exercise.
Exercise 3.2.3. Consider Rn with standard inner product. Given a nonzero vector v ∈ Rn , by Hv
we mean the hyperplane (i.e., a subspace of dimension n − 1) orthogonal to v
Hv = {u ∈ Rn : u · v = 0}.
The next result shows that orthogonal projection gives the unique best approximation.
Theorem 3.2.6. Let v ∈ V and let W be a subspace of V . Let w ∈ W . Then the following are
equivalent:
(i) w is a best approximation to v by vectors in W .
(ii) w = pW (v).
(iii) v − w ∈ W ⊥ .
Proof. We have
k v − w k2 = k v − pW (v) + pW (v) − w k2
= k v − pW (v) k2 + k pW (v) − w k2 ,
where the second equality follows from Pythogoras’ theorem on noting that pW (v) − w ∈ W and
v − pW (v) ∈ W ⊥ . It follows that (i) and (ii) are equivalent. To see the equivalence of (ii) and (iii)
write v = w + (v − w) and apply Theorem 3.2.1.
Consider Rn with the standard inner product (we think of Rn as column vectors). Let A be an
n × m (m ≤ n) matrix and let b ∈ Rn . We want to project b onto the column space of A.
The projection of b onto the column space of A will be a vector of the form p = Ax for some
x ∈ Rm . From Theorem 3.2.6, p is the projection iff b − Ax is orthogonal to every column of A. In
other words, x should satisfy the equations
At (b − Ax) = 0, or
At Ax = At b.
The above equations are called normal equations in the Gauss-Markov theory in statistics. Thus,
if x is any solution of the normal equations, then Ax is the required projection of b.
Lemma 3.2.7.
rank (A) = rank (At A).
Proof. We have rank (A) ≥ rank (At A) (why?). Let At Az = 0, for z ∈ Rm . Then At w = 0,
where w = Az, i.e., w is in the column space of A and is orthogonal to every column of A.
This implies (why?) that w = Az = 0. Thus nullity (A) ≥ nullity (At A). It follows that
rank (At A) = rank (A) .
If the columns of A are linearly independent, the (unique) solution to the normal equations is
(At A)−1 At band the projection of b onto the column space of A is A(At A)−1 At b. Note that the
normal equations always have a solution (why?), although the solution will not be unique in case
the columns of A are linearly dependent (since rank (At A) = rank (A) < m).
3.2. PROJECTIONS AND LEAST SQUARES APPROXIMATIONS 55
1 1 1
Example 3.2.8. Let A = 1 0 and b = 0 .
0 1 −5
2 1 1
Then At A = and At b = . The unique solution to the normal equations is
1 2 −4
2
2
x= and b − Ax = −2 (note that this vector is orthogonal to the columns of A).
−3
−2
−1
The projection of b onto the column space of A is p = Ax = 2 .
−3
1 1 1 2 1 3/2 1
Now let B = 1 0 1/2 . We have B t B = 1 2 3/2 and B t b = −4 .
0 1 1/2 3/2 3/2 3/2 −3/2
Note that A and B have the same column spaces (the third column of B is the average of
the first two columns). So the projection of b onto the column space of B will be the same as
before.
However
the normal
equations do not have a unique solution in this case. Check that
2 3
x = −3 , −2 are both solutions of the normal equations B t Bx = B t b.
0 −2
Suppose we have a large number of data points (xi , yi ), i = 1, 2, . . . , n collected from some
experiment. Frequently there is reason to believe that these points should lie on a straight line. So
we want a linear function y(x) = s + tx such that y(xi ) = yi , i = 1, . . . , n. Due to uncertainity in
data and experimental error, in practice the points will deviate somewhat from a straightline and
so it is impossible to find a linear y(x) that passes through all of them. So we seek a line that fits
the data well, in the sense that the errors are made as small as possible. A natural question that
arises now is: how do we define the error?
Consider the following system of linear equations, in the variables s and t, and known coefficients
xi , yi , i = 1, . . . , n:
s + x1 t = y1
s + x2 t = y2
.
.
s + xn t = yn
Note that typically n would be much greater than 2. If we can find s and t to satisfy all these
equations, then we have solved our problem. However, for reasons mentioned above, this is not
always possible. For given values of s and t the error in the ith equation is |yi − s − xi t|. There
are several ways of combining the errors in the individual equations to get a measure of the total
error. The following are three examples:
v
u n n
uX X
t (yi − s − xi t)2 , |yi − s − xi t|, max 1≤i≤n |yi − s − xi t|.
i=1 i=1
56 CHAPTER 3. INNER PRODUCT SPACES
Both analytically and computationally, a nice theory exists for the first of these choices and this is
what we shall study. The problem of finding s, t so as to minimize
v
u n
uX
t (yi − s − xi t)2
i=1
The least squares problem is finding an x such that ||b − Ax|| is minimized, i.e., find an x such
that Ax is the best approximation to b in the column space of A. This is precisely the problem of
finding x such that b − Ax is orthogonal to the column space of A.
A straight line can be considered as a polynomial of degree 1. We can also try to fit an mth
degree polynomial
y(x) = s0 + s1 x + s2 x2 + · · · + sm xm
to the data points (xi , yi ), i = 1, . . . , n, so as to minimize the error (in the least squares sense). In
this case s0 , s1 , . . . , sm are the variables and we have
x1 x21 . . xm
1 1 y1 s0
1
x2 x22 . . xm
2
y2
s1
A= . . . . . , b = . , x =
. .
. . . . . . .
1 2
xn xn . . xn m yn sm
Example 3.2.9. Find s, t such that the straight line y = s + tx best fits the following data in the
least squares sense:
y = 1 at x = −1, y = 1 at x = 1, y = 3 at x = 2.
1 1 −1
We want to project b = 1 onto the column space of A = 1 1 . Now At A =
3 1 2
3 2 5
and At b = . The normal equations are
2 6 6
3 2 s 5
= .
2 6 t 6
9
The solution is s = 9/7, t = 4/7 and the best line is y = 7 + 47 x.