Вы находитесь на странице: 1из 47

Further linear algebra

M. Anthony, M. Harvey
MT2175, 2790175
2012

Undergraduate study in
Economics, Management,
Finance and the Social Sciences

This is an extract from a subject guide for an undergraduate course offered as part of the
University of London International Programmes in Economics, Management, Finance and
the Social Sciences. Materials for these programmes are developed by academics at the
London School of Economics and Political Science (LSE).
For more information, see: www.londoninternational.ac.uk
This guide was prepared for the University of London International Programmes by:
Martin Anthony, Professor of Mathematics, and Michele Harvey, Course Leader, Department of
Mathematics, London School of Economics and Political Science.
This is one of a series of subject guides published by the University. We regret that due to
pressure of work the authors are unable to enter into any correspondence relating to, or
arising from, the guide. If you have any comments on this subject guide, favourable or
unfavourable, please use the form at the back of this guide.

University of London International Programmes


Publications Office
Stewart House
32 Russell Square
London WC1B 5DN
United Kingdom
Website: www.londoninternational.ac.uk

Published by: University of London


© University of London 2012
The University of London asserts copyright over all material in this subject guide except where
otherwise indicated. All rights reserved. No part of this work may be reproduced in any form,
or by any means, without permission in writing from the publisher.
We make every effort to contact copyright holders. If you think we have inadvertently used
your copyright material, please let us know.
Contents

Contents

Preface 1

1 Introduction 3
1.1 This subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Aims of the course . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Topics covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Online study resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Making use of the Online Library . . . . . . . . . . . . . . . . . . 6
1.4 Using the subject guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Examination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 The use of calculators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Diagonalisation, Jordan normal form and differential equations 9


Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Linear systems of differential equations . . . . . . . . . . . . . . . . . . . 10
2.3 Solving by diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 The Jordan normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Solving systems of differential equations using Jordan normal form . . . . 19
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 23
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

i
Contents

Feedback on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Inner products and orthogonality 29


Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 The inner product of two real vectors . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Geometric interpretation in R2 and R3 . . . . . . . . . . . . . . . 30
3.2 Inner products more generally . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1 Norms in a vector space . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 The Cauchy-Schwarz inequality . . . . . . . . . . . . . . . . . . . 34
3.2.3 Generalised geometry . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.4 Orthogonal vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.5 Orthogonality and linear independence . . . . . . . . . . . . . . . 35
3.3 Orthogonal matrices and orthonormal sets . . . . . . . . . . . . . . . . . 36
3.4 Gram-Schmidt orthonormalisation process . . . . . . . . . . . . . . . . . 37
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 39
Feedback on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Orthogonal diagonalisation and its applications 43


Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1 Orthogonal diagonalisation of symmetric matrices . . . . . . . . . . . . . 43
4.2 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.1 Information on quadratic forms . . . . . . . . . . . . . . . . . . . 48
4.2.2 Quadratic forms in R2 – conic sections . . . . . . . . . . . . . . . 51
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 54
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Feedback on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5 Direct sums and projections 59

ii
Contents

Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1 The direct sum of two subspaces . . . . . . . . . . . . . . . . . . . . . . . 59
5.1.1 The sum of two subspaces . . . . . . . . . . . . . . . . . . . . . . 59
5.1.2 Direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.1 The orthogonal complement of a subspace . . . . . . . . . . . . . 61
5.2.2 Orthogonal complements of null spaces and ranges . . . . . . . . . 62
5.3 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3.1 The definition of a projection . . . . . . . . . . . . . . . . . . . . 63
5.3.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3.3 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4 Characterising projections and orthogonal projections . . . . . . . . . . . 65
5.4.1 Projections are idempotents . . . . . . . . . . . . . . . . . . . . . 65
5.5 Orthogonal projection onto the range of a matrix . . . . . . . . . . . . . 66
5.6 Minimising the distance to a subspace . . . . . . . . . . . . . . . . . . . 67
5.7 Fitting functions to data: least squares approximation . . . . . . . . . . . 68
5.7.1 A linear algebra view . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.7.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 72
Feedback on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6 Generalised inverses 73
Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1 Left and right inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Weak generalised inverses . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.3 Strong generalised inverses . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.4 A method for calculating SGIs . . . . . . . . . . . . . . . . . . . . . . . . 81
6.5 Why are SGIs useful? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 90
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

iii
Contents

Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Feedback on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

7 Complex matrices and vector spaces 101


Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Aims of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.1 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.1.1 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.1.2 Algebra of complex numbers . . . . . . . . . . . . . . . . . . . . . 102
7.1.3 Roots of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.1.4 The complex plane . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.1.5 Polar form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.1.6 Exponential form and Euler’s formula . . . . . . . . . . . . . . . . 106
7.2 Complex vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.3 Complex matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.4 Complex inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . 111
7.4.1 The inner product on Cn . . . . . . . . . . . . . . . . . . . . . . . 111
7.4.2 Complex inner product in general . . . . . . . . . . . . . . . . . . 112
7.4.3 Orthogonal vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.5 Hermitian conjugates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7.5.1 The Hermitian conjugate . . . . . . . . . . . . . . . . . . . . . . . 115
7.5.2 Hermitian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.5.3 Unitary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.6 Unitary diagonalisation and normal matrices . . . . . . . . . . . . . . . . 118
7.7 Spectral decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 125
Feedback on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

A Sample examination paper 129

B Commentary on the Sample examination paper 133

iv
Preface

This subject guide is not a course text. It sets out a logical sequence in which to study
the topics in this subject. Where coverage in the main texts is weak, it provides some
additional background material. Further reading is essential. We are grateful to
Dr James Ward and Dr Bob Simon for providing us with their materials on generalised
inverses and Jordan normal form.

1
Preface

2
1

Chapter 1
Introduction

In this very brief introduction, we aim to give you an idea of the nature of this subject
and to advise on how best to approach it. We give general information about the
contents and use of this subject guide, and on recommended reading and how to use the
textbooks.

1.1 This subject

1.1.1 Aims of the course


This subject is intended to:

enable students to acquire further skills in the techniques of linear algebra, as well
as understanding the principles underlying the subject

prepare students for further courses in mathematics and/or related disciplines (e.g.
economics, actuarial science).
As emphasised above, however, we do also want you to understand why certain
methods work: this is one of the ‘skills’ that you should aim to acquire. The examination
will test not simply your ability to perform routine calculations, but will probe your
knowledge and understanding of the fundamental principles underlying the area.

1.1.2 Learning outcomes


We now state the broad learning outcomes of this course, as a whole. More specific
learning outcomes can be found at the end of each chapter.
At the end of this course and having completed the reading and activities you should
have:

knowledge of the concepts, terminology, methods and conventions covered in the


course

the ability to solve unseen mathematical problems involving an understanding of


these concepts

the ability to demonstrate knowledge and understanding of the underlying


principles of the subject.
There are a couple of things we should stress at this point. First, note the intention that
you will be able to solve unseen problems. This means simply that you will be

3
1. Introduction
1
expected to be able to use your knowledge and understanding of the material to solve
problems that are not completely standard. This is not something you should worry
unduly about: all mathematics topics expect this, and you will never be expected to do
anything that cannot be done using the material of the course. Second, we expect you
to be able to ‘demonstrate knowledge and understanding’ and you might well wonder
how you would demonstrate this in the examination. Well, it is precisely by being able
to grapple successfully with unseen, non-routine questions that you will indicate that
you have a proper understanding of the topic.

1.1.3 Topics covered


Descriptions of topics to be covered appear in the relevant chapters. However, it is
useful to give a brief overview at this stage.
The topics we study, broadly, are:

The application of diagonalisation to solving differential equations, the Jordan


normal form and its application to solving differential equations

Inner products and orthogonality

Orthogonal diagonalisation and its applications

Direct sums and projections

Generalised inverses

Complex matrices and complex vector spaces.

1.2 Reading
There are many books that would be useful for at least some parts of this course. We
recommend one in particular, and another for additional, further reading. Neither of
these books covers generalised inverses or Jordan normal form, and those topics have
therefore been discussed in this subject guide in such a way that they are self-contained.

1.2.1 Essential reading

R
You will need a copy of the following textbook.
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. (Cambridge
University Press, 2012) [ISBN 9780521279482].

4
1.3. Online study resources
1
1.2.2 Further reading

R
For additional reading, we suggest the following.
Anton, H. and C. Rorres. Elementary Linear Algebra with Supplemental
Applications (International Student Version). (John Wiley & Sons (Asia) Plc Ltd,
2010) tenth edition. [ISBN 9780470561577 ].1
Please note that as long as you read the Essential reading you are then free to read
around the subject area in any text, paper or online resource. You will need to support
your learning by reading as widely as possible. To help you read extensively, you have
free access to the virtual learning environment (VLE) and University of London Online
Library (see below).
Textbooks will provide more in-depth explanations than you will find in this subject
guide, and they will also provide many more examples to study and exercises to work
through. The books listed are the ones we have referred to in this subject guide.

1.3 Online study resources


In addition to the subject guide and the Essential reading, it is crucial that you take
advantage of the study resources that are available online for this course, including the
VLE and the Online Library.
You can access the VLE, the Online Library and your University of London email
account via the Student Portal at:
http://my.londoninternational.ac.uk
You should receive your login details in your study pack. If you have not, or you have
forgotten your login details, please email uolia.support@london.ac.uk quoting your
student number.

1.3.1 The VLE


The VLE, which complements this subject guide, has been designed to enhance your
learning experience, providing additional support and a sense of community. It forms an
important part of your study experience with the University of London and you should
access it regularly.
The VLE provides a range of resources for EMFSS courses:

Self-testing activities: Doing these allows you to test your own understanding of
subject material.

Electronic study materials: The printed materials that you receive from the
University of London are available to download, including updated reading lists
and references.
1
There are many editions and variants of this book. Any one is equally useful and you will not need
more than one of them. You can find the relevant sections cited in this subject guide in any edition by
using the index.

5
1. Introduction
1
Past examination papers and Examiners’ commentaries: These provide advice on
how each examination question might best be answered.

A student discussion forum: This is an open space for you to discuss interests and
experiences, seek support from your peers, work collaboratively to solve problems
and discuss subject material.

Videos: There are recorded academic introductions to the subject, interviews and
debates and, for some courses, audio-visual tutorials and conclusions.

Recorded lectures: For some courses, where appropriate, the sessions from previous
years’ Study Weekends have been recorded and made available.

Study skills: Expert advice on preparing for examinations and developing your
digital literacy skills.

Feedback forms.
Some of these resources are available for certain courses only, but we are expanding our
provision all the time and you should check the VLE regularly for updates.

1.3.2 Making use of the Online Library


The Online Library contains a huge array of journal articles and other resources to help
you read widely and extensively.
To access the majority of resources via the Online Library you will either need to use
your University of London Student Portal login details, or you will be required to
register and use an Athens login:
http://tinyurl.com/ollathens
The easiest way to locate relevant content and journal articles in the Online Library is
to use the Summon search engine.
If you are having trouble finding an article listed in a reading list, try removing any
punctuation from the title, such as single quotation marks, question marks and colons.
For further advice, please see the online help pages:
www.external.shl.lon.ac.uk/summon/about.php

1.4 Using the subject guide


We have already mentioned that this subject guide is not a textbook. It is important
that you read textbooks in conjunction with the subject guide. In particular, you will
need to work with the Anthony and Harvey book. Not only does it contain further
examples and explanations (including proofs), but it has many exercises for you to
attempt. At the end of each chapter of the subject guide is a section in which we urge
you to test your knowledge and understanding by attempting some exercises. These will
be principally from the main textbook (Anthony and Harvey), but we will occasionally
give some others. Full solutions to many (but not all) of the exercises in Anthony and
Harvey are in that book. The solutions to the other exercises from the book are

6
1.5. Examination
1
provided on the VLE area for this course. Solutions to any additional exercises given in
this subject guide are provided in the subject guide.
The exercises are a very useful resource. You should try them once you think you have
mastered a particular chapter. Really try them: do not just simply read the solutions
provided. Make a serious attempt before consulting the solutions. It is vital that you
develop and enhance your problem-solving skills and the only way to do this is to try
lots of examples.
Near the end of the chapters, we also provide some feedback on the activities contained
in the chapter. Again, you should consult these only after you have attempted the
activities.
Unless it is explicitly stated otherwise, you are expected to work through and
understand the proofs in this subject guide and also those proofs in the Anthony and
Harvey book to which we refer. Understanding proofs is a good way to see how the
theory works to achieve results.
We often use the symbol  to denote the end of a proof where we have finished
explaining why a particular result is true. This is just to make it clear where the proof
ends and the following text begins.

1.5 Examination

Important: the information and advice given here are based on the examination
structure used at the time this subject guide was written. Please note that subject
guides may be used for several years. Because of this we strongly advise you to always
check both the current Regulations for relevant information about the examination, and
the VLE where you should be advised of any forthcoming changes. You should also
carefully check the rubric/instructions on the paper you actually sit and follow those
instructions.
This course is assessed by a two-hour unseen written examination. A Sample
examination paper is given as the final chapter to this subject guide. In addition, a
commentary is provided on the sample paper.
Please do not think that the questions in a real examination will necessarily be very
similar to those in the Sample examination paper. An examination is designed (by
definition) to test you. You will get examination questions unlike questions in this
subject guide. The whole point of examining is to see whether you can apply knowledge
in familiar and unfamiliar settings. The Examiners (nice people though they are) have
an obligation to surprise you! For this reason, it is important that you try as many
examples as possible, from the subject guide and from the textbooks. This is not so
that you can cover any possible type of question the Examiners can think of! It is so
that you get used to confronting unfamiliar questions, grappling with them, and finally
coming up with the solution.
Do not panic if you cannot completely solve an examination question. There are many
marks to be awarded for using the correct approach or method.
Remember, it is important to check the VLE for:

7
1. Introduction
1
up-to-date information on examination and assessment arrangements for this course

where available, past examination papers and Examiners’ commentaries for the
course which give advice on how each question might best be answered.

1.6 The use of calculators


You will not be permitted to use calculators of any type in the examination. This is not
something that you should panic about: the Examiners are interested in assessing that
you understand the key concepts, ideas, methods and techniques, and will set questions
which do not require the use of a calculator.

8
Chapter 2 2
Diagonalisation, Jordan normal form
and differential equations

Reading
The Jordan normal form and its applications to differential equations are not discussed
in these textbooks, so we have tried to write this chapter of the subject guide in such a
way that the discussion of that topic is self-contained.

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 9.
(For revision of diagonalisation, read Chapter 8.)

R
Further reading
Anton, H. and C. Rorres. Elementary Linear Algebra. Section 5.4.

Aims of the chapter


We explore an application of diagonalisation that was not discussed in MT1173
Algebra, namely the solution of systems of differential equations. You will certainly
have studied differential equations if you have taken the course MT1174 Calculus.
But even if you have not met the topic before, we present here the basic facts which will
enable you to handle the material. Not all matrices are diagonalisable, as we know, so
we then present another way in which there exists a relatively ‘simple’ matrix similar to
a given matrix (the Jordan normal form). We then apply this to solve systems of
differential equations.
Before embarking on this chapter, you should ensure that you are familiar with
diagonalisation. If it has been a while since you took MT1173 or a similar course, then
it is advisable to re-read that material, and also read Chapter 8 of Anthony and Harvey.

2.1 Differential equations


A differential equation is, broadly speaking, an equation that involves a function and its
derivatives. We are interested here only in very simple types of differential equation and
it is quite easy to summarise what you need to know so that we do not need a lengthy

9
2. Diagonalisation, Jordan normal form and differential equations

discussion of calculus.
For a function y = y(t), the derivative of y will be denoted by y 0 = y 0 (t) or dy/dt. We
2 will need the following result: if y(t) satisfies the differential equation

y 0 = ay,

then the general solution is

y(t) = βeat for some β ∈ R.

If an initial condition, y(0), is given, then since y(0) = βe0 = β, we have a particular
(unique) solution y(t) = y(0)eat to the differential equation.

Activity 2.1 Check that y = 3e2t is a solution of the differential equation y 0 = 2y


which satisfies the initial condition y(0) = 3.

2.2 Linear systems of differential equations


We will look at systems consisting of these types of differential equations. In MT1173
Algebra, we used a change of variable technique based on diagonalisation to solve
systems of difference equations. We can apply an analogous technique to solve systems
of linear differential equations.
In general, a (square) linear system of differential equations for the functions
y1 (t), y2 (t), . . . , yn (t) is of the form

y10 = a11 y1 + a12 y2 + · · · + a1n yn


y20 = a21 y1 + a22 y2 + · · · + a2n yn
..
.
yn0 = an1 y1 + an2 y2 + · · · + ann yn ,

for constants aij ∈ R. So such a system takes the form

y0 = Ay,

where A = (aij ) is an n × n matrix whose entries are constants (that is, fixed numbers),
and y = (y1 , y2 , . . . , yn )T , y0 = (y10 , y20 , . . . , yn0 )T are vectors of functions.
If A is diagonal, the system y0 = Ay is easy to solve. Suppose

A = diag(λ1 , λ2 , . . . , λn ),

the diagonal matrix whose diagonal entries are (in order) λ1 , . . . , λn . Then the system is
precisely
y10 = λ1 y1 , y20 = λ2 y2 , . . . , yn0 = λn yn ,
and so
y1 = y1 (0)eλ1 t , y2 = y2 (0)eλ2 t , ..., yn = yn (0)eλn t .

Since a diagonal system is so easy to solve, it would be very helpful if we could reduce
our given system to a diagonal one, and this is precisely what the method will do.

10
2.3. Solving by diagonalisation

2.3 Solving by diagonalisation


We will come back to the general discussion shortly, but for now we explore the method 2
with a simple example.

Example 2.1 Suppose the functions y1 (t) and y2 (t) are related as follows:
y10 = 7y1 − 15y2
y20 = 2y1 − 4y2 .
In matrix form this is y0 = Ay where A is the 2 × 2 matrix
 
7 −15
A= .
2 −4
This matrix is diagonalisable.

Activity 2.2 Show that the matrix is diagonalisable. Find an invertible matrix
P and a diagonal matrix D such that P −1 AP = D.

You should find that if  


5 3
P = ,
2 1
then P is invertible and  
−1 1 0
P AP = D = .
0 2
We now use the matrix P to define new functions z1 (t), z2 (t) by setting y = P z (or
equivalently, z = P −1 y); that is,
    
y1 5 3 z1
y= = = P z,
y2 2 1 z2
so that,
y1 = 5z1 + 3z2
y2 = 2z1 + z2 .
By differentiating these equations we can express y10 and y20 in terms of z10 and z20 ,
y10 = 5z10 + 3z20
y20 = 2z10 + z20 ,
so that y0 = (P z)0 = P z0 . Then we have,

P z0 = y0 = Ay = A(P z) = AP z
and hence
z0 = P −1 AP z = Dz.
In other words,
z10
      
1 0 z1 z1
= = .
z20 0 2 z2 2z2
So the system for the functions z1 , z2 is diagonal and hence it is easily solved. Having
found z1 , z2 we can then find y1 and y2 through the explicit connection between the
two sets of functions: namely, y = P z.

11
2. Diagonalisation, Jordan normal form and differential equations

Let us now return to the general technique. Suppose we have the system y0 = Ay, and
that A can indeed be diagonalised. Then, there is an invertible matrix P and a diagonal
matrix D such that P −1 AP = D. Here,
2
P = (v1 . . . vn ), D = diag(λ1 , λ2 , . . . , λn ),

where λi are the eigenvalues and vi are the corresponding eigenvectors. Let z = P −1 y
(or, equivalently, let y = P z). Then

y0 = (P z)0 = P z0 ,

since P has constant entries.

Activity 2.3 Prove that (P z)0 = P z0 .

Therefore
P z0 = Ay = AP z,
and
z0 = P −1 AP z = Dz.
We may now easily solve for z, and hence y.
We illustrate with an example of a 3 × 3 system of differential equations, solved using
this method. Note carefully how we use the initial values y1 (0), y2 (0) and y3 (0).

Example 2.2 We find functions y1 (t), y2 (t), y3 (t) such that y1 (0) = 2, y2 (0) = 1
and y3 (0) = 1 and such that they are related by the linear system of differential
equations,
dy1
= 6y1 + 13y2 − 8y3
dt
dy2
= 2y1 + 5y2 − 2y3
dt
dy3
= 7y1 + 17y2 − 9y3 .
dt
We can express this system in matrix form as y0 = Ay where
 
6 13 −8
A = 2 5 −2  .
7 17 −9

The matrix A is diagonalisable.

Activity 2.4 Prove that A is diagonalisable.

We have P −1 AP = D where
   
1 −1 1 −2 0 0
P = 0 1 1, D =  0 1 0.
1 1 2 0 0 3

12
2.3. Solving by diagonalisation

We set y = P z, and substitute into the equation, y0 = Ay to obtain (P z)0 = A(P z).
That is, P z0 = AP z and so z0 = P −1 AP z = Dz. In other words, if
 
z1 2
z =  z2  ,
z3
then
z10
    
−2 0 0 z1
 z20  =  0 1 0   z2  .
z30 0 0 3 z3
So,
z10 = −2z1 , z20 = z2 , z30 = 3z3 .
Therefore,
z1 = z1 (0)e−2t , z2 = z2 (0)et , z3 = z3 (0)e3t .
Then, using y = P z, we have
z1 (0)e−2t
    
y1 1 −1 1
 y2  =  0 1 1   z2 (0)et  .
y3 1 1 2 z3 (0)e3t
It remains to find z1 (0), z2 (0), z3 (0). To do so, we use the given initial values
y1 (0) = 2, y2 (0) = 1, y3 (0) = 1. Since y = P z, we can see that y(0) = P z(0). We
could use row operations to solve this system to determine z(0). Alternatively, we
could use z(0) = P −1 y(0). Let us take the second approach. You should calculate
the inverse of P and find that
 
−1 −3 2
P −1 =  −1 −1 1  .
1 2 −1

Activity 2.5 Do this!

Therefore,
      
z1 (0) −1 −3 2 2 −3
z(0) =  z2 (0)  = P −1 y(0) =  −1 −1 1   1  =  −2  .
z3 (0) 1 2 −1 1 3
Therefore, finally,
z1 (0)e−2t
    
y1 1 −1 1
 y2  =  0 1 1   z2 (0)et 
y3 1 1 2 z3 (0)e3t
−3e−2t
  
1 −1 1
=  0 1 1   −2et 
1 1 2 3e3t
−3e−2t + 2et + 3e3t
 

=  −2et + 3e3t .
−2t t 3t
−3e − 2e + 6e

13
2. Diagonalisation, Jordan normal form and differential equations

The functions are:


y1 (t) = −3e−2t + 2et + 3e3t
2 y2 (t) = −2et + 3e3t
y3 (t) = −3e−2t − 2et + 6e3t .

How can we check our solution? First of all, it should satisfy the initial conditions. If we
substitute t = 0 into the equations we should obtain the given initial conditions.

Activity 2.6 Check this!

The real check is to look at the derivatives at t = 0. We can take the original system,
y0 = Ay and use it to find y0 (0),
 0         
y1 (0) 6 13 −8 y1 (0) 6 13 −8 2 17
 y20 (0)  =  2 5 −2   y2 (0)  =  2 5 −2   1  =  7  .
y30 (0) 7 17 −9 y3 (0) 7 17 −9 1 22
And we can differentiate our solution to find y0 , and then substitute t = 0.
 0   −2t 
y1 (t) 6e + 2et + 9e3t
 y20 (t)  =  −2et + 9e3t .
0 −2t t 3t
y3 (t) 6e − 2e + 18e

Activity 2.7 Substitute t = 0 to obtain y0 (0) and check that it gives the same
answer.

Often it is desirable to find a general solution to a system of differential equations,


where no initial conditions are given. A general solution will have n arbitrary constants,
essentially one for each function, so that given different initial conditions later, different
particular solutions can be easily obtained. We will show how this works using the
system in Example 2.2.

Example 2.3 Let y1 (t), y2 (t), y3 (t) be functions related by the system of
differential equations
dy1
= 6y1 + 13y2 − 8y3
dt
dy2
= 2y1 + 5y2 − 2y3
dt
dy3
= 7y1 + 17y2 − 9y3 .
dt
Let the matrices A, P and D be exactly as before in Example 2.2, so that we still
have P −1 AP = D, and setting y = P z, to define new functions z1 (t), z2 (t), z3 (t), we
have
y0 = Ay ⇐⇒ P z0 = AP z ⇐⇒ z0 = P −1 AP z = Dz.
So we need to solve the equations

z10 = −2z1 , z20 = z2 , z30 = 3z3 .

14
2.4. The Jordan normal form

in the absence of specific initial conditions. The general solutions are

z1 = αe−2t , z2 = βet , z3 = γe3t ,


2
for arbitrary constants α, β, γ ∈ R.
Therefore the general solution of the original system is
     −2t 
y1 1 −1 1 αe
y =  y2  = P z =  0 1 1   βet  ;
y3 1 1 2 γe3t

that is,
y1 (t) = αe−2t − βet + γe3t
y2 (t) = βet + γe3t for α, β, γ ∈ R.
−2t t 3t
y3 (t) = αe + βe + 2γe

Using the general solution, you can find particular solutions for any given initial
conditions. For example, using the same initial conditions y1 (0) = 2, y2 (0) = 1 and
y3 (0) = 1 as in Example 2.2, we can substitute t = 0 into the general solution to
obtain,
y1 (0) = 2 = α − β + γ
y2 (0) = 1 = β + γ
y3 (0) = 1 = α + β + 2γ
and solve this linear system of equations for α, β, γ. Of course this is precisely the
same system y(0) = P z(0) as before, with solution P −1 y(0),
      
α −1 −3 2 2 −3
 β  =  −1 −1 1   1  =  −2  .
γ 1 2 −1 1 3

Activity 2.8 Find the particular solution of the system of differential equations in
Example 2.3 which satisfies the initial conditions y1 (0) = 1, y2 (0) = 1 and y3 (0) = 0.

2.4 The Jordan normal form


Of course, as we know, not every square matrix is diagonalisable. But it turns out to be
the case that for any square matrix, there will be a relatively ‘simple’ matrix similar to
it, known as a Jordan matrix. This is made precise in the following theorem, the proof
of which is outside the scope of this course. (Actually, this theorem is a special case of a
more general result for complex matrices. The theorem stated below applies to matrices
which have real eigenvalues only. So, as stated, it does not show that every square
matrix will be similar to a Jordan matrix. But be assured that there is a more general
version of the theorem that applies in all cases.)
Theorem 2.1 If A is an n × n matrix with characteristic polynomial

15
2. Diagonalisation, Jordan normal form and differential equations

(x − λ1 )m1 . . . (x − λk )mk , then there will exist an invertible n × n matrix P such that

2 A1 0 ... 0
 
0 A2 ... 0 
P −1 AP = 

 ... .. . . . ..  ,
. . 
0 0 . . . Ak

where each Ai (an mi × mi matrix) looks like

λi ∗ 0 ... ... 0 0
 
0 λi ∗ 0 ... 0 0
0 0 λi ∗ ... 0 0
 
 . .. .. ... . . . .. .. 
Ai =  .
 .. ..
.
..
. . ,
. 
 .. . . ... λi ∗ 0
 
0 0 0 ... . . . λi ∗
0 0 0 ... ... 0 λi

where the ∗ are either 0 or 1 and all other entries are zeros.

Note that the special case of this in which all ∗ entries are 0 is nothing more than a
diagonal matrix. But the point of the theorem is that even if such a matrix A cannot be
diagonalised, it will be similar to a Jordan matrix, which is an ‘almost-diagonal’ matrix.

Example 2.4 Let  


3 −1 −4 7
 1 1 −3 5 
A=
 0 1 −1 −1  .

0 0 0 2
If  
3 −4 0 8
 2 −3 0 7 
P =
 1 −2 1 2  ,

0 0 0 1
then P is invertible, with  
3 −4 0 4
 2 −3 0 5 
P −1 = 
 1 −2 1 4 
0 0 0 1
and (check this!)  
1 1 0 0
0 1 1 0
P −1 AP = J = 

,
0 0 1 0
0 0 0 2
which is a Jordan matrix.

16
2.4. The Jordan normal form

There is another way to describe Jordan matrices, which is sometimes more useful. A
k × k matrix B is a Jordan block if, for some λ, if k = 1, then B = (λ) and, if k ≥ 2,

λ 1 0 0

... 0 0
 2
0 λ 1 0 ... 0 0
 
0 0 λ 1 ... 0 0
. . . . .. . .
B= . . . .. . .. .. 
 .. .. .. . .. .. .. 
,
 .. .. .. .. . . . 

0 0 0 ... ... λ 1
0 0 0 ... ... 0 λ

Then, a Jordan matrix is one of the form

B1 0 ··· 0
 
 0 B2 ··· 0 
J =
 ... .. . . . ..  ,
. . 
0 0 · · · Br

where each Bi is a Jordan block, i = 1, . . . , r.

Example 2.5 Consider the Jordan matrix of the previous example,


 
1 1 0 0
0 1 1 0
J =0 0
.
1 0
0 0 0 2

Then,  
B1 0
J= ,
0 B2
where B1 , B2 are the following Jordan blocks:
 
1 1 0
B1 =  0 1 1  , B2 = (2).
0 0 1

If J1 , J2 are two Jordan matrices similar to a given matrix A, then it turns out that the
only way in which J1 , J2 can differ is in the order of the Jordan blocks. (This is
analogous to the fact that, for a diagonalisable matrix, the only diagonal matrices
similar to it are those whose diagonal entries are the eigenvalues, in some order, and
this order can be changed.) For this reason, we can speak of the Jordan normal form of
A (or the Jordan canonical form of A), by which we mean a Jordan matrix similar to A.
(Although there are a number of such Jordan matrices, they differ only in the ordering
of the Jordan blocks, so this is why we use the word ‘the’ in ‘the Jordan normal form’:
we could instead say ‘a’ but the point is that there is essentially only one such matrix.)

17
2. Diagonalisation, Jordan normal form and differential equations

Example 2.6 Let A be the matrix

2
 
2 0 1 −3
0 2 10 4 
A=
0
.
0 2 0 
0 0 0 3

Then it turns out that A is not diagonalisable. The Jordan normal form of A is
 
2 0 0 0
0 2 1 0
J = 0 0 2
.
0
0 0 0 3

Note that, given what we said just a moment ago, we could equally well say that the
Jordan normal form is  
3 0 0 0
0 2 1 0
J = 0 0 2 0.

0 0 0 2
It is the case that, if  
0 1 0 −3
 1 10 0 4 
P =
0 0 1 0 ,

0 0 0 1
then P is invertible and P −1 AP = J. (Although P is a 4 × 4 matrix, it is easy to see
this because the last two rows of P are so simple.) Just to make sure you understand
the Jordan block description of a Jordan matrix, note that we can write
 
2 0 0 0  
0 2 1 0 B 1 0 0
J =0 0 2 0 =
  0 B2 0  ,
0 0 B3
0 0 0 3

where B1 , B2 , B3 are the Jordan blocks


 
2 1
B1 = (2), B2 = . B3 = (3).
0 2

We will be interested in using Jordan normal forms to solve systems of linear differential
equations, but we will not, in this course, be concerned with how to determine the
Jordan normal form. But let us make a few observations. Look at Example 2.6. What
this tells us is that if
       
0 1 0 −3
1  10  0  4 
v1 =  0  , v 2 =  0  , v3 =  1  , v4 =  0  ,
      

0 0 0 1

18
2.5. Solving systems of differential equations using Jordan normal form

then these four vectors are linearly independent, and:

Av1 = 2v1 , Av2 = 2v2 , Av3 = v2 + 2v3 , Av4 = 3v4 .


2
This is simply because J represents the transformation corresponding to A with respect
to the basis {v1 , v2 , v3 , v4 }. To explain this further, recall from MT1173 Algebra
(section 9.7) the following facts: (i) the matrix M representing the linear transformation
T (x) = Ax with respect to a basis B = {v1 , . . . , vn } has as its i-th column the
coordinate vector of T (vi ) with respect to the basis B; and (ii) if P is the matrix with
i-th column equal to vi , then P −1 AP is the matrix M . So, for example, the fact that
Av3 = v2 + 2v3 means that the third column of the matrix J = P −1 AP is the vector
(0, 1, 2, 0)T because this is the coordinate vector of Av3 with respect to the basis
B = {v1 , v2 , v3 , v4 }.
Note that v1 , v2 , v4 are eigenvectors. Vector v3 is not, however, an eigenvector: it is
what is called a generalised eigenvector corresponding to eigenvalue 2. It does not
satisfy (A − 2I)v = 0, as an eigenvector would, but it does satisfy (A − 2I)2 v = 0.

Activity 2.9 Check that (A − 2I)2 v3 = 0 and that (A − 2I)v3 6= 0.

2.5 Solving systems of differential equations using


Jordan normal form
Our interest in the Jordan normal form is to use it to solve linear systems of differential
equations in cases where the underlying matrix is not diagonalisable. We illustrate this
with an example.

Example 2.7 Suppose we want to find the general solution of the following system
of differential equations:
dy1
= y1 + y3
dt
dy2
= y1 + y2 − 3y3
dt
dy3
= y2 + 4y3 .
dt
We can express this system in matrix form as y0 = Ay where
 
1 0 1
A =  1 1 −3  .
0 1 4

Our approach outlined earlier would attempt to diagonalise A. But A turns out not
to be diagonalisable.

Activity 2.10 Prove that A is not diagonalisable.

19
2. Diagonalisation, Jordan normal form and differential equations

The next best thing is to work with the Jordan normal form. Suppose this is given:
explicitly, suppose we know that P −1 AP = J where
2 
1 1 0
 
2 1 0

P = −2 −3 0 ,
  J= 0 2 1.
1 2 1 0 0 2

Activity 2.11 Let v1 , v2 , v3 denote the three columns of P , in order. Verify that
v1 is an eigenvector for A corresponding to eigenvalue 2, and that
Av2 = v1 + 2v2 and Av3 = v2 + 2v3 . (Given that the three vectors are linearly
independent, J then represents A with respect to the basis {v1 , v2 , v3 }.)

As in the situation in which the coefficient matrix is diagonalisable, we set y = P z,


and substitute into the equation y0 = Ay to obtain (P z)0 = A(P z). That is,
P z0 = AP z and so z0 = P −1 AP z = Dz. In other words, if
 
z1
z =  z2  ,
z3
then
z10
    
2 1 0 z1
 z20  =  0 2 1   z2  .
z30 0 0 2 z3
So,
z10 = 2z1 + z2
z20 = 2z2 + z3
z30 = 2z3 .
This system is not uncoupled as it would be had we been able to diagonalise A. But
it is easier to solve than the system we started with. We will come shortly to explain
how to solve it, but for the moment let us just accept that the solutions for the zi
are as follows:
t2
z3 = c3 e2t , z2 = c2 e2t + c3 te2t , z1 = c1 e2t + c2 te2t + c3 e2t .
2
Then, the general solution to the original system is given by
   
y1 z1 + z2
 y2  = P z =  −2z1 − 3z2  ,
y3 z1 + 2z2 + z3
so
c3 2 2t
y1 = (c1 + c2 )e2t + (c2 + c3 )te2t + te
2
y2 = (−2c1 − 3c2 )e2t + (−2c2 − 3c3 )te2t − c3 t2 e2t
c3
y3 = (c1 + 2c2 + c3 )e2t + (c2 + 2c3 )te2t + t2 e2t .
2
This is the general solution. If we had initial values, we could determine the
particular solution satisfying those initial values, just as we did earlier.

20
2.5. Solving systems of differential equations using Jordan normal form

Let us look at another example and see how we might generalise from it.

Example 2.8 We saw earlier, in Example 2.6, that if 2


 
2 0 1 −3
0 2 10 4 
A= 0 0
,
2 0 
0 0 0 3

and if  
0 1 0 −3
 1 10 0 4 
P =
0 0 1 0 ,

0 0 0 1
then P −1 AP is the Jordan matrix
 
2 0 0 0  
B 0 0
0 2 1 0  1

J =0 0 = 0 B2 0 .
2 0
0 0 B3
0 0 0 3

To solve y0 = Ay using the reduction to Jordan normal form, we would set y = P z


and solve z0 = Jz. Let us look at this system for z. Explicitly, it will be:

z10 = 2z1
z20 = 2z2 + z3
z30 = 2z3
z40 = 3z4 .

Clearly the first, third and final equations can be solved directly just as in the
diagonalisable case:
z1 = c1 e2t , z3 = c3 e2t , z4 = c4 e3t .
It turns out that the solution for z2 is z2 = c2 e2t + c3 te2t . We can then determine y
by using y = P z.

Now, looking at the explicit equations for the zi in this example, we can see that really
there are three separate ‘sub-systems’ to solve: namely z10 = 2z1 , which is directly
solvable, then

z20 = 2z2 + z3
z30 = 2z3 ,

which involves only the functions z2 and z3 , and finally z40 = 3z4 , which is directly
solvable. You can probably see in general that if we are attempting to solve
B1 0 ··· 0
 
0 B2 ··· 0 
z0 = Jz = 

 ... .. .. .  z,
. . .. 
0 0 · · · Br

21
2. Diagonalisation, Jordan normal form and differential equations

where each Bi is a Jordan block, then we can separate this into r sub-systems for the zi
which can be solved separately (and having solved these, we then determine the
solution y = P z). Let us focus on any particular sub-system. It will be of the form
2 w0 = Bw where w consists of some of the zi and where
λ 1 0 ... ... 0 0
 
0 λ 1 0 ... 0 0
 
0 0 λ 1 ... 0 0
. . . .
. . . . . . . . ... ...  ,

B=  .. .. .. 
 .. .. .. . . . λ 1 0 
 
0 0 0 ... ... λ 1
0 0 0 ... ... 0 λ

is a Jordan block corresponding to some λ. (We will assume B is a k × k matrix where


k ≥ 2. We don’t need to consider the case k = 1, for that is just the directly solvable
case we encountered earlier in the situation where the coefficient matrix is
diagonalisable.)
We have the following useful result which we can invoke when confronted with a system
of the form w0 = Bw where B is a Jordan block.
Theorem 2.2 The general solution to the system of differential equations
λ 1 0 ... ... 0 0
 
 0  0 λ 1 0 ... 0 0 
w1   w1
0 0 0 λ 1 ... 0 0
 w2   . . . .
 .  =  .. .. .. . . . . . ... ...   w.2 
 
 ..   . . .   .. 
 .. .. .. . . . λ 1 0 
wk0  0 0 0 . . . . . . λ 1  wk
 

0 0 0 ... ... 0 λ
is given by

wk = ck eλt
wk−1 = ck−1 eλt + ck teλt
t2 λt
wk−2 = ck−2 eλt + ck−1 teλt + ck e ,
2
and, in general, for j = 1, . . . , k,

λt λtt2 λt tk−j λt
wj = cj e + cj+1 te + cj+2 e + · · · + ck e .
2 (k − j)!

Here, as always, r! denotes the product r(r − 1) · · · 1.

Proof
We have the equation wk0 = λwk , so certainly wk = ck eλt for some ck , and that concurs
with the statement of the theorem. Consider a general value of j between 1 and k − 1
and suppose that we already know that wj+1 is as stated in the theorem. Now, we have
wj0 = λwj + wj+1 . Multiplying both sides of this equation by e−λt and rearranging gives

e−λt wj0 − λe−λt wj = e−λt wj+1 .

22
2.5. Learning outcomes

Now, the left-hand side is easily seen to be (e−λt wi )0 , so we must have that e−λt wi is the
(indefinite) integral of e−λt wj+1 . Given that

e −λt
wj+1
t2
= cj+1 + cj+2 t + cj+3 + · · · + ck
tk−j−1
,
2
2 (k − j − 1)!
it follows that
Z
−λt
e wj = wj+1 dt

t2 tk−j
= cj + cj+1 t + cj+2 + · · · + ck ,
2 (k − j)!
where cj is a constant of integration. The expression for wj then follows by multiplying
both sides by eλt . This ‘inductive’ argument proves the theorem.

For instance, here is the special case that applies for k = 2:


w1 = c1 eλt + c2 teλt
w2 = c2 eλt .

And, for k = 3, we have:


t2
w1 = c1 eλt + c2 teλt + c3 eλt
2
λt λt
w2 = c2 e + c3 te
w3 = c3 eλt .

Activity 2.12 Now go back to the examples given earlier where we merely
presented the solutions of the differential equations, and convince yourself that the
theorem just proved shows why these are indeed the solutions.

Learning outcomes
At the end of this chapter and the relevant reading, you should be able to:

solve systems of differential equations in which the underlying matrix is


diagonalisable, by using the change of variable method
know what is meant by a Jordan matrix, and the Jordan normal form of a matrix
use the Jordan normal form to solve systems of differential equations.

Test your knowledge and understanding

Exercises
You should now attempt the following Exercises from Anthony and Harvey: Exercises
9.7, 9.8, 9.9, 9.17, 9.18, 9.19 and 9.20.

23
2. Diagonalisation, Jordan normal form and differential equations

Now, attempt the following exercises on the Jordan normal form and its applications to
solving systems of differential equations.
2 Exercise 2.1
Let  
0 0 1
A =  1 0 −3  .
0 1 3
Find the eigenvalues of A and show that A cannot be diagonalised. Let
     
1 1 0
v1 = −2 , v2 = −3 , v3 = 0  .
    
1 2 1

Verify that v1 is an eigenvector of A. Verify also that

Av2 = v1 + v2 , Av3 = v2 + v3 .

Hence write down a matrix P and a Jordan matrix J such that P −1 AP = J.

Exercise 2.2
Find the functions y1 (t), y2 (t), y3 (t) which are such that y1 (0) = 1, y2 (0) = 1 and
y3 (0) = 1 and which satisfy

y10 = −y1 + y2
y20 = −y2 + y3
y30 = −y3 .

Exercise 2.3
Write down the general solution to the following system of differential equations:
 0   
y1 1 0 0 0 0 0 0 0 y1
 y20   0 2 1 0 0 0 0 0   y2  
 0  
 y3   0 0 2 1 0 0 0 0   y3 
 
 0 
 y4   0 0 0 2 1 0 0 0   y4  .
 
 0=
y  0 0 0 0 2 1 0 0   y5 
 
 50  
y  0 0 0 0 0 3 1 0  y6 
 
 6 
 y70   0 0 0 0 0 0 3 1   y7 
y80 0 0 0 0 0 0 0 3 y8

Exercise 2.4
Let  
1 −1 0
A= 1 3 0 .
−2 1 −1
Verify that (0, 0, 1)T is an eigenvector of A. Verify also that u = (−1, 1, 1)T is an
eigenvector of A and that if v = (0, 1, 0), then Av = 2v + u. Hence find an invertible

24
2.5. Test your knowledge and understanding

matrix P and a Jordan matrix J such that P −1 AP = J. Hence find the general solution
to the following system of differential equations:
 0 
y1 1 −1 0
 
y1 2
 y20  =  1 3 0   y2  .
y30 −2 1 −1 y3

Comments on exercises
See Anthony and Harvey for solutions to Exercises 9.7, 9.8, 9.9. Solutions to Exercises
9.17, 9.18, 9.19 and 9.20 may be found on the VLE pages for this course. Here are the
solutions to the remaining exercises.
Solution to exercise 2.1
There is one eigenvalue, λ = 1, but you will find that there are not three linearly
independent eigenvectors. (We omit the details here.) The required verifications are
straightforward. The three vectors are easily seen to form a linearly independent set,
and hence a basis of R3 . With respect to that basis, the transformation represented by
A with respect to the standard basis will, when represented with respect to the basis
{v1 , v2 , v3 }, be the matrix  
1 1 0
J = 0 1 1
0 0 1
by the theory you will already have studied in MT1173 Algebra. That means that if
we take the matrix P with these vectors as its columns, then P −1 AP will equal J. This
P is  
1 1 0
P =  −2 −3 0  .
1 2 1

Solution to exercise 2.2


The system is y0 = Jy where J is the Jordan matrix
 
−1 1 0
J =  0 −1 1  .
0 0 −1

By Theorem 2.2, for some constants c1 , c2 , c3 ,

t2
y1 = c1 e−t + c2 te−t + c3 e−t , y2 = c2 e−t + c3 te−t , y3 = c3 e−t .
2
The initial values then easily establish that c1 = c2 = c3 = 1. So the answer is

t2 −t
y1 = e−t + te−t + e , y2 = e−t + te−t , y3 = e−t .
2

25
2. Diagonalisation, Jordan normal form and differential equations

Solution to exercise 2.3


The matrix is a Jordan matrix and there are really three Jordan block sub-systems to
2 solve. The first is simply y10 = y1 , the second is
 0   
y2 2 1 0 0 y2
 y30   0 2 1 0   y3 
 0=
 y4   0 0 2 1   y4  ,
 

y50 0 0 0 2 y5

and the third is


y60
    
3 1 0 y6
 y70  =  0 3 1   y7  .
y80 0 0 3 y8
Daunting as this may look, it is easy if we use Theorem 2.2. We have:

y1 = c1 et
t2 t3
y2 = c2 e2t + c3 te2t + c4 e2t + c5 e2t
2 6
2
t
y3 = c3 e2t + c4 te2t + c5 e2t
2
y4 = c4 e2t + c5 te2t
y5 = c5 e2t
t2
y6 = c6 e3t + c7 te3t + c8 e3t
2
y7 = c7 e3t + c8 te3t
y8 = c8 e3t .

Solution to exercise 2.4


The verifications are straightforward and are omitted here. The first vector is an
eigenvector for eigenvalue −1 and u is an eigenvector for eigenvalue 2. Let
w = (0, 0, 1)T . Then {u, v, w} is a basis of R3 since the vectors are easily seen to be
linearly independent. The representation of A with respect to this basis must be
 
2 1 0
J = 0 2 0 .
0 0 −1

So, P −1 AP = J where  
−1 0 0
P =  1 1 0.
1 0 1
To solve the system of differential equations, we make the substitution y = P z. Then
we will have z0 = Jz. That is,
 0   
z1 2 1 0 z1
 z20  =  0 2 0   z2  .
z30 0 0 −1 z3

26
2.5. Feedback on selected activities

Solving in the usual way (applying Theorem 2.2) we have


z1 = c1 e2t + c2 te2t , z2 = c2 e2t , z3 = c3 e−t .
2
Then the general solution for the yi is
   2t   
−1 0 0 c1 e + c2 te2t −c1 e2t − c2 te2t
y = Pz =  1 1 0 c2 e2t  =  c2 te2t + (c1 + c2 )e2t  .
−t
1 0 1 c3 e c1 e2t + c2 te2t + c3 e−t

Feedback on selected activities


Feedback to activity 2.1
It is clear that y(0) = 3e0 = 3. Furthermore, y 0 = 6e2t = 2(3e2t ) = 2y.

Feedback to activity 2.2


Just to review diagonalisation, we will present the solution in some detail. We have
     
7 −15 1 0 7−λ −15
A − λI = −λ =
2 −4 0 1 2 −4 − λ
and the characteristic polynomial is

7 − λ −15
|A − λI| =
2 −4 − λ
= (7 − λ)(−4 − λ) + 30
= λ2 − 3λ − 28 + 30
= λ2 − 3λ + 2.
So the eigenvalues are the solutions of λ2 − 3λ + 2 = 0. To solve this for λ, one could use
either the formula for the solutions to a quadratic equation, or simply observe that the
characteristic polynomial factorises. We have (λ − 1)(λ − 2) = 0 with solutions λ = 1
and λ = 2. Hence the eigenvalues of A are 1 and 2, and these are the only eigenvalues of
A. To find the eigenvectors for eigenvalue 1 we solve the system (A − I)x = 0. We do
this by putting the coefficient matrix A − I into reduced echelon form.
1 − 25
   
6 −15
(A − I) = −→ · · · −→ .
2 −5 0 0
This system has solutions
 
5
v=t , for any t ∈ R.
2
There are infinitely many eigenvectors for 1: for each t 6= 0, v is an eigenvector of A
corresponding to λ = 1. But be careful not to think that you can choose t = 0; for then
v becomes the zero vector, and this is never an eigenvector, simply by definition. To
find the eigenvectors for 2, we solve (A − 2I)x = 0 by reducing the coefficient matrix,
   
5 −15 1 −3
(A − 2I) = −→ · · · −→ .
2 −6 0 0

27
2. Diagonalisation, Jordan normal form and differential equations

Setting the non-leading variable equal to t, we obtain the solutions


 
3
2 v=t
1
, t ∈ R.

Any non-zero scalar multiple of the vector (3, 1)T is an eigenvector of A for eigenvalue 2.
It follows, then, that if  
5 3
P = ,
2 1
then P is invertible and  
−1 1 0
P AP = D = .
0 2

Feedback to activity 2.3


Each row of the n × 1 matrix P z is a linear combination of the functions
z1 (t), z2 (t), . . . , zn (t). For example, row i of P z is

pi1 z1 (t) + pi2 z2 (t) + · · · + pin zn (t).

The rows of the matrix (P z)0 are the derivatives of these linear combinations of
functions, so the i-th row is

(pi1 z1 (t) + pi2 z2 (t) + · · · + pin zn (t))0 = pi1 z10 (t) + pi2 z20 (t) + · · · + pin zn0 (t),

using the properties of differentiation, since the entries pij of P are constants. But

pi1 z10 (t) + pi2 z20 (t) + · · · + pin zn0 (t)

is just the i-th row of the n × 1 matrix P z0 , so these matrices are equal.

Feedback to activity 2.8


For the initial conditions y1 (0) = 1, y2 (0) = 1, y3 (0) = 0 the constants α, β, γ are
      
α −1 −3 2 1 −4
 β  =  −1 −1 1   1  =  −2  ,
γ 1 2 −1 0 3

so the solution is
y1 (t) = −4e−2t + 2et + 3e3t
y2 (t) = −2et + 3e3t
y3 (t) = −4e−2t − 2et + 6e3t .

28
Chapter 3
Inner products and orthogonality
3
Reading

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 10.

R
Further reading
Anton, H. and C. Rorres. Elementary Linear Algebra. Sections 6.1, 6.2 and 6.3.

Aims of the chapter


In this chapter we look at inner product spaces. We develop the concept of
orthogonality of vectors, using our geometric intuition from R2 to abstract these
concepts to a general vector space.

3.1 The inner product of two real vectors


For x, y ∈ Rn , the (standard) inner product (sometimes called the dot product or scalar
product) is defined to be the number hx, yi given by
hx, yi = xT y = x1 y1 + x2 y2 + · · · + xn yn .
Here, x = (x1 , x2 , . . . , xn )T and y = (y1 , y2 , . . . , yn )T . This is often referred to as the
standard or Euclidean inner product.

Example 3.1 If x = (1, 2, 3)T and y = (2, −1, 1) then

hx, yi = 1(2) + 2(−1) + 3(1) = 3.

It is important to realise that the inner product is just a number, not another vector or
a matrix.
The inner product on Rn satisfies certain basic properties as shown in the next theorem.
In R2 and in R3 the inner product is closely linked with the geometric concepts of length
and angle. This provides the background for generalising these concepts to any vector
space V , as we shall see in the next section.

29
3. Inner products and orthogonality

Theorem 3.1 The inner product


hx, yi = x1 y1 + x2 y2 + · · · + xn yn , x, y ∈ Rn
satisfies the following properties for all x, y, z ∈ Rn and for all α ∈ R:

(i) hx, yi = hy, xi


3 (ii) αhx, yi = hαx, yi = hx, αyi
(iii) hx + y, zi = hx, zi + hy, zi
(iv) hx, xi ≥ 0, and hx, xi = 0 if and only if x = 0.
Proof
We have
hx, yi = x1 y1 + x2 y2 + · · · + xn yn = y1 x1 + y2 x2 + · · · + yn xn = hy, xi
which proves (i). We leave the proofs of (ii) and (iii) as an exercise. For (iv), note that
hx, xi = x21 + x22 + · · · + x2n
is a sum of squares, so hx, xi ≥ 0, and hx, xi = 0 if and only if each term x2i = 0 is equal
to zero, that is, if and only if x is the zero vector, x = 0. 

Activity 3.1 Prove properties (ii) and (iii). Show, also, that these two properties
are equivalent to the single property

hαx + βy, zi = αhx, zi + βhy, zi.

3.1.1 Geometric interpretation in R2 and R3


We begin by looking at vectors in R2 . A vector
 
a1
a=
a2
can be represented as a directed line segment in the plane, starting at the origin and
going to the point (a1 , a2 ). As such, it is considered to be the position vector of the
point (a1 , a2 ).

y 6

(a1 , a2 )
>


a2




 -
(0, 0) a1 x

30
3.1. The inner product of two real vectors

Its length, denoted by kak, can be calculated using Pythagoras’ theorem applied to the
right-angled triangle shown above:
q
kak = a21 + a22

so that
kak2 = ha, ai .
3
If a, b are two vectors in R2 , let θ denote the angle between them, 0 ≤ θ ≤ π. The
vectors a, b and c = b − a form a triangle, where c is the side opposite the angle θ. The
law of cosines applied to this triangle gives us the important relationship stated in the
following theorem.
Theorem 3.2 Let a, b ∈ R2 and let θ denote the angle between them. Then
ha, bi = kak kbk cos θ .

For a proof, see Anthony and Harvey.


This theorem has many geometrical consequences. Since −1 ≤ cos θ ≤ 1 for any real
number θ, the maximum value of the inner product is ha, bi = kak kbk. This occurs
precisely when cos θ = 1 (θ = 0), that is, when the vectors a and b are parallel and in
the same direction. If they point in opposite directions, then θ = π and we have
ha, bi = −kak kbk. The inner product will be positive if and only if the angle between
the vectors is acute, 0 ≤ θ < π2 . It will be negative if the angle is obtuse, π2 < θ ≤ π.
The non-zero vectors a and b are orthogonal (or perpendicular) when the angle between
them is θ = π2 . Since cos( π2 ) = 0, this is precisely when their inner product is zero. We
restate this important fact:
The vectors a and b are orthogonal if and only if ha, bi = 0.
Everything we have said so far about the inner product and its geometric interpretation
in R2 extends to R3 .
 
a1 q
If a = a2
  then kak = a21 + a22 + a23 .
a3

Activity 3.2 Show this. Sketch a position vector a = (a1 , a2 , a3 )T in R3 . Drop a


perpendicular to the xy-plane and apply Pythagoras’ theorem twice to obtain the
result.

The vectors a, b and c = b − a in R3 lie in a plane and the law of cosines can still be
applied to establish the result that
ha, bi = kak kbk cos θ .

Planes and hyperplanes

In R3 suppose a vector a 6= 0 and a vector p are given. If a = (a1 , a2 , a3 )T the equation


ha, x − pi = 0 can be written as ha, xi = ha, pi, which is
a1 x 1 + a2 x 2 + a3 x 3 = k

31
3. Inner products and orthogonality

where k = ha, pi is a constant. This is the equation of a plane in R3 . This plane consists
of all vectors x for which x − p is orthogonal to the given vector a. The vector a is then
called a normal to the plane. If k = 0 the plane contains the vector 0. It has equation
ha, xi = 0 and defines a (two-dimensional) subspace of R3 .
As a set of points which we can graph in R3 , taking the endpoints X = (x1 , x2 , x3 ) of all
the vectors x, we can think of ha, x − pi = 0 as the plane passing through the point
3 P = (p1 , p2 , p3 ) and normal to the vector a, meaning that the line segments from X to
P are all perpendicular to a.
In general, if a and p are given vectors in Rn , we define the set of all vectors x ∈ Rn
which satisfy the equation ha, x − pi = 0 to be the hyperplane which contains the point
P and for which the normal vector is a. It has the equation

a1 x1 + a2 x2 + · · · + an xn = k, k = ha, pi is a constant.

In two-dimensional space, R2 , a hyperplane is just a line, with equation of the form


ax + by = k. In three-dimensional space R3 , we speak of a plane, rather than a
hyperplane. In Rn , the hyperplane given by ha, xi = 0 is an n − 1 dimensional subspace
of Rn .

Activity 3.3 Show that, in Rn , the hyperplane given by ha, xi = 0 is an n − 1


dimensional subspace of Rn .

3.2 Inner products more generally


There is a more general concept of inner product than the one just presented, and this
is very important. (It is ‘more general’ in two ways: first, this definition allows us to say
what we mean by an inner product on any vector space, and not just Rn , and, secondly,
it allows the possibility of inner products on Rn that are different from the standard
one.)
Definition 3.1 (Inner product) Let V be a vector space (over the real numbers). An
inner product on V is a mapping from (or operation on) pairs of vectors x, y to the real
numbers, the result of which is a real number denoted hx, yi, which satisfies the
following properties:

(i) hx, yi = hy, xi for all x, y ∈ V

(ii) hαx + βy, zi = αhx, zi + βhy, zi for all x, y, z ∈ V and all α, β ∈ R.

(iii) hx, xi ≥ 0 for all x ∈ V , and hx, xi = 0 if and only if x = 0, the zero vector of the
vector space V .

Some other basic facts follow immediately from this definition: for example,

hz, αx + βyi = αhz, xi + βhz, yi.

32
3.2. Inner products more generally

Activity 3.4 Prove that hz, αx + βyi = αhz, xi + βhz, yi.

It is a simple matter to check that the standard inner product on Rn defined in the
previous section is indeed an inner product according to this more general definition.
The abstract definition, though, applies to more than just the vector space Rn , and
there is some advantage in developing results in terms of the general notion of inner 3
product. If a vector space has an inner product defined on it, we refer to it as an inner
product space.

Example 3.2 (This is a deliberately strange example. It is not one you would
necessarily come up with, but its purpose is to illustrate how we can define inner
products in non-standard ways, which is why we have chosen it.) Suppose that V is
the vector space consisting of all real polynomial functions of degree at most n; that
is, V consists of all functions p : x 7→ p(x) of the form

p(x) = a0 + a1 x + a2 x2 + · · · + an xn , a0 , a1 , . . . , an ∈ R.

The addition and scalar multiplication are defined pointwise. (See Section 7.1.1 of
MT1173 Algebra.) Let x1 , x2 , . . . , xn+1 be n + 1 fixed, different, real numbers, and
define, for p, q ∈ V ,
n+1
X
hp, qi = p(xi )q(xi ).
i=1

Then this is an inner product. To see this, we check the properties in the definition
of an inner product. Property (i) is clear. For (iii), we have
n+1
X
hp, pi = p(xi )2 ≥ 0.
i=1

Clearly, if p is the zero vector of the vector space (which is the identically-zero
function), then hp, pi = 0. To finish verifying (iii) we need to check that if hp, pi = 0
then p must be the zero function. Now, hp, pi = 0 must mean that p(xi ) = 0 for
i = 1, 2, . . . , n + 1. So p(x) has n + 1 different roots. But p(x) has degree no more
than n, so p must be the identically-zero function. (A non-zero polynomial of degree
at most n has no more than n distinct roots.) Part (ii) is left to you.

Activity 3.5 Prove that, for any α, β ∈ R and any p, q, r ∈ V ,

hαp + βq, ri = αhp, ri + βhq, ri.

3.2.1 Norms in a vector space


For any x in an inner product space V , the inner product hx, xi is non-negative (by
definition). Now, because hx, xi ≥ 0, we may take its square root and this will be a real
number. We define the norm , kxk, of a vector x to be this real number.

33
3. Inner products and orthogonality

Definition 3.2 (Norm) Suppose that V is an inner product space and x is a vector in
V . Then the norm of x, denoted kxk, is
p
kxk = hx, xi.

For example, with the standard inner product on Rn ,

3 hx, xi = x21 + x22 + · · · + x2n ,


(which is clearly non-negative since it is a sum of squares), and the norm is the
standard Euclidean length of a vector:
q
kxk = x21 + x22 + · · · + x2n .

We say that a vector v is a unit vector if it has norm 1. If v 6= 0, then it is a simple


v
matter to create a unit vector in the same direction as v. This is the vector u = .
kvk
The process of constructing u from v is known as normalising v.

3.2.2 The Cauchy-Schwarz inequality


This important inequality will enable us to apply the geometric intuition we have
developed about angles to a completely abstract setting.
Theorem 3.3 (Cauchy-Schwarz inequality) Suppose that V is an inner product
space. Then
|hx, yi| ≤ kxkkyk
for all x, y ∈ V .

For a proof, see Anthony and Harvey.


(Note that |hx, yi| denotes the absolute value of the inner product.)
For example, if we take V to be Rn and consider the standard inner product on Rn ,
then for all x, y ∈ Rn , the Cauchy-Schwarz inequality tells us that
v v
Xn u n
X uX
u n
u 2
xi y i ≤ t xi t yi2 .



i=1 i=1 i=1

3.2.3 Generalised geometry


We now begin to imitate the geometry of the previous section. We have already used
Pythagoras’ theorem in R3 , which states that if c is the length of the longest side of a
right-angled triangle, and a and b are the lengths of the other two sides, then
c2 = a2 + b2 . In R2 , we can think of this triangle as having sides given by orthogonal
vectors a and b and the hypotenuse is the vector c = a + b. The generalised
Pythagoras’ theorem is:
Theorem 3.4 (Generalised Pythagoras’ Theorem) In an inner product space V , if
x, y ∈ V are orthogonal, then
kx + yk2 = kxk2 + kyk2 .

34
3.2. Inner products more generally

For a proof, see Anthony and Harvey.


We also have the triangle inequality for norms. This states the obvious fact in R2 that
the length of one side of a triangle must be less than the sum of the lengths of the other
two sides.
Theorem 3.5 (Triangle inequality for norms) In an inner product space V , if
x, y ∈ V , then
kx + yk ≤ kxk + kyk.
3

For a proof, see Anthony and Harvey.

3.2.4 Orthogonal vectors


We are now ready to extend the concept of angle to an abstract inner product space V .
To do this we begin with the result in R3 that hx, yi = kxk kyk cos θ and use this to
define the cosine of the angle between the vectors x and y. That is, by definition, we set

hx, yi
cos θ = .
kxk kyk

This definition will only make sense if we can show that this number cos θ is between
−1 and 1. But this follows immediately from the Cauchy-Schwarz inequality, which can
be stated as
hx, yi
kxk kyk ≤ 1 .

The usefulness of this definition is in the concept of orthogonality.


Definition 3.3 (Orthogonal vectors) Suppose that V is an inner product space.
Then x, y ∈ V are said to be orthogonal if and only if hx, yi = 0. We write x ⊥ y to
mean that x, y are orthogonal.

Example 3.3 With the usual inner product on R4 , the vectors x = (1, −1, 2, 0)T
and y = (−1, 1, 1, 4)T are orthogonal.

Activity 3.6 Check this!

3.2.5 Orthogonality and linear independence


If a set of (non-zero) vectors are pairwise orthogonal (that is, any two are orthogonal)
then it turns out that the vectors are linearly independent:
Theorem 3.6 Suppose that V is an inner product space and that vectors
v1 , v2 , . . . , vk ∈ V are pairwise orthogonal (vi ⊥ vj for i 6= j), and none is the
zero-vector. Then {v1 , v2 , . . . , vk } is a linearly independent set of vectors.

For a proof, see Anthony and Harvey.

35
3. Inner products and orthogonality

3.3 Orthogonal matrices and orthonormal sets


Definition 3.4 (Orthogonal matrix) An n × n matrix P is said to be orthogonal if
P T P = P P T = I: that is, if P has inverse P T .

At first it appears that this definition has little to do with the geometric concept of
3 orthogonality. But, as we shall see, it is closely related. If P is an orthogonal matrix,
then P T P = I, the identity matrix. Suppose that the columns of P are x1 , x2 , . . . , xn .
Then the fact that P T P = I means that xTi xj = 0 if i 6= j and xTi xi = 1.
To see this, consider the case n = 3. Then, P = (x1 x2 x3 ) and since I = P T P we have
   T  T 
1 0 0 x1 x1 x1 xT1 x2 xT1 x3
0 1 0  =  xT2  (x1 x2 x3 ) =  xT2 x1 xT2 x2 xT2 x3  .
0 0 1 xT3 xT3 x1 xT3 x2 xT3 x3

But, if i 6= j, xTi xj = 0 means precisely that the columns xi , xj are orthogonal. The
second statement is that kxi k2 = 1, which means (since kxi k ≥ 0) that kxi k = 1; that is,
xi is of norm 1. The converse is also true. If hvi , vj i = 0 for i 6= j and hvi , vi i = 1 then
it follows that P T P = I. This indicates the following characterisation:
A matrix P is orthogonal if and only if, as vectors, its columns are pairwise orthogonal,
and each has length 1.
Definition 3.5 (Orthonormal) A set of vectors {x1 , x2 , . . . , xk } in an inner product
space V such that any two different vectors are orthogonal and each vector has length 1:

hxi , xj i = 0 for i 6= j and kxi k = 1

is called an orthonormal set (ONS) of vectors.

An important consequence of Theorem 3.6 is that an orthonormal set of n vectors in an


n-dimensional vector space is a basis. A basis consisting of an orthonormal set of
vectors is called an orthonormal basis. If {v1 , v2 , . . . , vn } is an orthonormal basis of a
vector space V , then the coordinates of any vector w ∈ V are easy to calculate as
shown in the following theorem.
Theorem 3.7 Let B = {v1 , v2 , . . . , vn } be an orthonormal basis of a vector space V
and let w ∈ V . Then the coordinates a1 , a2 , . . . , an of w in the basis B are given by

ai = hw, vi i.

For a proof, see Anthony and Harvey.


If P is an orthogonal matrix, then its columns are an orthonormal set of n vectors in
Rn . These are linearly independent by Theorem 3.6, and hence form an orthonormal
basis of Rn . So we can restate our previous observation as follows.
Theorem 3.8 An n × n matrix P is orthogonal if and only if the columns of P form
an orthonormal basis of Rn .

If the matrix P is orthogonal, then since P = (P T )T , the matrix P T is orthogonal too.

36
3.4. Gram-Schmidt orthonormalisation process

Activity 3.7 Show that if P is orthogonal, so too is P T .

It therefore follows that the above theorem is also true if column is replaced by row: A
matrix P is orthogonal if and only if the columns (or rows) of P form an orthonormal
basis of Rn .
3
3.4 Gram-Schmidt orthonormalisation process
Given a set of linearly independent vectors {v1 , v2 , . . . , vk }, the Gram-Schmidt
orthonormalisation process is a way of producing k vectors that span the same space as
{v1 , v2 , . . . , vk }, and that form an orthonormal set. That is, the process produces a set
{u1 , u2 , . . . , uk } such that:

Lin{u1 , u2 , . . . , uk } = Lin{v1 , v2 , . . . , vk }

{u1 , u2 , . . . , uk } is an orthonormal set.


It works as follows. First, we set
v1
u1 =
kv1 k
so that u1 is a unit vector and Lin{u1 } = Lin{v1 }.
Then we define
w2 = v2 − hv2 , u1 iu1 ,
and set
w2
u2 = .
kw2 k
Then {u1 , u2 } is an orthonormal set and Lin{u1 , u2 } = Lin{v1 , v2 }.

Activity 3.8 Try to understand why this works. Show that w2 ⊥ u1 and conclude
that u2 ⊥ u1 . Why are the linear spans of {u1 , u2 } and {v1 , v2 } the same?

Next, we define
w3 = v3 − hv3 , u1 iu1 − hv3 , u2 iu2
and set
w3
u3 = .
kw3 k
Then {u1 , u2 , u3 } is an orthonormal set and Lin{u1 , u2 , u3 } is the same as
Lin{v1 , v2 , v3 }. Generally, when we have u1 , u2 , . . . , ui , we let
i
X wi+1
wi+1 = vi+1 − hvi+1 , uj iuj , ui+1 = .
j=1
kwi+1 k

Then the resulting set {u1 , u2 , . . . , uk } has the required properties.

37
3. Inner products and orthogonality

Example 3.4 In R4 , let us find an orthonormal basis for the linear span of the
three vectors

v1 = (1, 1, 1, 1)T , v2 = (−1, 4, 4, −1)T , v3 = (4, −2, 2, 0)T .

First, we have
3 v1 v1 1
u1 = =√ = v1 = (1/2, 1/2, 1/2, 1/2)T .
kv1 k 12 + 12 + 12 + 12 2

Next, we have
     
−1 1/2 −5/2
 4   1/2   5/2 
w2 = v2 − hv2 , u1 iu1 = 
 4  − 3  1/2  =  5/2  ,
    

−1 1/2 −5/2

and we set  
−1/2
w2  1/2 
u2 = = .
kw2 k  1/2 
−1/2
(Note: to do this last step, we merely noted that a normalised vector in the same
direction as w2 is also a normalised vector in the same direction as (−1, 1, 1, −1)T ,
and this second vector is easier to work with.) Continuing, we have

w3 = v3 − hv3 , u1 iu1 − hv3 , u2 iu2


       
4 1/2 −1/2 2
 −2   1/2   1/2   −2 
 2  − 2  1/2  − (−2)  1/2  =  2  .
=        

0 1/2 −1/2 −2

Then,
w3
u3 = = (1/2, −1/2, 1/2, −1/2)T .
kw3 k
So      

 1/2 −1/2 1/2 


1/2   1/2   −1/2 
{u1 , u2 , u3 } =   ,   ,   .

 1/2   1/2   1/2  
1/2 −1/2 −1/2
 

Activity 3.9 Verify that the set {u1 , u2 , u3 } of this example is an orthonormal set.

Learning outcomes
At the end of this chapter and the relevant reading, you should be able to:

38
3.4. Test your knowledge and understanding

explain what is meant by an inner product on a vector space

verify that a given inner product is indeed an inner product

compute norms in inner product spaces

state and apply the Cauchy-Schwarz inequality, the Generalised Pythagoras’


Theorem, and the triangle inequality for norms
3
prove that orthogonality of a set of vectors implies linear independence

state what is meant by an orthogonal matrix

explain what is meant by an orthonormal set of vectors

explain why an n × n matrix is orthogonal if and only if its columns are an


orthonormal basis of Rn

use the Gram-Schmidt orthonormalisation process.

Test your knowledge and understanding


You should now attempt the Exercises in Chapter 10 of Anthony and Harvey. Solutions
to those exercises without solutions in Anthony and Harvey may be found on the VLE.

Feedback on selected activities


Feedback to activity 3.1
To prove properties (ii) and (iii), apply the definition to the LHS (left-hand side) of the
equation and rearrange the terms to obtain the RHS. For example, for x, y ∈ Rn , using
the properties of real numbers,

αhx, yi = α(x1 y1 + x2 y2 + · · · + xn yn )
= αx1 y1 + αx2 y2 + · · · + αxn yn
= (αx1 )y1 + (αx2 )y2 + · · · + (αxn )yn = hαx, yi.

The single property hαx + βy, zi = αhx, zi + βhy, zi implies property (ii) by letting
β = 0 and then letting α = 0, and property (iii) by letting α = β = 1. On the other
hand, if properties (ii) and (iii) hold, then

hαx + βy, zi = hαx, zi + hβy, zi by property (iii)


= αhx, zi + βhy, zi by property (ii)

39
3. Inner products and orthogonality

Feedback to activity 3.3


Let a ∈ Rn be a given non-zero vector and let V = {x ∈ Rn : ha, xi = 0}. The
components of x ∈ V satisfy the equation a1 x1 + a2 x2 + · · · + an xn = 0. For some i,
ai 6= 0 (since a 6= 0), so the equation can be solved for xi . We have
xi = β1 x1 + · · · + βi−1 xi−1 + βi+1 xi+1 + · · · + βn xn
where βj = −(aj /ai ). Then
3    
x1 x1
 ...   ..
.

   
 xi  =  β1 x1 + · · · + βi−1 xi−1 + βi+1 xi+1 + · · · + βn xn 

 .  
 .   .. 
.  . 
xn xn
x = x1 v1 + · · · + xi−1 vi−1 + xi+1 vi+1 + · · · + xn vn
where vj is the vector with 1 in the j-th place, βj in the i-th place, and zeros elsewhere.
These n − 1 vectors are linearly independent (why?) and span V , hence they are a basis
of V and the dimension of V is n − 1.
Feedback to activity 3.4
By the properties of an inner product, we have
hz, αx + βyi = hαx + βy, zi
= αhx, zi + βhy, zi
= αhz, xi + βhz, yi.
Feedback to activity 3.5
Since αp + βq is the polynomial function: x 7→ αp(x) + βq(x), we have
n+1
X
hαp + βq, ri = (αp(xi ) + βq(xi ))r(xi )
i=1
n+1
X n+1
X
= α p(xi )r(xi ) + β q(xi )r(xi )
i=1 i=1
= αhp, ri + βhq, ri,
as required.
Feedback to activity 3.6
Just check that hx, yi = 0.

Feedback to activity 3.7


The matrix P is orthogonal if and only if P P T = P T P = I. Since (P T )T = P this
statement can be written as (P T )T P T = P T (P T )T = I which says that P T is orthogonal.
Feedback to activity 3.8
We have
hw2 , u1 i = hv2 − hv2 , u1 iu1 , u1 i = hv2 , u1 i − hv2 , u1 ihu1 , u1 i = 0,
as hu1 , u1 i = 1. The fact that w2 ⊥ u1 if and only if u2 ⊥ u1 follows from property (ii)
of the definition of inner product since w2 = αu2 for some constant α. The linear spans
are the same because u1 , u2 are linear combinations of v1 , v2 .

40
3.4. Feedback on selected activities

Feedback to activity 3.9


We only need to check that each ui satisfies kui k = 1, and that
hu1 , u2 i = hu1 , u3 i = hu2 , u3 i = 0. All of this is very easily checked. (It is much harder
to find the ui in the first place. But once you think you have found them, it is always
fairly easy to check whether they form an orthonormal set, as they should.)

41

Вам также может понравиться