Multivariable Calculus

Calculus of Several Variables (Math 22)
Shawn OHare
July 25, 2013
Contents
1 Preliminaries 4
1.1 Basic notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Euclidean Space and Coordinates 8
2.1 Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Subsets of Euclidean Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Coordinate systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Vectors and the dot product 12
3.1 Abstract vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Vector addition and scalar multiplication . . . . . . . . . . . . . . . . . . . . . . 12
3.3 The dot product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 The cross product and determinants 17
4.1 The cross product of two vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Coordinate vectors and the Determinant . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Planes, lines, and projections 23
5.1 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Angles between planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4 Vector projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Vector functions 29
6.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.2 Basic calculus of vector functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1
7 Arc Length and Curvature 34
7.1 Arc length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.2 Parameterization with respect to arc length . . . . . . . . . . . . . . . . . . . . . 35
7.3 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8 Multivariable functions 39
8.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
8.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.3 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
8.4 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.5 Implicit dierentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
9 Tangent planes and the linear approximation 47
9.1 Tangent planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
9.2 Linearizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.3 The dierential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
10 The directional derivative and gradient 52
10.1 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
10.2 Geometry of the gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
11 Maxima, minima, and saddle points 57
11.1 Maxima and minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
11.2 Finding extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
12 Lagrange multipliers 63
12.1 The method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
12.2 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
12.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
13 Double integrals 69
13.1 The integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
13.2 Iterated integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
13.3 More general regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
13.4 Changing the order of integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
13.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
14 Change of Coordinates 79
14.1 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
14.2 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
14.3 Applications to double integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
14.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2
15 Applications 91
15.1 Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
15.2 Center of mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
15.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
15.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
16 Vector Fields 95
16.1 Introduction to Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
16.2 Path and Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
16.3 Conservative Vector Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
16.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Index 96
3
1 Preliminaries
Mathematics is a precise subject. Careful and exact notation is therefor required. In this
section we remind the reader of the basic mathematical terminology and ideas which we will
utilize throughout the course.
1.1 Basic notation
In mathematics (and computer programming) we can distinguish certain types of equality. When
we write x
2
+ 4x + 4 = (x + 2)
2
we mean that the two polynomials are equal for all values of
x. Both polynomials have already been dened, and so by the = symbol we are asserting a
logical equivalence. Contrast this to a situation where we want to actually dene an object. For
instance, we might tire of writing (x
2
+ 4x + 4) and so we could call this polynomial something
else, say P. Mathematicians write P := (x
2
+ 4x + 4) to mean we are using the letter P to
denote the polynomial x
2
+ 4x + 4. Having dened P we could now write the true statement
P = (x + 2)
2
. So = deals with truth and := with notation/denitions.
1.2 Logic
Understanding the rudiments of logic is also important in mathematics and life in general. The
most crucial logical construction is perhaps the implication, also known as the conditional. Given
a general statement, we desire to know whether it is true, and the implication links the truth of
one statement to another.
To properly understand the conditional we need to know what the basic atoms of logic are.
For our purposes, the most reduced logical objects are simply statements of fact (or falsity). For
instance, the statements A human is a mammal. and All mammals die.
1.1 Denition (Implications). Let A and B be two logical statements. Then A is said to
imply B if when A is true, B is also true. We write A =) B, and read this as A implies
B. The converse of A =) B is B =) A. If A =) B and also B =) A we write
A () B. This reads as A if and only if B. In this situation we say that A and B are
logically equivalent. Sometimes, A () B is called a biconditional.
Given a statement A, then A means not A. For example, if A is the statement A human
is a mammal., then A is the statement A human is not a mammal. The contrapositive of
A =) B is B =) A. These two statements are logically equivalent, so A =) B is true
whenever B =) A is true, and vice versa.
1.2 Remark. The biconditional is ubiquitous in mathematics. Mathematical and scientic
problem solving is about taking a dicult problem and nding some way to make it tractable.
This technique is called reduction.
1.3 Example. Assuming that all humans are mammals and all mammals die, set C :=
Socrates is a human. and D := Socrates will die.. Then C =) D. Note also that if
Socrates will never die, then he must not be a human, since Socrates will not die implies
socrates is not a human is the contrapositive of C =) D.
1.3 Sets
1.4 Denition (Sets). A collection of distinct objects is called a set. We use { } to denote a
set, and treat a set as a mathematical object which we can manipulate. The objects which make
up the set are called the sets elements or often its members. We denote set membership by
the 2 symbol. That is, given a set S, a 2 S denotes that a is an element of the set S.
4
1.5 Example.
(a) {a, b, c} is a set of letters, which contains the elements a, b, c.
(b) {{a}, {b}} is a set with the elements {a}, {b}, i.e.
1
, a set of sets.
(c) R, Q, Z, N denote the sets of real, rational, integer, and natural numbers respectively.
(d) {, !} is a set of arrows.
Often it is impossible to enumerate all elements in a set, and so we often characterize a set
as a rule which ascribes membership. We typically use the symbols : or | to indicate that a sets
rule is about to be stated.
1.6 Example. {z 2 Z : z = 2s, s 2 Z} reads as the set of integers z such that z is twice the
product of some other integer s. More concisely, its the set of even integers.
Given two sets, we might be interested in relating elements from one set to another. We
make this precise as follows.
1.7 Denition (n-tuple). Given two sets A, B we can form another set denoted A B :=
{(a, b) : a 2 A, b 2 B}. The set A B is called the Cartesian product of the sets A and B.
We call the elements of AB ordered pairs.
More generally, suppose n is a natural number. Given n sets A
1
, . . . , A
n
, then A
1
A
n
:=
{(a
1
, . . . , a
n
) : a
i
2 A
i
, 1 i n}. The elements of A
1
A
n
are called n-tuples. In this
course we are primarily concerned with the set R
n
of all n-tuples of real numbers, which is the
n-fold Cartesian product of the real numbers R with itself.
You are actually quite familiar with ordered pairs.
1.8 Example.
(a) The Cartesian coordinate plane is an example of a set of ordered pairs. Recall that when
we graph functions, we often speak of an x and a y-value. The x-values represent inputs into a
function, the y-value the output. So given a real valued function f, we can completely describe
its graph via the set {(x, f(x) : x 2 R)}. (See Figure 1.)
Figure 1: Cartesian coordinate system
(b) We could represent the set of all three-lettered words with letters from the English
alphabet as follows. Let E be the set of letters which occur in the English alphabet. Then A =
{(a
1
, a
2
, a
3
) : a
1
, a
2
, a
3
2 E} is the set of all three letter combinations (which mathematicians
call words). So for instance, (c, a, t) 2 A and (x, y, z) 2 A.
1
The abbreviation i.e. is for the Latin id est which translates to that is.
5
We now turn toward dening some useful operations on sets.
1.9 Denition (Subset). Let A be a set. We say that a set B is a subset of A, denoted
B A or A B if every element of B is also an element of A. We say that A = B if the
elements of A and B are the same. Clearly, B A and A B implies that A = B.
1.10 Denition (Set operations). Let A and B be two sets. The intersection of A and B,
denoted by A\ B, is the set of elements which are both in A and B. That is A\ B := {x : x 2
A, x 2 B}. The union of A and B, denoted A [ B, is the set of elements in A or B, that is
A[ B := {x : x 2 A, or x 2 B}.
Suppose that A B. Then we dene the compliment of A in B, denoted A
c
or B\A, to
be the set of elements which are in B but which are not in A.
1.4 Functions
We are all familiar with the functions between the real numbers. Its simply a rule that takes
one number and yields another. We can generalize the notion of function to arbitrary sets. To
start out though we need the notion of a relation on a set.
1.11 Denition (Binary relation). Given a set X and Y , a binary relation R from X to Y is
a subset of X Y . If (x, y) 2 R we write this as xRy. When X = Y we often refer to R as a
binary operator on X. The standard arithmetical operations of addition and multiplication are
examples of binary operators on R.
1.12 Denition (Function: formal). Let X and Y be sets. By a function (or map) between
X and Y , denoted f : X !Y , we mean a binary relation on X and Y such that
(x, y) 2 f X Y and (x, z) 2 f X Y () y = z. (1.13)
Instead of writing xfy to signify that x and y are related by the function f, we typically
write f(x) = y or f : x 7!y.
1.14 Example. If f : R ! R is a function (here X = Y = R) then condition (1.13) is the
familiar notion of the vertical line test. We demand that a function cannot map a real number
to two dierent values. For instance, we cannot simultaneously allow f(2) = 5 and f(2) = 6.
1.5 Exercises
1.15 Exercise. If I wear boots whenever it is raining, and I am currently wearing boots, does
that imply that it is raining? If I am not wearing boots, can it be raining?
1.16 Exercise. What is the converse of the statement, If A causes B, then A and B are
correlated.? What is the contrapositive of the statement, If it is sunny, then I am happy.
1.17 Exercise. Let A := {x, y, z}, B := {1, 2, 3}, C := {1, x}, D := {{x} , {y} , {z} , {x, y, z}}.
1. Find A[ B.
2. Find A\ D
3. Find A\ C.
4. Is C a subset of B?
5. Is A a subset of D?
6
6. Find all elements in B C.
7. Is B C the same set as C B?
7
2 Euclidean Space and Coordinates
Many processes be them natural (the gravitational interaction of two planets) or the product
of human industriousness (economics, nance) can be modeled with high delity via various
mathematical constructions. Real world data often comes in the form of tuples of numbers,
and this alone should provide sucient impetus to study a mathematical abstraction of the
geometric properties of our everyday world known as Euclidean n-space. One way to think of
Euclidean n-space is as a canonical setting in which the usual n-dimensional Euclidean geometry
can take place. For instance, Euclidean 2-space is a plane. Euclidean space is an example of
a mathematical manifold, which are generalizations of surfaces that carry enough structure so
that some form of geometry can take place. Manifolds are used heavily in mathematical physics
as well as modern theories of statistical learning.
2.1 Euclidean space
2.1 Denition (Operations on R
n
). Let n be a natural number.
An element x of R
n
is typically called a point.
Given that x = (x
1
, . . . , x
n
), the real number x
i
(where i 2 {1, . . . , n}) is called the ith
entry of x. When no confusion will result, we will assume implicitly that x
i
refers to the
ith entry of some point x that has already been dened. For instance, v
1
will usually
correspond to the rst entry of some point v in R
n
.
We write x = y provided x
i
= y
i
for each i 2 {1, . . . , n}. The zero element 0 of R
n
is the
n-tuple consisting entirely of 0s. For example, 0 = (0, 0) in R
2
.
We wish to describe a way to add two points in R
n
, which is a prelude to our discussion
in the sequel regarding vectors. The binary operator + on R
n
is dened by
(x
1
, . . . , x
n
) + (y
1
, . . . , y
n
) := (x
1
+y
1
, . . . , x
n
+y
n
).
For example, (5, 3) + (1, 1) = (6, 2).
Given a real number , we dene the scaling of (x
1
, . . . , x
n
) by via
(x
1
, . . . , x
n
) := (x
1
, . . . , x
n
).
By x y we mean x + (1)y. For example, 3(, 0) (1, 1) = (3 1, 1).
2.2 Denition (Euclidean space). Let n be a natural number. The standard inner product
on R
n
is a function h, i from R
n
R
n
to R dened by
hx, yi :=
n
i=1
x
i
y
i
,
where x := (x
1
, . . . , x
n
) and y := (y
1
, . . . , y
n
) are two elements of R
n
. The standard inner
product on R
n
satises the axioms for a real-valued inner product. Namely, for all elements x,
y, and z of R
n
and any real number :
1. hx, yi = hy, xi,
2. hx, yi = hx, yi,
8
3. hx +y, zi = hx, zi +hy, zi,
4. hx, xi 0 with equality only when x = 0.
In the sequel we shall see that this inner product corresponds to the dot product of vectors.
The standard inner product on R
n
induces a function ||x|| :=
_
hx, xi from R
n
to R called the
Euclidean norm on R
n
. In the sequel we shall see that this norm gives the length of vectors.
The Euclidean norm in turn induces a metric (or distance function) on R
n
dened by
d(x, y) := ||x y|| =
_
n
i=1
(x
i
y
i
)
2
.
This metric is called the Euclidean metric on R
n
, and it allows us to dene all the usual aspects
of Euclidean geometry. For instance, the non-reex angle between two non-zero points x and
y is dened by
= arccos
hx, yi
||x|| ||y||
.
By convention, the angle between any point and 0 is 0. The set R
n
equipped with the standard
inner product is called Euclidean n-space.
2.3 Example. Let x := (1, 0, 0, 1) and y := (1, 0, 0, 1). Then hx, yi = 1(1) + 0 + 0 +
(1)(1) = 0. Since hx, xi = 2, we see that ||x|| =
p
2.
2.2 Subsets of Euclidean Space
Presently we give the set descriptions of some common subsets of Euclidean space.
2.4 Example. Now we explicitly describe some subsets of R
3
.
(a) {(x, y, 0) : x, y 2 R} is the (x, y)-plane.
(b) {(x, 0, z) : x, z 2 R} is the (x, z)-plane.
(c) {(0, y, z) : z, y 2 R} is the (y, z)-plane.
(d) {(x, 1, 2) : x 2 R} is the plane containing (0, 1, 2) which is parallel to the (x, y)-plane.
2.5 Example. We can realize the same subsets in 2.4 as solutions to certain equations in the
variables x, y, and z.
(a) z = 0 has the solution set {(x, y, 0) : x, y 2 R}.
(b) y = 0 has the solution set {(x, 0, z) : x, z 2 R}.
(c) x = 0 has the solution set {(0, y, z) : z, y 2 R}.
(d) The points of {(x, 1, 2) : z, y 2 R} are simultaneous solutions to y = 1, z = 2.
2.6 Example.
(a) Fix some natural number n. The set B(a, r) := {b 2 R
n
: d(a, b) < r} is the open ball
of radius r centered at a.
(b) The set S
1
(a, r) := {b 2 R
2
: d(a, b) = r} is a circle of radius r centered at a.
(c) The set S
2
(a, r) := {b 2 R
3
: d(a, b) = r} is the sphere of radius r centered at a.
2.3 Coordinate systems
A coordinate system on Euclidean n-space is a way to associate a unique number or a tuple
of numbers to each point in R
n
. There is already an obvious way to do this. Given x 2 R
n
,
associate it to the tuple (x
1
, . . . , x
n
). However, it might not be intuitively clear what x or
9
(x
1
, . . . , x
n
) actually represents. To remedy this, we turn to a thought experiment about how
we might encode the location of points in a room in some cogent way. One method is to assign
to each point in the room an element of R
3
. To carry this out, rst we would want to choose
an arbitrary point and assign it the value (0, 0, 0), that is, we choose an origin. Next we center
our right hand at the chosen origin and form a pistol shape so that our thumb is parallel to a
line L
1
, index nger parallel to a line L
2
, and middle nger parallel to a line L
3
. This produces
three pairwise perpendicular lines that intersect at the origin. The direction our ngers point
is special, and we signify that direction as the positive direction. Making this choice we have
chosen an orientation for the lines L
1
, L
2
, L
3
and say that they are oriented according to the
right hand rule (Figure 2). Suppose that P is some point in the room. Then to reach P we
can start at the origin, walk some number of units P
1
along L
1
, then walk some number of units
P
2
along L
2
, and nally some number of units P
3
along L
3
(assuming we can walk in the air).
If, say, P
1
were a negative number that means we would walk |P
1
| units in the direction opposite
of where our nger was pointing. The triple (P
1
, P
2
, P
3
) now uniquely identies the point P,
and we have endowed the room with a coordinate system.
Figure 2: The right hand rule.
2.7 Denition (Rectangular coordinates). The rectangular coordinate system on R
n
is the
identication of a point x in R
n
with its underlying tuple (x
1
, . . . , x
n
). If e
i
is the point in R
n
whose only non-zero entry is a 1 in the ith spot (e.g., e
2
= (0, 1, 0) in R
3
), then we can write
x = x
1
e
1
+. . . x
n
e
n
.
In the case where n = 3, we usually adopt a slightly dierent notational convention. The
rst entry of a point P in R
3
is called the x-coordinate, the second entry of P is called the y-
coordinate, and the third entry is called the z-coordinate. You should note that the rectangular
coordinate system on R
3
corresponds to that produced by the process above.
2.8 Denition (Polar coordinates). Now we describe a coordinate system for R
2
that asso-
ciates a dierent pair of numbers to each point P in R
2
than its x-coordinate and y-coordinate.
The polar coordinate system on R
2
is that which assigns to each point P in R
2
the pair (||P||,
P
),
where
P
is the angle (as measured counter clockwise) between the line segment connection 0 to
P and the line segment connecting 0 to (1, 0). The pair (||P||,
P
) uniquely identies P, in that
(||P||,
P
) = (||Q||,
Q
) if and only if P = Q, where Q is some point in R
2
. The pair (||P||,
P
)
are the polar coordinates for P. We maintain the convention that 0 has polar coordinates (0, 0).
Geometrically, to nd a point in R
2
given that its polar coordinates are (r, ), we can
start at the origin and rst walk r units along the x-axis and then walk radians in the
10
Figure 3: Rectangular coordinates.
a clockwise direction along the circle of radius r centered at the origin. More algebraically,
if P has polar coordinates (r, ), then it has rectangular coordinates (r cos , r sin ), so x =
r cos and y = r sin . If P has rectangular coordinates (x, y), then it has polar coordinates
(
_
x
2
+y
2
, arccos(x/(
_
x
2
+y
2
))).
2.9 Denition (Cylindrical coordinates). The cylindrical coordinate system on R
3
is that
which assigns to each point P in R
3
the triple (||P||,
P
, z
P
), where (||P||,
P
) are the polar
coordinates of the projection of P onto the (x, y)-plane and z
p
is the z-coordinate of P.
If P has cylindrical coordinates (r, , z), then it has rectangular coordinates (r cos , r sin , z).
2.10 Denition (Spherical coordinates). The spherical coordinate system on R

3
is that which
assigns to each point P in R
3
the triple (||P||,
P
,
P
), where
P
is the polar angle of the
projection of P onto the (x, y)-plane and
P
is the angle between P and (0, 0, 1). Note in
particular that this forces 0 to have spherical coordinates (0, 0, 0).
If P has spherical coordinates (, , ), then it has rectangular coordinates (x, y, z) where
x = cos sin , y = sin sin , and z = cos . If P 6= 0 has rectangular coordinates (x, y, z),
then P has spherical coordinates (, , ) where =
_
x
2
+y
2
+z
2
, = arctan(y/x), and
= arccos(z/
_
x
2
+y
2
+z
2
)), provided x 6= 0. If x = y = 0, then we take = 0. If x = 0 and
y 6= 0 we take = /2 when y is positive and = /2 when y is negative.
2.11 Example. 1. The annulus in R
2
centered at the origin with inner radius r
0
and outer
radius r
1
is the set of points whose polar coordinates are in {(r, ) : r
0
r r
1
, 0 < 2}.
A polar rectangle is consequently some section of an annulus.
2. A sphere in R
3
centered at the origin with radius R is the set of all points whose spherical
coordinates satisfy = R.
3. A cylinder in R
3
centered at the origin with radius R is the set of all points whose cylindrical
coordinates satisfy r = R.
2.4 Exercises
2.12 Exercise. From Rogawski: 12.7: 3, 5, 7, 17, 19, 21, 25, 31, 33, 47, 53.
11
3 Vectors and the dot product
In section 2 we endowed the set R
n
with an inner product in order to capture the basic notions
of Euclidean geometry. By endowing R
n
with an additive structure (a way to add two points
in R
n
and obtain another point) and a way to scale points by real numbers we actually gave
R
n
the structure of a vector space. This course does not concern itself with the formal study
of vector spaces, but we do need to utilize the concept of a vector. Fortunately vectors do not
dier too much from points in R
n
, in that they are really just two slightly dierent ways to
represent essentially the same information. In fact, in this section we will rst introduce some
operations on vectors and later realize that these correspond exactly to the various operations
on points in R
n
.
3.1 Abstract vectors
As we have seen, an element of R
3
can be thought of as a point in space. While R
3
is quite
handy in dealing with static objects, often what is of interest to the scientist is dynamic. For
instance, a physicist might be interested in the motion of an electron in an electromagnetic eld.
A marine biologist might want to know how nutrients ow in the ocean. An engineer tries to
understand the stress a bridge puts on its supports. We now introduce a mathematical concept
(the vector) which allows us to quantify these notions.
3.1 Denition (Abstract vector). An (abstract) vector v (or v or v) in Euclidean n-space
consists of a point a 2 R
n
called the tail, a point b 2 R
n
called the head, and a directed line
segment from the tail to the head. The magnitude ||v|| (or |v|) of v is the length of the line
segment connecting a to b. It should be clear that |v| = ||b a||, where the right hand side of
the equality is the Euclidean norm of b a introduced previously.
By the zero vector, denoted 0, we mean the vector with a tip and tail which coincide. In
particular |0| = 0. If |v| = 1 we say that v is a unit vector.
A point x 2 R
n
can naturally be identied with a vector whose tail is 0 and whose head is
x. Then what distinguishes a vector v with tail a and head b from the point x := ba thought
of as a vector? Not much really. The directed line segments associated to v and x are parallel
and point in the same direction. If we took x and translated it so that its tail was at a, then
we would obtain the vector v. In fact, the properties of v are translation invariant, which is to
say that for the purposes of calculation we can often just substitute x for v.
As a matter of convention, we will freely conate the point x 2 R
n
with the associated vector
whose tail is 0 and head is x. Similarly, if we dened a vector v in terms of a single point x,
then we mean that v has tail 0 and head x. Thus the vector v := (1, 1) corresponds to the
directed line segment originating at the origin and terminating at (1, 1), while v points in the
opposite direction.
3.2 Notation (Standard basis vectors). The vectors i := (1, 0) and j := (0, 1) in R
2
are
called the standard basis vectors for R
2
. Similarly, the vectors i := (1, 0, 0), j := (0, 1, 0), and
k := (0, 0, 1) are called the standard basis vectors in R
3
. More generally, the set {e
1
, . . . , e
n
} as
dened in 2.7 is the standard basis for R
n
.
3.2 Vector addition and scalar multiplication
We will describe a way to add and scale abstract vectors. At rst we will keep track of heads
and tails, but quickly realize that the arithmetic of vectors is essentially translation invariant,
as alluded to in 3.1.
12
3.3 Denition (Vector addition). Let v and w be an abstract n-vectors with tails at a point
a. Let P be the parallelogram formed by v and w. By v + w we mean the vector whose tail is
at a and whose head is the corner of the parallelogram opposite of a. This description of vector
addition is sometimes called the head to tail method of vector addition, since the head of v +w
is the point obtained by translating w so that its tail is at the head of v. If v has head at point
x and w has head at y, then v +w has head at the point (x a) + (y a) +a.
In the context of vectors, a real number is often called a scalar. Given a scalar and a
vector v with tail a and head b, the vector v has tail a and head (ba) +a. If is positive,
then v points in the same direction as v, but its magnitude is |v|. If < 0, then v points
in the opposite direction of v and has magnitude || |v|.
3.4 Example. Consider the 2-vectors a, b which have tails at the same point. Then Figure
(4) describes what a + b looks like. It should be clear from the picture that a + b = b + a and
Figure 4: Abstract vector addition
that a+b is one of the diagonals of the parallelogram with sides a and b. Convince yourself that
a b is the opposite diagonal.
Figure 5: Vector subtraction realized as one of the diagonals of the parallelogram with sides u
and v
3.5 Example. Let v = (1, 2), w = (2, 3) be vectors in R
2
. Then
(a)5v + 3w = 5(1, 2) + 3(2, 3) = (5, 10) + (6, 9) = (11, 19).
(b) v +v = (1, 2) + (1, 2) = (2, 4) = 2v.
(c) v w = v + (w) = (1, 2) + (2, 3) = (1, 1).
13
3.6 Proposition (Properties of vector addition). Let v, w be vectors in R
n
, and , 2 R be
scalars. Then from the denitions its clear that:
1. (v +w) = v +w.
2. ( +)v = v +v.
3. ( +)(v +w) = ( +)v + ( +)w = v +v +w +w.
3.3 The dot product
Now we to a function which encodes information about the geometric relationship between two
vectors, namely the angle between them. Of course at present we do not have a notion of what
the angle between two vectors might be, so we set about describing this now. We would be a
bit remiss if we did not confess that in some sense we are being coy. All the concepts we are
dening for vectors result from the structure imposed on R
n
by the standard inner product.
Previously we dened the angle between two points in R
n
in terms of the inner product, and so
our denition of dot product is essentially a reformulation of this concept.
3.7 Denition (Span). Let v be a vector in R
n
. Then dene Rv := {v 2 R
n
: 2 R}. We
call Rv the span of v. Think of this set as all vectors which result from stretching or shrinking
v. In particular every vector in Rv points in the same or opposite direction. Alternatively we
could think about Rv as a line through the origin which is parallel to the vector v.
3.8 Denition (Angle between two vectors). Let v, w be vectors in R
n
. Then the two lines
Rv, Rw determine a plane. Dene the angle between v and w to be the (non-reex) angle
between the lines Rv and Rw. In particular, 2 [0, ].
Two vectors v and w are said to be parallel if the angle between them is either 0 or . If
= 0 then v and w point in the same direction. If = then v and w point in opposite
directions.
If v has tail a and head b, and if w has tail c and head d, then the angle between v and w
is the angle between the points b a and d c as dened in 2.2.
3.9 Example. Let v = (1, 0) and w = (1, 1) be vectors in R
2
. Then the angle between v
and w is clearly /4.
3.10 Denition (Dot product: coordinate independent form). Let V be the set of all vectors
in R
n
. Then the dot product is dened as the function
: V V !R, (v, w) 7!|v||w| cos
where is the angle between v and w.
3.11 Proposition (Length of a vector as a dot product). Let v be a vector in Euclidean
n-space. Then v v = |v|
2
.
Proof. By denition v v = |v||v| cos where is the angle between v and v, i.e., = 0. Thus
v v = |v||v| cos 0 = |v||v| = |v|
2
.
14
3.12 Theorem (Dot product in coordinates). Let v and w be vectors in R
n
, and suppose
that x and y are the points in R
n
obtained by translating v and w (respectively) so that their
tails are both at the origin. Then v w = hx, yi, where the righthand side is the standard inner
product of points in R
n
. In particular, if v = (v
1
, . . . , v
n
) and w = (w
1
, . . . , w
n
) are vectors in
R
n
, then
v w =
n
i=1
v
i
w
i
.
Proof. We prove the case when n = 2 explicitly and appeal to the denition of angle introduced
in 2.2 to prove the general case. Suppose that v has tail at a = (a
1
, a
2
) and head at b = (b
1
, b
2
).
Likewise suppose that w has tail at c = (c
1
, c
2
) and head at d = (d
1
, d
2
). Dene v
1
:= b
1
a
1
,
v
2
:= b
2
a
2
, w
1
:= d
1
c
1
, w
2
:= d
2
c
2
. From the law of cosines,
|v|
2
+ |w|
2
2|v||w| cos = |v w|
2
= (v
1
w
1
)
2
+ (v
2
w
2
)
2
= (v
2
1
2v
1
w
1
+w
2
1
) + (v
2
2
2v
2
w
2
+w
2
2
)
= (v
2
1
+v
2
2
) + (w
2
1
+w
2
2
) 2(v
1
w
1
+v
2
w
2
)
= |v|
2
+ |w|
2
2(v
1
w
1
+v
2
w
2
) =)
|v||w| cos = v
1
w
1
+v
2
w
2
.
For the general case, suppose v has tail a and head b, and w has tail c and head d. Then
the angle between v and w is the angle between the points x := b a and y := d c. Thus
v w = |v| |w| cos
= ||x|| ||y|| cos
arccos
hx, yi
||x|| ||y||
= hx, yi .
We restate as a corollary what is essentially the denition of the angle between two points
in R
n
.
3.13 Corollary. Let v = (v
1
, . . . , v
n
) and w = (w
1
, . . . , w
n
) be two vectors in R
n
. Suppose
that is the angle between v and w. Then
= arccos
v
1
w
1
+ +v
n
w
n
|v||w|
= arccos
v w
|v||w|
.
3.14 Corollary (Properties of the dot product). Let v, w, u 2 R
n
be vectors and 2 R a
scalar.
1. v (w +u) = v w +v u.
2. 0 v = 0 2 R.
3. v w = w w.
4. (v) w = (v w).
15
3.4 Orthogonality
We explore an important geometric relationship between vectors.
3.15 Denition. Two non-zero vectors v, w in Euclidean n-space are said to be orthogonal,
denoted v ? w, if the angle between them is /2.
The following is a equivalent criterion for when two non-zero vectors are orthogonal. You
should know it.
3.16 Proposition (A condition for orthogonality). Let v, w be two non-zero vectors in Eu-
clidean n-space. Then v ? w () v w = 0.
Proof. Since v, w are non-zero |v| 6= 0 6= |w|. Thus 0 = v w = |v||w| cos () cos = 0 ()
= /2 () v ? w.
3.17 Example.
(a) In R
3
, the vectors e
1
:= (1, 0, 0), e
2
:= (0, 1, 0), e
3
:= (0, 0, 1) are pairwise orthogonal.
That is, e
1
e
2
= 0, e
1
e
3
= 0 and e
2
e
3
= 0.
(b) In R
2
, the vectors v = (5, 3) and w = (3, 5) are orthogonal, since v w = 5 3+(3 5) =
15 15 = 0.
3.5 Exercises
3.18 Exercise. Find two vectors v, w in R
3
such that v ? (1, 1, 0), w ? (1, 1, 0) and v ? w.
3.19 Exercise. Let v, w, u be three vectors in R
3
. Are the following true or false?
(a) If v and w both have an angle of /6 with u, so does v +w.
(b) If v and w both have an angle of /2 with u, so does v
+
w.
(C) If v and w both have an angle of /2 with u, then v ? w.
3.20 Exercise. Rogawski: 12.1: 3, 8, 9, 12, 29, 41-44, 56-59.
12.2: 1, 29, 30, 33, 36, 37, 41, 43, 46, 50, 52
12.3: 12-14, 19, 32, 34, 46, 44, 47, 52, 51, 56, 57, 61, 62, 77, 78
16
4 The cross product and determinants
Throughout this section we assume that all vectors are in Euclidean 3-space. The vector oper-
ation called the cross product which we are about to explore will allow us to show how vectors
can give rise to a coordinate system of Euclidean 3-space. Moreover, it will also be useful for
constructing planes, and also allows us to easily nd a vector which is pairwise orthogonal to
two given vectors.
4.1 The cross product of two vectors
4.1 Denition (Cross product: coordinate independent form). Let V = {vectors in Euclidean 3-space}.
The cross product is a function
: V V !V, (v, w) 7!v w
where v w is a vector such that (a) |v w| = |v||w| sin where is the angle between v, w
and (b) v w has direction orthogonal to both v and w which satises the right hand rule (see
Figure 6).
Figure 6: Right hand rule for determining the direction of the cross product.
4.2 Remark. Let v and w be two vectors. It should be geometrically obvious that vw exists.
Indeed, the existence of vw is equivalent to the fact that we can always equip Euclidean 3-space
with coordinate axes.
Dont get too hung up on the fact that the cross product is a function. What you do need
to realize is that v w is a vector while v w is a number. Also important to remember is that
v w ? v and v w ? w. We will use this fact a good deal and you should remember it.
4.3 Proposition (Cross product as area of parallelogram). Let v and w be vectors. Then
the area of the parallelogram with sides |v|, |w| is |v w|.
Proof. The area of a parallelogram is its vertical height times the length of its base. Recall
that the height of a parallelogram with sides |v| and |w| is |w| sin where is the angle formed
between the sides. In particular is the angle between v and w. Then the area is given by
|v||w| sin = |v w|.
4.4 Example. Imagine we use a wrench to loosen a tight bolt. We can think of the length
of the wrench as a vector r. Imagine applying a force to the end of the wrench at an angle .
We can think of this force as a vector F. Then the torque, is given by := r F. From the
17
Figure 7: |v w| realized as the area of the parallelogram formed by v and w.
denition of |r F| its clear that the torque is greatest if we apply the force F at an angle of
/2.
Figure 8: Torque diagram
Figure 9: Another torque diagram.
Here are a few more interesting facts about the cross product.
4.5 Example. Let v and w be vectors.
(a) Note that |v w|
2
+ |v w|
2
= |v|
2
|w|
2
cos
2
+ |v|
2
|w|
2
sin
2
= |v|
2
|w|
2
.
(b) |v v| = |v|
2
sin 0 = 0.
We now give the coordinate dependent denition of the cross product. This formulation of
the cross product is very useful for calculations.
18
4.6 Theorem (Cross product in coordinates). Let v = (v
1
, v
2
, v
3
) and w = (w
1
, w
2
, w
3
) be
vectors in R
3
. Then
v w = (v
2
w
3
v
3
w
2
, (v
1
w
3
v
3
w
1
), v
2
w
1
v
1
w
2
). (4.7)
Proof. Set u := (v
2
w
3
v
3
w
2
, (v
1
w
3
v
3
w
1
), v
2
w
1
v
1
w
2
). Now observe
|v w|
2
= |v|
2
|w|
2
sin
2
= |v|
2
|w|
2
(1 cos
2
)
= |v|
2
|w|
2
|v|
2
|w|
2
cos
2
= (v v)(w w) (v w)
2
(Proposition 3.11)
= (v
2
1
+v
2
2
+v
2
3
)(w
2
1
+w
2
2
+w
2
3
) (v
1
w
1
+v
2
w
2
+v
3
w
3
)
2
(Theorem 3.12)
= ((v
2
w
3
)
2
2(v
2
w
3
v
3
w
2
) + (v
3
w
2
)
2
) ((v
1
w
3
)
2
2(v
1
w
3
v
3
w
1
) + (v
3
w
1
)
2
)
+ ((v
1
w
2
)
2
2(v
1
w
2
v
2
w
1
) + (v
2
w
1
)
2
)
= (v
2
w
3
v
3
w
2
)
2
+ ((v
1
w
3
v
3
w
1
))
2
+ (v
2
w
1
v
1
w
2
)
= |u|
2
.
To nish we need only verify that v u = 0, w u = 0 (i.e., that v ? u, w ? u) and that
v, w, u obey the right hand rule. These are easily checked and their verication is left to the
reader.
The following corollary is immediate from Theorem 4.6
4.8 Corollary (Properties of the cross product). Let v, w, u be vectors in R
3
and suppose
2 R.
(a) v w = (w v).
(b) (v) w = (v w) = v (w).
(c) v (w +u) = v w +v u.
(d) v (w u) = (v w) u. These expressions are called triple scalar products.
(e) v (w u) = (v u)w (v w)u.
4.2 Coordinate vectors and the Determinant
4.9 Denition (Standard coordinate vectors). We dene the vectors i := (1, 0, 0), j :=
(0, 1, 0), k := (0, 0, 1). These three vectors are called the standard coordinate vectors on
R
3
. Its easy to verify that
1. |i| = |j| = |k| = 1;
2. i ? j, i ? k, j ? k;
3. i j = k, k i = j, j k = i. That is, i, j, k satisfy the right hand rule.
4. Any vector v = (v
1
, v
2
, v
3
) = v
1
i + v
2
k + v
3
j. We say that any vector v 2 R
3
is a linear
combination of {i, j, k}.
4.10 Remark. Since i, j, k satisfy the right hand rule, the lines Ri, Rj, Rk pointing in the
direction of i, j, k respectively are the coordinate axes, by denition. In general, we can take
any two vectors v and w in Euclidean 3-space and then {v, w, v w} will induce a coordinate
system. If v ? w, then {v, w, v w} will induce a rectangular coordinate system.
19
Our main purpose with the coordinate vectors i, j, k is to explore a method of calculating
v w which is far easier to remember than Equation (4.7) in Theorem 4.6. First we recall a few
facts about matrices.
4.11 Denition (Matrix). Recall that an n n matrix with entries in R (or a real n n
matrix) is a square array of real numbers, i.e.,
A =
_
_
_
a
11
. . . a
1n
.
.
.
.
.
.
.
.
.
a
n1
. . . a
nn
_
_
_
where a
ij
2 R for all i, j 2 {1, . . . , n}.
More generally we can allow the entries of an n n matrix to be elements of an arbitrary
set S.
4.12 Denition (Determinant). Suppose that A is a 22 real matrix and that for a, b, c, d 2 R
A =
a b
c d
.
Then dene the determinant of A, denoted det(A), as det(A) := ab cd.
Likewise, suppose that B is a 3 3 real matrix such that
B =
_
_
a
11
a
12
a
13
a
21
a
22
a
23
a
31
a
32
a
33
_
_
.
Then we dene the determinant of B as
det(B) := a
11
det
a
22
a
23
a
32
a
33
a
12
det
a
21
a
23
a
31
a
33
+a
13
det
a
21
a
22
a
31
a
32
.
In the above sum, 2 2 matrices which appear after a
1i
are the matrices yielded by looking at
B and ignoring the row and column which contains a
1i
.
We can generalize the notion of determinant a bit to include matrices which have vectors
in R
3
as the entries in the rst row. The formulation is identical to the above, except that we
replace a
1i
with a vector.
Now we relate the cross product to the determinant.
4.13 Theorem (Cross product as a determinant). Let i, j, k be the coordinate vectors. Sup-
pose v = hv
1
, v
2
, v
3
i, w = hw
1
, w
2
, w
3
i are two vectors in R
3
. Then
v w = det
_
_
i j k
v
1
v
2
v
3
w
1
w
2
w
3
_
_
.
Proof. Observe that
det
_
_
i j k
v
1
v
2
v
3
w
1
w
2
w
3
_
_
= (v
2
w
3
v
3
w
2
)i (v
1
w
3
v
3
w
1
)j + (v
2
w
3
v
3
w
2
)k.
In light of Theorem 4.6 we have shown the result.
20
4.14 Denition (Triple scalar product). Let v, w, u be three vectors in R
3
. Then the expres-
sion v (w u) is called the triple scalar product. Note that from Corollary 4.8 we have
v (w u) = (v w) u.
A corollary to Theorem 4.6 relates the determinant of a matrix to a triple scalar product.
4.15 Corollary (Triple scalar product: matrix form). Let v = (v
1
, v
2
, v
3
), w =
(w
1
, w
2
, w
3
), u = (u
1
, u
2
, u
3
) be three vectors. Then the the triple scalar product v (w u) is
given by
det
_
_
v
1
v
2
v
3
w
1
w
2
w
3
u
1
u
2
u
3
_
_
.
Proof. Observe that
det
_
_
v
1
v
2
v
3
w
1
w
2
w
3
u
1
u
2
u
3
_
_
= (w
2
u
3
w
3
u
2
) v
1
(w
1
u
3
w
3
u
1
) v
2
+(w
2
u
3
w
3
u
2
) v
3
= v (wu).
The last equality follows from Theorem 4.6.
Here is a geometric interpretation of the triple scalar product.
Figure 10: The volume of a parallelepiped is given by a triple scalar product
4.16 Proposition (Triple scalar product as the area of a parallelepipide). Consider the
parallelepiped (or simply box) formed by the vectors v, u, w with base u, v (see Figure 10).
Then the area of this box is given by w (u v).
Proof. Consider the box as given in Figure 10. Recall that the area of the box with sides v, u, w
is given by area of base vertical height. By Proposition 4.3, the area of the base is just |uv|.
The vertical height is given by |w| cos . But note that is actually the angle between uv and
w. Thus
area of the box = |(|w| cos ) (|u v|)| = |(|w| |u v| cos )| = |v (u w)|.
The last equality follows from the coordinate independent denition of the dot product.
4.17 Remark. The value w (u v) is called the signed volume of the box. In general the
dot product of two vectors need not be positive, so to talk about an actual volume we take the
absolute value.
21
4.3 Exercises
4.18 Exercise. Let v, w, u be vectors in R
3
. Are the following true or false?
(a) Depending on v and w, v w can equal v w.
(b) If v w = 0 and v w = 0 then either v = 0 or w = 0.
(c) If v w = v u, and v 6= 0, then w = u.
4.19 Exercise. Give the coordinate description of three vectors which satisfy the right hand
rule.
4.20 Exercise. Rogawski 12.4: 1, 5, 10, 12, 14, 17, 26, 28, 32, 33, 36, 50, 59, 61
22
5 Planes, lines, and projections
Now we are at the point where we can put some of the machinery of earlier sections to use. In
this section we will introduce how to construct planes and lines using vectors.
5.1 Notation. Throughout this section we will make use of the fact that we can identify
points with vectors. We will abuse language slightly and soon speak about adding a point p to
a vector v. This is just shorthand for saying that we are adding the vector p with tail at the
origin and head at the point p to the vector v. We will drop the overline to avoid a proliferation
of notation.
Additionally, we will will often identify a vector with the point its head points to. This
should not be a point of confusion. If the student feels uncomfortable with the duality between
points and vectors we suggest they review the previous material.
5.1 Lines
5.2 Denition (Vector equation for a line). Let v be a vector in R
n
, p a point in R
n
and
t 2 R. Then the vector equation of the line r(t) = p +tv denes a line through the point p
in the direction of v.
5.3 Remark. The equation r(t) above is called a vector valued equation because the output
is a vector. When we say that r(t) denes a line we of course are identifying the output vectors
of r(t) with points.
5.4 Denition (Parametric equations for a line). Say that v = (v
1
, . . . , v
n
) is a vector and
p = (p
1
, . . . , p
n
) is a point. We can write r(t) = p + tv as r(t) = hp
1
+tv
1
, . . . , p
n
+tv
n
i. The
equations
p
1
+tv
1
= x
1
,
.
.
.
p
n
+tv
n
= x
n
are called the parametric equations for the line r(t). Note that here x
i
for i = 1, . . . , n
is playing the role of a variable. If an arbitrary point (x
1
, . . . , x
n
) 2 R
n
satises the above
parametric equations, this simply means that it is a point on the line r(t). More explicitly,
it means that there is a s 2 R such that r(s) = (x
1
, . . . , x
n
). Compare the situation here to
Examples 2.4 and 2.5.
Next we turn to a way of rewriting the parametric equations so that we can easily nd the
set description of the line.
5.5 Denition (Symmetric equations for a line). Let v = (v
1
, . . . , v
n
) be a vector and p =
(p
1
, . . . , p
n
) a point. Let I {1, . . . , n} be the index set such that for i 2 I, v
i
6= 0. Say
I = {i
1
, . . . , i
k
}. If we solve each of the parametric equations of r(t) for t we get the series of
equalities
x
i1
p
i1
v
i1
= =
x
i
k
p
i
k
v
i
k
.
These are called the symmetric equations of the line.
Note that if v
j
= 0, then x
j
= p
j
. That is, x
j
is a constant.
23
5.6 Example. Let v = (1, 2) be a vector in R
2
and p = (0, 1) a point. Then the vector
equation for the line through p in the direction of v is given by
r(t) = p +tv = (0, 1) +t(1, 2) = (0 +t, 1 + 2t).
The parametric equations for the line are given by
t = x,
1 + 2t = y.
The symmetric equations are given by
x =
y 1
2
.
To interpret this last result, we have shown that {(x, y) 2 R
2
: x = (y 1)/2} is the set
description of the line. As an example, the point (1, 3) is on the line. (We need only check that
(3 1)/2 = 1).
5.7 Denition (Parallel and skew lines). Suppose that two lines L
1
, L
2
have vector equations
r(t) = p + tv and q(t) = p
0
+ tw respectively. We say that L
1
and L
2
are parallel if v and w
are parallel. We say that L
1
and L
2
are skew if L
1
and L
2
dont intersect and if v and w are
not parallel.
5.2 Planes
Recall that in Euclidean 3-space, two lines determine a plane. There is also a way to realize
a plane as solutions to a vector equation which involves the dot product. In particular, if we
know that a plane contains a given point p, and a vector n is perpendicular to the plane, we can
determine every other point on the plane.
First it helps to recall that given vectors v, w, the vector v w can be realized as the vector
which has tail at the head of w and head at the head of v. (See Figure 11).
Figure 11: Vector subtraction realized as one of the diagonals of the parallelogram formed by
two vectors.
5.8 Denition (Normal vector). Let P be a plane in Euclidean 3-space and n a vector which
is perpendicular to the plane. Then we say that n is a normal vector of P, and that n is
normal to P.
In particular, if a, p 2 P are points in the plane, then the vector a p is said to be in the
plane. The condition that n is perpendicular to P implies that n ? a p.
24
5.9 Proposition (Plane: vector form). Let P be a plane in Euclidean 3-space which contains
the point p, and suppose that n is a normal vector to P. Let a be an arbitrary point. Then
P = {a 2 R
3
: n (a p) = 0}.
In other words the solution set to the equation
n (a p) = 0
is a plane. The equation n (a p) = 0 is called the vector equation of the plane.
Proof. By our remark about the geometric interpretation of vector subtraction, we see that the
vector (a p) is a vector with tail at the point p. Suppose that a is a point in the plane P.
Then (a p) is a vector in the plane. That n is perpendicular to the plane is equivalent to
n ? (a p) () n (a p) = 0. This establishes that P {a 2 R
3
: n (a p) = 0}.
Now, say a is an arbitrary point. If n (a p) = 0, then n ? (a p) () (a p) is a vector
in the plane P () a 2 P. This establishes that {a 2 R
3
: n (a p) = 0} P.
We have shown in fact that P = {a 2 R
3
: n (a p) = 0}.
5.10 Remark. Proposition 5.9 just says that the equation n (p a) = 0 is the equation of
a plane. Compare this to a statement such as y = 2x + 5 is the equation of a line. What we
mean by the latter statement is that {(x, y) 2 R
2
: 2x + 5 = y} is a line.
We can rewrite the vector equation of a plane to get yet another equation which represents
a plane.
5.11 Denition (Plane: linear form). Let n = (n
1
, n
2
, n
3
) be a vector and p = (p
1
, p
2
, p
3
) a
point. If a = (x, y, z) is an arbitrary point, then
n (a p) = 0 ()
(n
1
, n
2
, n
3
) (x p
1
, y p
2
, z p
3
) = 0 ()
n
1
x +n
2
y +n
3
z (n
1
p
1
+n
2
p
2
+n
3
p
3
) = 0 ()
n
1
x +n
2
y +n
3
z = (n p).
Dene d := (n p). Then the equation
n
1
x +n
2
y +n
3
z = d
is called the linear equation of a plane. Indeed, any equation of the form Ax +By +Cz = d
with A, B, C, d 2 R represents the equation of a plane with normal vector n = (A, B, C). The
point d encodes information about where in space the plane is located.
Since we know that two intersecting lines determine a plane, its not a far stretch to suspect
that two vectors also form a plane. We explore this presently.
5.12 Proposition (Plane: matrix form). Let v = (v
1
, v
2
, v
3
) and w = (w
1
, w
2
, w
3
) be two
vectors in R
3
and p = (p
1
, p
2
, p
3
) 2 R
3
a point. If (x, y, z) is an arbitrary point in R
3
, then
solution set to the equation
det
_
_
x p
1
y p
2
z p
3
v
1
v
2
v
3
w
1
w
2
w
3
_
_
= 0 (5.13)
25
forms a plane. In other words, {(x, y, z) 2 R
3
: (x, y, z) satises Equation (5.13) } is a plane.
We call Equation (5.13) the matrix equation of the plane.
Proof. Recall that by Corollary 4.15,
det
_
_
x p
1
y p
2
z p
3
v
1
v
2
v
3
w
1
w
2
w
3
_
_
= (x p
1
, y p
2
, z p
3
) (v w)
= (v w) (x p
1
, y p
2
, z p
3
).
Thus
det
_
_
x p
1
y p
2
z p
3
v
1
v
2
v
3
w
1
w
2
w
3
_
_
= 0 () (v w) (x p
1
, y p
2
, z p
3
) = 0.
By Proposition 5.9, the solution set to the equation on the right is a plane with normal vector
v w. This shows that the solution set to the equation on the left is also a plane with normal
vector v w.
Lets see Proposition 5.12 in action.
5.14 Example. Suppose we want to determine the matrix equation of a plane which contains
the lines r(t) = (t, t, t) and q(t) = (t + 1, 2t + 1, t + 1). The rst natural question to ask is
whether these two lines are contained in a plane. It suces to check that r(t) and q(t) intersect.
Note that for t
0
= 0 and t
1
= 1, r(t
1
) = (1, 1, 1) = (0 + 1, 0 + 1, 0 + 1) = q(t
0
).
In order to invoke Proposition 5.12 we need two vectors. Note that r(t) = t(1, 1, 1) and
q(t) = (1, 1, 1) +t(1, 2, 1). Thus r(t) is parallel to the vector v := (1, 1, 1) and q(t) is parallel to
w := (1, 2, 1). In other words, v and w are vectors in the plane P. We have already determined
that the plane containing r(t) and q(t) also contains the point p = (1, 1, 1). Thus the matrix
equation of the plane is given by
det
_
_
x 1 y 1 z 1
1 1 1
1 2 1
_
_
= 0.
5.3 Angles between planes

5.15 Denition (Angle between planes). Let P and Q be planes with normal vectors n and
m respectively. If n and m are parallel we say that P and Q are parallel. If If n and m are not
parallel, then the angle between P and Q is dened to be the angle between n and m.
5.16 Remark. Let P and Q be two planes in Euclidean 3-space which are not parallel (and
hence not necessarily the same). It should be geometrically obvious that P \ Q is a line.
(Compare this with non-parallel lines in a plane). Let n and m be the normal vectors of P and
Q respectively. Then the line P \ Q is parallel to the vector n m. This should make sense
when we consider that P \ Q is a line in both P and Q and thus any vector parallel to P \ Q
is orthogonal to both n and m. Any 3-vector orthogonal to n and m is necessarily parallel to
n m for obvious geometrical reasons. (This last statement is not true in general. In higher
dimensions there are many non-parallel vectors which are orthogonal to a given pair. Indeed let
e
1
= (1, 0, 0, 0) and e
2
= (0, 1, 0, 0) be given. Then e
3
= (0, 0, 1, 0) and e
4
= (0, 0, 0, 1) satisfy
e
3
? e
1
, e
3
? e
2
and e
4
? e
1
, e
4
? e
2
but e
3
is not parallel to e
4
).
26
We have shown the following.
5.17 Proposition. Let P and Q be planes with normal vectors n and m respectively. Then
P \ Q is a line parallel to n m.
5.4 Vector projections
We now discuss a construction that will eventually allow us to determine the shortest distance
from a point to a plane. Its called vector projection.
5.18 Denition (Vector projection). Let v and w be two vectors in R
n
with angle between
them. The vector projection of v along w, denoted proj
w
v, is the vector dened by
proj
w
v := |v| cos
w
|w|
.
In particular note that proj
w
v is parallel to w and has length ||v| cos |. The quantity |v| cos
(the signed length of proj
w
v) is called the scalar projection of v onto w and is denoted by
scal
w
v. Observe that | scal
w
v| = | proj
w
v|.
We can also dene proj
w
v to be the vector parallel to w such that v = proj
w
v + v
0
where
v
0
? w. See Figure 12.
Figure 12: The vector projection of v onto w.
5.19 Proposition (Vector projection as a dot product). Let v, w be two vectors and the
angle between them. Then
(a) scal
w
v =
vw
|w|
;
(b) proj
w
v =
vw
|w|
w
|w|
=
vw
ww
w.
Proof. For (a) recall that scal
w
v = |v| cos . Then notice
v w
|w|
=
|v||w| cos
|w|
= |v| cos = scal
w
v.
Part (b) follows immediately from (a) the denition of proj
w
v.
A natural question arises as to why we might be interested in vector projections. Let v, w
be vectors. Suppose that v represents a force such as wind, and w is a direction vector. Then
proj
w
v represents the force of wind in the direction of w. Another example involves gravity.
27
5.20 Example. Suppose that a ball moves down a frictionless ramp which has angle as
measured from the vertical. Let g be the gravity vector, and w a unit vector which points down
the ramp. Then proj
w
g represents the force which accelerates the ball.
Here is a particularly nice application of vector projections.
5.21 Proposition (Distance from a plane to a point). Let P be a plane which contains the
point p = (p
1
, p
2
, p
3
) and has normal vector n = (n
1
, n
2
, n
3
). Let a = (x, y, z) be a point in
space. Dene v := a p = (x p
1
, y p
2
, z p
3
), i.e., v is the vector connecting the point p
to a. The shortest distance from P to a, denoted d(P, a) is given by
d(P, a) =
(a p) n
|n|
v n
|n|
= | proj
n
v|.
Proof. Geometrically it should be clear that the shortest distance from P to a is along a line
which is parallel to the normal vector n. But proj
n
v is a vector which has its tail at some point on
P and head at a, and is parallel to the normal vector by construction. Thus d(P, a) = | proj
n
v|.
By Proposition 5.20 | proj
n
v| =
vn
|n|
.
5.5 Exercises
5.22 Exercise. Rogaswki 12.5: 1, 5, 11, 13, 21, 23, 25, 29, 31, 35, 37, 43, 53, 59, 63, 65
28
6 Vector functions
In the previous section we encountered a way to realize a line as a vector-valued function. Here
we generalize this notion. As a point of caution, we will be very liberal about tacitly invoking
the duality of vectors and points.
6.1 Denitions
6.1 Denition (Vector function). A vector-valued function from R to R
n
is a function
r : X R !R
n
specied by r(t) := (x
1
(t), x
2
(t), . . . , x
n
(t)). The functions x
i
(t) for i = 1, . . . , n are called the
component functions of r. We say that r(t) is continuous if each component function x
i
(t)
is continuous as a function of a real variable.
By the domain of r(t) we mean the intersection of the domains of each of the component
functions.
6.2 Remark. Most of the vector equations that we will deal with are continuous. So when
we speak of a vector equation we mean a continuous vector equation unless otherwise stated.
While the outputs of a vector function are vectors, we often identify them with points for
practical purposes. To this end, when we speak about the image of a continuous vector function
we understand this to be a curve in space. When we say that a vector function r(t) traces a
curve C we mean that for each t in the domain of r, r(t) 2 C. In words this means that for each
t in the domain of r, the vector r(t) points to a point on C.
As the reader can already see, the symbol r(t) has two uses. Sometimes r(t) simply denotes
the function r when we wish to emphasize that r is a function of t. If we talk about an arbitrary
t in the domain of r, then r(t) means the vector output of the function r at the point t. Usually
its clear from context whether r(t) means the function r or an actual vector.
Some students have trouble internalizing the idea of component functions. For a simple case,
say r(t) = (x(t), y(t)). Then x(t) is simply some function which depends on t that indicates the
x-coordinate of the point (vector) r(t). Likewise for the component function y(t).
6.3 Example (Domain). The domain of r(t) := (
p
t, ln(t 1)) is {t 2 R : t > 1}. To see this
set x(t) :=
p
t and y(t) := ln(t 1). The domain of x(t) is all positive real numbers, but the
domain of y(t) is all real numbers strictly greater than 1. Indeed, the domain of y(t) is a subset
of the domain of x(t), so the intersection of the two domains is just the domain of y(t).
6.4 Example (Circle and helix).
(a) Let r(t) := (cos t, sin t). Then r(t) traces the unit circle centered at the origin. To see this,
let t 2 R. Then |r(t)| =
_
cos
2
(t) + sin
2
(t) = 1 so indeed r(t) is on the unit circle. Moreover,
the angle that r(t) makes with the x-axis is given by = t. This shows that for ever point p
on the unit circle there is some t 2 R such that r(t) = p.
(b) Let r(t) := (cos t, sin t, t). Then r(t) traces a helix. (See Figure 13).
Sometimes we are given a curve C and we want to nd a vector equation r(t) which traces
C.
6.5 Denition (Parameterization). Suppose that a curve C R
n
is traced by the vector
function r : [a, b] ! R
n
. Then we say that r(t) := (x
1
(t), . . . , x
n
(t)) parameterizes C, and
that r(t) is a parameterization of C. The component functions x
i
(t) for i = 1, . . . , n are the
parametric equations of the curve C.
29
Figure 13: A helix.
The variable t is called the parameter. Its helpful to imagine that t stands for time,
although in no means is this always the case. In particular, t need not be positive.
6.2 Basic calculus of vector functions
The calculus of vector functions essentially boils down to the calculus of component functions.
6.6 Denition (Limit of a vector function). Say that r(t) = (x
1
(t), . . . , x
n
(t)). Dene
lim
t!a
r(t) :=
lim
t!a
x
1
(t), . . . , lim
t!a
x
n
(t)
6.7 Denition (Derivative of a vector function). Say that r(t) = (x

1
(t), . . . , x
n
(t)) is a vector
function and x
i
(t) is dierentiable for i = 1, . . . , n. Dene
dr
dt
:= lim
h!0
r(t +h) r(t)
h
.
We often denote
dr
dt
by r
0
(t).
By the denition of limit and derivative of a vector function
dr
dt
= lim
h!0
r(t +h) +r(t)
h
=
lim
h!0
x
1
(t +h) x
1
(t)
h
, . . . , lim
h!0
x
n
(t +h) x
n
(t)
h
dx
1
dt
(t), . . . ,
dx
n
dt
(t)
.
For t = a, r
0
(a) is called the tangent vector to r(a). In particular if r(t) traces a curve C
then r
0
(a) is the vector which points toward the direction C is headed at the point a. In other
words, r
0
(a) is parallel to the tangent line of C at the point r(a). See Figure 14.
30
Figure 14: The tangent vector r
0
(t). Here r(t) denotes an actual vector rather than a function.
In other words, we have xed some t in the domain of r and we evaluate r at t to get the vector
r(t).
6.8 Example. Let r(t) := (t, t
2
). Then r
0
(t) = (d/dt(t), d/dt(t
2
)) = (1, 2t).
6.9 Example. If r(t) represents the position of a particle in 3-space at time t, then r
0
(a) is a
the velocity vector of the particle at t = a. That is, r
0
(a) indicates in what direction the particle
is moving and how fast. The speed, a scalar value, is given by |r
0
(a)|. Similarly, r
00
(a) is the
acceleration vector of the particle at time t = a.
6.10 Proposition (Rules for dierentiating vector equations). Say that r(t) and q(t) are
vector equations, 2 R is a scalar and f(t) is a real-valued function. Then
1.
d
dt
[r(t) +q(t)] = r
0
(t) +q
0
(t),
2.
d
dt
[r(t)] = r
0
(t),
3.
d
dt
[f(t)r(t)] = f
0
(t)r(t) +f(t)r
0
(t),
4.
d
dt
[r(t) q(t)] = r
0
(t) q(t) +r(t) q
0
(t),
5.
d
dt
[r(t) q(t)] = r
0
(t) q(t) +r(t) q
0
(t),
6.
d
dt
[r(f(t))] = f
0
(t)u
0
(f(t)).
6.11 Denition (Integrals of vector functions). Let P be an interval partition of [a, b] such
that |P| = m. In words, this means that P is a partition of [a, b] which consists of m subintervals.
Index P by {1, . . . , m}. Let c
i
be an element of the ith subinterval in P and set x
i
to be the
length of the ith subinterval.
Set r(t) := (x
1
(t), . . . , x
n
(t)). We dene
_
b
a
r(t) dt := lim
m!1
m
i=1
r(c
i
)x
i
.
31
In particular, its easy to see that
_
b
a
r(t) dt =
_
lim
m!1
m
i=1
x
1
(c
i
)x
i
, . . . , lim
m!1
m
i=1
x
n
(c
i
)x
i
_
=
_
_
b
a
x
1
(t) dt, . . . ,
_
b
a
x
n
(t) dt
_
.
The same is true for indenite integrals. Dene
_
r(t) dt :=
_
x
1
(t) dt, . . . ,
_
x
n
(t) dt
6.3 Exercises
6.12 Exercise. Let a = (a
1
, . . . , a
n
) and b = (b
1
, . . . , b
n
) be two points in R
n
. Let t 2 [0, 1].
Try to write down a vector equation r(t) such that r(t) is a parameterization of the line segment
connecting a to b. (Hint: We require r(0) = a, and r(1) = b. Also, your equation should involve
a (1 t) term. )
6.13 Exercise. Let C
r
be the circle in R
2
of radius r centered at the origin. Let n 2 N. Write
down a vector equation r(t) which traverses C
r
n times in a counterclockwise direction when t
ranges between 0 and 2. Also write down a vector equation q(t) which does the same but in a
clockwise direction.
6.14 Exercise. Find a parameterization for the ellipse centered at the origin with a 5 unit long
major axis parallel to the y-axis and a 3 unit long minor axis which is parallel to the x-axis.
6.15 Exercise. Let f : X R !R. Consider the graph G of the function f as a curve in the
plane. Find a parameterization for G. (See Denition 6.5 and dont think too hard).
6.16 Exercise. Let r : X R ! R
n
be a vector equation. Suppose that for all t 2 X,
r(t) 6= 0. Show that
d
dt
|r(t)| =
r(t) r
0
(t)
|r(t)|
.
(Hint: Use the fact that |r(t)|
2
= r(t) r(t) and Proposition 6.10 part 4 .)
6.17 Exercise. Let r : X R ! R
n
be a vector equation. Suppose that for all t 2 X,
r(t) 6= 0. Fix some 2 R. Prove |r(t)| = (for all t 2 X) () r
0
(t) ? r(t) (for all t 2 X). In
other words the vectors outputs of r all have the same length if and only if r(t) is orthogonal to
its tangent vector r
0
(t). Yet another way to state this problem is that a vector equation traces
a curve on a sphere centered at the origin if and only if r(t) is always orthogonal to its tangent
vector r
0
(t). (Hint: Use Exercise 6.16). This is a good question that incorporates a few dierent
ideas. You should remember how to do it.
6.18 Exercise. Suppose that a particle moves with a constant angular speed ! around a circle
with center at the origin and radius R. The particle is said to be in uniform circular motion.
Assume that the motion is counterclockwise and that the particle is at the point (R, 0) when
t = 0. That is, the position vector for the particle is given by r(t) = (Rcos !t, Rsin !t) for
t 0. See Figure 15.
(a) Find the velocity vector function v(t) and show that v(t) r(t) = 0.
(b) Show that the speed |v(t)| of the particle is the constant !R. Dene the period T of the
particle as the amount of time required for one complete revolution. Conclude that
T =
2R
|v(t)|
=
2
!
.
32
Figure 15: Centripetal acceleration.
(c) Find the acceleration vector function a(t). Show that a(t) is proportional to r(t) and that
it points toward the origin. An acceleration with this property is called centripetal acceleration.
Show that |a(t)| = R!
2
.
(d) Suppose that the particle has mass m. Show that the magnitude of the force F that is
required to produce this motion, called a centripetal force, is given by
|F| =
m|v|
2
R
.
Recall that F = m a.
6.19 Exercise. From Rogawski 13.1: 5, 9, 11, 15, 19, 28, 29
13.2: 1, 3, 9, 15, 17, 19, 23, 29, 35, 36, 37, 43, 57, 63
13.5: 3, 5, 19
33
7 Arc Length and Curvature
Here we will apply the calculus of vector equations to describe the curvature of a curve in space.
First we discuss the length of curves in space.
7.1 Arc length
Suppose that a curve C in R
n
is parameterized by r(t) = (x
1
(t), . . . , x
n
(t)), where each x
0
i
(t)
(i = 1, . . . , n) is continuous, and t 2 [a, b]. How would we approach the problem of nding the
length of the curve C?
Assume for the sake of simplicity that r(t) traces C exactly once as t ranges between a and
b. Then let P be an interval partition of [a, b] such that |P| = m. Set c
i
to be an element of
the ith subinterval of the partition. In order to approximate the arc length of C we need only
look at the sum of the lengths of the line segments which join r(c
i
) to r(c
i+1
). (See Figure 16).
That is,
m
i=1
length of line segment joining r(c
i
) to r(c
i+1
) arc length of C.
The length of the line segment joining r(c
i
) to r(c
i+1
) is given by
_
[x
1
(c
i+1
) x
1
(c
i
)]
2
+ + [x
n
(c
i+1
) x
n
(c
i
)]
2
.
We can then take the limit as m !1 of such approximations to nd the actual length of C. In
taking the limit we convert the sum into an integral. Moreover, the expression x
j
(c
i+1
) x
j
(c
i
)
(for j = 1, . . . , m) becomes dx
j
/dt.
Figure 16: Approximating the length of a curve in R
3
with line segments.
7.1 Denition (Arc length of a vector function). Let r : X R !R
n
be a vector equation.
Say r(t) = (x
1
(t), . . . , x
n
(t)) such that x
0
i
is continuous for each i = 1, . . . , n. Suppose [a, b] X.
The arc length of r(t) from a to b, denoted L(a, b), is dened by
L(a, b) :=
_
b
a
dx
1
dt
2
+ +
dx
n
dt
2
dt.
We can write this more concisely as
L(a, b) =
_
b
a
|r
0
(t)| dt.
34
If r : [a, b] !R
n
then L := L(a, b) is called simply the arc length of r(t).
7.2 Denition (Arc length function). Let r : [a, b] !R
n
be a vector equation. Say t 2 [a, b].
The function
s(t) :=
_
t
a
|r
0
(u)| du
is called the arc length function. By the fundamental theorem of calculus we can readily see
that
d
dt
(s(t)) = |r
0
(t)|.
7.3 Remark. Suppose that r(t) : [0, b] !R

n
is the position function for a particle. Then s(t)
represents the total distance traveled by the particle after t seconds.
Figure 17: The arc length function s(t) in action.
7.2 Parameterization with respect to arc length
Let r : [a, b] !R
n
be a parameterization of a curve C. In particular, its helpful to imagine that
C is the path traced by a particle which has position function r(t). For t 2 [a, b], r(t) represents
the position of the particle after t seconds. Suppose we are more interested in the particles
position after it travels a specied distance rather than for a specied amount of time.
Mathematically this means we want to use the arc length s as a parameter. Letting s be
a parameter has the advantage that r(t(s)) is the particles position after it travels s units of
distance along C from the starting point r(a). To use s as a parameter we must realize t as a
function of s, which we write t(s). In order to nd t(s) we need to recall that s =
_
|r
0
(t)| dt. If
we take the integral on the right we have a function of t. Next we can solve for t to realize t as
a function of s.
By nding r(t(s)) we say that we have reparameterized the curve C with respect to
arc length and that r(t(s)) is a parameterization of C with respect to arc length.
35
7.4 Example. (a) Say r(t) := (cos 2t, sin 2t) where t 2 [0, ]. Then
s(t) =
_
t
0
|r
0
(u)| du =
_
t
0
_
2
2
cos
2
2u + 2
2
sin
2
2u du =
_
t
0
2 du = 2t.
Solving for t yields t = s/2, i.e., t(s) = s/2. Say s = . Then r(t()) = r(/2) = (cos , sin ) =
(1, 0) represents a point which is units along the circle traced by r(t). This matches our
intuition, since we know that an arc on the unit circle which has length also subtends an angle
of . The point on the unit circle (centered at the origin) which has angle with the x-axis is
indeed the point (1, 0).
(b) Now say q(t) := (cos 4t, sin 4t) for t 2 [0, /2]. The dierence between r(t) and q(t) is
that q(t) traces the unit circle twice as fast as r(t). What happens if we reparameterize with
respect to arc length? Again
s(t) =
_
t
0
|r
0
(u)| du =
_
t
0
_
4
2
cos
2
4u + 4
2
sin
2
4u du =
_
t
0
4 du = 4t.
Solving for t yields that t = s/4, i.e. t(s) = s/4. Say s = . Then q(t()) = q(/4) =
(cos , sin ) = (1, 0) represents a point which is units along the circle traced by q(t).
Now lets dene t
r
(s) := s/2 and t
q
(s) := s/4. Note the interesting fact that q(t
q
()) =
r(t
r
()). A moments thought reveals that indeed q(t
q
(s)) = r(t
r
(s)). What this shows is that
parameterizing a curve with respect to arc length is independent of the initial parameterization
with respect to t. In other words, given any two parameterizations of a curve C with respect to
t, they both induce the same reparameterization with respect to arc length.
7.3 Curvature
Curvature measures how fast a curve is turning. More mathematically, curvature is the change
in direction divided by the change in position. First lets clarify what we mean by that. The
object which quanties the change in direction is called the unit tangent vector.
7.5 Denition (Unit tangent vector). Let r(t) be a vector function in R
n
. Set T(t) :=
r
0
(t)/|r
0
(t)|. We call T(t) the unit tangent vector function. In particular, x a point a 2 R
which is in the domain of r. Then T(a) is the unit tangent vector to r at the point r(a).
Given a curve C there are typically many dierent parameterizations. We would like curva-
ture to be independent from any choice of parameterization involving t. So by change in position
we mean the quantity ds rather than just dr.
7.6 Denition (Curvature). Let C be a curve parameterized by r(t). Dene
:=
dT
ds
to be the curvature of C.
By denition it should be clear that curvature implicitly depends on t. We make this clear
now.
7.7 Proposition (Curvature as a function of t). Let C be a curve parameterized by r(t).
Then
(t) =
|T
0
(t)|
|r
0
(t)|
.
36
Proof. Observe that by the chain rule
dT
dt
=
dT
ds

ds
dt
=) =
dT
ds
=
|dT/dt|
|ds/dt|
=
|T
0
(t)|
|r
0
(t)|
.
Here is a particularly useful way to compute curvature.
7.8 Proposition. Let C be a curve parameterized by r(t). Then
(t) =
|r
0
(t) r
00
(t)|
|r
0
(t)|
3
.
Proof. To ease notation we will omit the variables associated to functions, e.g. by T we mean
T(t). By Proposition 7.7, it suce to show that
|T
0
| =
|r
0
r
00
|
|r
0
|
2
.
To this end, recall that T = r
0
/|r
0
| and that |r
0
| = ds/dt. Thus
r
0
= |r
0
|T =
ds
dt
T =) r
00
=
d
2
s
dt
2
T +
ds
dt
T
0
.
Thus
r
0
r
00
=
_
ds
dt
T
_
_
d
2
s
dt
2
T +
ds
dt
T
0
_
=
ds
dt
d
2
s
dt
2
T +
ds
dt
ds
dt
T
0
= 0 +
ds
dt
2
(T T
0
) (since T T = 0)
= |r
0
|
2
(T T
0
).
Now |T(t)| = 1 for all t () T ? T
0
(by Exercise 6.17). In particular the angle between T
and T
0
is /2. Then |T T
0
| = |T||T
0
| sin /2 = 1 |T
0
| 1 = |T
0
|. Thus
|r
0
r
00
| = |r
0
|
2
|T
0
| =) |T
0
| =
|r
0
r
00
|
|r
0
|
2
.
Let f(x) be a function of a real variable. Suppose we wish to calculate the curvature of its
graph.
7.9 Corollary. Let f(x) be continuous function. Then the curvature of its graph (thought
of as embedded in the x, y-plane of R
3
) at the point (x, f(x), 0) is given by
(x) =
|f
00
(x)|
(1 + (f
0
(x)
2
))
3/2
.
37
Proof. If we embed the graph of f in the x, y-plane of R
3
then the vector function r(x) =
(x, f(x), 0) parameterizes the graph of f. Then
r
0
(x) = (1, f
0
(x), 0) and r
00
= (0, f
00
(x), 0).
This in turn implies
r
0
(x) r
00
(x) = (0, 0, f
00
(x)) =) |r
0
(x) r
00
(x)| = |f
00
(x)|.
Moreover, |r
0
(x)| =
_
1 + (f
0
(x))
2
. By Proposition 7.8
(x) =
|r
0
(x) r
00
(x)|
|r
0
(x)|
3
=
|f
00
(x)|
(1 + (f
0
(x))
2
)
3/2
.
7.4 Exercises
7.10 Exercise. Prove that the curvature of a circle with radius a is the constant 1/a. That
is, show that if r(t) is a parameterization for the circle of radius a centered at the origin then
(t) = 1/a for all t. (That the curvature of a circle is constant should be geometrically clear).
7.11 Exercise. From Rogawski 13.3: 1, 3, 7, 9, 13, 14, 15, 21
13.4: 1, 3, 7, 9, 13, 22, 24, 33, 37, 59, 60
38
8 Multivariable functions
The reader is already intimately familiar with functions of a single variable. Single variable
functions are insucient for many phenomena we would like to study. For instance, it might
seem hopelessly complicated to model the height of a mountain by a single variable function. If
we have two variables to work with (latitude and longitude), the situation is not so dire.
In this section we explore the basics of multivariable functions.
8.1 Denitions
8.1 Denition (Multivariable function). A (real valued) multivariable function is simply a
function f : X R
n
! R. You may think f as a rule which assigns a number to all n-tuples
(x
1
, . . . , x
n
) 2 X R
n
.
We write f(x
1
, . . . , x
n
) to indicate that f depends on the n variables x
1
, . . . , x
n
. When n = 2
we typically write f(x, y). When n = 3 we typically write f(x, y, z) .
By the domain of f(x
1
, . . . , x
n
), denoted D
f
, we mean the set where f is dened. In
particular, if f : X R
n
!R then D
f
= X. The range of f is dened as the set {f(x
1
, . . . , x
n
) :
(x
1
, . . . , x
n
) 2 D
f
}.
We dene the graph of f(x
1
, . . . , x
n
) to be the set {(x
1
, . . . , x
n
, f(x
1
, . . . , x
n
)) 2 R
n+1
:
(x
1
, . . . x,
n
) 2 D
f
}.
8.2 Example.
(a) Let f(x, y) := x
2
y +
p
y has {(x, y) 2 R
2
: y 0} as domain. The range of f(x, y) is all
of R. To see this x y = 0 and let x range through all of R.
(b) Let f(x, y, z) := 3. Indeed, f need not depend upon all the variables. Then D
f
= R
3
and the range of f is {3}.
(c) Let f(x, y) := x
2
+y
2
. The domain of f is R
2
.
Figure 18: One way to visualize a function of two variables.
8.3 Remark. Just like a function of a single variable is often visualized as a curve in the plane,
we can visualize the graph of f(x
1
, . . . , x
n
) as an n-dimensional object in an (n+1)-dimensional
space. In particular, the graph of f(x, y) can be realized as a surface which lies above the
domain D
f
R
2
of f. We often write z = f(x, y) to refer to the graph of f(x, y). This is
identical to the notation y = f(x) for a function of a single variable.
Likewise if some surface in R
3
happens to be the graph of a function f(x, y) we will typically
refer to said surface as z = f(x, y).
39
Figure 19: The graph of a function of two variables.
8.4 Denition (Level sets and pre-images). Let f : R
n
! R. Let k 2 R be a xed constant.
The level set of f at k (denoted f(x
1
, . . . , x
n
) = k) is the set dened by
[f(x
1
, . . . , x
n
) = k] := {(x
1
, . . . , x
n
) 2 R
n
: f(x
1
, . . . , x
n
) = k}.
We sometimes denote f(x
1
, . . . , x
n
) = k by f
1
(k) and call this set the pre-image of {k}.
In words, the level set f(x
1
, . . . , x
n
) = k is the set of all n-tuples (x
1
, . . . , x
n
) in the domain
of f such that f(x
1
, . . . , x
n
) is equal to some xed constant k 2 R.
Sometimes each level set f(x
1
, . . . , x
n
) = k for k 2 R has a similar description. When we can
easily categorize each level set we often speak about the level sets of f. We mean by this the
collection {f(x
1
, . . . , x
n
) = k : k 2 R}, i.e., the collection of all level sets. See the examples.
Figure 20: The level sets of f(x, y).
40
Figure 21: The level sets of f(x, y, z) = x
2
+y
2
+z
2
.
8.5 Example. Let f(x, y) :=
_
x
2
+y
2
. Let k 2 R. Then k = f(x, y) () k
2
= x
2
+y
2
. We
recognize this as the a circle of radius k, centered at the origin. This shows that the level sets
of f(x, y) are circles.
8.6 Example. The contours on a topo-map are examples of level sets. Each contour on a
topo-map designates a certain elevation, and so the contours represent level sets of some height
function.
8.7 Example. Let f(x, y, z) = x
2
+ y
2
+ z
2
and k 2 R. Then k = f(x, y, z) () (
p
k)
2
=
x
2
+y
2
+z
2
. We recognize this as a sphere of radius
p
k centered at the origin. Thus the level
sets of f(x, y, z) are spheres centered at the origin. See Figure 21.
8.2 Continuity
Recall that in a single dimension, we say a function f(x) is continuous at a 2 R if lim
x!a
+ f(x) =
lim
x!a
f(x) = f(a). In two dimensions there are many possible paths by which we can
approach a point, and so we need a more precise notion of continuity.
8.8 Denition (Convergence). Let {a
n
} be a sequence in R
n
and say a 2 R
n
. Then we say
that a
n
approaches a or a
n
converges to a if for every number > 0 there is some N 2 N
such that n N =) d(a
n
, a) . Intuitively, this just means that the points a
n
are getting
closer to a as n !1. In symbols we can write this a few ways. Both
a
n
!a, and lim
n!1
a
n
= a
mean that a
n
converges to a.
8.9 Denition (Limit). Let a 2 R
n
. We say that the limit of f(x
1
, . . . , x
n
) approaches L
as (x
1
, . . . , x
n
) approaches a or the limit of f as (x
1
, . . . , x
n
) approaches a is L, written
lim
(x1,...,xn)!a
f(x
1
, . . . x
n
) = L
if for every number > 0 there exists some number > 0 such that d((x
1
, . . . , x
n
), a) < =)
d(f(x
1
, . . . , x
n
), L)) < .
41
Figure 22: The limit L of f(x, y) as (x, y) ! (a, b). The picture describing what it means for
a function f(x, y) to be continuous at (a, b) is essentially the same, with L replaced by f(a, b).
This captures the fact that a function need not be dened at a point (a, b) in order to have a
limit there, but a function needs to be dened at (a, b) in order to be continuous at (a, b)
8.10 Denition (Continuity). Let a 2 R
n
be a point. We say that f(x
1
, . . . , x
n
) is continu-
ous at a if given > 0, there exists a > 0 such that d(x, a) < =) d(f(x), f(a) > .
A function f(x
1
, . . . , x
n
) is continuous on X if for all a 2 X R
n
, f is continuous at a. If
f is continuous for all a 2 R
n
we say simply that f is continuous.
8.11 Remark. This denition of continuity captures an important fact, namely that in a
relatively small region around a 2 R
n
, a continuous function f attains values which are close to
f(a).
Here is an alternative denition of continuity.
8.12 Denition (Continuity: limit point description). Let a 2 R
n
be a point. Then f(x
1
, . . . , x
n
)
is continuous at a if (x
1
, . . . , x
n
) !a =) f(x
1
, . . . , x
n
) !f(a). In other words, if
lim
(x1,...,xn)!a
f(x
1
, . . . , x
n
) = f(a).
8.13 Remark. The limit point description of continuity captures the fact that a function is
continuous if and only f(x
1
, . . . x
n
) !f(a) no matter how (x
1
, . . . , x
n
) approaches a.
8.14 Example. Let f(x, y) = x
2
y/(x
4
+y
2
). We claim that f does not have a limit at (0, 0).
First, consider (x, y) !(0, 0) along the line y = mx. Then
f(x, y) = f(x, mx) =
mx
3
x
4
+m
2
x
2
.
By LHospitals rule we see that
lim
x!0
mx
3
x
4
+m
2
x
2
= 0
and so f(x, mx) ! 0 as (x, mx) ! (0, 0). This shows in fact that f(x, y) ! 0 if we approach
along the lines y = mx. However, what if we approach along the parabola y = x
2
? Then
f(x, y) = f(x, x
2
) =
x
4
2x
4
42
and so as (x, y) !0 along (x, x
2
) we see that f(x, y) !1/2 since
lim
x!0
x
4
2x
4
=
1
2
.
Since f(x, y) 6!0 when (x, y) !(0, 0) for all possible paths of approach, f does not have a limit
at (0, 0).
8.15 Remark. Most the functions we deal with will be continuous, so this subsection is mostly
of technical interest.
8.3 Partial Derivatives
The derivative of a function of a single variable is an important concept in calculus. How
might we generalize the notion of a derivative to functions of multiple variables? Suppose that
f(x
1
, . . . , x
n
) is a function of n variables. If we x the variables x
2
, . . . , x
n
say, then f depends
only on one variable, namely x
1
. In other words, f becomes a function of a single variable. We
can now take its derivative!
8.16 Denition (Partial derivative at a point). Let f(x
1
, . . . , x
n
) be a function on R
n
and
(a
1
, . . . , a
n
) 2 R
n
. Then the partial derivative of f with respect to the variable x
i
at
(a
1
, . . . , a
n
) is dened to be
lim
h!0
f(a
1
, . . . , a
i
+h, a
i+1
, . . . , a
n
) f(a
1
, . . . , a
n
)
h
if this limit exists. We denote this quantity as f
xi
(a, b) or @f/@x
i
(a, b). Here the symbol @ is
used instead of d to distinguish the fact that f depends on multiple variables.
8.17 Remark. Lets explore the case a bit more for a function of two variables. Say f(x, y)
is a function on R
2
. Let (a, b) 2 R
2
. Then dene g(x) := f(x, b) and h(y) := (a, y). Its not too
dicult to see that f
x
(a, b) = g
0
(a) and f
y
(a, b) = h
0
(b).
This gives rise to a rather nice geometric description of the partial derivatives. The (con-
tinuous) functions g(x) and h(y) represent curves on the graph of f(x, y). Since f
x
(a, b) and
f
y
(a, b) are the respective derivatives of g and h, we see that f
x
(a, b) and f
y
(a, b) represent the
slopes of tangent lines to the curves g(x) and h(y) at the point (a, b, f(a, b)). See Figure 23.
8.18 Denition (Partial derivatives). Say that f(x

1
, . . . , x
n
) is a function on R
n
. Then the
partial derivatives of f are the functions on R
n
dened by
@f
@x
1
(x
1
, . . . , x
n
) : = lim
h!0
f(x
1
+h, . . . , x
n
) f(x
1
, . . . , x
n
)
h
.
.
.
@f
@x
n
(x
1
, . . . , x
n
) : = lim
h!0
f(x
1
, . . . , x
n
+h) f(x
1
, . . . , x
n
)
h
.
Since its often clear that the partial derivatives are functions of the variables x
1
, . . . , x
n
we
often simply denote the partial derivatives by
@f
@x
i
or f
xi
, (i = 1, . . . , n).
We read @f/@x
i
as the partial of f with respect to x
i
.
43
Figure 23: Geometric interpretation of the partial derivatives. Here C
1
is the curve g(x) and
C
2
the curve h(y) where g and h are dened as in the remark after Denition 8.16. The slope
of the line T
1
is just f
x
(a, b) = g
0
(a) and the slope of T
2
is f
y
(a, b) = h
0
(b).
When we want to emphasize that we are taking a derivative, we will write
@
@x
i
f :=
@f
@x
i
.
8.19 Remark. This is really not that dierent from how we dened the derivative of a function
of a single variable. First we dened the derivative of a function at a point, and then let that
point vary in order to get the derivative function. Thats all we have done here.
Be careful here to note that in the denition of the partial derivative f
xi
the variable x
i
plays
a dual role. On the one hand its specifying which variable we are letting range or if you like
which variable we are taking the derivative with respect to. On the other hand its the the ith
component of the element (x
1
, . . . , x
n
) 2 R
n
.
Lets consider the two variable case a bit more carefully. Say f(x, y) is a function of the
variables x and y. If again we dene g(x) := f(x, y) where y is xed, then f
x
(x, y) = g
0
(x).
Likewise, dene h(y) := f(x, y) where x is xed. Then f
y
(x, y) = h
0
(y).
8.20 Example. Unraveling the denitions reveals that the partial derivatives are actually very
easy to compute. Given f(x
1
, . . . , x
n
) the function f
xi
is obtained by thinking of f as a function
of the single variable x
i
and treating all the other variables as xed constants.
(a) Let f(x, y) := xy
2
. Then f
x
(x, y) = y
2
and f
y
(x, y) = 2xy. In nding f
x
we treated y
2
as some constant, so that f(x, y) was a linear equation in the variable x. Likewise, in nding f
y
we treated x as a constant so that f(x, y) was a quadratic equation in the variable y.
(b) Let f(x, y) := sin(xy). Then f
x
(x, y) = y cos(xy) and f
y
(x, y) = xcos(xy).
By denition the partial derivatives are functions themselves, so we can take their partial
derivatives.
8.21 Denition (Second partials). Let f(x
1
, . . . , x
n
n
. Then the 2nd
order partial derivatives of f (or mixed partial derivatives) are the functions
@
2
f
@x
i
@x
j
:=
@
@x
j
@f
@x
i
44
i.e., the jth partial derivative of the ith partial derivative of f. We also frequently denote the
second partial derivative as f
xixj
:= (f
xi
)
xj
.
We can iterate the process of taking 2nd order partial derivatives to take 3rd, 4th, etc. order
partial derivatives. So for instance, we write
@
3
f
@x
i
@x
j
@x
k
:=
@
@x
k
@
2
f
@x
i
@x
j
8.22 Example.
(a) Let f(x, y) = x
2
y
2
. Then f
xx
= 2y
2
, f
xy
= 4xy = f
yx
, f
yy
= 2x
2
, f
xxx
= 0, f
xxy
= 4y.
(b) Let f(x, y, z) = xyz. Then f
xy
= z, f
xz
= y, f
yz
= x. Also f
xx
= f
yy
= f
zz
= 0.
8.23 Theorem (Young). Let f(x
1
, . . . , x
n
n
. If the 2nd order partial
derivatives of f exist and are continuous, then
f
xixj
= f
xjxi
.
8.4 The chain rule
Suppose f(x
1
, . . . , x
n
) is a function of the variables x
1
, . . . , x
n
but each x
i
(i = 1, . . . , n) is itself
a function of the variables t
1
, . . . , t
m
. What if we are interested in @f/@t
j
(j 2 {1, . . . , m})?
The following proposition tells us what to do.
8.24 Proposition (Chain rule). Let f(x
1
, . . . , x
n
) be a function of the variables x
1
, . . . , x
n
and suppose for i = 1, . . . , n each x
i
is itself a function of the variables t
1
, . . . , t
m
. Then for
j = 1, . . . , m
@f
@t
j
=
@f
@x
1
@x
1
@t
j
+ +
@f
@x
n
@x
n
@t
j
.
8.25 Example.
(a) Say f(x, y) = xy and x(t) = t
2
, y(t) = t
3
. Then df/dt = (@f/@x)(dx/dt)+(@f/@y)(dy/dt) =
y 2t + x 3t
2
. Here we use the notation df/dt because we can think of f as a function of the
single variable t.
(b) Say f(x, y) = xy and x(s, t) = st, y(s, t) = st
2
. Then @f/@s = (@f/@x)(@x/@t) +
(@f/@y)(@y/@s) = y t +x t
2
.
8.5 Implicit dierentiation
Let F(x, y) be some function of two variables. Say F(x, y) = 0 is an implicit description of
y := f(x), i.e., F(x, f(x)) = 0. Then the variables x and y both depend on x. Indeed, x = x
and y = f(x). We can use the chain rule to dierentiate both sides with respect to x. Observe
that
@F
@x
dx
dx
+
@F
@y
dy
dx
= 0.
However, dx/dx = 1. Assuming that @F/@y 6= 0, we can solve for dy/dx to get
dy
dx
=
@F
@x
@F
@y
=
F
x
F
y
.
45
This gives a good description of implicit dierentiation in a single variable.
We would like to extend this idea to functions of more than one variable. To this end,
suppose that f(x, y) is implicitly described by F(x, y, z) = 0, i.e., F(x, y, f(x, y)) = 0. Again
we can think of x, y, z depending on the variables x, y so that x = x, y = y and z = f(x, y).
Applying the chain rule to both sides of F(x, y, f(x, y)) = 0 and thinking of F as a function of
x, y and f yields
@F
@x
@x
@x
+
@F
@y
@y
@x
+
@F
@f
@f
@x
= 0.
Since @x/@x = 1 and @y/@x = 0, if we suppose that @F/@z 6= 0 we obtain
@f
@x
=
@F
@x
@F
@f
=
F
x
F
f
.
Similarly,
@f
@y
=
@F
@y
@F
@f
=
F
y
F
f
.
Note that throughout this discussion we have thought of F as a function of f, i.e., f is the z
variable. Some authors replace f with z.
8.26 Example. Say that f(x, y) is a function of x and y and that 3xf = y
2
. Then 3xf y
2
=
0 so that F(x, y, z) := 3x z y
2
satises F(x, y, f(x, y)) = 3x f y
2
= 0. By the chain rule,
@F
@x
=
@F
@x
dx
dx
+
@F
@y
dy
dx
+
@F
@F
@f
@x
= 3f + 0 + 3x
@f
@x
= 0 =)
@f
@x
=
3f
3x
.
8.27 Remark. The key steps involved in taking the partial derivatives of implicit functions is
to (1) realize the implicit description as a solution to some F(x, y, z) = 0 (2) Apply the chain
rule to F(x, y, z) where z = f(x, y), x = x, y = y depend on the variables x and y. These two
steps are easy to remember and thus there is not much need to memorize any formulae.
8.6 Exercises
8.28 Exercise. Rogawski 14.1: 1, 3, 5, 9, 19, 27, 39, 47
14.2: 1, 5, 13, 16,
14.3: 9, 11, 13, 17, 19, 33, 47, 48, 49, 53, 55
14.6: 1, 3, 11, 17, 22, 25, 27
46
9 Tangent planes and the linear approximation
9.1 Tangent planes
Recall that the tangent line to a graph of f(x) is a good approximation to f(x) nearby the point
it is tangent to. We can extend this idea to functions of two variables.
Throughout let f(x, y) be a function on R
2
with continuous partial derivatives. Then we can
visualize the graph of f(x, y) as a surface in 3-space which we denote by z = f(x, y). Given a
point (a, b) in the domain of f, dene g(x) := f(x, b) and h(y) := f(a, y). Then we can visualize
the graphs of g(x) and h(y) as curves on the surface z = f(x, y). Call C
1
and C
2
the curves
associated with g and h respectively. Call T
1
and T
2
the lines tangent to C
1
and C
2
respectively
at at (a, b, f(a, b)). In particular, T
1
has slope g
0
(a) = f
x
(a, b) and C
2
has slope h
0
(b) = f
y
(a, b).
9.1 Denition (Tangent plane). The tangent plane to the surface z = f(x, y) at the point
(a, b, f(a, b)) is the plane formed by the lines T
1
, T
2
. Sometimes we do not distinguish between
the function f and its graph z = f(x, y). In this case we will simply talk about the tangent
plane to f.
Figure 24: The tangent plane to the surface z = f(x, y) at the point P = (a, b, f(a, b)).
9.2 Theorem (Equation for the tangent plane). Suppose that f(x, y) has continuous partial
derivatives. Then the equation of the tangent plane to the surface z = f(x, y) at (a, b, f(a, b))
is given by
z f(a, b) = f
x
(a, b)(x a) +f
y
(a, b)(y b).
Proof. Let T
1
and T
2
be the lines which determine the tangent plane to f(x, y) at the point (a, b).
By denition T
1
and T
2
are parallel to the vectors v
1
:= h1, 0, f
x
(a, b)i and v
2
:= h0, 1, f
y
(a, b)i
respectively. Then n := v
2
v
1
= hf
x
(a, b), f
y
(a, b), 1i is normal to the tangent plane of
z = f(x, y) at (a, b, f(a, b)). The tangent plane contains the point (a, b, f(a, b)) and so by
47
Proposition 5.9 the equation of the tangent plane is given by
n hx a, y b, z f(a, b)i = 0 () hf
x
(a, b), f
y
(a, b), 1i hx a, y b, z f(a, b)i = 0
() f
x
(a, b)(x a) +f
y
(a, b)(y b) (z f(a, b)) = 0
() z f(a, b) = f
x
(a, b)(x a) +f
y
(a, b)(y b)
as desired.
9.2 Linearizations
Suppose we wish to approximate the function f(x, y) near the point (a, b). Then it is geometri-
cally obvious that points on the tangent plane to z = f(x, y) at (a, b, f(a, b)) are close to points
on the surface z = f(x, y) near (a, b, f(a, b)). (See Figure 24). In the present subsection we will
make this geometric intuition algebraic.
9.3 Denition (Linearization). Let f(x, y) be a function and (a, b) in the domain D
f
of f.
The function
L(x, y) := f(a, b) +f
x
(a, b)(x a) +f
y
(a, b)(y b)
is called the linearization of f(x, y) at (a, b). In particular, L(x, y) f(x, y) when (x, y) is
near (a, b). This approximation is called the linear approximation of f(x, y) at (a, b).
Note that to write down L(x, y) all we did was solve the tangent plane equation z f(a, b) =
f
x
(a, b)(x a) +f
y
(a, b)(y b) for z.
9.4 Remark. The equation of the tangent plane to z = f(x, y) at (a, b, f(a, b)) and the
linearization are the same equation, but they emphasize dierent things. The equation of the
tangent plane is an algebraic description of a geometric object which locally resembles the graph
of f(x, y). The linearization is a function which locally is similar to f(x, y). This duality is
exactly the same as the duality between the equation of the tangent line and the linearization
of a function of a single variable.
We can linearize functions of arbitrarily many variables, but it becomes quite hard to draw
pictures. Nonetheless we have the following denition.
9.5 Denition (Linearization of f(x
1
, . . . , x
n
)). Let f(x
1
, . . . , x
n
) be a function and (a
1
, . . . , a
n
)
be a point in D
f
. Then the linearization L(x
1
, . . . , x
n
) at the point (a
1
, . . . , a
n
) is dened as
L(x
1
, . . . , x
n
) := f(a
1
, . . . , a
n
) +f
x1
(a
1
, . . . , a
n
)(x
1
a
1
) + +f
xn
(a
1
, . . . , a
n
)(x
n
a
n
).
9.3 The dierential

The linearization of a function f(x, y) at a point (a, b) 2 D
f
approximates f(x, y) when
d((x, y), (a, b))) is small. Suppose we were more interested in f := f(a +x, b +y) f(a, b)
where the quantities x and y are simply small numbers. (The letter is the Greek capital
letter delta, and f reads delta f.) Geometrically f represents the change in height of the
graph of f(x, y) when we compare the two heights f(a+x, b+y) and f(a, b). The dierential
df is a function which approximates f.
First recall that in a single variable case the dierential df is the function depending of the
variable dx dened by
df := f
0
(x)dx.
48
Figure 25: The dierential df of a single variable function f(x).
In this context, dx can be any real number. Suppose that f has a continuous derivative f
0
dened for all a 2 D
f
. Say a 2 D
f
. Then df = f
0
(a)dx is simply the change in height of the
tangent line to f at (a, f(a)) (with equation f(a) +f
0
(a)(x a)) when a changes by some small
amount. Here dx represents this small amount and so we see how df is indeed a function of the
variable dx. Take a few moments to study Figure 25. Your intuition of the single variable case
will help guide you in the two variable case.
Now we dene the dierential for functions of two variables.
9.6 Denition (Dierential). Let f(x, y) be a function. Then
df := f
x
(x, y)dx +f
y
(x, y)dy =
@f
@x
dx +
@f
@y
dy
where dx and dy are real variables. That is, df is a function of the variables dx and dy.
In particular, if dx = x = x a and dy = y = y b then
df = f
x
(a, b)(x a) +f
y
(a, b)(y b)
so f(x, y) f(a, b) +df. In other words, L(x, y) = f(a, b) +df where L(x, y) is the linearization
of f at the point (a, b). This relates the concepts of dierential and linearization.
9.7 Remark. Note that while df is a function of dx, dy the functions f
x
(x, y) and f
y
(x, y)
appear in the denition. In this sense df depends also on x and y. In practice the variables x
and y play the role that the point (a, b) does in the linear approximation L(x, y) at the point
(a, b). Typically some point (a, b) is specied, and dx, dy represent numbers which are small
deviations from a and b respectively. The example and exercises to follow will make this clear.
Also key to remember is that df f = z. Take a few moments to study Figure 26.
We can generalize the dierential to functions of arbitrarily many variables.

9.8 Denition (Dierential of f(x
1
, . . . , x
n
)). Let f(x
1
, . . . , x
n
) be a function. Dene the
increment f := f(x
1
+ x
1
, . . . , x
n
+ x
n
) f(x
1
, . . . , x
n
) where x
i
for i = 1, . . . , n is
49
Figure 26: The dierential df of f(x, y).
simply a small number. The the dierential df is dened as
df :=
@f
@x
1
dx
1
+ +
@f
@x
n
dx
n
where dx
i
is a real variable for i = 1, . . . , n. As before df f.
9.9 Example. Lets say we are interested in the area of a rectangular window pane. Suppose
we measure the length of the window to be 10cm and the height to be 50cm. If we know that our
measurements are accurate to within 1cm, by approximately how much will our measurement
of the area dier from the actual area?
9.10 Solution. We are trying to approximate the change in the area of a square. Recall that
the area of a square with sides x and y is given by f(x, y) = xy. Then the problem is to simply
approximate f(10 +x, 50 +y) where |x| 1 and |y| 1.
In this example we could easily nd exactly how far our estimate of the area diered from
the true area. Indeed, by assumption the maximum that the area could be is 51 11 square cm
and the minimum is 49 9. So we can already see through simple logic that our measurement
diers by at most 61 square cm.
Lets try dierentials. The statement that our measurements are accurate to 1cm translates
to dx = dy = 1. That we measured the window initially to have sides of length 10 and 50 means
that (x, y) = (10, 50). Thus
df = f
x
(10, 50)dx +f
y
(10, 50)dy = 10 1 + 50 1 = 60.
This shows that the maximum dierence between the actual area and our measurement is
approximately 60 square cm. Not too shabby.
50
9.4 Exercises
9.11 Exercise. From Rogawski 14.4: 1, 3, 5, 7, 9, 15, 17, 21, 23, 25, 29, 31, 35, 37
51
10 The directional derivative and gradient
Suppose f(x, y) is a function of two variables. Recall that we visualize f
x
(a, b) as the slope of a
tangent line to a curve on the surface z = f(x, y). Namely, given (a, b) we dened g(x) := f(x, b).
If you remember, g(x) is a curve on the surface z = f(x, y) that passes through (a, b, f(a, b))
and g
0
(a) = f
x
(a, b).
Now if we project the curve g(x) down onto the x, y-plane we obtain a line which is parallel
to the x-axis. A perfectly natural question to ask is what happens if we want to consider the
slopes of curves on the surface z = f(x, y) with projections that are not parallel to either the
x or y axes. In other words, whats the rate of change of f in the direction of some arbitrary
vector?
As an applied motivation, suppose that f(x, y) gives the temperature on the earth at the
point with coordinates (x, y) (i.e., x is longitude and y is latitude). Then f
x
tells us how the
temperature is changing if we move due east. Likewise f
y
represents how the temperature is
changing if we move due north. What if we want to know how the temperature is changing as
we move north-east?
To explore this question we need a tool called the directional derivative.
10.1 Directional derivatives
10.1 Denition (Directional derivative at a point). Suppose f(x
1
, . . . , x
n
) is a function and
(a
1
, . . . , a
n
) 2 D
f
. Let u = (u
1
, . . . , u
n
) be a unit vector. Then the directional derivative of
f at the point (a
1
, . . . , a
n
) in the direction u, denoted D
u
f(a
1
, . . . , a
n
), is dened by
D
u
f(a
1
, . . . , a
n
) := lim
h!0
f(a
1
+h u
1
, . . . , a
n
+h u
n
) f(a
1
, . . . , a
n
)
h
.
Since (a
1
, . . . , a
n
) was arbitrary, we can replace it with (x
1
. . . , x
n
) to realize the directional
derivative as a function of the variables x
1
, . . . , x
n
.
There is a nice way to compute D
u
f. First we need yet another new object, called the
gradient.
10.2 Denition (Gradient). Let f(x
1
, . . . , x
n
) be a function of the variables x
1
, . . . , x
n
. The
gradient of f is the vector function, denoted by rf, dened by
rf(x
1
, . . . , x
n
) := hf
x1
(x
1
, . . . , x
n
), . . . , f
xn
(x
1
, . . . , x
n
)i .
10.3 Example. Let f(x, y) = xy. Then rf(x, y) = hy, xi.

10.4 Proposition (Directional derivative as a dot product). Let f be a (dierentiable
a
)
function of the variables x
1
, . . . , x
n
. If u is a unit vector in R
n
, then
D
u
f(x
1
, . . . , x
n
) = rf(x
1
, . . . , x
n
) u. (10.5)
a
All functions we deal with will be dierentiable, so this concept is of technical interest only.
Proof. We will prove the claim for the case when f(x, y) is a function of the two variables x and y.
Generalizing this proof is an easy matter. Let u = (u
1
, u
2
) be a unit vector. To begin, lets look
52
Figure 27: D
u
f(x
0
, y
0
) when f(x, y) is a function of two variables. Here u = (a, b). The curve C
is the intersection of the plane {(ha, hb, z) 2 R
3
} with the surface z = f(x, y). P(x
0
, y
0
, z
0
) :=
(x
0
, y
0
, f(x
0
, y
0
)) is a point on the curve C. T is the line tangent to C at the point P(x
0
, y
0
, z
0
).
Observe that T has slope D
u
f(x
0
, y
0
). To see this notice that a point Q(x, y, z) on the curve
C which is close to the point P(x
0
, y
0
, z
0
) has coordinates (ha, hb, f(ha, hb)). Thus the slope
of the secant line through P(x
0
, y
0
, z
0
) and Q(x, y, z) is given by z/h = [z z
0
]/h = [f(x
0
+
ha, y
0
+hb) f(x
0
, y
0
)]/h. Taking the limit as h !0 gives the slope of the tangent line to T at
the point P(x
0
, y
0
, z
0
). However, the limit of this expression is also D
u
f(x
0
, y
0
). The notation
used in the Figure is Stewarts.
53
at the right hand side of Equation (10.5). First note that rf(x, y) u = f
x
(x, y)u
1
+f
y
(x, y)u
2
.
Let (a, b) 2 R
2
be a point and dene g(h) := f(a +hu
1
, b +hu
2
). Then
g
0
(0) = lim
h!0
g(0 +h) g(0)
h
= lim
h!0
f(a +hu
1
, b +hu
2
) f(a, b)
h
= D
u
f(a, b).
Next we want to realize x and y as functions of the variable h. To this end, set x = a +hu
1
and
y = b +hu
2
. Then g(h) = f(x, y). By the chain rule we have
g
0
(h) =
df
dh
=
@f
@x
dx
dh
+
@f
@y
dy
dh
= f
x
(x, y)u
1
+f
y
(x, y)u
2
.
If we take h = 0 then x = a + 0 u
1
= a and y = b + 0 u
2
= b, and so
g
0
(0) = f
x
(a, b)u
1
+f
y
(a, b)u
2
.
But remember that g
0
(0) = D
u
f(a, b). Thus we have shown that
D
u
f(a, b) = g
0
(0) = f
x
(a, b)u
1
+f
y
(a, b)u
2
= rf(a, b) u.
Since (a, b) 2 R
2
was arbitrary, weve in fact shown
D
u
f(x, y) = f
x
(x, y)u
1
+f
y
(x, y)u
2
= rf(x, y) u
which is precisely the claim.
10.6 Example. Consider f(x, y) = 300 + 50(1 x
2
) + 50(1 y
2
) where x, y 2 B(0, 1), the
open unit ball centered at the origin. This function might model the temperature in degrees
centigrade of of sea water surrounding a hydrothermal vent located at the origin. (Note that as
(x, y) moves away from the origin, the quantities (1x
2
) and (1y
2
) become smaller). Suppose
we want to know the rate of change of the temperature with respect to distance at the point
(.5, .5) in the direction u = 1/
p
5(1, 2). In other words, we want to compute D
u
f(.5, .5).
We can of course use Proposition 10.4. Observe that
rf(x, y) = h100x, 100yi =) rf(.5, .5) = h50, 50i .
Thus
D
u
f(.5, .5) = rf(.5, .5) u =
1
p
5
(50 +100) = 30
p
5.
That we can realize the directional derivative as a dot product of a unit vector with the
gradient makes the next corollary an obvious thing to write down. In the corollary we x a
point in R
n
and consider D
u
f as a function of the vector u.
10.7 Corollary (Maximizing D
u
f). Suppose that f is a (dierentiable) function of the vari-
ables x
1
, . . . , x
n
. Let a 2 D
f
. Consider D
u
f(a) as a function of the unit vector u. Then the
maximum value of D
u
f(a) occurs when u = rf where 0 < 2 R (i.e., u and rf have the
same direction). In other words, the gradient rf is the direction in which f has the greatest
rate of change.
Proof. Let be the angle between u and rf(a). By Proposition 10.4, D
u
f(x, y) = rf(x, y)u =
|rf(x, y)||u| cos = |rf(x, y)| cos which is at its maximum when cos = 1 () = 0 since
2 [0, ]. However when = 0 the vectors u and rf(a) are parallel and moreover point in the
same direction. In other words, u = rf(x, y) where 0 < 2 R.
54
10.8 Remark. A moments thought reveals that D
u
f is minimized when u = rf for
0 < 2 R.
10.9 Example. Suppose that for x, y 0, f(x, y) = x
2
y
3
models the density of poisonous
insects on the oor. Say you stand at the point (5, 4) and your front door is at the origin. In
what direction should you initially walk if you wish to encounter the fewest insects?
10.10 Solution. By Corollary 10.7 and the remark that follows we know that the desired
direction is rf(5, 4) = (f
x
(5, 4), f
y
(5, 4)) = (2 5 4
3
, 3 5
2
4
2
) = (640, 1200). Typically
when we use a vector to indicate a direction we want a unit vector. Thus the desired direction
is given by
rf(5, 4)
| rf(5, 4)|
=
1
p
640
2
+ 1200
2
(640, 1200).
10.11 Remark. We can actually do much better than merely determining which direction to
initially walk. The notion of a gradient eld, studied later, will give us an entire path we could
follow so that we encounter the fewest insects.
10.2 Geometry of the gradient
In this subsection we will describe a nice geometric description of the gradient vector. To begin
with, suppose that f(x, y) is a function of the variables x and y. Let k 2 R. Assume that the
level set f(x, y) = k is a curve in the plane R
2
which passes through the point (a, b). Our present
goal is to show that the gradient vector rf(a, b) is perpendicular to f(x, y) = k.
To see why this might be note that we can parameterize the curve f(x, y) = k by r(t) =
(x(t), y(t)) for t 2 [c, d]. In particular say that (a, b) = (x(t
0
), y(t
0
)) for some t
0
2 [c, d]. Then
f(x(t), y(t)) = k.
Next we can use the use the chain rule to take the derivative of both sides with respect to t,
which yields
0 =
df
dt
=
@f
@x
dx
dt
+
@f
@y
dy
dt
= hf
x
, f
y
i hx
0
(t), y
0
(t)i
= rf r
0
(t).
So in particular rf(a, b) ? r
0
(t
0
) which just means that rf(a, b) is perpendicular to f(x, y) = k
at the point (a, b). Since (a, b) was an arbitrary point on f(x, y) = k we have shown that
whenever (x, y) 2 R
2
lies on the curve f(x, y) = k, rf(x, y) is perpendicular to f(x, y) = k at
the point (x, y).
10.12 Example. Suppose that f(x, y) represents the height of a mountain at the coordinate
(x, y). Then a path which follows gradient vectors will be the path of steepest ascent. A path
which follows the negative gradient vectors is the path of steepest descent.
Moreover, in this context it becomes intuitively clear why the gradient should be perpendicu-
lar to level curves. Imagine that you were walking along a level curve f(x, y) = k. Physically this
means you are following a path which has constant elevation. Suppose u points in the direction of
your walk. A gain of 0 in elevation means that D
u
f = 0 for your entire walk. However since you
are walking along the level curve f(x, y) = k, u is just the unit tangent vector T(t) of the vector
equation r(t) which parameterizes f(x, y) = k. Then D
u
f = 0 () rf T = 0 () rf r
0
= 0.
55
Figure 28: The gradient vector evaluated at a point on a level curve is perpendicular to that
level curve.
Yet another intuitive argument for why rf is perpendicular the direction of you walk is that
if it were not, then D
u
f 6= 0. That D
u
f 6= 0 means your walk involved some change in elevation
at some point, which contradicts our assumption that you were walking along a level curve.
Figure 29: A curve of steepest ascent follows the direction of the gradient vectors.
10.3 Exercises
10.13 Exercise. From Rogawski 14.5: 1, 3, 5, 9, 13, 17, 21, 25, 27, 31, 33, 35, 48, 51, 68
56
11 Maxima, minima, and saddle points
One of the key uses of the derivative is that it easily locates local maxima and minima. When
df/dx = 0 we are most likely locally at the top or bottom of the graph of f. If we are interested
at all in optimizing a process we need to nd maxima and minima. As we will see shortly, the
graph of f(x, y) is level when f
x
= 0 = f
y
. This leads us to suspect that the maxima and
minimum occur when f
x
= 0 = f
y
. Indeed, this is the case for most maximum and minimum
points.
We are primarily interested in functions of two variables here. However, the denitions and
results extend rather easily to functions of more variables.
11.1 Maxima and minima
11.1 Denition (Local maxima and minima). Suppose that f : X R
2
!R and (a, b) 2 X.
The function f(x, y) has a local maximum at the point (a, b) 2 R
2
if f(x, y) f(a, b) whenever
d((x, y), (a, b)) is sucient small, i.e., when (x, y) is close to (a, b). The value f(a, b) is called a
local maximum value. If f(x, y) f(a, b) whenever d((x, y), (a, b)) is suciently small, then
(a, b) is said to be a local minimum. The number f(a, b) is then called a local minimum
value.
If f : X R
2
! R and f(x, y) f(a, b) for all (x, y) 2 X then (a, b) is simply called a
maximumor sometimes an absolute maximum. Likewise if f(a, b) f(x, y) for all (x, y) 2 X
then (a, b) is called a minimum or an absolute minimum.
When we dont care whether a point (a, b) is a maximum or a minimum we call that point
an extremum and call f(a, b) an extreme value. The collection of extreme points is called
the extrema. We call the collection of maximum points the maxima and the collection of
minimum points the minima.
11.2 Remark. You might recall that if f : [a, b] !R then there is a certain subtlety involved
in nding maxima or minima. It is not sucient to simply nd all x 2 [a, b] such that f
0
(x) = 0.
We need to consider all critical points. In particular, a and b are critical points because f
0
is
not dened at a and b.
For example, consider the function f : [1, 1] ! R dened by f(x) = x
2
. Since f
0
(x) = 2x
we know that f
0
(x) = 0 () x = 0. Moreover its not too dicult to see that 0 is a local
minimum. However, f(1) = 1 are absolute maxima. Its easy to check that if 1 < x < 1 then
f(x) < f(1) = 1. Note though that f
0
(1) = 2.
A similar phenomenon occurs with two variables. Suppose that f : [a, b] [c, d] !R. In this
case the domain of f is not an interval but a rectangle. The partials f
x
and f
y
are not dened
on the perimeter of [a, b] [c, d], but f can have absolute maxima and minima there.
We want to develop a strategy for locating extrema. In order to do this we need some
denitions and a theorem. Most of the denitions that follow are to make a particular theorem
precise and can be skimmed on a rst reading.
11.3 Denition (Boundary and interior). Let X R
2
. We say that x 2 R
2
is a boundary
point of X if any ball centered at x contains points both in X and in the complement of X. In
other words, x 2 R
2
is a boundary point if for all 2 R, B(x, )\X 6= ? and B(x, )\(R
2
X) 6=
?
The collection of all boundary points of X is called the boundary of X and is denoted by
@X. Note that we do not require that x 2 X in order for x to be a boundary point!
An interior point of X is a point x 2 X such that x is not in the boundary of X. The
collection of all interior points of X is called the interior of X. In other words the interior of
57
X is the set {x 2 X : x / 2 @X}. Unravelling the denitions reveals that x is an interior point of
X if and only if there exists some B(x, ) such that B(x, ) X.
11.4 Denition (Closed and open sets). A set X R
2
is called closed if @X X, i.e., if X
contains all its boundary points. A set X R
2
is called open if x 2 X () x is in the interior
of X. Equivalently, X is open if X
c
is closed. Also equivalently, X is open if for all x 2 X, there
is some
x
2 R such that B(x,
x
) X.
Note that if a set X is not closed this does not imply that X is open. See Figure 30 for
some examples of sets which are neither open nor closed. Conversely, a set can be both open
and closed. (The set R
2
is both open and closed).
11.5 Denition (Bounded set). Let X R
2
. The set X is said to be bounded if X is
contained in some open ball. That is, X is bounded if there exists an a 2 R
2
and some r 2 R
such that X B(a, r).
11.6 Example.
(a) Let X = R
2
. Then @X = ? since R
2
X = ?. Also, X is open since every point is an
interior point. X is not bounded.
(b) Let a 2 R
2
and let X := B(a, r) = {x 2 R
2
: d(x, a) < r} (i.e., the ball centered at a
with radius r). Then @X = {x 2 R
2
: d(a, x) = r} which is just a circle centered at a of radius
r. In particular, @X is the perimeter of X. By denition X is bounded and by denition X is
open.
(c) Let X = [a, b] [c, d]. Again @X is simply the perimeter of X. To see that X is bounded
simply pick p to be the intersection of the two diagonals of X and set r = max{b a, d c}.
Then X B(p, r) and so X is bounded. X is closed because by construction X contains its
perimeter which weve already seen is its boundary.
Figure 30: Some closed sets and some which are not closed.
The previous denitions were for the most part to make the following theorem precise.
11.7 Theorem (Necessary conditions for local extrema which are interior points). Suppose
a function f(x, y) has a local extremum (a, b) and that (a, b) is an interior point of the domain
of f. If f
x
and f
y
exist at (a, b) then f
x
(a, b) = 0 = f
y
(a, b). In other words, rf(a, b) =

0.
58
11.8 Theorem (Second derivative test). Suppose that the second partials of f are continuous
on a disk centered at (a, b). Additionally suppose that f
x
(a, b) = 0 = f
y
(a, b). Dene
D := D(a, b) := det
f
xx
(a, b) f
xy
(a, b)
f
yx
(a, b) f
yy
(a, b)
= f
xx
(a, b) f
yy
(a, b) (f
xy
(a, b))
2
.
Then
1. D > 0 and f
xx
(a, b) > 0 =) (a, b) is a local minimum.
2. D > 0 and f
xx
(a, b) < 0 =) (a, b) is a local maximum.
3. D < 0 =) (a, b) is not a local extremum. In this case (a, b) is said to be a saddle
point.
11.9 Denition (Critical points). If f
x
(a, b) = 0 = f
y
(a, b) or if one of the partials does not
exist at (a, b) we say that (a, b) is a critical point.
Figure 31: The function f(x, y) = y
2
x
2
. Note that f
xx
= 2 and f
yy
= 2 and f
xy
(0, 0) = 0.
Thus D(0, 0) = 4. This shows that f has a saddle point at the origin.
11.10 Example. Let f(x, y) = xy. Find and classify all critical points.
11.11 Solution. It should be obvious that f
x
and f
y
exist for all (a, b). Thus to nd the
critical points it suces to nd where f
x
= 0 = f
y
. Since f
x
= y and f
y
= x, we know that
f
x
= 0 = f
y
() x = 0 and y = 0. In other words, the only critical point is (0, 0).
To classify (0, 0) we need to look at D(0, 0). Note that f
xx
= f
yy
= 0 and f
xy
= 1 so D = 1.
This shows that (0, 0) is a saddle point.
11.12 Theorem (Extreme value theorem). Suppose that f : X R

2
! R is a continuous
function on a closed, bounded set X. Then f attains its maximum and minimum values. In
other words, there exist a point (a, b) such that f(a, b) f(x, y) for all (x, y) 2 X. Additionally,
there exist a point (c, d) such that f(c, d) f(x, y) for all (x, y) 2 X.
With the extreme value theorem we have enough machinery to actually nd the extrema of
a function.
59
11.2 Finding extrema
Suppose that f(x, y) has a domain X which is a closed, bounded set. Then we know from the
extreme value theorem that f(x, y) attains it maximum and minimum values. In particular we
can now talk about how we might nd the extrema of f. We can break the search up into the
interior of X and the boundary of X. By Theorem 11.7 we know that any local extrema which
occur in the interior are necessarily critical points. Once we have found these critical points we
classify them according to the second derivative test if we can. If the second derivative test does
not apply because the second partials do not exist, we can simply evaluate f and make a note
of the value.
Second we need to look at the boundary of X. Since x 2 @X implies that f
x
(x) and f
y
(x) are
not dened, x is a critical point. Thus searching for extrema on the boundary is a bit more of
an art. It usually reduces to adding a constraint to the function and then nding the extrema of
a single variable function. This will become clear in the examples and exercises. We summarize
this process in the following remark.
11.13 Remark. To nd the extrema of a continuous function f on a closed, bounded set X
we must
1. Find all critical points in the interior of X and classify these critical points with the second
derivative test if it applies; else we simply evaluate f at the critical point.
2. Look at all points on @X and locate any extrema.
11.14 Example. Let X := [0, 1] [0, 1] be the unit square. Suppose that f : X !R is dened
by f(x, y) = x
2
3xy +y
2
. Find and classify all extrema.
11.15 Solution. First we nd the critical points of f in the interior of X. Since the second
partials of f exist and are continuous at all points in the interior, to nd the critical points it
suces to nd when f
x
= 0 = f
y
. Note that f
x
= 2x 3y and f
y
= 2y 3x. Now
0 = f
x
(x, y) = 2x 3y ()
2
3
x = y
0 = f
y
(x, y) = 2y 3x ()
3
2
x = y
In order for both f
x
and f
y
to be 0 simultaneously we require that 2/3x = 3/2x. Its clear that
2/3x = 3/2x () x = 0. This then implies that y = 0. However, (0, 0) is on the boundary of
X and so we have shown that there are no local extrema in the interior of X.
Since f is continuous on a closed, bounded set X the extreme value theorem tells us that f
attains its extreme values. By our remarks above, the extrema must occur on the boundary of
X. We can break @X into four sides. Set L
1
= {(x, 0) : 0 x 1}, L
2
= {(1, y) : 0 y 1},
L
3
= {(x, 1) : 0 x 1} and L
4
= {(0, y) : 0 y 1}. Lets consider how f behaves when
restricted to each of these line segments.
On the segment L
1
we know that y = 0 and so f(x, y) restricted to L
1
is just the single
variable function g
1
(x) := f(x, 0) = x
2
. This function is strictly increasing and so clearly has a
minimum at x = 0 and maximum at x = 1. What this shows is that the points (0, 0) and (1, 0)
are possible extrema of f. In particular g
1
(0) = f(0, 0) = 0 and g
1
(1) = f(1, 0) = 1 .
On the segment L
2
we know that x = 1 and so f(x, y) restricted to L
2
is just the single
variable function g
2
(y) := f(1, y) = 1 3y +y
2
. Then g
0
2
(y) = 3 + 2y and so g
2
has a critical
point at y = 3/2 and g
2
is decreasing when y < 3/2. The point (1, 3/2) / 2 X, and so g(y)
60
achieves its extrema on the boundary of L
2
. In particular, (1, 0) is the maximum of g
2
and (1, 1)
the minimum for g
2
. Moreover, the points (1, 0) and (1, 1) are possible extrema for f.
On the segment L
3
we know that y = 1 and so f(x, y) restricted to L
3
is the single variable
function g
3
(x) := f(x, 1) = 1 3x + x
2
. This function is the same as g
2
except that we have
replaced y with x. Thus it should be clear that g
3
has a maximum at x = 0 and minimum
at x = 1. Again, this shows that (0, 1) and (1, 1) are possible extrema for f. In particular,
g
3
(1) = f(1, 1) = 1 and g
3
(0) = f(0, 1) = 1.
On the segment L
4
we know that x = 0 and so f(x, y) restricted to L
4
is the single variable
function g
4
(y) := f(0, y) = y
2
. Note that g
4
is simply g
3
but x has been replaced by y. We
know g
4
is a strictly increasing function in the variable y and so has a minimum at y = 0 and
maximum at y = 1. This shows that (1, 0) and (0, 0) are possible extrema of f. (We actually
already knew this).
What we have shown are that the only possible extrema of f are the points {(0, 0), (1, 0), (1, 1), (0, 1)}
which happen to be the four corners of the domain. We also know that f(0, 0) = 0, f(1, 0) = 1,
f(1, 1) = 1 and f(0, 1) = 1. This shows that (1, 0) and (0, 1) are maxima and (1, 1) is a
minimum.
11.16 Example. Minimize f(x, y) = x

2
+y
2
+ 2x + 4y when y 0.
11.17 Solution. Since y 0 the domain of f is the set X := {(x, y) : y 0}. The boundary
of X is the line {(x, 0) : x 2 R}. To minimize f(x, y) we rst nd the critical points of f in the
interior of X. Since the second partials of f exist and are continuous in the interior of X, to nd
the critical points it suces to nd when f
x
= 0 = f
y
. Note that f
x
= 2x + 2 and f
y
= 2y + 4.
Now
0 = f
x
= 2x + 2 () x = 1
0 = f
y
= 2y + 4 () y = 2
and so f
x
(x, y) = 0 = f
y
(x, y) () (x, y) = (1, 2). Since (1, 2) / 2 X we know that the
only extrema of f occur on @X. In other words, the minimum of f is some point (x, 0) where
x 2 R.
We need to now check the behavior of f when it is restricted to the boundary @X. On @X,
y = 0 and so f(x, y) restricted to @X is the single variable function g(x) := f(x, 0) = x
2
+ 2x.
Observe that since g(x) is an upward facing parabola it has a minimum at the vertex and no
maxima. To nd the vertex we nd when g
0
(x) = 0. Since g
0
(x) = 2x + 2, g
0
(x) = 0 () x =
1. This shows that (1, 0) is the minimum of f on the set X.
11.3 Exercises
11.18 Exercise. Minimize f(x, y) = x
2
+xy + y
2
x y restricted by (a) x 0, (b) y 1,
(c) x 0 and y 1.
11.19 Exercise. Find a c > 0 such that f(x, y) = x
2
+ xy + cy
2
has a saddle point at (0, 0).
Note that f 0 on the lines x = 0, y = 0, y = x and y = x. This shows why the second
derivative test is useful. In particular, you cannot say that a point is a minimum by just
observing the behavior of f along a certain number of lines or curves.
11.20 Exercise. From Rogawski 14.7: 1, 3, 5, 7, 13, 24, 28, 29, 33, 34, 35, 37, 45, 48, 50,
51
61
12 Lagrange multipliers
Suppose we wish to minimize or maximize a function f(x, y, z) which is subject to some con-
straint g(x, y, z) = k. For example, say we wanted to minimize the sum of the squares of three
numbers which summed to 25. Then we would want to minimize f(x, y, z) = x
2
+y
2
+z
2
subject
to the constraint g(x, y, z) = x +y +z = 25.
In the previous section we would solve for z as a function of the variables x and y and
then try to maximize or minimize a function of two variables. This method can be clumsy at
times. In this section we present a method which will greatly facilitate maximizing or minimizing
functions. To nd the extrema of a function is an endeavor called optimization. The method
of optimization we explore in this section is called the method of Lagrange multipliers.
12.1 The method
To say that we want to nd the extrema of f(x, y, z) subject to the constraint g(x, y, z) = k is
to say that any extrema of f must lie on the level surface g(x, y, z) = k. Suppose that (a, b, c)
is an extremum of f(x, y, z) which lies on g(x, y, z) = k. Now let r(t) = (x(t), y(t), z(t)) be the
parameterization for some arbitrary curve on the surface g(x, y, z) = k which passes through
the point (a, b, c). In particular suppose that r(t
0
) = (a, b, c). Consider the composite function
h(t) = f(r(t)) = f(x(t), y(t), z(t)). Since f has an extremum at (a, b, c) we know that h has an
extremum at t
0
. If f is dierentiable we can use the chain rule to get
0 = h
0
(t
0
)
= f
x
(a, b, c)x
0
(t
0
) +f
y
(a, b, c, )y
0
(t
0
) +f
z
(a, b, c)z
0
(t
0
)
= rf(a, b, c) r
0
(t
0
).
We conclude that rf(a, b, c) ? r
0
(t
0
). However, since (a, b, c) lies on g(x, y, z) = k we also know
that if rg(a, b, c) 6=

0 then rg(a, b, c) ? r
0
(t
0
). This last fact follows from our discussion in
Section 10.2.
Our initial assumption was that r(t) parameterizes some arbitrary curve C in g(x, y, z) = k
which passes through (a, b, c). Suppose that q(t) parameterizes a dierent curve K on g(x, y, z) =
k which passes through (a, b, c) = q(t
1
). Further suppose that r
0
(t
0
) 6= q
0
(t
1
). The vectors r
0
(t
0
)
and q
0
(t
1
) form a plane P. Now by repeating the argument above we know rf(a, b, c) ? q
0
(t
1
)
and rg(a, b, c) ? q
0
(t
1
). In other words, rf(a, b, c) and rg(a, b, c) are both normal to the plane
P. This in turn implies that rf(a, b, c) and rg(a, b, c) are parallel. This just means that if
rg(a, b, c) 6=

0 then rf(a, b, c) = g(a, b, c) for some 2 R. We have proved the following
proposition.
12.1 Proposition (A necessary condition for the extrema of a function subject to a constraint).
Suppose that f(x, y, z) is subject to the constraint g(x, y, z) = k and that rg(x, y, z) 6=

0 for
all (x, y, z) 2 [g(x, y, z) = k]. If (a, b, c) is an extrema of f subject to g(x, y, z) = k then
rf(a, b, c) = rg(a, b, c) (12.2)
for some 2 R.
In particular any extremum (a, b, c) must satisfy Equation (12.2). This is what is meant
when we say that Equation (12.2) is a necessary condition for a point (a, b, c) to be an extremum.
12.3 Denition (Lagrange multiplier). The number from Proposition 12.1 is called a La-
grange multiplier, named after the famous French-Italian mathematician Joseph-Loius La-
grange (1736-1813).
62
12.4 Remark (Method of Lagrange multipliers). Suppose that f(x, y, z) is a function subject
to g(x, y, z) = k such that the conditions of Proposition 12.1 are satised. What Proposition
12.1 yields is a method for nding the extrema of a function f(x, y, z) subject to a constraint
g(x, y, z) = k.
1. First we nd all (x, y, z) and 2 R such that
rf(x, y, z) = rg(x, y, z)
g(x, y, z) = k.
2. We then evaluate f at each of the points (x, y, z) obtained in Step 1 to nd the maximum
and minimum values.
This method of nding extrema is called the method of Lagrange multipliers. This is a
very general method and it turns out in practice that we must utilize a few dierent techniques
to actually carry out Step 1. In the examples and exercises we will try to illustrate the most
common techniques. Once these techniques are mastered the method of Lagrange multipliers
becomes signicantly easier, though by no means is it always trivial. The key is to be able to
recognize through experience and intuition which technique is most amenable to a particular
Lagrange multiplier problem.
The rst thing to realize is that the equations rf(x, y, z) = rg(x, y, z) and g(x, y, z) = k
are actually a system of four equations, namely
f
x
(x, y, z) = g
x
(x, y, z)
f
y
(x, y, z) = g
y
(x, y, z)
f
z
(x, y, z) = g
z
(x, y, z)
g(x, y, z) = k.
Our task in Step 1 then is to nd points (x, y, z) which satisfy the four equations above. The
dicult part is actually solving the system of equations. We dont have a general algorithm
which will solve any given system of equations.
12.5 Remark. It should be clear that the same arguments used above apply to functions of a
two variables. Suppose f(x, y) is subject to g(x, y) = k. We can then think of f as a function of
x, y and z which does not depend on z and we can embed the level curve g(x, y) = k into the x,
y plane. Then we can argue as above to see that any extrema of f(x, y) subject to g(x, y) = k
must also satisfy rf = rg and g = k.
12.2 Some examples
Here we want to explore a few of the common techniques which are utilized to actually apply
the method of Lagrange multipliers. In particular we are going to illustrate a few techniques
for solving a system of equations. Often to solve a system of equations you need to show some
cleverness.
12.6 Example. Whats the maximum sum of the squares of three numbers which sum to
25?
12.7 Solution. The function we wish to maximize is f(x, y, z) = x
2
+ y
2
+ z
2
subject to the
constraint g(x, y, z) = x +y +z = 25. Since rg = (1, 1, 1) its clear that rg(x, y, z) 6=

0 for all
(x, y, z) 2 g(x, y, z) = 25. Thus by Proposition 12.1 we know that (x, y, z) is a maximum only if
g(x, y, z) = 25.
63
Rephrased, this means that (x, y, z) satises the four equations
2x =
2y =
2z =
x +y +z = 25.
In particular this shows that x = y = z and so the equation x+y +z = 25 becomes x+x+x =
25 () x = 25/3. We have deduced that 3 (25/3)
2
is the maximum sum of the squares of
three numbers which sum to 25.
12.8 Example. Prove that the rectangle of maximum area which has some given perimeter p
is a square.
12.9 Solution. Let x be the base of the rectangle and y the height. Here it should be
clear that the function we are trying to maximize is f(x, y) = xy subject to the constraint
g(x, y) = 2x + 2y = p. Since rg(x, y) = (2, 2) 6=

0 we know that the maximum (x, y) must
satisfy
rf(x, y) = rg(x, y)
2x + 2y = p.
Rephrased, this means that (x, y) satises the three equations
y = 2
x = 2
2x + 2y = p.
This shows in particular that x = y, and thus the rectangle is a square.
12.10 Example. Find three positive numbers whose sum is 100 and whose product is maximal.
12.11 Solution. Let the three positive numbers be x, y and z. Then the function we are
trying to maximize is f(x, y, z) = xyz subject to the constraing g(x, y, z) = x + y + z = 100.
Since rg = (1, 1, 1) 6= 0 we know that any extrema satisfy
g(x, y, z) = 100.
In other words, the maximum must satisfy the four equations
yz =
xz =
xy =
x +y +z = 100.
Its clear then that yz = xz =) y = x since z 6= 0. Also from the second and third equation
we get that xz = xy =) x = z since again x 6= 0. This shows that x = y = z. Thus the
numbers are all 100/3.
12.12 Example. Maximize the area of a rectangle inscribed in a circle of (non-zero) radius r
centered at the origin.
64
12.13 Solution. Let 2x be the base of the rectangle and 2y the height. Then the assumption
that the rectangle is inscribed in a circle of radius r is the same as saying its corners touch the
circle. By construction the upper right corner of the rectangle is at the point (x, y). Since the
point (x, y) touches the circle, we know that d((x, y), (0, 0)) = r () x
2
+y
2
= r
2
.
With this information it becomes clear that the function we are trying to maximize is
f(x, y) = 2x 2y = 4xy subject to the constraint g(x, y) = x
2
+ y
2
= r
2
. We can assume
that x 6= 0 and y 6= 0 since otherwise the rectangle is degenerate and has 0 area. Since
rg(x, y) = (2x, 2y) 6= 0 when x 6= 0 6= y we know that (x, y) satises
2y = 2x
2x = 2y
x
2
+y
2
= r
2
.
Substituting the rst equation into the second yields x =
2
x. Since x 6= 0 we must have that
= 1. Using this information with the rst equation yields that y = x. In particular, the
rectangle is a square. Since r
2
= x
2
+ y
2
= 2x
2
, we know that x = r/2 and so the square has
sides of length r/2.
12.14 Example. Maximize the product of the squares of three non-zero numbers such that
the sum of their squares is 1.
12.15 Solution. The function we wish to maximize is f(x, y, z) = x
2
y
2
z
2
subject to the
constraint g(x, y, z) = x
2
+y
2
+z
2
= 1. Note for (x, y, z) which satisfy g(x, y, z) = 1, rg(x, y, z) 6=
0. Thus the maximum of f satises
2xy
2
z
2
= 2x
2x
2
yz
2
= 2y
2x
2
y
2
z = 2z
x
2
+y
2
+z
2
= 1.
The following technique is useful to take note of. If we multiply the rst equation by x, the
second by y, and the third by z we obtain
2x
2
y
2
z
2
= 2x
2
2x
2
y
2
z
2
= 2y
2
2x
2
y
2
z
2
= 2z
2
x
2
+y
2
+z
2
= 1.
Thus x
2
= y
2
= z
2
. Now note that 6= 0 since = 0 =) either x = 0, y = 0, or z = 0
which contradicts our assumption that x, y and z are non-zero. Thus x
2
= y
2
= z
2
.
Students at this stage are often tempted to write x
2
= y
2
= z
2
=) x = y = z but this is
false. Instead, x
2
= y
2
= z
2
() x = y = z. The constraint g(x, y, z) = 1 =) x
2
=
1/3 () x = 1/
p
3. Thus the eight points {(1/
p
3, 1/
p
3, 1/
p
3)} are all possible candi-
dates for the maximum. Moreover they are all maximums because f(1/
p
3, 1/
p
3, 1/
p
3) =
1/27. To check that this is indeed a maximum you can pick points nearby which also satisfy
g(x, y, z) = k (say for instance (1/
p
2, 1/2, 1/2)) and plug them into f and compare the resulting
value to 1/27.
12.16 Example. Find the minimal and maximal distance from the unit sphere centered at
the origin to the point (2, 1, 1).
65
12.17 Solution. We wish to nd the extrema of the distance squared from a point (x, y, z) to
(2, 1, 1). Thus we wish to optimize f(x, y, z) = (x 2)
2
+ (y 1)
2
+ (z + 1)
2
subject to the
constraint that the point (x, y, z) lie on the unit sphere, i.e., g(x, y, z) = x
2
+y
2
+z
2
= 1. Since
rg 6= 0 for all points on the sphere, we know that the extrema of f satisfy
g(x, y, z) = 1
()
2(x 2) = 2x
2(y 1) = 2y
2(z + 1) = 2z
x
2
+y
2
+z
2
= 1.
The trick to solving the system of four equations above is to solve for x, y, and z in terms of .
In particular note that
2(x 2) = 2x () 2x 4 2x = 0
() 2x(1 ) = 4
() x =
2
1
.
Similarly,
y =
1
1
z =
1
1
.
Of course these equations only make sense if 6= 1. This cannot be since = 1 =) 2x =
2x + 4 =) 0 = 4 which is absurd.
We can now plug in the above expressions for x, y, and z into the constraint g(x, y, z) = 1
and solve for which will in turn yield the values for x, y and z. In particular,
2
1
2
+
1
1
2
+
1
1
2
= 1
6
(1 )
2
= 1
(1 ) =
p
6
= 1
p
6.
Now we plug these values for back into x, y and z. After we do this we can see that the
extrema of f belong to the set
__
p
6
3
,
p
6
6
,
p
6
6
_
,
_
p
6
3
,
p
6
6
,
p
6
6
__
.
Its trivial to verify that the rst point is the minimum and the second the maximum of f subject
to g(x, y, z) = 1. Take note that while this shows that the rst point is the closest point on the
unit sphere to (2, 1, 1), the value
f
_
p
6
3
,
p
6
6
,
p
6
6
_
66
is not the minimum distance since f is the distance squared. The minimum distance is given by
_
f
_
p
6
3
,
p
6
6
,
p
6
6
_
.
Of course the same remarks hold for the maximum distance.
12.3 Exercises
12.18 Exercise. Find the extrema of the following functions subject to the given constraint.
(a) f(x, y) = x
2
y subject to g(x, y) = x
2
+y
2
= 1.
(b) f(x, y) = x +y subject to g(x, y) = (1/x) + (1/y) = 1.
(c) f(x, y) = 3x +y subject to g(x, y) = x
2
+ 9y
2
= 1.
(d) f(x, y) = x
2
+y
2
subject to g(x, y) = x
6
+y
6
= 2.
12.19 Exercise. The total production P of a certain product depends on the amount of labor
x used and the amount capital investment y. Let 0 < < 1 and b > 0. The function P =
bx
y
1
is called the Cobb-Douglas production function and it follows from various economic
assumptions. Suppose that a unit of labor costs m and a unit of capital costs n and that the
company has only p dollars in its total budget. Maximize P subject to mx + ny = p. (Hint:
You should get that x = (p)/m and y = ((1 )p)/n).
12.20 Exercise. Find the point on the circle x
2
+y
2
such that f(x, y) = 2x3y is a maximum.
What does this mean geometrically?
12.21 Exercise.
(a) Find the volume V of the largest box with sides x, y and z which sum to k, that is,
x +y +z = k.
(b) Verify that = dV
max
/dk. That is, consider V
max
as a function of the variable k and
then take its derivative with respect to k.
12.22 Exercise. Find an equation of the plane which passes through the point (1, 2, 3) and
cuts o the smallest volume in the rst octant.
12.23 Exercise. Rogawski 14.8: 1, 3, 5, 15, 28, 33, 37, 45, 46, 48
67
13 Double integrals
Recall that our interest in the area under a curve lead to the construction of the integral. The
graph of a function of two variables is a surface. How would we go about nding the volume
under this surface? This question motivates the desire to generalize the integral to handle
functions of multiple variables.
Throughout this section we assume that f is a continuous function.
13.1 The integral
Let f(x, y) be a continuous function of two variables. Suppose that R is a piecewise smooth
region in the domain of f. What is the volume V of the solid S which lies above R but below
the surface z = f(x, y)?
To answer this question, the rst observation to make is that we can divide the region R into
m small rectangles of base x and height y. The area of each rectangle is then A := x y.
These rectangles might not cover R, but as x and y decrease and m increases, the rectangles
will cover more of R. When we take a limit as x ! 0 and y ! 0, the rectangles cover the
entire region R.
Index the rectangles by the numbers 1, . . . , m. Let (x
i
, y
i
) be a point in the ith rectangle
(where i = 1, . . . , m). Then we can form a column above this rectangle which has height f(x
i
, y
i
)
and thus volume
f(x
i
, y
i
) A.
The total volume of all the columns is given by
m
i=1
f(x
i
, y
i
) A V
and this sum approximates the volume of S. Note that as x and y tend toward 0, m !1.
Thus
lim
x!0
y!0
f(x
i
, y
i
) A = V.
This leads us to the denition of the double integral.
13.1 Denition (Double integral). Let f(x, y) be a continuous function and keep all the
notation from above. Then he integral of f over the region R is dened to be
__
R
f(x, y) dA := lim
x!0
y!0
f(x
i
, y
i
) A.
Typically R is called the domain of integration or simply the region of integration.
13.2 Remark. Its possible to dene an integral as a double sum. The advantage of this is
that it is initially in line with the notation
__
R
. However as we will soon see, this notation also
suggests the method we use to actually compute double integrals. Overall double sums tend to
confuse students at rst and it just leads to messier notation.
Other authors only dene the integral over a rectangular region and then do some gymnastics
to dene it over more general regions. While a rectangle is indeed a very nice region, its not
particularly hard to draw rectangles inside a general region and we see no need to over complicate
an already simple denition.
68
Figure 32: Approximating the volume of a solid above a square region and below the surface
z = 16 x
2
2y
2
.
Figure 33: As x and y decrease, the approximations for the area under z = 16 x
2
2y
2
become better.
69
13.3 Example. Let R be the rectangular region R = [0, 1][0, 2]. This rectangle obviously has
area A(R) = 2. How could we nd this area using a double integral? Well, if we let f(x, y) = 1,
then __
R
f(x, y) dA =
__
R
dA = A(R).
13.4 Proposition (Properties of Integration). Let R, D, C be piecewise smooth regions in

R
2
such that R = D [ C and D \ C = ?. Let f(x, y), g(x, y) be functions. Assuming all the
integrals below exist, then
1. (Linearity)
__
R
(f +g) dA =
__
R
f dA+
__
R
g dA;
2. (Commutes with scalars)
__
R
f(x, y) dA =
__
R
f(x, y) dA where 2 R;
3. (Splitting regions)
__
R
f dA =
_ _
D
f dA+
__
C
f dA.
13.5 Denition (Average value). Let R R
2
be a piecewise smooth region with area A(R).
If f(x, y) is a function of two variables, the average value of f over the region R is dened
to be
f
ave
:=
1
A(R)
__
R
f(x, y) dA.
Figure 34: The water level represents the average value of a function f(x, y) which models the
height of the mountain in the picture.
13.2 Iterated integrals
Recall that when you compute solids of revolution you rst nd a function that gives you the
area of a slice and then you integrate that function to nd the volume of the surface of revolution.
We can use the same idea to compute the double integral (and indeed higher degree integrals).
This technique is called the method of iterated integrals.
Suppose that R = [a, b] [c, d] is a rectangle and f(x, y) a function. Say that S is the solid
which lies above R and below the surface z = f(x, y). If we x an x 2 [a, b] and let y range
between [c, d] then the function h(y) := f(x, y) traces a curve C on the surface z = f(x, y). We
can integrate h(y) with respect to y to get the area under the curve C. This integral represents
the area of a slice of the solid S taken parallel to the y-axis. (See Figure 35). If A(x) is the area
of the slice at x, we know from our experience with surfaces of revolution that the volume V of
S is given by V =
_
b
a
A(x) dx. This motivates the following denitions and proposition.
70
13.6 Denition (Partial or inner integrals). Let R = [a, b] [c, d] be a rectangle and f(x, y)
a function. Fix a x 2 [a, b]. Then the partial integral with respect to y is dened as the
function
A(x) :=
_
d
c
f(x, y) dy.
Similarly, suppose we x a y 2 [c, d]. Then the partial integral with respect to x is
dened to be the function
a(y) :=
_
b
a
f(x, y) dx.
Note that a(y) is just the area of a slice of S which is parallel to the x-axis and taken at some
point y 2 [c, d]. (See Figure 36). Further note that both A(x) and a(y) are functions which
depend only on the variables x and y respectively.
13.7 Example. Suppose that f(x, y) = 4x
3
4y
3
+2x and R = [0, 2] [0, 1]. Then [a, b] = [0, 2]
and [c, d] = [0, 1]. Thus
A(x) =
_
1
0
4x
3
4y
3
+ 2x dy
=
_
4x
3
y
4
+ 2x y
_
1
0
= 4x
3
+ 2x.
Also
a(y) =
_
2
0
4x
3
4y
3
+ 2x dx
=
_
x
4
4y
3
+x
2
_
2
0
= 64y
3
+ 4.
Figure 35: A(x) is just the area of a slice taken parallel to the y axis.
13.8 Denition (Iterated integrals). Let R = [a, b][c, d] be a rectangle and f(x, y) a function.
71
Figure 36: a(y) is the area of a slice taken parallel to the x axis.
Set A(x) and a(y) as above. Then the integrals
_
b
a
_
d
c
f(x, y) dy dx :=
_
b
a
_
_
d
c
f(x, y) dy
_
dx =
_
b
a
A(x) dx
_
d
c
_
b
a
f(x, y) dx dy :=
_
d
c
_
_
b
a
f(x, y) dx
_
dy =
_
d
c
a(y) dy
are called the iterated integrals. Take note that the integrals on the far left are denitions.
Most authors omit the brackets when writing an iterated integral so its important to keep in
mind what
_
b
a
_
d
c
f dy dx actually means.
13.9 Proposition (Fubinis theorem). Let R = [a, b] [c, d] be a rectangle and f(x, y) a
function which is continuous on R. Then if the integral exists,
__
R
f(x, y) dA =
_
b
a
_
_
d
c
f(x, y) dy
_
dx =
_
d
c
_
_
b
a
f(x, y) dx
_
dy.
Sketch of proof. By denition the volume of the solid S which lies above R and below z = f(x, y)
is given by
__
R
f dA. As weve already seen though, the volume of S is also given by
_
b
a
A(x) dx
and
_
d
c
a(y) dy. These two integrals are just the middle and right hand sides of the above
equation.
13.10 Example. Find the volume V of the solid S which lies above the rectangle R = [0, 1]
[0, 2] and the plane x +y +z = 5.
13.11 Solution. The plane x +y +z = 5 is just the graph of the function z = (x +y 5).
72
By Fubinis theorem
V =
__
R
(x +y 5) dA =
_
1
0
A(x) dx
=
_
1
0
_
2
0
x +y 5 dy dx
=
_
1
0
__
2
0
x +y 5 dy
_
dx
=
_
1
0
_
(xy +y
2
/2 5y)
2
0
_
dx
=
_
1
0
2x 8 dx
= (x
2
8x)
1
0
= 7.
13.12 Proposition (Integrating the product of special functions). Suppose that f(x) de-
pends only on the x variable and g(y) depends only on the y variable. Let a, b, c, d 2 R. The
denition of iterated integrals implies
_
b
a
_
d
c
f(x) g(y) dy dx =
_
_
b
a
f(x) dx
_
_
_
d
c
g(y) dy
_
.
13.3 More general regions
We are faced with a mild problem. On the one hand the technique of iterated integrals is
wonderful for computing double integrals. On the other, its not often that we want to integrate
over such a nice region as a rectangle. In this subsection we show how we can extend the power
of iterated integrals to more general regions. Theoretically we can nd integrals over quite
perverse regions, but in practice most integrals we compute turn out to be unions of two special
types of regions.
13.13 Denition (Type I regions). A region D R
2
is said to be type I if it is bounded and
lies between two continuous functions of x. In particular
D = {(x, y) : a x b, g
1
(x) y g
2
(x)} .
13.14 Denition (Type II regions). A region D R

2
is said to be type II if it is bounded
and lies between two continuous functions of y. In particular
D = {(x, y) : c y d, h
1
(y) x h
2
(y)} .
How do we use iterated integrals to compute the integral over a type I (or type II) region?
The answer is to turn the region D into a rectangle and then use Fubinis theorem. Let D be a
73
Figure 37: A few examples of type I regions.
Figure 38: Type II regions.
74
type I region bounded by g
1
(x) and g
2
(x). Let f(x, y) be a function. Since D is bounded it lies
in some rectangle R = [a, b] [c, d]. Dene
F(x, y) :=
_
f(x, y) if (x, y) 2 D,
0 if (x, y) 2 R D.
Its not too dicult to see that
__
R
F(x, y) dA =
__
D
f(x, y) dA.
However, by Fubinis
__
R
F(x, y) dA =
_
b
a
_
d
c
F(x, y) dy dx =
_
b
a
_
g2(x)
g1(x)
f(x, y) dy dx.
A very similar argument shows how to compute the double integral over a type II region. This
leads to the proposition below.
Figure 39: To compute the double integral over a type I region D we rst use the fact that D
lies in some rectangle R = [a, b] [c, d]. The blue shaded region is where F(x, y) = 0 and so
does not contribute to the integral.
13.15 Proposition (Computing double integrals over type I and type II regions). Suppose
that D = {(x, y) : a x b, g
1
(x) y g
2
(x)} is a type I region and f(x, y) a function
dened on D. Then
__
D
f(x, y) dA =
_
b
a
_
g2(x)
g1(x)
f(x, y) dy dx.
Likewise if D = {(x, y) : c y d, h
1
(y) x h
2
(y)} is a type II region and f(x, y) is a
function dened on D, then
__
D
f(x, y) dA =
_
d
c
_
h2(y)
h1(y)
f(x, y) dx dy.
13.16 Remark. Its important to take note of which integral is the inner integral. The outer
integral must have numbers in the limits of integration in order for the integral to even make
sense. The inner integral has functions (possibly constants) as the limits of integration.
75
Additionally, a type I region constrains y between two functions (of x). This means the inner
integral is always with respect to y. Likewise for a type II region the variable x is constrained
by two functions (of y) and so the inner integral is taken with respect to x.
13.17 Example. Find the volume of the solid S which lies above the triangle T with vertices
(0, 0), (1, 0) and (0, 1) and the surface z = x
2
y
2
.
13.18 Solution. The triangle T can be realized as either a type I or a type II region. First
lets think of it as a type I region. Then T = {(x, y) : 0 x 1, 0 y 1 x}. To see this it
helps to draw a picture. Thus by Proposition 13.15 the volume of S is given by
__
T
x
2
y
2
dA =
_
1
0
_
1x
0
x
2
y
2
dy dx
=
_
1
0
x
2
y
3
3
1x
0
dx
=
_
1
0
x
2
(1 x)
3
3
dx
=
1
3
_
1
0
x
2
(1 3x + 3x
2
x
3
) dx
=
1
3
_
1
0
x
2
3x
3
+ 3x
4
x
5
dx
=
1
3
x
3
3

3x
4
4
+
3x
5
5

x
6
6
1
0
=
1
3
1
3

3
4
+
3
5

1
6
Now lets consider T as a type II region. Then T = {(x, y) : 0 y 1, 0 x 1 y}. It is

a coincidence that the hypotenuse of T can be realized as a function which is symmetric in the x
and y variables. Indeed the line which passes through (1, 0) and (0, 1) is the function y = 1 x.
Solving for x yields x = 1 y. Thus
__
T
x
2
y
2
dA =
_
1
0
_
1y
0
x
2
y
2
dx dy. Its left as an exercise
to compute this integral and verify that the number obtained is the same as above.
13.4 Changing the order of integration
Sometimes a double integral is given to you in a form which is impossible to integrate. The trick
to dealing with such problems is to see if you can rewrite the domain of integration so that the
integral becomes tractable. To do this rst nd the set description of the domain of integration
D. From this description you can usually draw a picture of the region D. The picture is quite
useful for nding alternative descriptions for D. If the region D is given to you as type I for
instance, you can try to use the picture to rewrite D as a type II region (and vice versa). Once
D is rewritten, we can integrate. This process is called changing the order of integration
and it is very useful.
13.19 Example. Integrate
_
1
0
_
1
y
cos x
2
dx dy.
76
13.20 Solution. The function cos x
2
has no elementary anti-derivative, and so we have to
switch the order of integration. The region D we are integrating over is just D = {(x, y) :
0 1 y, y x 1}. This is a type II region since y ranges between two numbers and x
is constrained by two functions of y. The fact that x ranges between y and 1 tells us that D
is the triangle with vertices (0, 0), (1, 0) and (1, 1). (Draw a picture!) Lets rewrite D as a
type I region. So we want x to range between two numbers. A moments thought reveals that
0 x 1. Now if we x x, where can y range between? It should be clear that 0 y x.
Thus
D = {(x, y) : 0 x 1, 0 y x}.
Therefor
_
1
0
_
1
y
cos x
2
dx dy =
__
D
cos x
2
dA
=
_
1
0
_
x
0
cos x
2
dy dx
=
_
1
0
_
y cos x
2
_
x
0
dx
=
_
1
0
xcos x
2
dx.
Letting u = x
2
yields
=
1
2
_
1
0
cos u du
=
1
2
sin(1).
13.5 Exercises
13.21 Exercise. Rogawski 15.1: 1, 3, 5, 15, 17, 19, 21, 23, 33, 37 39, 43, 45
15.2: 1, 3, 5, 11, 23, 33, 41, 43
15.3: Skim the chapter, 1, 3, 5, 16
77
14 Change of Coordinates
Suppose that we took the unit square R = [0, 1] [0, 1]. Now lets say we pin down the corner
(0, 0) and let that be a pivot point. Next we rotate R by an angle of . By a simple observation
we know that the area of this rotated R is 1, but how could we determine that using a double
integral? It turns out to be quite messy using only the machinery we have developed thus far.
This motivates us to consider coordinate systems other than rectangular coordinates.
Our goal is to take an ugly region in the x, y plane and realize it as a particularly nice
region (such as a rectangle) in some other coordinate plane. When we do this we are essentially
performing the double integral version of substitution.
First we examine a coordinate system called polar coordinates.
14.1 Polar coordinates
14.1 Denition (Polar coordinates). Lets suppose we pick some point O in an arbitrary plane.
We will call this point the pole and it corresponds to the origin of rectangular coordinates. Now
draw some ray which starts at the pole O. This ray is usually drawn horizontally and is called
the polar axis.
Let P be some other point on the plane and suppose that d(O, P) = r. Further suppose that
the line segment connecting O to P forms an angle with the polar axis. We then say then that
the point P has the polar coordinates (r, ).
Figure 40: The point P which is distance r from the pole O such that the line segment connecting
P to O forms an angle with the polar axis. P has polar coordinates (r, ).
We would like r and to be honest real variables so that we can talk about the r, plane.
However since r represents a distance, it can never be negative. Suppose that the point P
has polar coordinates (r, ). By convention, the point (r, ) represents a point which lies on
the line between P and 0 but is r units in the opposite direction from P. In other words
(r, ) = (r, +). (See Figure 41).
Figure 41: This is how we turn r and into real variables.
78
14.2 Remark. How to polar coordinates relate to rectangular coordinates? If we suppose

that the polar axis is just the x axis, then we can imagine that the point (r, ) is the vertex on
a right triangle which has hypotenuse r and one angle . The base and height of this triangle
are then given by r cos and r sin respectively. (See Figure 42). This shows that
x = r cos , y = r sin , and r
2
= x
2
+y
2
.
What this means is that the point (r, ) has rectangular coordinates (r cos , r sin ) under the
assumptions above.
Figure 42: The relation between polar coordinates and rectangular coordinates.
Recall that a rectangle R in rectangular coordinates is just some set R = [a, b] [c, d] R
2
.
What do similar sets look like in polar coordinates?
14.3 Denition (Polar rectangle). Let , 2 R such that < . Moreover let a, b 2 R such
that a < b. Then a polar rectangle R is any set of the form
R = {(r, ) : a r b, }.
14.4 Example.
(a) Let R = {(r, ) : 0 r 1, 0 2}. Then R is just the unit circle.
(b) Let R = {(r, ) : 1 r 2, 0 }. Then R is a half-annulus with inner radius
r
i
= 1 and outer radius r
o
= 2.
14.5 Remark. There is a reason we call sets of the form R = {(r, ) : a r b, }

polar rectangles. If we draw a set of perpendicular axis labeled as the r-axis and the -axis,
then R is an honest rectangle with respect to that coordinate system. In other words, a polar
rectangle is an actual rectangle in the r, plane. While a polar rectangle is not a rectangle in
Euclidean 2-space, for the purposes of doing calculus it behaves as one.
14.2 Transformations
Transformations are functions which relate two coordinate systems. For our purposes it suces
to consider only transformations between R
2
and R
2
. Additionally we make a slight distinction
between the two copies of R
2
. One is assumed to represent rectangular coordinates while the
79
Figure 43: Two polar rectangles.
Figure 44: A general polar rectangle.
Figure 45: A polar rectangle in the r, plane is an actual rectangle.
80
other represents some other coordinate system. When R
2
represents rectangular coordinate
system we typically call it the x, y plane. When R
2
represents some other arbitrary coordinate
system, we frequently refer to it as the u, v plane. Here, of course, u and v are just real variables.
14.6 Denition (Transformation). A transformation from R
2
to R
2
is simply a function
T : R
2
!R
2
, (u, v) 7!(T
1
(u, v), T
2
(u, v))
where T
1
and T
2
are real-valued functions of the two variables u and v.
To specify a particular transformation we typically write T(u, v) = (T
1
(u, v), T
2
(u, v)). We
call the functions T
1
and T
2
the component functions of T. We say that T is a C
1
transfor-
mation if T
1
and T
2
have continuous rst-order partials.
If we want to emphasize that T is a transformation from the u, v plane to the x, y plane we
typically write T(u, v) = T(x(u, v), y(u, v)). In this case we call the transformation T a change
of coordinates from u, v to x, y or simply a change of coordinates.
14.7 Denition (Image). Let T : R
2
!R
2
be a transformation. If X R
2
, then the image
of X under T, denoted T(X), is the set
T(X) := {T(u, v) : (u, v) 2 X}.
14.8 Denition (Injective transformations). Let (u

1
, v
1
), (u
2
, v
2
) 2 X R
2
. A transforma-
tion T : X !R
2
is said to be injective on X if T(u
1
, v
1
) = T(u
2
, v
2
) () (u
1
, v
1
) = (u
2
, v
2
).
In other words, T is injective if and only if it maps any pair of two distinct elements of R
2
to
another pair of two distinct elements of R
2
.
If T is injective on X then the inverse of T on X is the function
T
1
: T(X) !X, T(u, v) 7!(u, v).
Figure 46: The transformation T is injective on S. The set R is the image of S under T. That
is, R = T(S). Here (x
i
, y
i
) = T(u
i
, v
i
).
14.9 Example. The transformation T(r, ) := (r cos , r sin ) from the r, plane to the x,
y plane is the change of coordinates from polar coordinates to rectangular coordinates. T is
injective on the set X := {(r, ) : 0 r < 1, 0 }.
81
Figure 47: The change of coordinates transformation T(r, ) = (r cos , r sin ).
14.10 Example. Let R be the square in the x,y plane that has vertices (0, 0), (1, 1), (2, 0), and
(1, 1). Let T(u, v) = (1/2(u+v), 1/2(uv)) be a transformation from the u, v plane to the x,
y plane. Then T
1
(x, y) = (x+y, xy). Show that T
1
(R) is the u, v square S = [0, 2] [0, 2].
In other words, T(S) = R.
14.11 Solution. Break R into four sides.
Let R
1
be the line segment between (0, 0) and (1, 1). On this line segment y = x and so T
1
restricted to R
1
, written T
1
|R1
, is given by T
1
|R1
= (x + x, x x) = (2x, 0). Since 0 x 1,
T
1
(R
1
) is the line segment in the u, v plane connecting (0, 0) to (0, 2).
Let R
2
be the line segment connecting (1, 1) to (2, 0). On R
2
, y = 2 x and so T
1
|R2
=
(x + 2 x, x (2 x)) = (2, 2x 2). Since 1 x 2, T
1
(R
2
) is the line segment in the u, v
plane connecting (2, 0) to (2, 2).
Let R
3
3
, y = x 2 and so T
1
|R3
=
(x +x 2, x (x 2)) = (2x 2, 2). Since 1 x 2, T
1
(R
3
) is the line segment in the u, v
plane connecting (2, 2) to (0, 2).
Let R
4
4
, y = x and so T
1
|R
4
=
(xx, x+x) = (0, 2x). Since 0 x 1, T
1
(R
4
) is the line segment in the u, v plane connecting
(0, 2) to (0, 0).
This shows that T
1
(R) = S and also T(S) = R.
The example above is a special case of a very useful type of transformation.
14.12 Denition (Linear transformation). Let T(u, v) = (au+bv, cu+dv) where a, b, c, d 2 R.
Then T is called a linear transformation. Note that if (u
1
, v
1
) and (u
2
, v
2
) are two points in
the domain of T, then T(u
1
+u
2
, v
1
+v
2
) = T(u
1
, v
1
) +T(u
2
, v
2
).
82
14.3 Applications to double integrals
We promised that changing coordinates was the double integral analog to substitution. Suppose
that we wished to integrate
_
2
1
xe
x
2
dx. Let f(x) = xe
x
2
. The obvious substitution is u = x
2
.
Then du = 2x dx and its clear that
_
2
1
xe
x
2
dx =
_
2
2
1
1
2
e
u
du.
What if we reverse the roles of u and x? In particular, we can write x as a function of u. In
this case, x =
p
u and dx/du = 1/2(u
1/2
). Thus
_
p
4
p
1
f(x) dx =
_
p
4
p
1
xe
x
2
dx
=
_
4
1
1
2
e
u
du
=
_
4
1
p
ue
(
p
u)
2 1
2
p
u
du
=
_
4
1
f(x(u))
dx
du
du.
In general if x is some function of u, denoted x(u), and if x(c) = a, x(d) = b, then
_
b
a
f(x) dx =
_
d
c
f(x(u))
dx
du
du.
The equation above is just the rule of substitution written in a slightly dierent form than you
are perhaps used to. We will soon see that the change of coordinates leads to a very similar
looking equation.
The Jacobian determinant
Suppose we have a rectangle S in the u, v plane which has (u
0
, v
0
) as its lower left corner,
(u
1
, v
1
) as its lower right corner and (u
2
, v
2
) as its upper left corner. Further suppose S has
dimensions v, u. If T(u, v) = (x(u, v), y(u, v)) is a transformation, we would like to know
what the image of S looks like. Indeed we want to relate the double integral over S to a double
integral over T(S). In order to do this we need to know the area of T(S).
We can think of the transformation T as a map which takes the u, v plane and stretches it
in some fashion. To compute the area of T(S) wed like to see if we can nd some mathematical
object which encodes information about how T stretches the u, v plane at particular points.
The Jacobian determinant turns out be this object. It arises quite naturally.
To this end suppose that T(u
0
, v
0
) = (x
0
, y
0
). The vector equation
r(u, v) := (x(u, v), y(u, v))
is the position vector which points to T(u, v). In other words r(u, v) points to the image of
the point (u, v) under the transformation T. In order to see what T(S) looks like we have to
consider where T maps the boundary of S.
Let S
1
be the base of S so that S
1
= {(u, v
0
) : u
0
u u
1
}. Alternatively, S
1
is the line
segment described by v = v
0
. Then r
1
(u) := r(u, v
0
) parameterizes the curve T(S
1
). (Note that
since v
0
is xed, r(u, v
0
) is a function of the single variable u). The tangent vector to T(S
1
) at
the point (x
0
, y
0
) is given by
r
u
:= r
0
1
(u
0
) =
@x
@u
(u
0
, v
0
),
@y
@u
(u
0
, v
0
)
.
83
Figure 48: The image of a rectangle S in the u, v plane under the transformation T.
Similarly, the left side of the rectangle is described by S
2
= {(u
0
, v) : v
0
v v
2
}. Now
r
2
:= r(u
0
, v) parameterizes T(S
2
). The tangent vector to T(S
2
) at the point (x
0
, y
0
) is given
by
r
v
:= r
0
2
(v
0
) =
@x
@v
(u
0
, v
0
),
@y
@v
(u
0
, v
0
)
.
Figure 49: The secant vectors a and b. Here R := T(S).
We can use secant vectors to approximate the area of T(S). In particular the parallelogram
formed by the vectors a := r(u
0
+u, v
0
)r(u
0
, v
0
) and b := r(u
0
, v
0
+v)r(u
0
, v
0
) has an area
approximately equal to the area of T(S). (See Figure 49). However, we can also approximate
the secant vectors a and b. By denition
r
u
= lim
u!0
r(u
0
+u, v
0
) r(u
0
, v
0
)
u
,
r
v
= lim
v!0
r(u
0
, v
0
+v) r(u
0
, v
0
)
v
.
This then implies that
a = r(u
0
+u, v
0
) r(u
0
, v
0
) u r
u
b = r(u
0
, v
0
+v) r(u
0
, v
0
) v r
v
.
What this shows is that the area of the parallelogram formed by u r
u
and v r
v
is ap-
proximately equal to the area of T(S). (See Figure 50). If we embed u r
u
and v r
v
into R
3
,
84
then the area of the parallelogram formed by u r
u
and v r
v
is given by
|u r
u
v r
v
| = |r
u
r
v
|u v
where
(r
u
r
v
)(u v) = det
_
_
_
_
_
_
i j k
@x
@u
(u
0
, v
0
)
@y
@u
(u
0
, v
0
) 0
@x
@v
(u
0
, v
0
)
@y
@v
(u
0
, v
0
) 0
_
_
_
_
_
_
u v
=
_
_
det
_
_
@x
@u
(u
0
, v
0
)
@y
@u
(u
0
, v
0
)
@x
@v
(u
0
, v
0
)
@y
@v
(u
0
, v
0
)
_
_
u v
_
_
k
=
_
_
det
_
_
@x
@u
(u
0
, v
0
)
@x
@v
(u
0
, v
0
)
@y
@u
(u
0
, v
0
)
@y
@v
(u
0
, v
0
)
_
_
u v
_
_
k.
If A is the area of T(S) then
A |r
u
r
v
|u v =
det
_
_
@x
@u
(u
0
, v
0
)
@x
@v
(u
0
, v
0
)
@y
@u
(u
0
, v
0
)
@y
@v
(u
0
, v
0
)
_
_
u v.
The determinant above essentially measures how a rectangle gets stretched by the transformation
T. This is precisely what we promised above.
Figure 50: Approximating the area of T(S) by the plane formed by u r
u
and v r
v
.
14.13 Denition (Jacobian determinant). Let T(u, v) = (x(u, v), y(u, v)) be a transformation
and (u, v) a point in the domain of T. The Jacobian determinant of T at (u, v), denoted by
J(u, v), is dened by
J(u, v) := det
_
_
@x
@u
(u, v)
@x
@v
(u, v)
@y
@u
(u, v)
@y
@v
(u, v)
_
_
.
Sometimes J(u, v) is denoted by
@(x, y)
@(u, v)
to remind us that it represents a stretching factor for a transformation. Compare @(x, y)/@(u, v)
with the dx/du that appears in the modied substitution rule above.
85
14.14 Example. Let T(r, ) := (r cos , r sin ) be the transformation from polar coordinates
to rectangular coordinates. Then
J(u, v) = det
cos r sin
sin r cos
= r cos
2
+r sin
2
= r.
With the discussion and denition above we have proved the following.
14.15 Proposition (Approximating T(S)). Let S be a rectangle in the u, v plane with dimen-
sions u and v. Let (u
0
, v
0
) be the bottom left corner of S. Say T(u, v) = (x(u, v), y(u, v))
is a transformation and A is the area of T(S). Then
A |J(u
0
, v
0
)|u v
where J(u
0
, v
0
) is the Jacobian determinant of T at (u
0
, v
0
).
14.16 Corollary (Change of coordinates for double integrals). Suppose that T(u, v) =
(x(u, v), y(u, v)) is a C
1
transformation such that J(u, v) 6 0. Let S be a region in the u,
v plane such that T is dened on S and T is injective on S. If R := T(S) and f(x, y) is
continuous on R, then
__
R
f(x, y) dA =
__
S
f(x(u, v), y(u, v))|J(u, v)| du dv.
Sketch of proof. Divide S into small m rectangles with dimensions u and v. For i = 1, . . . , m
let S
i
denote the ith small rectangle and set R
i
:= T(S
i
). Pick (u
i
, v
i
) to be the bottom left
corner of the rectangle S
i
. Suppose that T(u
i
, v
i
) = (x
i
, y
i
). Finally, let A
i
be the area of R
i
.
By Proposition 14.15
__
R
f(x, y) dA
m
i=1
f(x
i
, y
i
)A
i
f(x(u
i
, v
i
), y(u
i
, v
i
))|J(u
i
, v
i
)|uv
__
S
f(x(u, v), y(u, v))|J(u, v)| du dv.
14.17 Example. Evaluate
__
R
x
2
+y
2
dA
where R is the half-annulus with inner radius r
i
= 1 and outer radius r
o
= 4.
14.18 Solution. We rst recognize that R is just the polar rectangle S := {(r, ) : r
i
r
r
0
, 0 }. Recall that the transformation T(r, ) := (r cos , r sin ) is transformation which
86
changes from polar to rectangular coordinates. By Corollary 14.16
__
R
x
2
+y
2
dA =
__
S
(r cos )
2
+ (r sin )
2
|J(u, v)| dr d
=
_

0
_
4
1
r rdr d
=
_

0
(63/3) d
=
63
3
.
14.19 Example (An example from probability). The function e

x
2
is a classic bell-shaped
curve and is related to the normal distribution e
x
2
/2
/
p
2. In order for e
x
2
/2
/
p
2 to be a
distribution we require that
_
1
1
e
x/2
p
2
dx = 1.
This will follow from the fact that
A :=
_
1
1
e
x
2
dx =
p
.
But how do we show that A =
p
? You might recall that e
x
2
has no general anti-derivative.
The trick is to compute A
2
using a double integral. Note that
A
2
=
_
1
1
e
x
2
dx
_
1
1
e
y
2
dy
=
_
1
1
_
1
1
e
x
2
e
y
2
dy dx
=
_
1
1
_
1
1
e
(x
2
+y
2
)
dy dx
which follows from the fact that
_
b
a
_
d
c
f(x)g(y) dy dx =
_
b
a
f(x) dx
_
d
c
g(y) dy. The trick is
to now change to polar coordinates. Note that the integral above has R
2
as the domain of
integration. In polar coordinates, R
2
= {(r, ) : 0 r < 1, 0 < 2}. Thus
=
_
2
0
_
1
0
e
((r cos )
2
+(r sin )
2
)
r dr d
=
_
2
0
_
1
0
e
r
2
r dr d.
Now make the substitution u = r
2
to get
=
_
2
0
1
2
_
1
0
e
u
du
d
=
_
2
0
1
2
d
= .
Since A
2
= we have that A =
p
.
87
Now if we let w = x/
p
2
_
1
1
e
x
2
/2
p
2
dx =
1
p
_
1
1
e
w
2
dw
=
p
= 1.
This is precisely what we wanted to show and is pretty cool.
14.20 Example. Let R be the parallelogram in the x,y plane which has vertices (0, 0), (1, 2),
(3, 3) and (2, 1). Compute
__
R
e
x
dA.
14.21 Solution. The parallelogram R is not a particularly nice region to integrate over. Our
task is to nd a transformation from the x, y plane into the u,v plane which maps R to a suitable
region in the u, v plane. However to use Corollary 14.16 we need a transformation T from the
u, v plane to the x, y plane. So our search for a transformation from the x, y plane to the u, v
plane is really the search for some inverse transformation T
1
.
What might T
1
be? It would be nice if T
1
(R) were some rectangle S. Let R
1
be the line
segment joining (0, 0) to (2, 1). On this line segment y = 1/2x. We can map R
1
to the base
of the u, v square S = [0, 3] [0, 3] via the transformation T
1
(x, y) = (2x y, 2y x) since
T
1
|R1
= ((3/2)x, 0). In other words, T
1
maps R
1
to the u, v line segment joining (0, 0) to (3, 0).
Let R
2
be the line segment joining (2, 1) to (3, 3). Then on R
2
, y = 2x3 and so T
1
|R2
(x, y) =
(3, 3x) and so T
1
maps R
2
onto the u, v ine segment connecting (3, 0) to (3, 3).
Let R
3
be the line segment connecting (3, 3) to (1, 2). Then on R
3
, y = 1/2x +3/2 and thus
T
1
|R3
(x, y) = (3/2(x 1), 3). Since 1 x 3 we know that T
1
maps R
3
onto the u, v line
segment connecting (3, 3) to (0, 3).
Finally let R
4
4
, y = 2x and so T
1
|R4
=
(0, 3x). Its an easy consequence that T
1
maps R
4
onto the u, v line segment joining (0, 3) to
(0, 0).
Since u = 2x y and v = 2y x we can solve for x and y in terms of u and v to nd T. In
particular
u = 2x y
v = 2y x =)
2u = 4x 2y
2v = 4y 2x =)
2u +v = 3x
2v +u = 3y =)
x =
2
3
u +
1
3
v
y =
2
3
v +
1
3
u.
Thus T(u, v) = (2/3u + 1/3v, 2/3v + 1/3u). We have already shown that T(S) = R. Since we
need to compute the Jacobian determinant of T to integrate, lets do that now. Note that
|J(u, v)| =
det
2/3 1/3
1/3 2/3
4
9

1
9
=
1
3
.
88
Now by Corollary 14.16
__
R
e
x
dA =
_
3
0
_
3
0
e
2u/3+v/3
1
3
du dv
=
1
3

_
3
0
e
2u/3
du
_
3
0
e
v/3
dv
=
1
3
3
2
e
2u/3
3
0
3e
v/3
3
0
=
3
2
(e
2
1)(e 1).
14.4 Exercises
14.22 Exercise. Rogawski 15.4: 1, 3, 5, 7, 13, 15, 19, 23, 27, 29, 41, 43, 53
89
15 Applications
In this section we explore a few applications of double integrals.
15.1 Density
15.1 Proposition (Mass). Suppose that (x, y) represents the density of an object which
has shape D R
2
. Then the objects mass m is given by
m =
__
D
(x, y) dA.
Proof. Divide the region D into n small rectangles with dimensions x and y and area A.
Then if (x
i
, y
i
) is a point in the ith rectangle, (x
i
, y
i
) A approximates the mass within the
ith rectangle. Thus,
m
n
i=1
(x
i
, y
i
)A.
This implies that
m = lim
x!0
y!0
(x
i
, y
i
)A =
__
D
(x, y) dA.
15.2 Proposition (Total charge). Suppose that an electric charge is distributed over a region
D R
2
and the charge density at the point (x, y) 2 D is given by (x, y). Then the total
charge Q is given by
Q =
__
D
(x, y) dA.
Proof. The proof is identical to the one used in Proposition 15.1.
15.2 Center of mass
15.3 Denition (Moment of a particle). Suppose a particle has mass m and distance d from
some axis. Then the moment of the particle about the axis is the product m d.
15.4 Proposition (Moment of a region). Let D R
2
have a density given by (x, y) for
(x, y) 2 D. The moment of D about the x-axis is given by
M
x
:=
__
D
y (x, y) dA.
Likewise, the moment of D about the y-axis is given by
M
y
:=
__
D
x (x, y) dA.
90
Proof. The proofs for both cases are identical, so lets consider the moment about the x-axis.
Divide D into n rectangles with dimensions x, y and area A. Let (x
i
, y
i
) be a point in the
ith rectangle. Then the mass of the ith rectangle is approximated by (x
i
, y
i
)A. Now consider
the rectangles mass as a point mass. In other words, we can think of the mass of the rectangle
as being concentrated at the point (x
i
, y
i
). The moment of this point mass about the x-axis is
given by ((x
i
, y
i
)A) y
i
.
This shows that
M
x

n
i=1
y
i
(x
i
, y
i
)A
and thus the moment about the x-axis is given by
M
x
= lim
x!0
y!0
y
i
(x
i
, y
i
)A =
__
D
y (x, y) dA.
15.5 Denition (Center of mass). Let D R
2
have total mass m and M
x
and M
y
as moments
about the x and y axes respectively. The center of mass of D, denoted ( x, y), is dened by
the equations x = mM
y
and y = mM
x
.
We can imagine that the mass of D is concentrated at the center of mass ( x, y). Another
way to think of this is that if D were supported only at ( x, y) it would balance horizontally.
Figure 51: The center of mass ( x, y) of the region D.
15.6 Proposition. Let D R

2
be a region with density function (x, y) and center of mass
( x, y). If M
x
and M
y
are the moments of D about the x and y axes respectively, then
x =
M
y
m
=
1
m
__
D
x (x, y) dA, y =
M
x
m
.
15.3 Probability
In the context of probability we call real-valued functions random variables. Typically a
random variable X represents some event. For instance X might represent the cholesterol level
of an individual. We would then be interested in knowing the probability of a random individual
having a cholesterol level between 100 and 250. We represent this probability by P(a X b).
Associated to every random variable X is a probability density function f. The probability
91
density function satises
P(a X b) =
_
b
a
f(x) dx, and
_
R
f(x) dx = 1.
We can extend this to two variables.
15.7 Denition (Joint density function). Suppose that f(x, y) is function such that f(x, y)
0 and __
R
2
f(x, y) dA = 1.
Let X and Y be two random variables such that the probability that (X, Y ) lies in the region
D is given by
__
D
f(x, y) dA.
Then we say that f(x, y) is the joint density function of X and Y .
15.8 Denition (Expected value). Let X and Y be random variables with joint density
function f(x, y). Then the expected values of X and Y are dened respectively by
1
:=
__
R
2
xf(x, y) dA,
2
:=
__
R
2
yf(x, y) dA.
Sometimes the expected values are called the X-mean and the Y -mean.
15.9 Denition (Independent random variables). Suppose that X is a random variable with
probability density function f
1
(x) and Y is a random variable with probability density function
f
2
(y). In this case X and Y are called independent random variables with joint density
function f(x, y) = f
1
(x) f
2
(y).
15.10 Denition (Normally distributed random variable). A random variable X is said to be
normally distributed if its probability density function is of the form
f(x) =
1
p
2
e
(x)
2
2
2
where is the mean and is the standard deviation.
15.4 Exercises
15.11 Exercise. Find the mass and center of mass of the region D which has the given density
function .
(a) D = [0, 2] [1, 1]; (x, y) = xy
2
.
(b) D = [0, a] [0, b]; (x, y) = cxy where a, b, c 2 R.
(c) D is bounded by the parabolas y = x
2
and x = y
2
; (x, y) =
p
x.
15.12 Exercise. Find the center of mass of an isosceles right triangle with equal sides length
a if its density at a point is proportional to the square of the distance from that point to the
vertex opposite from the hypotenuse.
15.13 Exercise. Find the mass and center of mass for the unit half-circle C if it has density
function (r, ) = 1 r.
92
15.14 Exercise. Suppose that an electric charge is distributed over the disk x
2
+y
2
4 such
that the charge density is given by (x, y) = x + y + x
2
+ y
2
. Find the total charge over the
disk.
15.15 Exercise. Let X and Y be random variables with joint density function
f(x, y) =
_
4xy if 0 x 1, 0 y 1,
0 else.
(a) Verify that f is a joint density function.
(b) Find P(Y 1).
(c) Find P(X 2, Y 4).
(d) Find the expected values of X and Y .
15.16 Exercise. Let C 2 R be a constant. Suppose that X and Y are random variables which
have joint density function
f(x, y) =
_
C(x +y) if 0 x 3, 0 y 2,
0 else.
(a) Find the value of the constant C.
(b) Find P(X 2, Y 1).
(c) Find P(X +Y 1).
15.17 Exercise. Rogawski 15.5: 1, 3, 5, 9, 23, 27, 49, 51, 54, 55

93
16 Vector Fields
The ow of a river and the gravitational forces acting on a space shuttle can both be modeled
elegantly with vector elds. In a similar vein, the work required to move a charged particle
through an electromagnetic eld can be computed via a line integral.
16.1 Introduction to Vector Fields
16.1 Denition (Vector eld). A vector eld in R
n
is a vector-valued function
F: R
n
!R
n
.
We usually write F(x
1
, . . . , x
n
) = (F
1
(x
1
, . . . , x
n
), . . . , F
n
(x
1
, . . . , x
n
)). The real-valued func-
tions F
i
: R
n
! R (for i 2 {1, . . . , n}) are called the component functions of F. You can
think of F as assigning to each point (x
1
, . . . , x
n
) in R
n
some vector F(x
1
, . . . , x
n
).
16.2 Path and Line Integrals
16.3 Conservative Vector Fields
16.4 Exercises
16.2 Exercise. Rogawski 16.1: 1, 5, 15, 21, 27, 31, 33
16.2: ?
16.3: ?
94
Index
arc length, 34
arc length
as a parameter, 35
function, 35
average value, 70
boundary, 57
boundary
point, 57
bounded set, 58
Cartesian coordinates, 5
closed, 57
component function, 29
conditional, 4
continuity, 42
convergence, 41
coordinate vectors, 19
coordinates
cylindrical, 11
rectangular, 10
spherical, 11
critical point, 59
cross product, 17
cross product
determinant, 20
in coordinates, 19
parallelogram, 17
properties, 19
curvature, 36
cylindrical coordinates, 11
derivative
directional, 52
dierential, 49
directional derivative, 52
Directional derivative
dot product form, 52
dot product, 14
dot product
properties, 15
double integral, 68
elements, 4
Euclidean space, 8
expected values, 92
extrema, 57
extreme value theorem, 59
fubinis theorem, 72
function, 6
function
continuous, 42
multivariable, 39
gradient, 52
graph, 39
image, 81
implication, 4
increment, 49
independent random variables, 92
injective, 81
inner product, 8
integral
double, 68
partial, 70
integrals
iterated, 71
integration
properties, 70
interior, 57
interior
point, 57
intersection, 6
\, 6
joint density function, 92
Lagrange multiplier, 62
Lagrange multiplier
method of, 63
level set, 39
limit, 41
line
parallel, 24
parametric equations, 23
skew, 24
symmetric equations, 23
vector equation, 23
linear approximation, 48
linear transformation, 82
linearization, 48
mass, 90
matrix, 20
matrix
determinant, 20
maxima, 57
minima, 57
95
multivariable function, 39
normal vector, 24
open, 57
ordered pair, 5
orthogonal, 16
parallel vectors, 14
parallelepiped, 21
parameter, 29
parameterization, 29
parametric equation, 29
parametric equations, 23
partial derivative, 43
plane, 25
plane
angle between two, 26
dot product, 25
linear equation, 25
matrix equation, 25
normal vector, 24
parallel, 26
vector equation, 25
polar coordinates, 10
polar rectangle, 79
pre-imagine, 39
projection, 27
rectangular coordinates, 10
region
type I, 73
type II, 73
saddle point, 59
second derivative test, 59
second partial derivatives, 44
set, 4
set
binary relation, 6
product, 5
skew, 24
spherical coordinates, 11
standard basis vectors, 12
subset, 6
, 6
symmetric equations, 23
tangent plane, 47
tangent vector
unit, 36
total charge, 90
trace, 29
transformation, 81
transformation
linear, 82
triple scalar product, 19, 21
tuple, 5
type I region, 73
type II region, 73
union, 6
[, 6
unit tangent vector, 36
unit vector, 12
vector
(abstract) addition, 12
abstract, 12
coordinate vectors, 19
cross product, 17
dot product, 14
normal, 24
orthogonal, 16
parallel, 14
projection, 27
span, 14
tip to tail, 12
unit, 12
zero vector, 12
vector eld, 94
vector function, 29
vector function
component function, 29
continuity, 29
derivative, 30
integral, 31
limit, 30
Youngs theorem, 45
96

Multivariable Calculus

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Multivariable Calculus

Загружено:

Авторское право:

Доступные форматы

Calculus of Several Variables (Math 22)

2.10 Denition (Spherical coordinates). The spherical coordinate system on R

5.3 Angles between planes

6.7 Denition (Derivative of a vector function). Say that r(t) = (x

7.3 Remark. Suppose that r(t) : [0, b] !R

8.18 Denition (Partial derivatives). Say that f(x

9.3 The dierential

We can generalize the dierential to functions of arbitrarily many variables.

10.3 Example. Let f(x, y) = xy. Then rf(x, y) = hy, xi.

11.12 Theorem (Extreme value theorem). Suppose that f : X R

11.16 Example. Minimize f(x, y) = x

13.4 Proposition (Properties of Integration). Let R, D, C be piecewise smooth regions in

13.14 Denition (Type II regions). A region D R

Now lets consider T as a type II region. Then T = {(x, y) : 0 y 1, 0 x 1 y}. It is

14.2 Remark. How to polar coordinates relate to rectangular coordinates? If we suppose

14.5 Remark. There is a reason we call sets of the form R = {(r, ) : a r b, }

14.8 Denition (Injective transformations). Let (u

14.19 Example (An example from probability). The function e

15.6 Proposition. Let D R

15.17 Exercise. Rogawski 15.5: 1, 3, 5, 9, 23, 27, 49, 51, 54, 55

Вам также может понравиться