Вы находитесь на странице: 1из 24

Chapter 8

Functions of several variables


8.1

Limits and Continuity

Real -valued functions of several independent real variables are defined much
the same way one would imagine from the single variable case. The domains
are sets of ordered n-tuples of real numbers. Let
Rn = {(x1 , . . . , xn ) | xi R, i = 1, . . . , n}.
We will mostly work on the spaces of n = 2, or 3.
Definition 8.1.1 A real-valued function of n-variables is a function f
on a subset D Rn into R:
f : D R,

f (x1 , . . . , xn ) = w.

The set D is called the domain of f , w is the dependent variable of f ,


x1 , . . ., xn are called the independent variables.
Let D Rn be a subset of Rn . A point P = (p1 , . . . , pn ) is an interior
point of D if there is a positive number > 0 such that the set of points
with
p
B (P ) = {X = (x1 , . . . , xn ) Rn | (p1 x1 )2 + + (pn xn )2 < },
which is called an open ball of radius centered at P , is contained in D.
The set of all interior points of D is denoted by Int(D).
A point P = (p1 , . . . , pn ) is a boundary point of D if every ball centered
at P contains points that lie in D as well as the points that lie outside of
D. The set of all boundary points of D is denoted by (D).
237

238

Chapter 8.

Functions of several variables

Then it is clear that Int(D) (D) = . A region D is open if D =


Int(D), and closed if D = Int(D) (D).
A region D is bounded if it is contained in a ball of a fixed radius, and
unbounded if it is not bounded.
Let f : D R be a function on D Rn . The graph of f is the set
{(x1 , . . . , xn , w) Rn+1 | w = f (x1 , . . . , xn ), (x1 , . . . , xn ) D}.
When n = 2, the graph is also called a surface; z = f (x, y). The set of all
points in D whose f -values are constant:
{(x1 , . . . , xn ) D | c = f (x1 , . . . , xn )},
is called a level surface of f . When n = 2, it is called a level curve, and
when n = 3, it is called a level surface. For example, if z = f (x, y) is the
height of a mountain, the level curves c = f (x, y) are contour curves in
the domain D.
The following theories hold for arbitrary dimension n with a proper
adjustment of the number of variables. However, for convenience, we just
do for n = 2 or 3.
Definition 8.1.2 Let z = f (x, y) be a function on a domain D R2 .
(1) A number L is called the limit of f at X0 = (x0 , y0 ) R2 , which is
not necessarily in D, denoted by
lim f (X) = L,

XX0

if, for every number > 0, there exists a corresponding number > 0
such that
|f (X) L| < , for all points X = (x, y) D with 0 < |X X0 | < .
(2) z = f (x, y) is said to be continuous at X0 , if
lim f (X) = f (X0 ) :

XX0

that is, f is defined at X0 D and it has the limit f (X0 ) at X0 .


(3) z = f (x, y) is said to be continuous on D, if it is continuous at every
point in D.

8.2. PARTIAL DERIVATIVES

239
2

2
Example 8.1.1 Consider f (x, y) = x4xy
2 +y 2 on D = R {(0, 0)}. For X =
(0, y) or X = (x, 0) D, f (X) = 0. Thus we choose L = 0. To show
limXX0 f (X) = 0: For > 0 given, we want to find a > 0 such that

2
p
4xy 2

= 4|x|y < , whenever 0 < x2 + y 2 < .

0
x2 + y 2
x2 + y 2

However, since y 2 x2 + y 2 ,
p

4|x|y 2
4|x| 4 x2 4 x2 + y 2 .
2
2
x +y
p
Thus, if we choose = 4 , then, for any (x, y) with 0 < x2 + y 2 < ,

2
p
4xy 2

= 4|x|y 4 x2 + y 2 < 4 = 4 = .
x2 + y 2 x2 + y 2
4
Hence if define f (0, 0) = 0, then this function is made to be a continuous
function on R2 .

Example 8.1.2 Consider f (x, y) = x22xy


on D = R2 {(0, 0)}. For X0 =
+y 2
(0, 0) 6 D, if we choose X = (x, y) with y = mx, then
f (x, y) =

2m
2mx2
=
,
x2 (1 + m2 )
1 + m2

which depends on the value m. That is, f has no limit at X0 = (0, 0), and
so there is no way to define f (0, 0) to make it continuous at X0 .

Theorem 8.1.1 The general rules of the arithmetics of continuous functions hold: that is, if f and g are continuous at X0 = (x0 , y0 ) and k R,
then so are kf , f g, f g, fg provided g(X0 ) 6= 0. Moreover, if h(z) = w is
continuous function at z0 = f (x0 , y0 ), then so is w = (h f )(x, y) at X0 .

8.2

Partial Derivatives

Let z = f (x, y) be a function on D R2 , and X0 = (x0 , y0 ) D. For the


fixed value y = y0 , z = f (x, y0 ) is a function in x only through x0 whose
graph is the intersection of the graph z = f (x, y) and the vertical plane
y = y0 .

240

Chapter 8.

Functions of several variables

The derivative of this function of single variable x is called the partial


derivative of f with respect to x at x = x0 :

f (x0 + h, y0 ) f (x0 , y0 )
f
d
z

f (x, y0 )
= lim

h0
dx
h
x (x0 ,y0 )
x (x0 ,y0 )
x=x0
fx (x0 , y0 ) = zx .
z

Tangent line with


slope fy (X0 )

Tangent line with


slope fx (X0 )

z = f (x0 , y)

z = f (x, y0 )

x0
x

y0

(x0 , y0 )

Similarly for y:

y
(x0 ,y0 )

s y
s

z
fy (x0 , y0 ) = zy
y (x0 ,y0 )

d
f (x0 , y0 + h) f (x0 , y0 )
f (x0 , y)
.
= lim
h0
dy
h
y=y0

Example 8.2.1 For f (x, y) =


fx (x, y) =

2y
y+cos x ,

find fx and fy .

(y + cos x) x
(2y) 2y x
(y + cos x)
2y sin x
=
,
2
(y + cos x)
(y + cos x)2

(y + cos x) y
(2y) 2y y
(y + cos x)

2 cos x
=
.

(y + cos x)2
(y + cos x)2
Example 8.2.2 Let z = f (x, y) satisfy yz ln z = x + y. Find zx and zy .
fy (x, y) =

x y
(yz)
(ln z) =
+
x
x
x x
z
1 z
y

= 10
x z x
z
z
=
.
x
yz 1
Similarly for

z
y .

8.2.

Partial derivatives

241

0, if xy 6= 0,
1, if xy = 0.
fy (0, 0) exist, but f is not continuous at (0, 0).
Example 8.2.3 Let z = f (x, y) =

Then fx (0, 0) and

Higher order partial derivatives are defined as usual:



2f
2f
f
= fxx ,
=
= (fy )x = fyx ,
x2
xy
x y

etc.

Theorem 8.2.1 Suppose that z = f (x, y) has continuous second partial


derivatives on an open domain D R2 . Then the mixed partial derivatives
are equal:
2f
2f
=
.
xy
yx
Proof: For a fixed point X0 = (x0 , y0 ) D, consider
F (4x, 4y) = f (x0 +4x, y0 +4y)f (x0 +4x, y0 )f (x0 , y0 +4y)+f (x0 , y0 ).
For fixed y0 and 4y, define g(x) = f (x, y0 +4y)f (x, y0 ). Then F (4x, 4y) =
g(x0 +4x)g(x0 ). By the mean value theorem for functions of one variable,
there is a number c [x0 , x0 + 4x] such that
F (4x, 4y) = g(x0 + 4x) g(x0 ) = g 0 (c)4x
= [fx (c, y0 + 4y) fx (c, y0 )]4x.
By the mean value theorem again,
F (4x, 4y) =
Since

2f
yx (c, d)

2f
(c, d)4x4y,
yx

d [y0 , y0 + 4y].

is continuous, we have

2f
1
2f
(x0 , y0 ) =
lim
F (4x, 4y) =
(x0 , y0 ),
yx
xy
(4x,4y)(0,0) 4x4y
where the second equality is obtained similarly from h(y) = f (x0 + 4x, y)
f (x0 , y), and F (4x, 4y) = h(y0 + 4y) h(y0 ).

Example 8.2.4 For f (x, y) = xey + yx2 , one can easily show that
2f
2f
= ey + 2x =
.
yx
xy

242

8.3

Chapter 8.

Functions of several variables

Differentiability

Recall that, for a differentiable function y = f (x) in a single variable x


4y
I = [a, b], the difference quotient 4x
as x changes from x0 I to x0 + 4x
satisfies
4y
= f 0 (x0 ) + , (= f 0 (c), for some c [x0 , x0 + 4x], )
4x
where 0 as 4x 0, or
4y = f 0 (x0 )4x + 4x.
If f 0 (x) is continuous at x0 , then this equation becomes
dy = f 0 (x0 )dx,

as 4x 0,

which is called the total differential of f at x0 . For 4x small, 4y


f 0 (x0 )4x is called the linear approximation of f : that is,
f (x) f (x0 ) + f 0 (x0 )(x x0 ) = L(x),
the right side of which is the equation of the tangent line through (x0 , f (x0 ))
of the graph of f . The error term 4x was computed from the Taylor
polynomial of f .
L(x) = f (x0 ) + f 0 (x0 )(x x0 )
f (x)
4y
f (x0 )

4x
f 0 (x0 )4x
4x
x0

x = x0 + 4x

The above formula can be used for the differentiability of functions with
more than one variables: f : Rn R. However, it is good enough to work
on two variable functions:
Definition 8.3.1 A function z = f (x, y) on D is said to be differentiable
at X0 = (x0 , y0 ) D if fx and fy exist at X0 , and
4z = f (X) f (X0 ) = fx (X0 )4x + fy (X0 )4y + 1 4x + 2 4y,
where 1 , 2 0 as 4x, 4y 0. f is said to be differentiable on D if it is
differentiable at every point of D.

8.3.

Differentiability

243

In fact, for functions with more than one variables, we have the following
theorem:
Theorem 8.3.1 Let z = f (x, y) be a function on an open domain D R2 .
Suppose that fx and fy are defined on D and continuous at X0 = (x0 , y0 )
D. Then the increment 4z = f (X) f (X0 ) of f from X0 to X = (x, y) =
(x0 + 4x, y0 + 4y) D is given by
4z = f (X) f (X0 ) = fx (X0 )4x + fy (X0 )4y + 1 4x + 2 4y,
where 1 , 2 0 as 4x, 4y 0.
Proof: We assume that 4x and 4y are small enough so that a rectangle
T centered at X0 is contained in D. Then 4z = 4z1 + 4z2 where
4z1 = f (x0 + 4x, y0 ) f (x0 , y0 ),
4z2 = f (x0 + 4x, y0 + 4y) f (x0 + 4x, y0 ).
6
f
y (x0

1 4x

02 4x
+ 4x, y0 )4x
6

?
6

dz

f
x (X0 )4x

x0

4x

6
?

6
?
4z

6?
6 4z
?

4z1

y0

x = x0 + 4x

2 4y
f
y (X0 )4y

y0 + 4y
?y = ?

X0
4y

X = (x, y)

From the case of single variable functions, we have


4z1 = fx (x0 , y0 )4x + 1 4x,
4z2 = fy (x0 + 4x, y0 )4y + 02 4y
= fy (x0 , y0 )4y + 2 4y, by the continuity of fx and fy ,
which gives the result.

244

Chapter 8.

Functions of several variables

Theorem 8.3.2 If the partial derivatives fx and fy are continuous on D,


then z = f (x, y) is differentiable at every point of D.
Corollary 8.3.3 If z = f (x, y) is differentiable on D, then it is continuous.
Definition 8.3.2 If z = f (x, y) is differentiable on D, then the limit of 4z
as 4x, 4y 0 is denoted by
dz = fx (X)dx + fy (X)dy,
which is called the total differential of f .
Corollary 8.3.4 Let z = f (x, y) be a function with continuous fx and fy
on an open domain D R2 . Then the linear approximation of f (X),
X = (x, y) = (x0 + 4x, y0 + 4y), at X0 = (x0 y0 ) is given by
f (X) f (X0 ) + fx (X0 )4x + fy (X0 )4y.
Note that the right side of the above equation is the equation of the
tangent plane of the graph of f at X0 . In fact, the vector (1, 0, fx (X0 ))
is tangent to the curve z = f (x, y0 ) at X0 , and the vector (0, 1, fy (X0 ))
is tangent to the curve z = f (x0 , y) at X0 . Thus, the normal vector n(X0 )
to the graph of f at (x0 , y0 , f (X0 )) is

i j

n(X0 ) = 1 0 fx (X0 ) = (fx (X0 ), fy (X0 ), 1).


0 1 fy (X0 )
Thus, if (x, y, w) is a point on the tangent plane spanned by the two vectors,
then X X0 = (x x0 , y y0 , w z0 ) = (4x, 4y, 4w) satisfies
0 = (X X0 ) (fx (X0 ), fy (X0 ), 1)
= (w z0 ) fx (X0 )4x fy (X0 )4y,
or w = f (X0 ) + fx (X0 )(x x0 ) + fy (X0 )(y y0 ).
The error terms in the linear approximation of z = f (x, y) by the value
w on the tangent plane will be given later in Section 8.6.
Example 8.3.1 Compute the functional value of z = f (x, y) = x2 xy +
1 2
2 y + 3 at X = (3.01, 2.02).
By a direct computation,
1
f (3.01, 2.02) = (3.01)2 (3.01)(2.02) + (2.02)2 + 3 = 8.0201.
2

8.4. DIRECTIONAL DERIVATIVES

245

Instead, we may take the linear approximation: Choose X0 = (3, 2). Then
f (X0 ) = 8, (4x, 4y) = (0.01, 0.02), and
fx (3, 2) = (2x y)(3,2) = 4,
fy (3, 2) = (x + y)(3,2) = 1,
z = f (3.01, 2.02) w = f (3, 2) + fx (3, 2)4x fy (3, 2)4y
= 8 + 4(0.01) + (1)(0.02) = 8.02.

Analogous results holds for functions z = f (x1 , . . . , xn ) of more than two


variables: The total differential represents the linear approximation:
df

= fx1 dx1 + + fxn dxn ,

or 4z = fx1 (X0 )4x1 + + fxn (X0 )4xn .

8.4

Directional Derivatives

Note that the partial derivatives fx (X0 ) and fy (X0 ), etc, of a function with
more than one variables are the rate of changes of f when X moves along
the lines through X0 and parallel to the coordinate axes. How about the
rate of change of f when X moves in other directions at X0 ?
Let z = f (x, y) be a differentiable function on R R2 and X0 =
(x0 , y0 ) R. Let (t) = (x(t), y(t)), t I = [a, b], be a differentiable
curve in R through X0 = (t0 ) with 0 (t0 ) = (x0 (t0 ), y 0 (t0 )) a unit vector.
Then z = f (t) = f (x(t), y(t)) is a function in t, which is the restriction
of the domain of f along the curve (t).
Theorem 8.4.1 The composite z = f (t) = f (x(t), y(t)) is differentiable
at t0 , and
dx(t0 )
dy(t0 )
dz
(t0 ) = fx (X0 )
+ fy (X0 )
.
dt
dt
dt
Proof: By the differentiability of f , we have
4z = fx (X0 )4x + fy (X0 )4y + 1 4x + 2 4y,
4z
4x
4y
4x
4y

= fx (X0 )
+ fy (X0 )
+ 1
+ 2
,
4t
4t
4t
4t
4t

246

Chapter 8.

Functions of several variables

where 1 , 2 0 as 4x, 4y 0. Since (t) is differentiable, 4x, 4y 0,


and so 1 , 2 0, as 4t 0. Hence

dz
4z
f dx(t0 )
f dy(t0 )
(t0 ) = lim
=
+
.

4t0 4t
dt
x
dt
y
dt
X0

X0

Recall that 0 (t0 ) = (x0 (t0 ), y 0 (t0 )) (u1 , u2 ) = u R2X0 is a vector


tangent to (t) at X0 = (t0 ) R, and the derivative of (f ) at t0
depends only on u = 0 (t0 ), not on the curve (t) itself: that is, for any
curve (t) in R such that (t0 ) = (t0 ) and 0 (t0 ) = u = 0 (t0 ),

f
dz
f
u1 +
u2 .
(t0 ) =
dt
x X0
y X0
This has several notations:
dz
(t0 ) = fx (X0 )u1 + fy (X0 )u2
dt
= (fx (X0 ), fx (X0 )) (u1 , u2 ) = f (X0 ) u, in vector notation,

u1
= [fx (X0 ) fx (X0 )]
= Df (X0 )0 (t0 ), in matrix notation,
u2
which is called the directional derivative of f at X0 in u direction, also
denoted by dz
dt (t0 ) = Dfu (X0 ) = f (X0 ) u.
This can be done at every point X R, and the first equality is the
total differential:
dz = fx (X)x0 (t)dt + fx (X)y 0 (t)dt = fx (X)dx + fx (X)dy.
In the vector notation,
f (X) = (fx (X), fx (X))
is called the gradient vector of f at X in R2X0 , and in the matrix notation
the 1 2 matrix
Df (X) = [fx (X) fy (X)]
is called the derivative of f at X, and the above equation represents the
chain rule of the derivatives of functions of several variables.
By definition, the directional derivative of f is the rate of change of f
when X moves in u-direction, and it is given as, for any unit vector u,
Dfu (X0 ) = f (X0 ) u = |f (X0 )| cos , = ](f (X0 ), u),

|f (X0 )|, when cos = 1 ( = 0), or u k f (X0 ),


0, when cos = 0 ( = 2 ), or u f (X0 ),
=

|f (X0 )|, when cos = 1 ( = ), or u k f (X0 ).

8.4.

Directional derivatives

247

To see a geometrical meaning of this computation, consider a level curve


C given by an equation f (x, y) = C. If we parameterize this contour curve
as (t) = (x(t), y(t)), t I, then C = f (t) = f (x(t), y(t)), and so, by
differentiating both sides, we get
0 =

dz(t)
= Df0 (t) ((t)) = f ((t)) 0 (t).
dt

Since 0 (t) is tangent to the contour curve C, this means that f ((t)) is always perpendicular (or, normal) to C. That is, the function f increases most
rapidly in f (X) direction at X R at the rate |f (X), stays constant at
the rate 0 in the direction of C (or perpendicular to f (X)), and decreases
most rapidly in in f (X) direction at X R at the rate |f (X)|.
y
6

C = f (x, y)

X = (t)

0 (t)
1 u

j f (X)
- x

Example 8.4.1 Consider a function z = f (x, y) = xey + cos(xy). The


directional derivative f in v = (3, 4) direction at X0 = (2, 0) is computed
as follows: The direction of v is the unit vector u = ( 53 , 45 ), and the partial
derivatives are
fx (2, 0) = (ey y sin(xy))|(2,0) = 1,

fy (2, 0) = (xey x sin(xy))|(2,0) = 2.

Thus the gradient vector of f at X0 is f (2, 0) = (1, 2) and the directional


derivative is
3 4
Dfu (2, 0) = f (2, 0) u = (1, 2) ( , ) = 1.
5 5
f increases most rapidly in (1, 2) direction, and stays constant in (2, 1)
direction. The tangent line to the level curve f (x, y) = 3 at X0 is orthogonal
to f (2, 0) = (1, 2). Thus its equation is
0 = f (2, 0) (x 2, y) = (1, 2) (x 2, y) = x + 2y 2.

248

Chapter 8.

Functions of several variables

The following theorem is an easy consequence from the definition.


Theorem 8.4.2 Let f and g be differentiable functions on R, and k R,
X R. Let u, v R2X . For notational convention, we use D(f )u (X)
D(f )(u).
(1) D(kf + g)(u) = (kf + g) u = (kf + g) u = (kD(f ) + D(g))(u),
(2) D(f )(ku + v) = kD(f )(u) + D(f )(v),
(3) D(f g)(u) = (f g) u = (f g + gf ) u = (f D(g) + gD(f ))(u),
D(g)
g
(4) D( fg )(u) = ( fg ) u = ( gfgf
) u = ( gD(f )f
)(u).
2
g2

For functions of more than two variables: f : Rn R, the similar


definitions hold except for adding more terms with respect to the number of
variables: for example, if w = f (x, y, z) is a differentiable function of three
variables, then
dw = fx dx + fy dy + fz dz,
f

= (fx , fy , fz ),

D(f )(u) = f u = fx u1 + fy u2 + fz u3 .

8.5

Derivatives and Chain Rule

In this section, we consider vector-valued functions of more than one variables: that is, functions of the form F : Rn Rm denoted by
F (x1 , . . . , xn ) = (f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn )), (x1 , . . . , xn ) R Rn ,
where R is an open domain in Rn . F is differentiable if f1 , . . ., fm are
differentiable functions on R: that is, fj s have continuous partial derivatives
on R, and its derivative is defined to be the m n matrix of the partial
derivatives:
f1
f1
x
x1
n

..
DF (X) =
(X).
.
fm
x1

fm
xn

As usual, one can easily see that if F is differentiable on R, then it is


continuous on R.
For simplicity, we consider a function F : R2 R3 , denoted by
F (u, v) = (x(u, v), y(u, v), z(u, v)) U R3 ,

(u, v) R R2 ,

8.5.

Derivatives and Chain rule

249

where R is an open domain in R2 and x, y, z are differentiable functions on


R, and a differentiable curve (t) = (u(t), v(t)), t I in R R2 . Then
(t) = (F )(t) = F ((t)) = F (u(t), v(t)) = (x((t)), y((t)), z((t)))
= (x(u(t), v(t)), y(u(t), v(t)), z(u(t), v(t)))
is a curve in U. Then the derivatives of the component functions x, y, and
z restricted on the curve (t) are given as
d(x )(t)
dt
d(y )(t)
dt
d(z )(t)
dt

=
=
=

x du(t) x dv(t)
+
u dt
v dt
y du(t) y dv(t)
+
u dt
v dt
z du(t) z dv(t)
+
.
u dt
v dt

These three equations together can be written in the chain rule


d(x)
x x
" du #
u
v
dt
d(y)
y y
dt
0

(t) = D(F )t =
dt = u v
dv
d(z)
dt

z
u

z
v

dt

(t)

= DF(t) Dt .
This shows that DF(t) transforms the tangent vectors 0 (t) at (t) to vector
0 (t) tangent to (t).
Example 8.5.1 Consider the spherical coordinates
F (, ) = (x(, ), y(, ), z(, )), given by, for (, ) (0, 2) (0, ),
x(, ) = sin cos ,

y(, ) = sin sin ,

z(, ) = cos .

Let (t) = ((t), (t)) = (t, 0 ) be a curve, where 0 is a constant. Then


the tangent vector to (t) = (F )(t) is
x x
" d #

y
y
dt

0 (t) = DF(t) 0 (t) =



d
z

dt

sin 0 sin t cos 0 cos t


sin 0 cos t
0

(t)

cos 0 sin t

sin 0

"

1
0

sin 0 sin t

sin 0 cos t .

250

Chapter 8.

Functions of several variables

Let f : U R be a differentiable function on U. Then for each fixed v0 ,


(f F )(u, v0 ) is a function in u only and so the derivative of (f F ) in u is
just the partial derivative and so, by the chain rule Theorem 8.4.1, we have

(f F )
f x f y f z
=
+
+
.
u (u0 ,v0 )
x u y u z u (u0 ,v0 )
Similarly, for fixed u0 , the partial derivative of (f F ) in v is:

f x f y f z
(f F )
=
.
+
+
v (u0 ,v0 )
x v
y v
z v (u0 ,v0 )
These two equations together can be written in the chain rule

h
i
h
i

f
f
f

D(f F )X = (fuF ) (fvF )


=
x
y
z
X
F (X)

x
u
y
u
z
u

x
v
y
v
z
v

= DfF (X) DFX .


Example 8.5.2 Let w = f (x, y, z) = x + 2y + z 2 and
F (u, v) = (x(u, v), y(u, v), z(u, v)) = ( uv , u2 + ln v, 2u). Then
D(f F )(u,v) = DfF (u,v) DF(u,v) =
x

w
u

w
v

fx fy fz
1
v

u
y
u
z
u

+ 12u vu2 +

2
v

x
v
y
v
z
v

1
v

= [1 2 4u] 2u

vu2
1
v

In general, if F : Rn Rm and G : Rm Rp are differentiable functions,


then the chain rule holds as:

g1
g1
f1
f1
x
y1
ym
x1
n

..
..

D(G F )X = DGF (X) DFX =


.
.
.

gp
gp
fm
fm
xn
x1
y1
ym
Suppose that z = F (x, y) is a differentiable function and the level curve
C = F (x, y) defines a differentiable function y = g(x). Then we have
C = F (x, y) = F (x, g(x)) is a function in x. By differentiating both sides,
0 = Fx

dx
dy
dg(x)
+ Fy
= Fx + Fy
.
dx
dx
dx

8.5.

Derivatives and Chain rule

251

Thus, if Fy 6= 0, we have

dy
Fx
= ,
dx
Fy
which is called the implicit differentiation.
F (x, y) = x2 + y 2 r2 = 0 defines two funcExample8.5.3 An equation
tions y = r2 x2 and y = r2 x2 . In both cases,
2x + 2y

dy
x
dy
= 0 =
= ,
dx
dx
y

in which sign is included in the sign of the y value.

Let w = f (x, y, z) be a differentiable function, in which z = g(x, y) is


also a differentiable function. Then w = f (x, y, g(x, y)) = F (x, y), which is
the composite of f with (x, y) = (x, y, g(x, y)). Thus
DF
or

Fx Fy

= Df D,

1 0
fx fy fz 0 1
=
gx gy

fx + fz gx fy + fz gy .
=

Such a function z = g(x, y) is called a constraint of f .


Example 8.5.4 Let w = f (x, y, z) = x2 + y 2 + z 2 , and z = g(x, y) satisfy
w
z 3 xy + yz + y 3 = 1. Find w
x and y .
Set (x, y) = (x, y, g(x, y)). Then, by implicit differentiation,
z
z
y+y
= 0 =
x
x
z
z
3z 2
x+z+y
+ 3y 2 = 0 =
y
y

z
y
= 2
.
x
3z + y
z
x z 3y 2
=
.
y
3z 2 + y

3z 2

Hence,

wx wy

1 0
fx fy fz 0 1
=
gx gy

2x 2y 2z 0
=

y
3z 2 +y

h
=

2x +

2yz
3z 2 +y

2y +

0
1

xz3y 2
3z 2 +y

2z(xz3y 2 )
3z 2 +y

i
.

252

8.6

Chapter 8.

Functions of several variables

Taylors Polynomial

Let z = f (x, y) have continuous partial derivatives on an open domain


R R2 . Let P = (a, b) D R and Q = (x, y) = (a + h, b + k) D,
where h = 4x and k = 4y. Let (t) = (a + th, b + tk), 0 t 1, be the
line segment joining P to Q with 0 (t) = (h, k). Then the derivative of the
function F (t) = f (t) = f (a + th, b + tk) is given, by the chain rule, as

dx
dy
0
F (0) = fx
= [hfx + kfy ](a,b) ,
+ fy
dt
dt (a,b)

dx
dy
dx
dy
00
F (0) = fxx
+ fxy
h + fyx
+ fyy
k
dt
dt (a,b)
dt
dt (a,b)

= fxx h2 + 2fxy kh + fyy k 2 (a,b) .


By the Taylors formula for functions of 1-variable,
F 00 (c)
(1 0)2 , for some c [0, 1],
2!
or f (x, y) = f (a, b) + (fx (a, b)4x + fy (a, b)4y)
1
+ [fxx 4x2 + 2fxy 4x4y + fyy 4y 2 ](a+ch,b+ck) .
2!
F (1) = F (0) + F 0 (0)(1 0) +

The last term is the error term for the linear approximation of f (x, y) by
the tangent plane, discussed in Corollary 8.3.4: If
M = max{|fxx (x, y)|, |fxy (x, y)|, |fyy (x, y)| | (x, y) D},
where D is a rectangle in R centered at (a, b), then
Error2 (Q, P )

M
(|4x| + |4y|)2 .
2

In general, for fixed h = x a = 4x and k = y b = 4y, define a

differential operator D = h x
+ k y
so that for a differentiable function f ,

D(f ) = h
+k
(f ) = hfx + kfy ,
x
y

2
2
D (f ) = h
+k
(f ) = h2 fxx + 2hkfxy + k 2 fyy ,
x
y

3
+k
(f ) = h3 fxxx + 3h2 kfxxy + 3hk 2 fxyy + k 3 fyyy ,
D (f ) = h
x
y
..
.

8.7. EXTREME VALUES

253

Now, if f is smooth enough on R,


F

(n)

(0) =

dn

n
n
F (0) = D (f )(a,b) = h
+k
(f )(a,b) .
dtn
x
y

The Taylor polynomial is given as:


F (1) = F (0) + F 0 (0)(1 0) +
+

F (n+1) (c)
(1 0)n+1 ,
(n + 1)!

F 00 (0)
F (n) (0)
(1 0)2 + +
(1 0)n
2!
n!
for some c [0, 1],

or f (x, y) = f (a, b) + [fx h + fy k](a,b) +


+

8.7

1 2
[h fxx + 2hkfxy + k 2 fyy ](a,b) +
2!

1 n
1
D (f )(a,b) +
Dn+1 (f )(a+ch,b+ck) .
n!
(n + 1)!

Extreme Values

Most of optimization problems in applications are concerned about the maximization or minimization of certain functions of several variables. When
the function is smooth on the domain, those local extrema usually occur at
some boundary points of the domain, or at the points where the tangent
planes are horizontal, or the derivatives are zero, but not all the time. A
point where the tangent plane is horizontal, but is not a local extremum, is
called a saddle point.
Definition 8.7.1 Let z = f (x, y) be function on a domain R R2 . For
(a, b) R, f (a, b) is a local maximum (or a local minimum) if f (a, b)
(or, ) f (x, y) for all (x, y) R in an open disk centered at (a, b).
Theorem 8.7.1 (First derivative test) If z = f (x, y) has a local extremum at an interior point (a, b) R, and if the first partial derivatives
exist there, then fx (a, b) = 0 = fy (a, b).
Proof: If z = f (x, y) has a local extremum at (a, b) R, then g(x) = f (x, b)
also has a local extremum at x = a. Thus g 0 (a) = fx (a, b) = 0. Similarly,
fy (a, b) = 0.

254

Chapter 8.

Functions of several variables

Definition 8.7.2 An interior point (a, b) of R is called a critical point of


f (x, y) if either fx or fy does not exist there, or fx (a, b) = 0 = fy (a, b). For
critical point (a, b) in R, the point (a, b, f (a, b)) on the graph of z = f (x, y)
is called a saddle point if on every open disk at (a, b) there are points (x, y)
where f (a, b) > f (x, y), and points where f (a, b) < f (x, y).
Let (a, b) R be a critical point of f such that fx (a, b) = 0 = fy (a, b),
or D(f ) = [hfx + kfy ](a,b) = 0. Thus, from the Taylors formula,
f (x, y) = f (a, b) +

1 2
[h fxx + 2hkfxy + k 2 fyy ](a,b) + Error3 (Q, P ),
2!

where |Error3 (Q, P )| 0 for sufficiently small (h, k) = (xa, yb). Hence,
f (x, y) f (a, b)

1 2
[h fxx + 2hkfxy + k 2 fyy ](a,b) ,
2!

and so both sides have the same sign. Thus,


f (a, b) is maximum if f (x, y) f (a, b) < 0, (x, y) D,
f (a, b) is minimum if f (x, y) f (a, b) > 0, (x, y) D.
The right side is
fxx [f (x, y) f (a, b)]
=

1 2 2
[h fxx + 2hkfxx fxy + k 2 fxx fyy ]
2!
1
2
(hfxx + kfxy )2 + k 2 (fxx fyy fxy
).
2!

Theorem 8.7.2 (Second derivative test) (1) f (x, y) f (a, b) > 0, or


2 > 0 and f
f (a, b) is minimum, if fxx fyy fxy
xx > 0 at (a, b),
2 > 0 and
(2) f (x, y) f (a, b) < 0, or f (a, b) is maximum , if fxx fyy fxy
fxx < 0 at (a, b).
2 < 0 at (a, b) since f (x, y)
(3) f (a, b) is a saddle point, if fxx fyy fxy
f (a, b) takes both signs depending on the values of h and k.
2 = 0 at (a, b), since there is a possibility
(4) The test fails if fxx fyy fxy
that f (x, y) f (a, b) = 0.
2 ]
The expression [fxx fyy fxy
(a,b) is called the Hessian of f , which can
be written as the determinant:

fxx fxy

.
Hf (a, b) =
fxy fyy (a,b)

8.8. LAGRANGE MULTIPLIERS

255

Example 8.7.1 Find the local extrema of f (x, y) = xyx2 y 2 2x2y+4.


fx = y 2x 2 = 0,

fy = x 2y 2 = 0 = x = 2 = y.

Thus, (2, 2) is the only critical point of f . Since


fxx = 2 < 0, fyy = 2, fxy = 1,
the Hessian of f at (a, b) = (2, 2) is 3. Thus, f (2, 2) = 8 is the local
maximum.

Example 8.7.2 Find the absolute extrema of f (x, y) = 2+2x+2y x2 y 2


on the triangular region R = {(x, y) | x 0, y 0, y 9 x}.
(1) On the interior: By solving fx = 2 2x = 0 and fy = 2 2y = 0,
(a, b) = (1, 1) is the only critical point in R with value f (1, 1) = 4.
(2) On the boundary:
(i) At the corner P1 = (0, 0), f (0, 0) = 2, P2 = (0, 9), f (0, 9) = 61, and
P3 = (9, 0), f (9, 0) = 61.
(ii) On y = 0, f (x, 0) = 2 + 2x x2 with x [0, 9]. Thus f 0 (x, 0) =
2 2x = 0 means x = 1. Thus f (1, 0) = 3.
(iii) On x = 0, f (0, x) = 2+2y y 2 with y [0, 9]. Thus f 0 (0, y) = 22y =
0 means y = 1. Thus f (0, 1) = 3.
(iv) On y = 9 x, f (x, 9 x) = 2 + 2x + 2(9 x) x2 (9 x)2 =
61 + 18x 2x2 with y [0, 9]. Thus f 0 (x, 9 x) = 18 4x = 0 means
x = 29 , and so y = 92 . Thus f ( 92 , 92 ) = 41
2 .
Thus, the maximum of f is f (1, 1) = 4, and the minima occur at the corners:
f (0, 9) = 61 = f (9, 0).

8.8

Lagrange Multipliers

The method of Lagrange multipliers is useful to find the extreme values of


w = f (x, y, z) with its domain restricted to some constraint. The following
examples show how to find such extreme values.
Example 8.8.1 Find the point P = (x, y, z) on the plane 2x + y z 5 = 0
closest to the origin.

256

Chapter 8.

Functions of several variables

Solution: To minimize the distance from the origin to a point P , it is good


enough to minimize w = f (x, y, z) = x2 + y 2 + z 2 subject to the constraint:
2x + y z 5 = 0. Solving this for z, we get z = g(x, y) = 2x + y 5. Thus
our problem is reduced to minimize
w = F (x, y) = f (x, y) = f (x, y, 2x + y 5) = x2 + y 2 + (2x + y 5)2 ,
where (x, y) = (x, y, 2x + y 5). Now, solve Fx = 2x + 2(2x + y 5)2 =
10x + 4y 20 = 0 and Fy = 2y + 2(2x + y 5) = 4x + 4y 10 = 0, to
get x = 35 , y = 65 . By the second derivative test, F ( 53 , 56 ) is the minimum,
since HF ( 53 , 56 ) = 24 > 0 and Fxx = 10 > 0. Since z = 2x + y 5 = 65 ,
P = ( 35 , 56 , 65 ).

Remark: In Example 8.8.1, the constraint represents a level surface in


R3 , which can be expressed in various ways, and so the functional value
w = f (x, y, z) usually changes in different rate depending on what dependent
variables we choose, as the following example shows one has to specify what
the dependent variables are.

2
2
2
Example 8.8.2 Find w
x of w = f (x, y, z) = x + y + z subject to a
2
2
constraint x + y z = 0.

Solution: (1) If we choose x, y as independent variables and z as a dependent variable, then z = g(x, y) = x2 + y 2 , and so w = f (x, y) =
f (x, y, g(xy)) with (x, y) = (x, y, g(x, y)). Thus

1 0
1 0
h
i

w
w
fx fy fz 0 1 = 2x 2y 2z 0 1
=
x
y
gx gy
2x 2y

2
2
2
2
2x + 4x(x + y ) 2y + 4y(x + y ) .
=
(2) If we choose x, z as
independent variables and y as a dependent variable, then y = h(x, x) = z x2 , and so w = f (x, z) = f (x, h(x, z), z) =
x2 + (z x2 ) + z 2 = z + z 2 . Thus
w
= 0.
x
(3) A geometrical interpretation: The level surface x2 + y 2 z = 0 is
the paraboloid as the following picture. When the x-coordinate of a point P
on the paraboloid varies, while holding y(= 0), as an independent variable,
fixed, P moves along the parabola z = x2 . Thus w as the distance from the
3
2
origin to P changes so that w
x = 2x + 4x + 4xy 6= 0.

8.8.

Lagrange multipliers

257

P
z=
x)

c = x2 + y 2
x2
zy

If we take y as a dependent variable, and if the x-coordinate of a point P


on the paraboloid varies, while holding z, as an independent variable, fixed,
then P moves along the circle c = x2 + y 2 . Hence the distance w from the

origin to P remains constant and so w


x = 0.
Example 8.8.3 Find the point P = (x, y, z) on the hyperbolic cylinder
x2 z 2 1 = 0 closest to the origin.
Solution: Again, to minimize the distance from the origin to a point P , it
is good enough to minimize w = f (x, y, z) = x2 + y 2 + z 2 subject to the
constraint: g(x, y, z) = x2 z 2 1 = 0.
Note that the constraint g(x, y, z) = 0 describes a level surface S in
3
R . On the other hand, f (x, y, z) = c2 is also a level surface, which is the
sphere of radius c centered at the origin, so that the points on this sphere
are at the same distance from the origin. As the the radius increases, when
the sphere touches the hyperbolic cylinder, the point of contact will be the
closest point on the hyperboloid to the origin. At this contact point, the
two surface will be tangent to each other and so their normal vectors will be
parallel to each other. But those normal vector are just the gradient vectors
since the surfaces are level surface and the gradients are normal to the level
surfaces. Hence, at contact point P , we can write
f (P ) = g(P ), for some R.
In our problem,
f (P ) = (2x, 2y, 2z) = (2x, 0, 2z) = g(P ),
or 2x = 2x, 2y = 0, 2z = 2z.
Since P S, x 6= 0. Thus from the first equation, = 1. Then 2z = 2z
shows z = 0. Hence P = (x, 0, 0) and it has to be P S: i.e., 1 = x2 z 2 =
x2 , or x = 1. Thus, P = (1, 0, 0).

This method holds in general as follows:

258

Chapter 8.

Functions of several variables

Theorem 8.8.1 Let w = f (x, y, z) be a differentiable function on U R3 ,


and (t) = (x(t), y(t), z(t)) : C U a smooth curve in U. If f has a local
extremum at P0 = (t0 ) C relative its values on C, then f (P0 ) C, or
f (P0 ) 0 (t0 ) = 0.
Proof: Note that F (t) = (f )(t) = f (x(t), y(t), z(t)) has local extremum
at t0 means that
d
Ft = f (P0 ) 0 (t0 ) = 0.

dt 0
Suppose that w = f (x, y, z) and g(x, y, z) are differentiable functions on
U , and suppose that w = f (x, y, z) has a local extremum at P0 on the level
surface g(x, y, z) = 0 relative to its values on the surface. Then f takes on
a local extremum at P0 relative to its values on every differentiable curve
through P0 on the surface g(x, y, z) = 0. Therefore, f (P0 ) is orthogonal
to the velocity vector of every curve on the surface through P0 . Since g is
orthogonal to the level surface g(x, y, z) = 0, we must have
f (P0 ) = g(P0 ), for some R.
is called a Lagrange multiplier, and this is called the method of Lagrange multipliers.
Example 8.8.4 Find the extremum values of z = f (x, y) = xy subject to
2
2
C : x8 + y2 = 1.
Solution: For f (x, y) = (y, x) and g(x, y) = ( x4 , y), f (x, y) = g(x, y)
gives
1
1
y = x, x = y = y = 2 y.
4
4
Thus, y = 0 or = 2. However, y 6= 0, otherwise x = 0 = 0 but
(0, 0) 6 C. Thus y 6= 0 and = 2 means x = 2y. Then from the
constraint:
x2 y 2
0=
+
1 = y 2 1 = y = 1, x = 2.
8
2
In fact, f (2, 1) = 2 is the maximum and f (2, 1) = 2 is the minimum.

Sometimes, it can be asked to find the extremum values of w = f (x, y, z)


subject to two constraints: g1 (x, y, z) = 0 and g2 (x, y, z) = 0 with g1

8.8.

Lagrange multipliers

259

g2 . In this case, it can also be found by introducing two Lagrange multipliers 1 and 2 : Find the values x, y, z, 1 , and 2 from the following
equations
f = 1 g1 + 2 g2 ,

g1 (x, y, z) = 0, g2 (x, y, z) = 0.

Geometrically, the two level surfaces of the constraints intersect in a


smooth curve C, and we are looking for points on the curve C where f takes
local extrema relative to its values on the curve. These are the points where
f is orthogonal to C. Since g1 and g2 are both orthogonal to C, f
lies in the plane spanned by g1 and g2 : i.e., f = 1 g1 + 2 g2 for
some 1 and 2 .
g2 = 0
g1

P0

f
- g2

g1 = 0

Example 8.8.5 Find the points closest to the origin subject to two constraints g1 (x, y, z) = x + y + z = 1 and g2 (x, y, z) = x2 + y 2 = 1.
z

x+y+z =1
(1, 0, 0)

(0, 1, 0)
j y

x
x2

y2

=1

Solution: The intersection C of the two constraints is an ellipse. and we


want to find extreme values of f (x, y, z) = x2 + y 2 + z 2 on the ellipse, we

260

Chapter 8.

Functions of several variables

solve
f (x, y, z) = 1 g1 (x, y, z) + 2 g2 (x, y, z)
or (2x, 2y, 2z) = 1 (1, 1, 1) + 2 (2x, 2y, 0),
or 2x = 1 + 22 x, 2y = 1 + 22 y, 2z = 1 .
Thus, (1 2 )x = z and (1 2 )y = z, and so 2 = 1 and z = 0, or 2 6= 1
z
.
and x = y = 1
2
If z = 0, then 1 = 0, and from the constraints, x+y = 1 and x2 +y 2 = 1.
Thus, 0 = x2 + (1 x)2 1 = 2x(x 1) shows x = 0 and y = 1, or x = 1
and y = 0. Hence, at (0, 1, 0) and (1, 0, 0), f = 1 is the minimum.
If x = y, then, from the constraints, 2x+ z = 1 or z = 1 2x, and
and
x2 + y 2 = 2x2 = 1 or x = 12 : z = 1 2. Hence at ( 12 , 12 , 1 2)

and ( 12 , 12 , 1 + 2), f takes maximum. Thus the closest points are


(0, 1, 0) and (1, 0, 0) on the ellipse.