Chain Rule for Multivariable Functions

The Chain Rule:
Recall that the chain rule for a function of one variable says that for differentiable
function f and g ,
d
f ( g ( x ) ) = f ( g ( x ) ) g ( x ) .
dx
Now, we extend the chain rule to functions of two or more variables.
Theorem 1: If f ( x, y ) is a differentiable function of x and y , and both x and y are
differentiable functions of a single variable t , then f ( x ( t ) , y ( t ) ) is differentiable with
respect to t , and its derivative is given by the equation

d
f
dx f
dy
f ( x ( t ) , y ( t ) ) = ( x ( t ) , y ( t ) ) + ( x ( t ) , y ( t ) ) .
dt
x
dt y
dt
Proof: Let g ( t ) = f
( ( x ( t ) , y ( t ) ) ) . Then, by definition of the ordinary derivative, we have
f
g ( t + t ) g ( t )
= lim
t 0
t 0
t
g ( t ) = lim
For
simplicity,
f = f
if
we
write
( ( x ( t + t ) , y ( t + t ) ) ) f ( ( x ( t ) , y ( t ) ) ) .
t
x = x ( t + t ) x ( t ) ,
y = y ( t + t ) y ( t )
and
( ( x ( t + t ) , y ( t + t ) ) ) f ( ( x ( t ) , y ( t ) ) ) , then the preceding equation gives us

d
f
dt
( ( x ( t ) , y ( t ) ) ) = g ( t ) = lim ft .
t 0
Since f is a differentiable function of x and y , by definition, we have that

f =
f
f
x + y + 1x + 2 y
x
y
...(1)
where both 1 0 and 2 0 as ( x, y ) ( 0, 0) . Dividing (1) throughout by t , we get

f f x f y
x
y
=
+
+ 1
+ 2
t x t y t
t
t
f f
x f
y
x
y
lim
=
lim
+ lim
+ lim ( 1 ) lim
+ lim ( 2 ) lim
...(2)
t 0 t
t 0 t
t 0
t 0 t
x t 0 t y t 0 t t 0
x ( t + t ) x ( t ) dx
y ( t + t ) y ( t ) dy
x
y
= lim
=
and lim
= lim
=
. Also,
t 0 t
t 0
t 0 t
t 0
t
dt
t
dt
since x ( t ) and y ( t ) are differentiable, they are continuous as well, so that
Now, lim
lim x = lim x ( t + t ) x ( t ) = 0,
t 0
t 0
lim y = lim y ( t + t ) y ( t ) = 0.
t 0
t 0
Consequently,
( x, y ) ( 0,0 )
as t 0 . Hence, lim 1 = 0, lim 2 = 0 . Thus, using

t 0
t 0
equation (2), we obtain
lim
t 0
f f dx f dy
=
+
t x dt y dt
d
f
dx f
dy
f ( x ( t ) , y ( t ) ) = ( x ( t ) , y ( t ) ) + ( x ( t ) , y ( t ) ) .
dt
x
dt y
dt
Remarks: We consider x and y to be intermediate variables, which are both functions of a

single variable t . If we write w = f ( x, y ) , then the Chain rule is conveniently written in
short form as
dw w dx w dy
=
+
.
dt x dt y dt
To remember it, we use the tree diagram as shown below:
Similarly, if w = f ( x, y, z ) is a function of three variables, which are themselves

differentiable functions of a single variable t , then the appropriate Chain rule is
dw w dx w dy w dz
=
+
+
.
dt
x dt y dt z dt
Example 1: Let z = f ( x, y ) = x 2 e y , where x and y are functions one variable t defined by
x ( t ) = t 2 1, y ( t ) = sin t . Find the derivative of z with respect to t .

Solution: First, we compute the derivatives
z
z
= 2 xe y ,
= x 2e y , x ( t ) = 2t , y ( t ) = cos t .
x
y
By applying Theorem 1, we obtain
dz z dx z dy
=
+
dt x dt y dt
= 2 xe y ( 2t ) + x 2 e y ( cos t )
2
= 2 ( t 2 1) esin t ( 2t ) + ( t 2 1) esin t ( cos t ) .
Example 2: Let w = f ( x, y ) = x 2 y y 2 , where x = sin t , y = et . Find dw dt when t = 0 .

Solution: By the Chain rule in Theorem 1, we have
dw w dx w dy
=
+
dt x dt y dt
= 2 xy ( cos t ) + ( x 2 2 y ) et
= 2sin t ( et ) cos t + ( sin 2 t 2et ) et .
When t = 0 , it follows that dw dt = 2 .
Remarks: In Examples 1 and 2, we could have first substituted for x and y , then computed
the derivative of g ( t ) = f ( x ( t ) , y ( t ) ) , using the usual rules of differentiation of one

variable. However, as the next example shows, sometimes we dont have any alternative but
to use the chain rule to find the derivative of a function of two variables.
Example 3: Suppose the production of a firm is modeled by the Cobb-Douglas production
function P ( k , l ) = 20k 1 4l 3 4 , where k measures the capital (in millions of dollars) and l
measures the labour force (in thousands of workers). Suppose that when l = 2 and k = 6 , the
labour force is decreasing at the rate of 20 workers per year and capital is growing at the rate
of $400,000 per year. Determine the rate of change of production.
Solution: Let t denote the time in years and g ( t ) = P ( k ( t ) , l ( t ) ) . From the Chain rule, we
have
g (t ) =
P dk P dl P
P
+
=
k (t ) +
l (t ) .
k dt l dt k
l
Also, we have P k = 5k 3 4l 3 4 and P l = 15k 1 4l 1 4 . With l = 2 and k = 6 , this gives us
P
P
( 6,2 ) 2.1935 and
( 6,2 ) 19.7411 .
k
l
Also, we have k ( t ) = 0.4 and l ( t ) = 0.02 . Thus, we obtain
g (t ) =
P
P
k (t ) +
l ( t ) 2.1935 ( 0.4 ) + 19.7411( 0.02 ) = 0.48258.
k
l
This indicates that the production is increasing at the rate of approximately one-half unit per
year.
Example 4: Two objects are travelling in elliptical paths given by the following parametric
equations.
x1 = 4cos t and y1 = 2sin t
...first object
x1 = 2sin 2t and y1 = 3cos 2t
...second object
At what rate is the distance between objects is changing when t = ?

Solution: The motion of the two objects is shown in the figure below.
The distance s between the two objects, as a function of four variables, is given by
s=
( x2 x1 )
+ ( y2 y1 ) ,
and that when t = , we have x1 = 4, y1 = 0, x2 = 0, y2 = 3 , so that s = 5. Therefore, when

t = , the partial derivatives of s are as follows:
s ( x2 x1 )
4 s ( y2 y1 )
3 s ( x2 x1 ) 4 s ( y2 y1 ) 3
=
= ,
=
= ,
=
= ,
=
= .
x1
s
5 y1
s
5 x 2
s
5 y 2
s
5
dx1
dy
dx
dy
= 4sin t = 0, 1 = 2 cos t = 2, 2 = 4 cos 2t = 4, 2 = 6sin 2t = 0.
dt
dt
dt
dt
Thus, using the appropriate Chain rule, we obtain that the distance is changing at the rate of
Also, at t = ,
ds s dx1 s dy1 s dx2 s dy2

=
+
+
+
dt x1 dt y1 dt x2 dt y2 dt
4
3
4
3
= ( 0 ) + ( 2 ) + ( 4 ) + ( 0 )
5
5
5

5
22
= .
5
Remarks: We can easily extend the Chain Rule given in Theorem 1 to the case of a function
f ( x, y ) , where both x and y are functions of two independent variables s and t ,
x = x ( s, t ) , y = y ( s, t ) . It is given in the next theorem (without proof).

Theorem 2: If z = f ( x, y ) is a differentiable function of x and y , and both x and y, as
functions of two variables s and t , written as x = x ( s, t ) , y = y ( s, t ) , have first order partial
derivatives, then we have the chain rules:
z z x z y
=
+
;
s x s y s
z z x z y
=
+
.
t x t y t
Example 4: Use the Chain Rule to find w s and w t for w = 2 xy , where x = s 2 + t 2

and y = s t .
Solution: First, we find the partial derivatives of x and y with respect to s and t .
x
x
y 1 y
s
= 2 s,
= 2t ,
= ,
= 2 .
s
t
s t t
t
Hence, by the Chain rule given in Theorem 2, we get
w w x w y
1 6 s 2 + 2t 2
1 s
=
+
= ( 2 y )( 2 s ) + 2 x = 2 2 s + 2 ( s 2 + t 2 ) =
,
s x s y s
t
t
t t
2
3
w w x w y
s s
s 2 st 2 s
=
+
= ( 2 y )( 2t ) + 2 x 2 = 2 2t + 2 ( s 2 + t 2 ) 2 =
.
t x t y t
t2
t t
t
Remarks: The Chain Rule in Theorem 2 can be extended to any number of variables. For
example, if w is a function of n variables x1 , x2 ,..., xn , where each xi is a differentiable
function of the m variables t1 , t2 ,..., tm , then for w = f ( x1 , x2 ,..., xn ) , we have
w w x1 w x2
w xn
=
+
++
t1 x1 t1 x2 t1
xn t1
w w x1 w x2
w xn
=
+
++
t2 x1 t2 x2 t2
xn t2
w w x1 w x2
w xn
=
+
+ +
tm x1 tm x2 tm
xn tm
The tree diagram below gives the Chain rule for a function of three variables x, y , z , where
each of these intermediate variables are functions of two variables u and v .
Implicit Partial Differentiation:

Suppose that x and y are related by the equation F ( x, y ) = 0 , where it is assumed
that y = f ( x ) is a differentiable function of x . The problem is to find dy dx . If y can be
solved explicitly in terms of x , then we can find dy dx by usual methods of differentiation.
However, if y cannot be solved in terms of x , then we use Chain rule to find dy dx .
Consider the function w = F ( x, y ) = F ( x, f ( x ) ) . Using the Chain rule, we find that
dw dx = Fx ( x, y )( dx dx ) + Fy ( x, y )( dy dx ) .
Since w = F ( x, y ) = 0 , we have dw dx = 0 , so that Fx ( x, y )
dx dx = 1 , so that if Fy ( x, y ) 0 , we get
theorem of implicit differentiation:
dx
dy
+ Fy ( x, y ) = 0 . But
dx
dx
F ( x, y )
dy
= x
. We formally state this result as a
dx
Fy ( x, y )
Theorem 3: If the equation F ( x, y ) = 0 defines y implicitly as a differentiable function of

x such that Fy ( x, y ) 0 , then
F ( x, y )
dy
.
= x
dx
Fy ( x, y )
Similarly, if the equation F ( x, y , z ) = 0 defines z implicitly as a differentiable function of x

and y such that Fz ( x, y , z ) 0 , then
F ( x, y , z )
F ( x, y, z )
z
z
.
= x
and
= y
x
Fz ( x, y, z )
y
Fz ( x, y, z )
The above theorem can be extended to differentiable functions defined implicitly with any
number of variables.
Example 5: Find dy dx , given y 3 + y 2 5 y x 2 + 4 = 0.
Solution: Let F ( x, y ) = y 3 + y 2 5 y x 2 + 4 . Then we have
Fx ( x, y ) = 2 x, Fy ( x, y ) = 3 y 2 + 2 y 5 .
Because Fy ( x, y ) = 3 y 2 + 2 y 5 0 for any ( x, y ) , it follows therefore by Theorem 3 that

F ( x, y )
dy
2x
= x
= 2
.
dx
Fy ( x, y ) 3 y + 2 y 5
Example 6: Find z x and z y , given 3x 2 z x 2 y 2 + 2 z 3 + 3 yz 5 = 0 .
Solution: Let F ( x, y, z ) = 3x 2 z x 2 y 2 + 2 z 3 + 3 yz 5 . Then we have
Fx ( x, y , z ) = 6 xz 2 xy ,2
Fy ( x, y, z ) = 2 x 2 y + 3z ,
Fz ( x, y , z ) = 3x 2 + 6 z 2 + 3 y.
Hence, we obtain
Fx ( x, y , z )
z
2 xy 2 6 xz
=
=
x
Fz ( x, y , z ) 3 x 2 + 6 z 2 + 3 y
F ( x, y , z )
z
2 x 2 y 3z
= y
= 2
.
y
Fz ( x, y, z ) 3 x + 6 z 2 + 3 y
Directional Derivatives:
Recall that by Chain Rule, if f ( x, y ) is differentiable, then the rate at which f
changes with respect to t along a differentiable curve x = g ( t ) , y = h ( t ) is
df f dx f dy
=
+
.
dt x dt y dt
At any point P0 ( x0 , y0 ) = P0 ( g ( t0 ) , h ( t0 ) ) , this equation gives the instantaneous rate of
change of f with respect to increasing t and therefore depends, among other things, on the
direction of motion along the curve.
Example 1: Suppose the direction of motion is a straight line and s is the arc length

parameter along the line measured from P0 ( x0 , y0 ) in the direction of a given unit vector u .
df
at P0 is the instantaneous rate of change of f with respect to distance in its
ds

domain in the direction of u . By varying u , we find the rates at which f changes with
Then
respect to distance as we move through P0 in different directions.
Example 2: Suppose z = T ( x, y ) gives the temperature at each point ( x, y ) in a region R of

the plane, and let P0 ( x0 , y0 ) be a particular point in R . Then we know that the partial
derivative Tx ( x0 , y0 ) gives the rate at which the temperature changes if we move from P0 in
the x direction, while the rate of temperature change in the y direction is given by
Ty ( x0 , y0 ) . The question is how to find the direction of greatest temperature change, which
may be in a direction not parallel to either of the coordinate axes.
Example 3: While hiking in rugged terrain, we may think of the altitude on a hill side at the
point given by longitude x and latitude y as defining a function f ( x, y ) . If you face due east
(in the direction of the positive x-axis), then we know that the slope of the terrain is given by
the partial derivative f x . Similarly, facing due north, the slope of the terrain is given by
f y . However, how would one compute the slope in some other direction, say north-by-
northwest? Also, how one would find the direction of steepest ascent or descent?
To answer these questions, we now introduce the concept of directional derivative and
understand its geometrical interpretation.
Suppose that we want to find the instantaneous rate of change of f ( x, y ) at the point P ( a , b )

and in the direction given by the unit vector u = u1i + u2 j . Let Q ( x, y ) be any point on the

line through P ( a , b ) in the direction of u (see figure above). Notice that the vector PQ is

then parallel u . Since two vectors are parallel if and only if one is a scalar multiple of the

other, we have that PQ = h u for some scalar h, so that
( x a ) i + ( y b ) j = h u = ( hu1 ) i + ( hu2 ) j .
It then follows that x a = hu1 , y b = hu2 , so that x = a + hu1, y = b + hu2 . The point Q is
then described by ( a + hu1 , b + hu2 ) , as indicated in Figure above. The average rate of change
of z = f ( x, y ) along the line from P to Q is therefore
f ( a + hu1 , b + hu2 ) f ( a, b )
.
h
The instantaneous rate of change of f ( x, y ) at the point P ( a, b ) and in the direction of the

unit vector u is then found by taking the limit as h 0 . We give this limit a special name in
the following definition.
Definition 1: The directional derivative of f ( x, y ) at the point ( a, b ) and in the direction of

the unit vector u = u1i + u2 j is given by
Du f ( a, b ) = lim
h0
provided the limit exists.
f ( a + hu1 , b + hu2 ) f ( a, b )
h
Remarks:
a) Notice that this limit resembles the definition of partial derivative, except that in this
case, both variables may change.
b) Calculating directional derivatives by this definition is similar to finding the
derivative of a function of one variable by the limit process. However, a simpler
working formula for finding directional derivatives involving the partial derivatives
f x and f y is given in Theorem 1 below.
c) At a particular point P ( a, b ) , there are infinitely many directional derivatives to the
function f ( x, y ) , one for each direction radiating from P ( a, b ) . Two of these are the

partial derivatives f x ( a, b ) and f y ( a, b ) . To see this, note that if u = i (so u1 = 1 and
u2 = 0 ), then
Di f ( a, b) = lim
h 0
f ( a + h, b ) f ( a, b )
= f x ( a, b ) ,
h

and if u = j (so u1 = 0 and u2 = 1 ),
D j f ( a, b ) = lim
h 0
f ( a, b + h ) f ( a , b )
= f y ( a, b ) .
h
Theorem 1: (Directional derivatives using partial derivatives)
Let f ( x, y ) be a function that is differentiable at P ( a, b ) . Then f has a directional

derivative in the direction of the unit vector u = u1i + u2 j given by
Du f ( a, b) = f x ( a, b ) u1 + f y ( a, b ) u2
(1)
Proof: We define a function F of a single variable h by F ( h ) = f ( a + hu1 , b + hu2 ) . Then
f ( a + hu1, b + hu2 ) f ( a , b )
h 0
h
F ( h ) F (0)
= lim
= F ( 0) .
h 0
h
Du f ( a , b ) = lim
Writing x = a + hu1 , y = b + hu2 , and applying the Chain rule on F , we obtain

F (h ) =
dF f dx f dy
=
+
= f x ( x, y ) u1 + f y ( x, y ) u2 .
dh x dh y dh
When h = 0 , we have x = a and y = b , so that

Du f ( a , b ) = F ( 0 ) = f x ( a , b ) u1 + f y ( a , b ) u2 .
Geometrical Interpretation of Directional Derivative:

The directional derivative of f ( x, y ) at a point P ( x0 , y0 ) in the domain of f and in the

direction of the unit vector u = u1i + u2 j can be interpreted as a slope of the surface

z = f ( x, y ) at the point P ( x0 , y0 ) in the direction of u . To see this, we reduce the problem
to two dimensions by intersecting the surface with a vertical plane passing through the point

P ( x0 , y0 ) and parallel to u as shown in Figure (b) below.
This vertical plane intersects the surface to form a curve C . The slope of the surface at
( x0 , y0 , f ( x0 , y0 ) ) in the direction of u is defined as the slope of the curve C at
( x , y , f ( x , y ) ) . The vertical plane used to form

0
C intersects the xy plane in a line L ,
represented by the equations x = x0 + tu1, y = y0 + tu2 so that for any value of t , the point
Q ( x, y ) lies on the line L . The points on the surface corresponding to P and Q are
( x , y , f ( x , y ))
0
( x x0 )
and ( x , y , f ( x , y ) ) , respectively. Since the distance between P and Q is

2
+ ( y y0 ) =
( tu1 )
+ ( tu2 ) = t u12 + u22 = t ,

( u is a unit vector)
the slope of the secant line through the points ( x0 , y0 , f ( x0 , y0 ) ) and ( x, y , f ( x , y ) ) is
f ( x, y ) f ( x0 , y0 ) f ( x0 + tu1, y + tu2 ) f ( x0 , y0 )
=
.
t
t
Letting t 0 , we obtain the slope of the tangent line to the curve C at ( x0 , y0 ) . But this is,

by definition, the directional derivative of f at ( x0 , y0 ) in the direction of u :
lim
t 0
f ( x0 + tu1 , y + tu2 ) f ( x0 , y0 )
= Du f ( x0 , y0 ) .
t
Thus, geometrically, the directional derivative in a given direction at any point on the surface
gives the slope of the surface in that direction.
Remarks:

1) Note that since u is a unit vector, we may write u = cos i + sin j , where is the

angle that the vector u makes with the positive x axis. With this notation, any point
on L can be written as x = x0 + t cos , y = y0 + t sin .
2) In order to compute the directional derivative of f at the point ( a , b ) in the direction

of the vector v = v1i + v2 j , where v is not a unit vector, v must be normalized.

That is, consider the unit vector u = v v in the direction of v and then find Du f
using the formula (1) proved in Theorem 1.
Example 1: Find the directional derivative of f ( x, y ) = 3 2 x 2 + y 3 at the point P (1, 2 ) in

the direction of the unit vector u = (1 2 ) i 3 2 j .
(Ans: 2 6 3 )
Example 2: Find the directional derivative of f ( x, y ) = x 2 sin ( 2 y ) at the point P (1, 2 ) in

the direction of the vector v = 3i 4 j .
(Ans: 8 5 )

Example 3: For f ( x, y ) = x 2 y 4 y 3 , compute Du f ( 2,1) where ( i ) u =

and ( ii ) u in the direction from ( 2,1) to ( 4,0 ) .
3 2 i + (1 2 ) j ,
(Ans: 2 3 4 ; 16
5)
Gradient of a function of two variables:

Definition 2: Let z = f ( x, y ) be a function of two variables x and y such that the partial
derivatives f x and f y exist. Then the gradient of f , denoted by f ( x, y ) or grad f ( x, y ) ,

is the vector
f ( x, y ) = f x ( x, y ) i + f y ( x, y ) j .
Note that the gradient f ( x, y ) is a vector in the plane and not a vector in space. Also,
think of as an operator which produces a vector in the plane. We read f as del f .
Example 4: The gradient of f ( x, y ) = x 2 y + y 3 is
f ( x , y ) =
x y + y 3 ) i + ( x 2 y + y 3 ) j = ( 2 xy ) i + ( x 2 + 3 y 2 ) j .
(
x
y
Example 5: Find the gradient of f ( x, y ) = y ln x + xy 2 at the point (1, 2 ) .
(Ans: 6i + 4 j )
Theorem 2: (The gradient formula for the directional derivative)
If f is a differentiable function of x and y , then the directional derivative of f at the point

P ( a , b ) in the direction of the unit vector u is

Du f ( a, b ) = f ( a , b ) u .

Proof: Since f ( x, y ) = f x ( x, y ) i + f y ( x, y ) j for all x and y , and u = u1i + u2 j , we have

f ( a, b ) u = f x ( a , b ) i + f y ( a , b ) j u1i + u2 j
= f x ( a , b ) u1 + f y ( a , b ) u2
= Du f ( a , b ) ,
using equation (1) in Theorem 1.

Example 6: Find the directional derivative of f ( x, y ) = ln ( x 2 + y 3 ) at P (1, 3) in the

direction of v = 2i 3 j .
(Ans: 77 13 338 )
Example 7: Find the directional derivative of f ( x , y ) = 3 x 2 2 y 2 at ( 3 4,0 ) in the

direction of PQ where P = ( 3 4, 0 ) and Q = ( 0,1) .
(Ans: 27 10 )
Basic properties of gradient:
Let f and g be differentiable functions of x and y . Then
b) Linearity Rule
c = 0 for any constant c .

( af + bg ) = af + bg .
c) Product Rule
( fg ) = f g + g f .
d) Quotient Rule
f g f f g
=
, ( g 0) .
g2
g
e) Power Rule
f n = nf n1f .
a) Constant Rule
Proof: (a) Since
( c ) = 0 and ( c ) = 0 , we have c = 0 for any constant c .

x
y
(b) Using the linearity rule of partial (ordinary) derivatives, we get

( af + bg ) =
( af + bg ) i + ( af + bg ) j = ( af x + bg x ) i + ( af y + bg y ) j
x
y
) (
= a f x i + f y j + b g x i + g y j = af + bg .
(c) Using the product rule of partial (ordinary) derivatives, we have
( fg ) =
( fg ) i + ( g ) j = ( fg x + f x g ) i + ( fg y + f y g ) j
x
y
) (
= f g x i + g y j + g f xi + f y j = f g + g f .
(d) Using the quotient rule of partial (ordinary) derivatives, we obtain for g 0 ,
f f
f
= i +
g x g y g
g f x i + f y j f
=
g2
gf x fg x gf y fg y
j =
i +
g2
g2

g xi + g y j
g f f g
=
.
g2
(e) Using power rule of partial (ordinary) derivatives, we have

f n =
f n ) i + ( f n ) j = nf n 1 f xi + nf n 1 f y j = nf n 1 f x i + f y j = nf n 1f .
(
x
y
Theorem 3: (Maximal Direction Property of the Gradient) Let f ( x, y ) be differentiable
function of x and y . Let ( a , b ) be any point in the domain of f . Then

1) If f ( a , b ) = 0, then Du f ( a , b ) = 0 for any direction u .
2) If f ( a , b ) 0 , then
( a, b) is f ( a, b ) . The
maximum rate of change of f at ( a , b ) is f ( a , b ) .
b) The direction of minimum increase of f at ( a , b ) is f ( a , b ) . The
minimum rate of change of f at ( a , b ) is f ( a , b ) .
The rate of change of f at ( a , b ) is 0 in the directions orthogonal to f ( a , b ) .
The gradient f ( a , b ) is orthogonal to the level curve f ( x, y ) = c at the point ( a , b ) ,
where c = f ( a , b ) .
a) The direction of maximum increase of f at
3)
4)

Proof: 1) Given that f ( a , b ) = 0 , we have Du f ( a, b ) = f ( a, b ) u = 0 u = 0 for any

direction u.
2) Let f ( a , b ) 0 . Then
(a) Using Theorem 2, we have

Du f ( a, b ) = f ( a, b ) u = f ( a, b ) u cos = f ( a , b ) cos .

where is the angle between the gradient vector at ( a, b ) and the direction vector u .
Now, f ( a , b ) cos has its maximum value when cos assumes its largest value,
that is, when = 0 , so that cos = 1 . Thus, the direction of maximum increase of f

at ( a , b ) is attained when the angle between f ( a, b ) and u is 0. In other words,

f ( a, b ) is in the same direction as u . Hence, the direction of maximum increase of
f ( a, b )
, and the largest value of Du f ( a, b ) is f ( a , b ) .
f at ( a , b ) is u =
f ( a, b )
(b) As in part (a), the minimum value of Du f ( a, b ) occurs when cos = 1 and

= . This value occurs when u points towards f ( a , b ) , and in this direction
Du f = f ( a , b ) ( 1) = f ( a, b ) .

3) Since Du f ( a, b ) = f ( a , b ) u and f ( a , b ) u = 0 if and only if f ( a , b ) is

orthogonal to u , we get that Du f ( a , b ) = 0 in the directions orthogonal to f ( a , b ) .

4) Let the vector equation of the level curve be r ( t ) = x ( t ) i + y ( t ) j , where t is a
parameter. Since f ( x ( t ) , y ( t ) ) = c for every t , we differentiate it with respect to t

and apply Chain rule to obtain
d
f dx f dy

f ( x ( t ) , y ( t ) ) = 0
+ = 0 f r ( t ) = 0 .
x dt y dt
dt
This implies that f is normal to velocity vector r ( t ) for every point on the level

curve. But r ( t ) is tangent to the level curve at every point, so that f is normal to
the level curve at every point. In particular, f ( a , b ) is orthogonal to the level curve
at the point ( a, b ) .
Example 8: In what direction is the function defined by f ( x, y ) = xe 2 y x increasing most
rapidly at the point P ( 2,1) , and what is the maximum rate of increase? In what direction is
f decreasing most rapidly?

Solution: By the preceding theorem, the gradient of f at P provides the required answer.
So, we first find the gradient of f :
f = f xi + f y j = e2 y x + xe2 y x ( 1) i + xe2 y x ( 2 ) j
= e2 y x (1 x ) i + 2 xj .
Now, at the point P ( 2,1) , we have f ( 2,1) = i + 4 j . Thus, the most rapid rate of increase
is f ( 2,1) = 17 and it occurs in the direction of i + 4 j . The most rapid rate of decrease
therefore occurs in the direction f ( 2,1) = i 4 j.
Example 9: Find the direction of maximum increase on the contour plot of the surface given
by f ( x, y ) = 3x x 3 3xy 2 from the point P ( 0.6, 0.7) and sketch the path of steepest
ascent.
Solution: The direction of maximum increase at P is given by f ( 0.6, 0.7 ) . We have

f =< 3 3x 2 3 y 2 , 6 xy > ,
so that f ( 0.6, 0.7) =< 0.45,2.52 > . The unit vector in this direction is then

u =< 0.176,0.984 > , which is the direction of maximum increase. Note that the path of
steepest ascent is a curve that remains perpendicular to each level curve through which it

passes. Finding the path of steepest ascent is a difficult one. The unit vector u does not point
towards the maximum (the peak of the surface) at (1,0 ) . However, a plausible path of
steepest ascent is shown in the figure below:
Remarks: The directional derivative and gradient concepts can easily be extended to
functions of three or more variables. The basic properties of the gradient and maximal
property is valid for function or three or more variables. For example, for a function
f
f f
j + k , and the
f ( x, y , z ) of three variables, the gradient f is defined by f = i +
x
y
z
directional derivative Du f of f ( x, y , z ) at the point P0 ( x0 , y0 , z0 ) in the direction of the unit

vector u is given by

Du f ( x0 , y0 , zo ) = f ( x0 , y0 , zo ) u .
Example10: Let f ( x , y , z ) = xy sin ( xz ) . Find f (1, 2, ) and then compute the directional

derivative of f at (1, 2, ) in the direction of the vector v = 2i + 3 j 5k .
Solution: By definition, we have f = f xi + f y j + f z k . We now compute the partial
derivatives at (1, 2, ) :
( xy sin ( xz ) ) = y sin ( xz ) + xy ( z cos ( xz ) ) f x (1, 2, ) = 2 ;

x
f y = ( xy sin ( xz ) ) = x sin ( xz )
f y (1, 2, ) = 0;
y
f z = ( xy sin ( xz ) ) = xy cos ( xz ) x
f z (1, 2, ) = 2.
z
fx =
Thus, the gradient of f at (1, 2, ) is f (1, 2, ) = 2 i + 2k . To find Dv f , we first need

to normalize v to get the unit vector u in the direction of v :

v
u= =
v
2i + 3 j 5k
( 2 )
+ ( 3) + ( 5)
1
2i + 3 j 5k ,
38
and then note that Dv f = Du f . Thus, we have

1
1
Du f (1, 2, ) = f (1, 2, ) u = 2 i + 2k
2i + 3 j 5k =
( 4 10) 3.66.
38
38
Theorem 4: (Normal Property of the Gradient) Suppose the function f is differentiable at

the point P ( a, b, c ) and that the gradient at P satisfies f ( a, b, c ) 0 . Then f ( a , b, c ) is
orthogonal to the level surface of f through P .

Proof: Let C be any smooth curve on the level surface f ( x, y , z ) = k that passes through P

and has the vector equation R ( t ) = x ( t ) i + y ( t ) j + z ( t ) k for all t in some interval I . Since
C lies on the level surface, any point
( x ( t ) , y (t ) , z (t ))
on the curve C must satisfy the
equation f ( x ( t ) , y ( t ) , z ( t ) ) = k . Differentiating this equation with respect to t , we obtain
d
f ( x ( t ) , y ( t ) , z ( t ) ) = 0 .
dt
Applying the Chain rule on left hand side, we get that for all t I ,
f x ( x (t ) , y (t ) , z (t ) )
dx
dy
dz
+ f y ( x (t ) , y (t ) , z ( t ) ) + fz ( x (t ) , y (t ) , z ( t ) ) = 0
dt
dt
dt
or

dR
f ( x ( t ) , y ( t ) , z ( t ) )
= 0.
dt
In particular, at the point P ( a, b, c ) , we have

dR
=0
f ( a, b, c )
dt ( a ,b,c )
...(1) .

dR
0,
But since the curve is smooth, the velocity vector to the curve C at P , namely,
dt ( a ,b ,c )

and it is given that f ( a, b, c ) 0 . Therefore, from (1), we must have that the vector

f ( a, b, c ) is normal to the vector dR dt
. That is, f ( a, b, c ) is orthogonal to the level
( a ,b ,c )
surface of f through P ( a, b, c ) .
Remarks: The preceding theorem implies that the gradient vector at a point P ( a, b, c ) on the

level surface is normal to the tangent vector T = dR dt on each curve C on the surface that
passes through P . Thus, all these tangent vectors lie in a single plane through P with normal

vector N = f ( a, b, c ) . This plane is the tangent plane to the surface at P . Note that since
P is an arbitrary point on the level surface, there is a unique tangent plane at every point on
the level surface of f where f 0 .
Example 11: Find a vector that is normal to the level surface x 2 + 2 xy yz + 3z 2 = 7 at the
point P (1,1, 1) .
Solution: By the preceding theorem, the gradient vector f (1,1, 1) is normal to the level
surface f ( x, y, z ) = 7 , where f ( x, y , z ) = x 2 + 2 xy yz + 3z 2 . First, we have

f = f xi + f y j + f z k = ( 2 x + 2 y ) i + ( 2 x z ) j + ( 6 z y ) k .
At the point (1,1, 1) , we have, therefore, f (1,1, 1) = 4i + 3 j 7k as the required normal.
Example 12: Sketch the level curve f ( x, y ) = c corresponding to c = 1 for the function
f ( x, y ) = x 2 y 2 and find a normal vector at the point P 2, 3 .
Solution: The level curve for c = 1 is the hyperbola given by x 2 y 2 = 1 (trace it). The
gradient vector is normal to the level curve, by Theorem 3 part (4). We have
f = f xi + f y j = 2 xi 2 yj .
Thus, at the point 2, 3 , we get that f 2, 3 = 4i 2 3 j is the required normal.

Example 13: The set of points ( x, y ) with 0 x 5 and 0 y 5 is a square in the first
quadrant of the xy plane. Suppose this square is heated in such a way that the temperature
at the point P ( x, y ) is given by T ( x, y ) = x 2 + y 2 . In what direction will heat flow from the
point P ( 3, 4 ) ?
Solution: The level curves of T are called isothermal curves. From physics it is known that
the flow of heat is perpendicular to the isothermal curves, and points in the direction of
decreasing temperature. Thus, if H ( x, y ) denotes the heat flow at a point ( x, y ) in the
region, then we can express the heat flow as
H ( x, y ) = k T ( x, y )
...(2) .
where k is a positive constant (called thermal conductivity). Since T ( 3, 4 ) = 25 , the point
P ( 3, 4 ) lies on the isotherm T ( x, y ) = 25 , which is part of the circle x 2 + y 2 = 25 . Because

T ( x, y ) = 2 xi + 2 yj , we get that T ( 3, 4 ) = 6i + 8 j . Using the heat flow equation (2), we
have that the heat flow at P ( 3, 4 ) satisfies
H ( 3, 4 ) = k 6i + 8 j .
Because the thermal conductivity k is positive, the heat flows from P ( 3, 4 ) in the direction

of the unit vector u given by

u=
6i + 8 j
( 6 )
+ ( 8 )
( 6) i + ( 8)
100
j = 3 i + 4 j.
100
5 5
Example 14: Find equations for the tangent plane and the normal line at the point P (1, 1, 2 )
on the surface S given by x 2 y + y 2 z + z 2 x = 5 .

Solution: The given surface S can be written as the level surface F ( x, y , z ) = 5 , where
F ( x, y, z ) = x 2 y + y 2 z + z 2 x .
The gradient vector F is normal to the level surface S at P (1, 1, 2 ) . We have
F ( x, y , z ) = ( 2 xy + z 2 ) i + ( x 2 + 2 yz ) j + ( y 2 + 2 xz ) k .

Thus, a normal vector at P (1, 1, 2 ) is N = F (1, 1, 2 ) = 2i 3 j + 5k . Hence, if Q ( x, y , z )
is any point on the plane, then equation for the tangent plane passing through P (1, 1, 2 ) and

with normal vector N is given by QP N = 0 , or equivalently
2 ( x 1) 3 ( y + 1) + 5 ( z 2 ) = 0
or
2 x 3 y + 5 z = 15.
The normal line to the surface at P (1, 1, 2 ) with direction numbers < 2, 3,5 > is
x = 1 + 2t ,
y = 1 3t,
z = 2 + 5t ,
t .
Example 15: Find the equation for the tangent plane and the normal line to the cone
z2 = x2 + y 2
at the point where x = 3, y = 4, and z > 0.

Solution: At the point ( 3,4 ) , since z > 0 , we have z = 32 + 4 2 = 25 = 5 . If we write
F ( x, y, z ) = x 2 + y 2 z 2 , then the cone can be regarded as the level surface F ( x, y , z ) = 0 .

The partial derivatives of F are
Fx ( x, y , z ) = 2 x, Fy ( x, y, z ) = 2 y , Fz ( x, y , z ) = 2 z.
Therefore, the direction numbers of the normal to the cone at ( 3, 4,5 ) are
Fx ( 3, 4,5) = 6, Fy ( 3, 4,5) = 8, Fz ( 3,4,5) = 10.

Thus the tangent plane to the cone at ( 3, 4,5) has the equation
6 ( x 3) + 8 ( y 4 ) 10 ( z 5) = 0 3 x + 4 y 5 z = 0 .
The normal line is given parametrically by the equations
x = 3 + 6t ,
y = 4 + 8t , z = 5 10t , t .
Extrema of functions of two variables:

Definition: (Absolute Extrema)
(1) We call f (a, b) the absolute maximum of f on the region R if f (a, b) f (x, y) for
all (x, y) R.
(2) We call f (a, b) is called the absolute minimum of f on R if f (a, b) f (x, y) for all
(x, y) R.
(3) In either case (1) or (2), f (a, b) is called an absolute extremum of f.
Theorem 1: (Extreme value theorem)

Let f be a continuous function of two variables x and y defined on a closed and bounded
region R in the xy plane.
a) There is at least one point in R where f takes on a minimum value.
b) There is at least one point in R where f takes on a maximum value.
Definition: (Relative Extrema)

Let be a function defined on a region R containing ( a, b )
1) The function has a relative minimum at ( a, b ) if f ( x, y ) f ( a, b ) for all ( x, y ) in
an open disk in R containing ( a, b) .
2) The function has a relative maximum at ( a, b) if f ( x, y ) f ( a, b ) for all ( x, y ) in
an open disk in R containing ( a, b) .
The figure given below shows peaks and valleys on the graph of a function f which are
respectively the points of relative maximum and minimum of f
Relative Extrema
Local Maximum
Local Minimum
Definition: (Critical Point)

Let f be defined on an open region R containing the point ( x0 , y0 ) . The point ( x0 , y0 ) is a
critical point of f if one of the following is true:

1)
f x ( x0 , y0 ) = 0 and f y ( x0 , y0 ) = 0 .
2)
f x ( x0 , y0 ) or f y ( x0 , y0 ) does not exist.
Definition: (Saddle Point)

A critical point ( x0 , y0 ) is called a saddle point of f if every open disk centered at ( x0 , y0 )
contains points in the domain of f that satisfy f ( x, y ) > f ( x0 , y0 ) as well as points in the
domain of f that satisfy f ( x, y ) < f ( x0 , y0 ) .
Remarks: Note that if f is differentiable at a critical point ( x0 , y0 ) , then
f ( x0 , y0 ) = f x ( x0 , y0 ) i + f y ( x0 , y0 ) j = 0i + 0 j .
Therefore, every directional derivative of f at ( x0 , y0 ) must be 0. This implies that the graph
of the function f has a horizontal tangent plane at the point ( x0 , y0 , f ( x0 , y0 ) ) . Since some
critical points yields saddle points, ( x0 , y0 ) is a possible location for relative extremum. See
figures below:
Theorem 2: Let f be a function of two variables x and y defined on an open region R

containing the point ( x0 , y0 ) . If f has a relative extremum at ( x0 , y0 ) and partial derivatives
f x and f y both exist at ( x0 , y0 ) , then
f x ( x0 , y0 ) = f y ( x0 , y0 ) = 0 .
Proof: Let F ( x ) = f ( x, y0 ) . Then F ( x ) must have a relative extremum at x = x0 , so that
F ( x0 ) = 0 . This implies that f x ( x0 , y0 ) = 0 . Similarly, if G ( y ) = f ( x0 , y ) has a relative

extremum at y = y0 , then G ( y0 ) = 0 which implies that f y ( x0 , y0 ) = 0 . Thus, we must have
both f x ( x0 , y0 ) = 0 and f y ( x0 , y0 ) = 0 .
Remarks: The preceding theorem implies that there is a horizontal plane at each extreme
point where the first partial derivatives exist. However, the theorem does not say that
whenever a horizontal tangent plane occurs at a point P , there must be an extremum there.
All that we can deduce is that such a point P is a possible location for a relative extremum.
Example 1: Discuss the nature of the critical ( 0,0) for the quadric surfaces
a) z = x 2 + y 2 .
b) z = 1 x 2 y 2 .
Solution: The graphs of the quadric surfaces are shown below.
Let f ( x, y ) = x 2 + y 2 , g ( x, y ) = 1 x 2 y 2 . A point ( a, b ) is a critical point of a function
F ( x, y ) if Fx ( a, b ) = 0 and Fy ( a, b ) = 0 . We use these equations to find the critical points

of f , g and discuss their nature:
a)
f x ( x, y ) = 2 x , and f y ( x, y ) = 2 y . By definition of a critical point, we must have

2 x = 0, 2 y = 0 ( x, y ) = ( 0, 0) . Thus ( 0,0) is the only critical point. The function
f has a relative minimum at ( 0,0) because x 2 + y 2 > 0 for all nonzero x and y .
b) g x ( x, y ) = 2 x, g y ( x, y ) = 2 y , so again
( 0, 0)
is the only critical point. Since
z = 1 x 2 y 2 = 1 ( x 2 + y 2 ) , and x 2 + y 2 0 for all x and y , it follows that z 1
for all x and y with a relative maximum z = 1 occurring when ( x, y ) = ( 0,0) , that is
the point ( 0,0) is the point of relative maximum.
Example 2: Determine the critical points of h ( x, y ) = y 2 x 2 and discuss its nature.

Solution: The graph of the quadric surface z = y 2 x 2 is shown below.
Since hx ( x, y ) = 2 x, hy ( x, y ) = 2 y , and 2 x = 0 = 2 y implies that ( 0,0) is the critical point

of h . At ( 0, 0) , we have z = 0 . But when z = 0 , h has a minimum and a maximum in every
open disk D centered at ( 0, 0) . In fact, h has a minimum in D when y = 0 (on the x axis)
and a maximum when x = 0 (on the y axis). Thus, the function h has neither a relative
maximum nor a relative minimum at
( 0,0) ,
and hence
hyperbolic paraboloid h ( x, y ) = y 2 x 2 .
Example 3: Determine the relative extrema of

a)
f ( x, y ) = 2 x 2 + y 2 + 8 x 6 y + 20 .
13
b) g ( x , y ) = 1 ( x 2 + y 2 )
Solution: The graphs of quadric surfaces are shown below:
(a)
(b)
( 0, 0)
is a saddle point of the
We begin by finding the critical points of f and g . The critical points of f and g are
found using their partial derivatives. The partial derivatives of f and g are
f x ( x, y ) = 4 x + 8,
g x ( x, y ) =
f y ( x, y ) = 2 y 6;
2 x
3( x + y
2
2 23
, g y ( x, y ) =
2 y
3( x + y 2 )
2
23
(a) f x ( x, y ) = 4 x + 8, f y ( x, y ) = 2 y 6 is defined for all x and y . Therefore, the only

critical points of f are those for which both partials f x and f y are 0. Thus, to locate
these points, we solve the equations 4 x + 8 = 0, 2 y 6 = 0 simultaneously. This gives
us that x = 2, y = 3 which implies that
( 2,3)
is the only critical point. Since
f ( 2, 3) = 3 and f ( x, y ) = 2 ( x + 2 ) + ( y 3) + 3 > 3 for all
( x, y ) ( 2,3) ,
we
conclude that a relative minimum of f occurs at ( 2,3) , and the value of the
relative minimum is f ( 2,3) = 3 .
(b) Both partial derivatives g x and g y exist for all x and y except for ( 0,0 ) . Since the
critical points are those points for which one of the partials must not exist or both
must be 0, we have that the critical points are the solutions of the equations
g x ( x, y ) = 0 = g y ( x, y ) . Solving these equations, we get that ( 0,0 ) is the only critical
point of g . Since g ( 0,0 ) = 1 and
13
g ( x, y ) = 1 ( x 2 + y 2 ) < 1 ( x, y ) 2 ,
we find that g has a relative maximum at ( 0,0 ) .
Second Partials Test:
In the preceding examples, it was relatively easy to find the relative extrema of given
functions or determine when a critical point of the given function is a saddle point. The
arguments were algebraic. However, for more complicated functions, it is better to rely on
analytical means presented in following Second partials test:
Theorem 3: Let f have continuous second order partial derivatives on an open region
containing a point ( x0 , y0 ) for which
f x ( x0 , y0 ) = 0 and f y ( x0 , y0 ) = 0 .
To test for relative extrema of f at ( x0 , y0 ) , consider the quantity, called discriminant of f ,
defined by the equation
D ( x0 , y0 ) = f xx ( x0 , y0 ) f y ( x0 , y0 ) f xy2 ( x0 , y0 ) .
( x0 , y0 ) if D ( x0 , y0 ) > 0 and
equivalently, D ( x0 , y0 ) > 0 and f yy ( x0 , y0 ) < 0 ).
(2) A relative minimum occurs at ( x0 , y0 ) if D ( x0 , y0 ) > 0 and
equivalently, D ( x0 , y0 ) > 0 and f yy ( x0 , y0 ) > 0 ).
(3) A saddle point occurs at ( x0 , y0 ) if D ( x0 , y0 ) < 0 .
(4) The test is inconclusive if D ( x0 , y0 ) = 0 .
(1) A relative maximum occurs at
f xx ( x0 , y0 ) < 0 (or
f xx ( x0 , y0 ) > 0 (or
Example 4: Find the relative extrema of

(a) f ( x, y ) = x 3 + 4 xy 2 y 2 + 1 .
(b) g ( x, y ) = x 2 y 2 .
Solution: (a) We first find the critical points of f . Because the partial derivatives
f x ( x, y ) = 3x 2 + 4 y and f y ( x, y ) = 4 x 4 y
both exist for all x and y , the only critical points are those for which both f x ( x, y ) and
f y ( x, y ) are zero. Thus, we obtain 3 x 2 + 4 y = 0, 4 x 4 y = 0 which implies that x = y = 0

and y = x = 4 3 . That is, the critical points of f are ( 0,0) and ( 4 3, 4 3) .
To test these points for relative extrema, we now compute the second partials:
f xx ( x, y ) = 6 x, f yy ( x, y ) = 4, f xy ( x, y ) = 4.
For the critical point ( 0,0 ) , we have f ( 0,0) = 1 , and the discriminant of f
2
D ( 0,0 ) = f xx ( 0,0 ) f yy ( 0, 0 ) f xy ( 0, 0 ) = 0 16 < 0 ,

so, by the second partials test, ( 0,0,1) is a saddle point of f . Furthermore, for the critical
point ( 4 3, 4 3) , we have f xx ( 4 3, 4 3) = 8 < 0 , and the discriminant of f
D ( 4 3,4 3) = f xx ( 4 3,4 3) f yy ( 4 3, 4 3) f xy ( 4 3, 4 3) = ( 8)( 4 ) 16 = 16 > 0 .

So, by second partials test, we get that at ( 4 3, 4 3) , f has a relative maximum.
(b) Since both the partial derivatives f x ( x, y ) = 2 xy 2 , f y ( x, y ) = 2 x 2 y exist for all x
and y , and both are zero if x = 0 or y = 0 , we get that every point along the x axis
or y axis is a critical point of g . Now, to know the nature of these critical points,
we find the second partials of g :
g xx ( x, y ) = 2 y 2 , g yy ( x, y ) = 2 x 2 , g xy ( x, y ) = 4 xy .
If either x = 0 or y = 0 , then
2
D ( x , y ) = f xx ( x , y ) f yy ( x , y ) f xy ( x , y ) = 4 x 2 y 2 16 x 2 y 2 = 12 x 2 y 2 = 0 .
So, the second partials test fails. However, because f ( x, y ) = 0 for every point along
x axis or y axis, and f ( x, y ) = x 2 y 2 > 0 for all other points, we conclude that
each of these critical points is yields an absolute minimum. (See figure below)
Example 5: Find the absolute extrema of the function f ( x, y ) = sin ( xy ) on the closed
region given by 0 x , 0 y 1.
Solution: From the partial derivatives
f x ( x, y ) = y cos ( xy ) , f y ( x, y ) = x cos ( xy ) ,
we obtain that the critical points in the domain of f are ( 0,0 ) and each point lying in the
domain and on the hyperbola given by xy = 2 . Since every point on the hyperbola
xy = 2 yields the value
f ( x, y ) = sin ( 2 ) = 1 ,
which we know is the absolute maximum of f , we get that all the critical points lying on the
hyperbola xy = 2 are points of absolute maximum.
The critical point ( 0,0) yields an absolute minimum of f because for every point in the
domain 0 x , 0 y 1, we have 0 xy which implies that
0 sin ( xy ) 1 0 f ( x, y ) 1 .
To locate other absolute extrema, we consider the points of the boundary of the domain:
x = 0, x = , y = 0, y = 1 . Since sin ( xy ) = 0 for all the points on the x axis, y axis and at
the point ( ,1) , we get that each of these points yields an absolute minimum of f . (See
figure below)
Example 6: Find all critical points on the graph of the following functions and classify each
point as a relative extremum or a saddle point:
(1) f ( x, y ) = 8 x 3 24 xy + y 3 .
(2) g ( x , y ) = x 2 y 4 .
(Try on your own and draw the rough sketch of the surface in each case.)
Example 7: Find the absolute extrema of the function h ( x , y ) = e x
x2 + y 2 1 .
y2
over the disk
Solution: Since the partial derivatives hx ( x, y ) = 2 xe x
y2
and h y ( x, y ) = 2 ye x
y2
exist for
all the points ( x, y ) , and both are zero only when x = 0 and y = 0 , it follows that ( 0,0 ) is
the only critical point of h and it is inside the unit disk x 2 + y 2 1 . Also, h ( 0,0 ) = e0 = 1 .
But the discriminant of h at
( 0,0)
is D ( 0,0 ) = 0 , so that the second partial test is
inconclusive about the nature of the critical point ( 0,0 ) .

Now we examine the values of h on the boundary curve x 2 + y 2 = 1 . On this
boundary curve, y 2 = 1 x 2 so we obtain h ( x, y ) = e
x 2 1 x 2
= e2 x
. In order to find the
absolute extrema of h , we need to find the largest and smallest values of

F ( x ) = e2 x
Since F ( x ) = 4 xe2 x
( 0, 1)
for 1 x 1.
= 0 only when x = 0 , and at x = 0 , we have y = 1 , so that ( 0,1) and
are boundary critical points. At the endpoints of the interval
corresponding points are (1,0 ) and
( 1,0 ) ,
[ 1,1] ,
the
which are also possible critical points of h .
Thus, by computing the values of h for all the above critical points:
h ( 0,1) = e 1 , h ( 0, 1) = e 1, h (1, 0 ) = e, h ( 1,0 ) = e ,
we find that the absolute maximum value of h on the given unit disk is e , which occurs at
(1, 0 )
and ( 1,0 ) ; the absolute minimum value is e1 , which occurs at ( 0,1) and ( 0, 1) .
Example 8: Find the point on the plane x + 2 y + z = 5 that is closest to the point P ( 0, 3, 4 ) .
Solution: Let Q ( x, y , z ) be any point on the plane x + 2 y + z = 5 . Then z = 5 x 2 y and
2
the distance from P to Q is d = x 2 + ( y 3) + ( 5 x 2 y 4 ) . Since the minimum value
of d will occur at the same points where d 2 is minimized, we minimize d 2 = f ( x, y ) where

2
f ( x, y ) = x 2 + ( y 3) + (1 x 2 y ) .
First, we find the critical points of f by solving the system of equations

f x ( x, y ) = 2 x 2 (1 x 2 y ) = 4 x + 4 y 2 = 0
f y ( x, y ) = 2 ( y 3) 4 (1 x 2 y ) = 4 x + 10 y 10 = 0.
Thus, we obtain x = 5 6, y = 4 3 . Since
f xx = 4, f yy = 10, f xy = 4, we find that the
discriminant of f at the critical point ( 5 6, 4 3) is
D ( 5 6 ,4 3) = f xx ( 5 6, 4 3) f yy ( 5 6, 4 3) f xy ( 5 6, 4 3) = 4 (10 ) 16 > 0 ,
and f xx ( 5 6,4 3) = 4 > 0 , so a relative minimum occurs at
( 5 6, 4 3) .
Note that this
relative minimum must also be an absolute minimum because there must be exactly one point
on the plane that is closest to the given point. The corresponding z value is
z = 5 ( 5 6) 2 ( 4 3) = 19 6 . Thus, the closest point on the plane is Q ( 5 3,4 3,19 6) and

the minimum distance is
2
25
5
5 4
5
4
=
.
d = + 3 + 1 + 2 =
6
6
6 3
6
3
Example 9: A rectangular box is resting on the xy plane with one vertex at the origin. The
opposite vertex lies on the plane 6 x + 4 y + 3z = 24 . Find the maximum volume of the box.
See figure below.
Solution: Let x, y , z represent the length, width and height of the box, respectively. Because
one vertex of the box lies at the origin and the vertex opposite to this vertex lies on the plane
6 x + 4 y + 3z = 24 , the point ( x, y, z ) must satisfy the equation of the this plane. Thus, we get
z=
1
( 24 6 x 4 y ) .
3
Thus, we can write the volume V = xyz of the box as a function of two variables as follows:
1
1
V ( x, y ) = xy ( 24 6 x 4 y ) = ( 24 xy 6 x 2 y 4 xy 2 ) ,
3
3
where the domain of V is the triangle in the xy plane bounded by the lines x = 0, y = 0
and 3 x + 2 y 12 = 0 . By setting the first partial derivatives equal to 0,
y
1
24 y 12 xy 4 y 2 ) = ( 24 12 x 4 y ) = 0
(
3
3
1
x
V y ( x, y ) = ( 24 x 6 x 2 8 xy ) = ( 24 6 x 8 y ) = 0
3
3
Vx ( x , y ) =
we obtain that the critical points are ( 0,0 ) and ( 4 3, 2 ) . At ( 0,0 ) , the volume is 0 so that
( 4 3, 2 ) , we apply the second partial
point does not yield a maximum value. At the point

test:
Vxx = 4 y,
Because
V yy =
8 x
,
3
Vxy =
1
( 24 12 x 8 y ) .
3
Vxx ( 4 3, 2 )V yy ( 4 3, 2 ) Vxy ( 4 3,2 ) = ( 8)( 32 9 ) ( 8 3) = 64 3 > 0 ,
and
Vxx ( 4 3,2 ) = 8 < 0 , we conclude that the maximum volume of the box occurs at
( x, y ) = ( 4 3,2 )
and the maximum volume is
2
1 4
4
4 2 64
V ( 4 3, 2 ) = 24 2 6 2 4 ( 2 ) =
cu units.
3 3
3
3
9
Note that the volume is 0 at the boundary points of the triangular domain of V.
Example 10: An electronics manufacturer determines that the profit (in dollars) obtained by
producing x units of a DVD player and y units of a DVD recorder is approximated by the
model P ( x, y ) = 8 x + 10 y 0.001 ( x 2 + xy + y 2 ) 10, 000 . Find the production level that

produces a maximum profit. What is the maximum profit?
Solution: The partial derivatives of the profit function are
Px ( x, y ) = 8 0.001( 2 x + y ) and Py ( x, y ) = 10 0.001( x + 2 y ) .
By setting these partial derivatives equal to 0, and after simplifying, we obtain the following
system of equations:
2 x + y = 8000
x + 2 y = 10,000.
Solving this system produces x = 2000 and y = 4000 . The second partial derivatives of P at
( x, y ) = ( 2000,4000)
are
Pxx ( 2000,4000 ) = 0.002
Pyy ( 2000, 4000 ) = 0.002
Pxy ( 2000, 4000 ) = 0.001.
Because Pxx ( 2000,4000) < 0 and

2
Pxx ( 2000, 4000 ) Pyy ( 2000,4000 ) Pxy ( 2000, 4000 ) = ( 0.002 ) ( 0.001) > 0 .
we conclude that the production level of x = 2000 units and y = 4000 units yields a maximum
profit. The maximum profit is P ( 2000, 4000 ) = 18,000 , that is $18,000.
Remarks: Note that in the preceding example, it was assumed that the manufacturing plant
is able to produce the required number of units to yield a maximum profit. In actual practice,
the production would be bounded by physical constraints. We will next study such
constrained optimization problems and learn how deal with them by a very ingenious
technique called the method of Lagrange Multiplier.
The Method of Lagrange Multipliers

Many optimization problems may have restrictions, or constraints, on the values that
can be used to produce the optimal solution. Such constraints tend to complicate optimization
problems because the optimal solution can occur at a boundary point of the domain. For
solving such problems, we use the Method of Lagrange Multipliers.
To motivate the method of Lagrange multiplier, suppose that we are trying to
maximize a function f ( x, y ) subject to the constraint g ( x, y ) = 0 . Geometrically, this means
that we are looking for a point ( x0 , y0 ) in the domain of f and on the graph of the constraint
curve g ( x, y ) = 0 at which f ( x, y ) is as large as possible. To help locate such a point, let us
construct all the level curves of f ( x, y ) in the same coordinate system as the graph of
g ( x, y ) = 0 .
In figure (a), each point of intersection of g ( x, y ) = 0 with a level curve is a candidate for a
solution, since these points lie on the constraint curve. Among the seven such intersections
shown in the figure, the maximum value of f ( x, y ) occurs at the intersection ( x0 , y0 ) where
f ( x, y ) has a value of 400. Note that at ( x0 , y0 ) , the constraint curve and the level curve just
touch and thus have a common tangent line at this point. Since f ( x0 , y0 ) is normal to the
level curve f ( x, y ) = 400 at ( x0 , y0 ) , and since g ( x0 , y0 ) is normal to the constraint curve
g ( x, y ) = 0 at ( x0 , y0 ) , we conclude that the vectors f ( x0 , y0 ) and g ( x0 , y0 ) must be

parallel. That is,
f ( x0 , y0 ) = g ( x0 , y0 )
(*)
for some scalar . The same condition holds at points on the constraint curve where f ( x, y )
has a minimum. For example, if the level curves are as shown in the figure (b), then the
minimum value of f ( x, y ) occurs where the constraint curve just touches a level curve.
Thus, to find the maximum or minimum of f ( x, y ) subject to the constraint g ( x, y ) = 0 , we
look for points at which (*) equation holdsthis is the method of Lagrange multipliers. The
scalar is called a Lagrange multiplier.
To see how this method works, let us look at the following problem. Suppose we want
to find the greatest and the smallest values that the function
f ( x, y ) = xy
takes on the ellipse
x2 y2
+
= 1.
8
2
That is, to find the extreme values of f ( x, y ) = xy subject to the constraint
g ( x, y ) =
x2 y 2
+
1 = 0 .
8
2
The level curves of the function f ( x, y ) = xy are the hyperbolas xy = c , where c is a

constant (see figure below).
The farther the hyperbolas lie from the origin, the larger the absolute value of f . We want
to find the extreme values of f ( x, y ) given that the point ( x, y ) is in the domain of f and
also lies on the ellipse x 2 + 4 y 2 = 8 . Which hyperbolas intersecting the ellipse lie farthest
from the origin? The hyperbolas that just graze the ellipse, the ones that are tangent to it, are
farthest. To find the appropriate hyperbola, we use the fact that two curves are tangent at a
point if and only if their gradient vectors are parallel. This means that f ( x, y ) must be a
scalar multiple of g ( x, y ) at the point of tangency, so that
f ( x , y ) = g ( x , y )
yi + xj = xi + yj .
4
Thus, we first find the values of x, y and for which yi + xj = xi + yj and g ( x, y ) = 0.

4
Solving first equation, we get y = ( 4 ) x, x = y y = ( 4 ) y y = 0 or = 2.
We now consider these two cases.
Case 1: If y = 0 , then x = y = 0 . But ( 0,0) does not satisfy the equation g ( x, y ) = 0 , that
is, ( 0,0) does not lie on the ellipse. Hence, y 0 .

Case 2: If y 0 , then = 2 and x = 2 y . Substituting this in the equation g ( x, y ) = 0 ,
we
obtain
( 2 y )
+ 4 y 2 = 8 4 y 2 + 4 y 2 = 8 y = 1.
Therefore,
f ( x, y ) = xy takes on its extreme values on the ellipse at four points
the
function
( 2,1) , ( 2, 1) .
Hence, the extreme values are xy = 2 and xy = 2 .

Now, we state and prove the necessary conditions for the existence of Lagrange multipliers.
Theorem: (Lagrange Theorem)
Let f and g have continuous first order partial derivatives such that f has an extremum at
a point ( a, b) on the smooth constraint curve g ( x, y ) = c . If g ( a, b ) 0 , then there is a
real number such that
f ( a , b ) = g ( a , b ) .
Proof: Let us represent the smooth curve given g ( x, y ) = c by the vector-valued function

r ( t ) = x ( t ) i + y ( t ) j,

r ( t ) 0,
where x and y are continuous on an open interval I . Define the function h on I as

h ( t ) = f ( x ( t ) , y ( t ) ) . Then, since f ( a, b ) is an extreme value of f , we have that
h ( t0 ) = f ( x ( t0 ) , y ( t0 ) ) = f ( a , b )
is an extreme value of h . This implies that h ( t0 ) = 0 . But by Chain rule, we have

h ( t0 ) = f x ( a, b) x ( t0 ) + f y ( a, b ) y ( t0 ) = f ( a, b ) r ( t0 ) = 0 .

Thus, we obtain that f ( a, b ) is orthogonal to r ( t0 ) . Moreover, g ( a, b ) is orthogonal to

r ( t0 ) . Consequently, the gradients f ( a, b ) and g ( a, b ) are parallel. Hence, there exists a
scalar such that f ( a, b ) = g ( a, b ) . This completes the proof.
Method of Lagrange multipliers:

Let f and g satisfy the hypothesis of Lagranges Theorem, and let f have a minimum or
maximum subject to the constraint g ( x, y ) = c . To find the minimum or maximum of f , use
the following steps:
1. Simultaneously solve the equations f ( x, y ) = g ( x, y ) and g ( x, y ) = c, that is,

solve the system of equations
f x ( x , y ) = g x ( x, y ) ,
f y ( x, y ) = g y ( x, y ) ,
g ( x, y ) = c.
2. Evaluate f at each solution point obtained in the first step. The largest value yields
the maximum of f subject to the constraint g ( x, y ) = c , and the smallest value
yields the minimum of f subject to the constraint g ( x, y ) = c .
Example 11: Find the maximum value of f ( x, y ) = 4 xy where x > 0 and y > 0 , subject to
the constraint x 2 9 + y 2 16 = 1 .
Solution: For the rectangle to be inscribed inside the given ellipse, we must have one of its
vertices in the first quadrant. Let ( x, y ) be the vertex of the rectangle in the first quadrant, so
that x > 0 and y > 0 (see figure above). Note that the other three vertices of the rectangle are
then determined uniquely as ( x, y ) , ( x, y ) , ( x, y ) . Now, because the rectangle has sides
of lengths 2 x and 2 y , its area is given by (the objective function) f ( x, y ) = 4 xy . We want
to find x and y such that f ( x, y ) is a maximum. However, the choice of ( x, y ) is restricted
to the first-quadrant points that lie on the ellipse (the constraint function)
x2 y2
+
= 1.
9 16
We now use the method of Lagrange multiplier. Let g ( x, y ) = x 2 9 + y 2 16 = 1 . By equating
f ( x, y ) = 4 yi + 4 xj and g ( x, y ) = ( 2 x 9 ) i + ( 2 y 16 ) j ,
we obtain the following system of equations.
2
4 y = x,
9
1
4 x = y,
8
x2 y 2
+
= 1 (constraint equation).
9 16
From the first equation, we obtain = 18 ( y x ) and substitution into the second equation
produces 4 x = (1 8)(18 y x ) y x 2 = ( 9 16) y 2 . Substituting this value of x 2 into the third
equation gives us y 2 = 8 , so that y = 2 2 . But y > 0 , so we take the positive value
y = 2 2. This gives x 2 = 9 2 or x = 3
will occur at 3
2 because x > 0 . Hence, the maximum value of f
2 , 2 2 and the maximum value of f is

3
3
f
,2 2 = 4
2 2 = 24.
2
Geometry of the solution: We consider the constraint equation to be the fixed level curve
x2 y 2
+ . The level curves of f , f 1 ( k ) , k a constant,
9 16
represent a family of hyperbolas f ( x, y ) = 4 xy = k (see figure below).
g ( x, y ) = 1 , where g ( x, y ) =
In this family, the level curves that meet the given constraint correspond to the hyperbolas
that intersect the ellipse. Moreover, to maximize f ( x, y ) , we wanted to find the hyperbola
that just barely satisfies the constraint. The level curve that does this is the one that is tangent
to the ellipse.
Example 12: The Cobb-Douglas production function for a software manufacturer is given by
f ( x, y ) = 100 x 3 4 y1 4 ,
where x represents the units of labour (at $150 per unit) and y represents the units of capital
(at $250 per unit). The total cost of labour and capital is limited to $50,000. Find the
maximum production level for this manufacturer.
Solution: The limit on the cost of labour and capital produces the constraint equation
g ( x, y ) = 50, 000 where g ( x, y ) = 150 x + 250 y .

Thus, g ( x, y ) = 150i + 250 j . From the given function, we get
f ( x, y ) = 75 x1 4 y1 4i + 25 x3 4 y 3 4 j .
By Lagranges theorem, there exists such that f ( x, y ) = g ( x, y ) . This gives us the
following system of equations
75 x 1 4 y1 4 = 150 , 25 x 3 4 y 3 4 = 250 and 150 x + 250 y = 50,000.
By solving for in the first equation, we have = (1 2 ) x 1 4 y1 4 , and substituting in the

second equation gives 25 x = 125 y x = 5 y . By putting this value of x into the third
equation, we obtain y = 50 and thus x = 250 . So, the maximum production level is obtained
when 250 units of labour is employed and 50 units of capital is invested, and the maximum
production level is f ( 250,50 ) = 100 ( 250 )
34
14
(50)
16, 719 product units.
Remarks:
Economists call the Lagrange multiplier obtained in a production function the marginal
productivity of money. For instance, in the preceding example, the marginal productivity of
money at x = 250 and y = 50 is
1 4
14
1
( 250) ( 50)
= x 1 4 y 1 4 =
2
2
0.334
which means that for each additional dollar spent on production, an additional 0.334 unit of
the product can be produced.
Example 13: At what point(s) on the circle x 2 + y 2 = 1 does the function f ( x, y ) = xy

attains its absolute maximum and what is that maximum?
Solution: The circle x 2 + y 2 = 1 is a closed and bounded set and f ( x, y ) = xy is a continuous

function, so it follows from the Extreme-Value Theorem that f has an absolute maximum
and an absolute minimum on the circle. To find these extrema, we will use Lagrange
multipliers to find the constrained relative extrema, and then evaluate f at those relative
extrema to find the absolute extrema.
We want to maximize f ( x, y ) = xy subject to the constraint g ( x, y ) = 0 , where
g ( x, y ) = x 2 + y 2 1 .
First, we will find the constrained relative extrema. For this purpose, we need the gradients
f ( x, y ) = yi + xj and g ( x, y ) = 2 xi + 2 yj . Now, g ( x, y ) = 0 if and only if x = 0 and
y = 0 , so that g ( x, y ) 0 for any point on the circle x 2 + y 2 = 1 . Thus, at a constrained
relative extremum, we must have
f ( x, y ) = g ( x, y ) yi + xj = 2 xi + 2 yj .
This gives us the pair of equations y = 2 x, x = 2 y . It follows from these equations that if
x = 0 , then y = 0 , and if y = 0 , then x = 0 . In either case, we have x 2 + y 2 = 0 , so the

constraint equation g ( x, y ) = 0 is not satisfied. Thus, we can assume that x and y are
nonzero, and we can rewrite the equations as
y
x
and =
.
2x
2y
This implies that x 2 = y 2 , which on substituting in the constraint equation gives 2 x 2 1 = 0 .

Thus, we obtain x = 1
2 , y = 1
2 . Hence, the constrained relative extrema occur at
1 1 1 1 1 1 1 1
the points
,
,
,
,
,
,
,
.
2 2 2 2 2 2 2 2
1 1 1 1
Thus, the absolute maximum of f ( x, y ) = xy is 1 2 at the points
,
,
,
.
2 2 2 2
1 1 1 1
Also, note that f has an absolute minimum 1 2 at the points
,
,
,
. In
2 2 2 2
the figure above, some level curves xy = c , and the constraint curve x 2 + y 2 = 1 are shown in
the vicinity of maxima of f .
Example 14: Find the maximum and minimum values of the function f ( x, y ) = 3x + 4 y on
the circle x 2 + y 2 = 1 .
Solution: We model this as a Lagrange multiplier problem with
f ( x, y ) = 3x + 4 y, g ( x, y ) = x 2 + y 2 1 ,
and look for the values of x, y and that satisfies the equations
f ( x, y ) = g ( x, y ) 3i + 4 j = 2 xi + 2 yj ;
x 2 + y 2 1 = 0.
g ( x, y ) = 0
The gradient equation implies that 0 , and gives that x = 3 2 , y = 2 . These equations
tell us that both x and y have the same sign. The constraint equation gives us that
2
5
3 2
+ 1 = 0 = .
2
2
Thus,
x = 3 5 and y = 4 5 ,
( x, y ) = ( 3 5, 4 5)
and
f ( x, y ) = 3 x + 4 y
has
extreme
values
at
(see figure below). The maximum and minimum values of f are,
respectively, therefore
3
4 25
3
4 25
3 + 4 =
= 5 and 3 + 4 =
= 5.
5
5
5 5
5
5
Example 15: Suppose that the temperature of a metal plate is given by
T ( x, y ) = x 2 + 2 x + y 2
for points
( x, y )
on the elliptical plate defined by x 2 + 4 y 2 24 . Find the maximum and
minimum temperatures on the plate.
Solution: The plate corresponds to the shaded region as shown in the figure below.
We first look for critical points of T ( x, y ) inside the region R . We have the gradient of T
T ( x, y ) = ( 2 x + 2 ) i + ( 2 y ) j .
At the critical point, we must have T ( x, y ) = 0 , that is 2 x + 2 = 0, 2 y = 0 which implies
x = 1, y = 0 . Thus, T is maximum or minimum inside the region R at the point ( 1,0) and
at this point T ( 1, 0) = 1 .
Now, we look for extrema of T ( x, y ) on the boundary of the region R , that is the ellipse
x 2 + 4 y 2 = 24 using the method of Lagrange multiplier. The constraint function is
g ( x, y ) = x 2 + 4 y 2 24 = 0 .
By Lagranges theorem, any extrema on the ellipse must satisfy the gradient equation
T ( x, y ) = g ( x, y ) .
Thus, we must have ( 2 x + 2 ) i + ( 2 y ) j = 2 xi + 8 yj . This occurs when

2 x + 2 = 2 x
and
2 y = 8 y .
The second equation holds when y = 0 or = 1 4 . If y = 0 , the constraint equation

x 2 + 4 y 2 = 24 gives us that x = 24 . If = 1 4 , then first equation gives x = 4 3 and the
constraint equation implies that y = 50 3 . Now, we compare all the values of T at all of
these points, namely, one interior critical point and the candidates for boundary extrema:
T ( 1,0 ) = 1,
( 24,0) = 24 + 2 24 33.8,
T ( 24,0 ) = 24 2 24 14.2,
T
4 50 14
T ,
4.7,
=
3
3
3
4
50 14
T ,
4.7.
=
3 3
3
Hence, the minimum value of T is 1 at the point ( 1,0) and the maximum value of T is
24 + 2 24 at the point
24,0 .
Example 16: Find the extreme value of f ( x, y ) = x 2 + 2 y 2 2 x + 3 subject to the constraint

x 2 + y 2 10 .
Solution: This problem is similar to Example 15. We divide the constraint into two cases.
a) For points on the circle x 2 + y 2 = 10 , use Lagrange multipliers to find that the
maximum value of f ( x, y ) is 24. This value occurs at ( 1,3) and at ( 1, 3) . In a
similar way, we can determine that the minimum value of f is approximately 6.675,
and this value occurs at
10, 0 .
b) For points inside the circle, use the techniques of finding the relative extrema to
conclude that the function has a relative minimum of 2 at the point (1, 0) .
By combining these two results, we conclude that f has a maximum of 24 at ( 1,3) and a
minimum of 2 at (1, 0) , as shown in the figure below.
Example 17: (Lagrange Multipliers and three variables)

Find the minimum value of
f ( x , y , z ) = 2 x 2 + y 2 + 3z 2
subject to the constraint
2 x 3 y 4 z = 49 .
Solution: Let g ( x, y , z ) = 0 , where g ( x, y, z ) = 2 x 3 y 4 z 49 , represent the constraint

curve. Because we have
g ( x, y, z ) = 2i 3 j 4k and f ( x, y , z ) = 4 xi + 2 yj + 6 zk ,
so that Lagranges theorem gives us that f = g , we obtain the following system of
equations
4 x = 2 :
f x ( x, y , z ) = g x ( x, y , z )
2 y = 3 :
f y ( x , y , z ) = g y ( x, y , z )
6 z = 4 :
f z ( x, y, z ) = g z ( x, y , z )
2 x 3 y 4 z 49 = 0 :
(the constraint)
Solving these equations, we get = 6, x = 3, y = 9, z = 4. So, the optimum value of f is

attained at ( 3, 9, 4 ) and the optimum value is
2
f ( 3, 9, 4 ) = 2 ( 3) + ( 9 ) + 3 ( 4 ) = 147.
Note that from the original function and the constraint, f has no maximum. So the optimum
value determined above is a minimum of f .
Geometrical view of Lagrange multiplier theorem in three variables: A graphical

interpretation of constrained optimization problems in three variables is similar to that of two
variables except that level surfaces are used instead of level curves. For instance, in the
preceding example, the level surfaces of f are ellipsoids centered at the origin, and the
constraint is a plane. The minimum value of f is represented by the ellipsoid that is tangent
to the constraint plane, as shown in the figure below.
The method of Lagrange Multipliers with two constraints:

The problem is to find the minimum or maximum value of a differentiable function
f ( x, y, z ) subject to two constraints g ( x, y , z ) = 0 and h ( x, y , z ) = 0 , where g and h are
also differentiable. Notice that for both constraints to be satisfied at a point
( x, y, z ) ,
the
point must lie on both surfaces defined by the constraints. Consequently, in order for there to
be a solution, we must assume that the two surfaces intersect. We further assume that g
and h are nonzero and are not parallel, so that the two surfaces intersect in a curve C and
are not tangent to one another. If f has an extremum at a point ( x0 , y0 , z0 ) on a curve C, then
f ( x0 , y0 , z0 ) must be normal to the curve. Notice that since C lies on both constraint
surfaces, g ( x0 , y0 , z0 ) and h ( x0 , y0 , z0 ) are both orthogonal to C at
implies that f ( x0 , y0 , z0 ) must lie in the plane determined by
( x0 , y0 , z0 ) .
g ( x0 , y0 , z0 )
This
and
h ( x0 , y0 , z0 ) (see figure below).
That is, for ( x, y, z ) = ( x0 , y0 , z0 ) and some constants and (Lagrange multipliers),
f ( x, y, z ) = g ( x, y, z ) + h ( x, y, z ) .
The method of Lagrange multipliers for the case of two constraints then consists of finding
the point ( x, y , z ) and the Lagrange multipliers and (for a total of five unknowns)
satisfying the five equations defined by:
f x ( x , y , z ) = g x ( x , y , z ) + hx ( x , y , z )
f y ( x, y, z ) = g y ( x, y , z ) + h y ( x, y , z )
f z ( x, y , z ) = g z ( x, y, z ) + hz ( x, y , z )
g ( x, y , z ) = 0
h ( x, y, z ) = 0.
Example 18: The plane x + y + z = 12 intersects the paraboloid z = x 2 + y 2 in an ellipse.

Find the point on the ellipse that is closest to the origin.
Solution: The intersection of the given plane and the paraboloid is shown below.
Finding the point on the ellipse that is closest to the origin is same as finding the point
( x, y, z )
equivalently minimizes the function
( x 0) + ( y 0) + ( z 0 ) = x 2 + y 2 + z 2
f ( x, y, z ) = x 2 + y 2 + z 2 subject to the constraints
that minimizes the distance d =
or
g ( x, y, z ) = x + y + z 12 = 0 and h ( x, y, z ) = x 2 + y 2 z = 0 .
At any extremum, we must have, by the method of Lagrange multipliers, that
f ( x, y, z ) = g ( x, y, z ) + h ( x, y , z )
) (
2 xi + 2 yj + 2 zk = i + j + k + 2 xi + 2 yj k .
or
Together with the constraint equations, we now have the system of five equations
2 x = + 2 x, 2 y = + 2 y , 2 z = , x + y + z 12 = 0, x 2 + y 2 z = 0
(*) .
From the first two equations, we have = 2 x 2 x and = 2 y 2 y . Equating these two
expressions of , we obtain that
2 x 2 x = 2 y 2 y
( x y )(1 ) = 0 .
It follows that either = 1 (in which case = 0 ) or x = y . However, if = 1 and = 0 ,

then third equation in
(*)
gives us that z = 1 2 , a contradiction to the fact that
z = x 2 + y 2 > 0 . Consequently, the only possibility is x = y , from which it follows that

z = 2 x 2 . Substituting this in the first constraint (the equation of plane), we obtain
0 = x + y + z 12 = x + x + 2 x 2 12 = 2 x 2 + 2 x 12 = 2 ( x + 3)( x 2 ) .
Thus, we get that either x = 3 or x = 2 . Since y = x and z = 2 x 2 , we have the points of
extremum of f as ( 3, 3,18 ) and ( 2, 2,8 ) . Finally, since
f ( 3, 3,18 ) = 342 and f ( 2, 2,8) = 72 ,

the closest point on the intersection of the two surfaces to the origin is (2, 2, 8). By the same
reasoning, observe that the point on the intersection of the two surfaces that is farthest from
the origin is (3,3, 18). Notice that these are also consistent with what you see in the figure.
Example 19: Let T ( x, y , z ) = 20 + 2 x + 2 y + z 2 represent the temperature at each point on the

sphere x 2 + y 2 + z 2 = 11 . Find the extreme temperatures on the curve formed by the
intersection of the plane x + y + z = 3 and the sphere.
(Ans.: Min. Temp. T = 25 and Max. Temp. T = 91 3 )
Example 20: The plane x + y + z = 1 cuts the cylinder x 2 + y 2 = 1 in an ellipse. Find the
points on the ellipse that lie closest to and farthest from the origin.
(Ans.: The closest points are (1, 0, 0 ) and ( 0,1,0 ) . Farthest point is 2 2, 2 2,1 + 2 )

Chain Rule for Multivariable Functions

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Chain Rule for Multivariable Functions

Загружено:

Авторское право:

Доступные форматы

The Chain Rule:

respect to t , and its derivative is given by the equation

( ( x ( t ) , y ( t ) ) ) . Then, by definition of the ordinary derivative, we have

( ( x ( t + t ) , y ( t + t ) ) ) f ( ( x ( t ) , y ( t ) ) ) , then the preceding equation gives us

Since f is a differentiable function of x and y , by definition, we have that

where both 1 0 and 2 0 as ( x, y ) ( 0, 0) . Dividing (1) throughout by t , we get

as t 0 . Hence, lim 1 = 0, lim 2 = 0 . Thus, using

equation (2), we obtain

Remarks: We consider x and y to be intermediate variables, which are both functions of a

Similarly, if w = f ( x, y, z ) is a function of three variables, which are themselves

Example 1: Let z = f ( x, y ) = x 2 e y , where x and y are functions one variable t defined by

x ( t ) = t 2 1, y ( t ) = sin t . Find the derivative of z with respect to t .

= 2 ( t 2 1) esin t ( 2t ) + ( t 2 1) esin t ( cos t ) .

Example 2: Let w = f ( x, y ) = x 2 y y 2 , where x = sin t , y = et . Find dw dt when t = 0 .

the derivative of g ( t ) = f ( x ( t ) , y ( t ) ) , using the usual rules of differentiation of one

Also, we have P k = 5k 3 4l 3 4 and P l = 15k 1 4l 1 4 . With l = 2 and k = 6 , this gives us

x1 = 2sin 2t and y1 = 3cos 2t

At what rate is the distance between objects is changing when t = ?

and that when t = , we have x1 = 4, y1 = 0, x2 = 0, y2 = 3 , so that s = 5. Therefore, when

ds s dx1 s dy1 s dx2 s dy2

x = x ( s, t ) , y = y ( s, t ) . It is given in the next theorem (without proof).

Example 4: Use the Chain Rule to find w s and w t for w = 2 xy , where x = s 2 + t 2

Implicit Partial Differentiation:

Theorem 3: If the equation F ( x, y ) = 0 defines y implicitly as a differentiable function of

Similarly, if the equation F ( x, y , z ) = 0 defines z implicitly as a differentiable function of x

Because Fy ( x, y ) = 3 y 2 + 2 y 5 0 for any ( x, y ) , it follows therefore by Theorem 3 that

respect to distance as we move through P0 in different directions.

Example 2: Suppose z = T ( x, y ) gives the temperature at each point ( x, y ) in a region R of

Definition 1: The directional derivative of f ( x, y ) at the point ( a, b ) and in the direction of

provided the limit exists.

Theorem 1: (Directional derivatives using partial derivatives)

Let f ( x, y ) be a function that is differentiable at P ( a, b ) . Then f has a directional

Proof: We define a function F of a single variable h by F ( h ) = f ( a + hu1 , b + hu2 ) . Then

Writing x = a + hu1 , y = b + hu2 , and applying the Chain rule on F , we obtain

When h = 0 , we have x = a and y = b , so that

Geometrical Interpretation of Directional Derivative:

( x , y , f ( x , y ) ) . The vertical plane used to form

C intersects the xy plane in a line L ,

and ( x , y , f ( x , y ) ) , respectively. Since the distance between P and Q is

+ ( tu2 ) = t u12 + u22 = t ,

the slope of the secant line through the points ( x0 , y0 , f ( x0 , y0 ) ) and ( x, y , f ( x , y ) ) is

Example 2: Find the directional derivative of f ( x, y ) = x 2 sin ( 2 y ) at the point P (1, 2 ) in

Gradient of a function of two variables:

derivatives f x and f y exist. Then the gradient of f , denoted by f ( x, y ) or grad f ( x, y ) ,

Example 5: Find the gradient of f ( x, y ) = y ln x + xy 2 at the point (1, 2 ) .

Theorem 2: (The gradient formula for the directional derivative)

If f is a differentiable function of x and y , then the directional derivative of f at the point

using equation (1) in Theorem 1.

Let f and g be differentiable functions of x and y . Then

c = 0 for any constant c .

Proof: (a) Since

( c ) = 0 and ( c ) = 0 , we have c = 0 for any constant c .

(b) Using the linearity rule of partial (ordinary) derivatives, we get

(c) Using the product rule of partial (ordinary) derivatives, we have

(e) Using power rule of partial (ordinary) derivatives, we have

Theorem 3: (Maximal Direction Property of the Gradient) Let f ( x, y ) be differentiable

function of x and y . Let ( a , b ) be any point in the domain of f . Then

parameter. Since f ( x ( t ) , y ( t ) ) = c for every t , we differentiate it with respect to t

f decreasing most rapidly?

So, we first find the gradient of f :

Solution: The direction of maximum increase at P is given by f ( 0.6, 0.7 ) . We have

( xy sin ( xz ) ) = y sin ( xz ) + xy ( z cos ( xz ) ) f x (1, 2, ) = 2 ;