Вы находитесь на странице: 1из 4

One dimensional Taylor series

A Taylor series, or Taylor expansion, is an expression for a function, f (x, y, z, ...), written in terms of ascending powers
of its variables, {x, y, z, ...}. In one dimension, consider a function of x expanded in terms of powers of x:
f (x) = a0 + a1 x + a2 x2 + a3 x3 + ... (1)
where the {ai } are constants the subscripted i matches the power of x. How do we find the coefficients?

Consider evaluating the function at x = 0:


f (0) = a0 + a1 0 + ... (2)
where all terms with subscripts 1 vanish. We thus find that a0 = f (0). We also learn from this that our expansion
is in fact going to be an expansion about the point x = 0. This is because the first term in the expansion gives us the
value of the function at x = 0 and the higher order terms are telling us how the function changes about this point. So
lets move on to finding the rest of the coefficients. We calculate the derivative of the function, and then evaluate at
x = 0:
f 0 (x) = a1 + 2a2 x + 3a3 x2 + ... (3)
0
f (0) = a1 + 2a2 0 + ... (4)
where now all terms subscripted 2 vanish because they involve powers of x evaluate at x = 0. We have thus found
a1 = f 0 (0). We now continue. You can easily convince yourself in order to pull down the coefficient ai we have
to differentiate precisely i times, and then evaluate at x = 0. Doing this will produce a factor of i! in front of the
coefficient, leading to
1 (i)
ai = f (0) (5)
i!

X 1
f (x) = f (n) (0) xn (6)
n=0
n!
1
= f (0) + f 0 (0) x + f 00 (0) x2 , to quadratic order, (7)
2
where the notation f (n) implies the nth derivative of f .

We can also see how to generalise this to expanding about any chosen point. We arrange for terms with n 1 to
vanish when the function is evaluated at our chosen point, call it x0 , by writing an expansion as
2
f (x) = a0 + a1 (x x0 ) x + a2 (x x0 ) + ... (8)
We then see that evaluating the function at x = x0 provides a0 = f (x0 ) and in general
1 (i)
ai = f (x0 ) (9)
i!

X 1 n
f (x) = f (n) (x0 ) (x x0 ) (10)
n=0
n!
1 2
= f (x0 ) + f 0 (x0 ) (x x0 ) + f 00 (x0 ) (x x0 ) to quadratic order. (11)
2

This is often written in an alternative form to emphasise the fact that it is an expansion about some point x = x0 ,
usually in terms of some small parameter, x (see the note on convergence later). If we write x = x0 + x and
substitute this into our expressions above we arrive immediately at
2
f (x0 + x) = a0 + a1 x + a2 (x) + ... (12)
1
ai = f (i) (x0 ) (13)
i!

X 1 (n) n
so f (x0 + x) = f (x0 ) (x) (14)
n=0
n!
1 2
= f (x0 ) + f 0 (x0 ) (x) + f 00 (x0 ) (x) to quadratic order. (15)
2

1
We have ignored the issue of whether or not these series exist for any function and about any point. As an obvious
counter example, consider the function f (x) = x1 . Clearly, we cannot evaluate f , or its derivatives at x = 0, so no
Taylor expansion is possible here (though if minded, look up the idea of Laurent Series, which generalise these ideas
to apply to functions with poles or singularities by including negative powers of the expansion variable). Now
clearly, with the powers of x ever increasing, then evaluating the expressions when x or x x0 or x are large will lead
to ever increasing numbers. We therefore require that the coefficients become equally small to counterbalance this.
1
The n! does indeed help us in this fashion, but there are also conditions on the derivatives of the functions in question
and the size of the expansion parameters. In short, there is no simple answer, and we will leave further discussions on
this to a course on analysis.

In two dimensions (or more...)


The extension of these ideas to more than one dimension is straightforward. We simply expand about a (2 dimensional)
point in terms of powers of each of the variables, and cross terms between these powers:

f (x, y) = a00 + a10 x + a01 y + a20 x2 + a11 xy + a11 yx + a02 y 2 + a30 x3 + a21 x2 y + ... (16)

Note that because xy = yx we pick up two terms looking like a11 xy etc. Now we proceed in the same way; evaluate
at the point (0, 0) to leave just a00 = f (0, 0). Similarly, partially differentiate with respect to x to leave just

fx (x, y) = a10 + 2a20 x + 2a11 y + ... (17)

where the higher order terms contain terms in x and y. Evaluate this at (0, 0) to see that a10 = fx (0, 0). To pick out
a01 we differentiate with respect to y and evaluate at (0, 0). The second order terms work similarly a20 comes from
fxx evaluated at (0, 0) and similarly for a02 . For a11 we differentiate first with respect to x, and then with respect to
y, finding
fxy (x, y) = 2a11 + 2a21 x + ... (18)
and so evaluating at (0, 0) provides 2a11 = fxy (0, 0), which is the term (with the factor of 2) appearing in the
expansion. Thus, to quadratic order we have (in the three forms we explored for the 1 dimensional case)
1
fxx (0, 0) x2 + 2fxy (0, 0) xy + fyy (0, 0) y 2 + ...

About x = 0: f (x, y) = f (0, 0) + fx (0, 0) x + fy (0, 0) y +
2
(19)
About x = x0 : f (x, y) = f (x0 , y0 ) + fx (x0 , y0 ) (x x0 ) + fy (x0 , y0 ) (y y0 ) (20)
1h 2 2
i
+ fxx (x0 , y0 ) (x x0 ) + 2fxy (x0 , y0 ) (x x0 ) (y y0 ) + fyy (x0 , y0 ) (y y0 ) + ...
2
(21)
Using x, y: f (x0 + x, y0 + y) = f (x0 , y0 ) + fx (x0 , y0 ) (x) + fy (x0 , y0 ) (y) (22)
1h 2 2
i
+ fxx (x0 , y0 ) (x) + 2fxy (x0 , y0 ) (x) (y) + fyy (x0 , y0 ) (y) + ... (23)
2

Critical Points
We can now apply this to determine stationary points. At such a point we require the function to be stationary
with respect to all its variables. The 1 dimensional case requires simply f 0 (x) = 0 and is the realm of ordinary
differentiation. In the 2-or more-dimensional case, we require the function to be stationary with respect to all of its
variables. In two dimensions then
f f
=0 = 0. (24)
x y
Anywhere where both of these conditions hold is a stationary point. We would like to be able to classify the stationary
points of a function, in terms of maxima, minima and saddle points (the name will hopefully become natural when
these are discussed below). Ill give you two approaches; one based on the Taylor expansion and then a second that
uses ideas from matrix algebra. To begin with the way we examine the stationary points is by making use of the

2
Taylor expansions we have been discussing above we let the stationary point be given by (x0 , y0 ) and expand about
this point:

f (x0 + x, y0 + y) = f (x0 , y0 ) + fx (x0 , y0 ) (x) + fy (x0 , y0 ) (y) (25)


1h 2 2
i
+ fxx (x0 , y0 ) (x) + 2fxy (x0 , y0 ) (x) (y) + fyy (x0 , y0 ) (y) + ... (26)
2
Now at a stationary point we have exactly that fx (x0 , y0 ) = 0 and that fy (x0 , y0 ) = 0 so that our expansion becomes

1h 2 2
i
f (x0 + x, y0 + y) = f (x0 , y0 ) + fxx (x0 , y0 ) (x) + 2fxy (x0 , y0 ) (x) (y) + fyy (x0 , y0 ) (y) + ... (27)
2
Therefore, all of our information about the stationary points is held in the set of partial derivatives fxx , fxy and fyy .
There are a number of ways we could proceed here, and we will explore two of them. The first is the method closest
to what you will have seen in lectures, and the second makes use of the ideas of matrices and diagonalisation which
you will meet / have met in linear algebra courses.

The first method involves factorising the expression as such (doing this is non-obvious, but check the expression and
understand its consequences):

f (x0 + x, y0 + y) = f (x0 , y0 ) (28)


1 h 2 2 2
i
+ (fxx (x) + fxy (y)) + (y) fxx fyy fxy (29)
2fxx
+ ... (30)

where Ive dropped the notation of explicitly writing where the derivatives should be evaluated all should be done
so at (x0 , y0 ). Now the first term in square brackets is always greater than or equal to zero, and the second term
2
(multiplying (y) ) is the definition of D given in lectures. So we see that everything depends on D, and on the sign
of fxx which premultiplies everything. Now in case you feel that we are treating x and y unfairly by factorising out
fxx from the expression, note that we can also factorise this in terms of fyy as follows

f (x0 + x, y0 + y) = f (x0 , y0 ) (31)


1 h 2 2 2
i
+ (fyy (y) + fxy (x)) + (x) fxx fyy fxy (32)
2fyy
+ ... (33)

containing once again a positive definite term and the D-term. By comparing the two expressions we can deduce
that the sign of fxx and fyy must be the same. Now we analyse the expressions and figure out how their behaviour is
dependent upon the the second derivatives of our function.

Define a (local)maximum as a point at which the function is decreasing in all directions about that point. Define
a minimum as a point at which the function is increasing in all directions about that point. Finally define a saddle
point as one about which the function increases in one direction, and decreases in the other this leads to graphs
which resemble horse saddles. We can combine the ideas in the equations above to determine (as you will have done
in lectures)
2
Define D fxx fyy fxy . Then
D > 0, fxx > 0 (or fyy > 0) gives a minimum
D > 0, fxx < 0 (or fyy < 0) gives a maximum
D < 0 gives a saddle point
D = 0 provides no classification information instead we must look at higher derivatives, as in the 1 dimensional
case.

A really nice way to classify stationary points is to use matrices to express the expansion. If you havent met these
ideas yet then save this section for a later date. The power of this approach is in how easily it generalises to higher

3
dimensions. Put simply, the second order terms in the expansion of the function about its stationary point (see
equation (27)) can be written as
 
1 f fxy
f (x0 + x, y0 + y) = f (x0 , y0 ) + xT xx x + ... (34)
2 fyx fyy

where we define xT = (x, y), name the matrix of second derivatives the Hessian, denoted by H, and of course
fxy = fyx . Now the Hessian is a symmetric matrix, because of the equality of the off-diagonal terms. It turns out
that it is always possible to diagonalise such a matrix, writing it in terms of its eigenvalues 1 , 2 :
 
1 1 0
f (x0 + x, y0 + y) = f (x0 , y0 ) + x0T x0 + ... (35)
2 0 2

In this equation, Ive introduced a variable x0 = (x0 , y 0 ), which is related to x as follows. The diagonalalisation
procedure highlights the natural direction to use as a basis for a given matrix and allows us to transform x, which
hides the geometric structure of the matrix, into a new basis x0 which manifests the special directions associated to
the matrix allowing us to write it in its simplest form as above. So with respect to the diagonal basis we can expand
out the matrix multiplication to find
1h 2 2
i
f (x0 + x, y0 + y) = f (x0 , y0 ) + 1 (x0 ) + 2 (y 0 ) + ... (36)
2
which contains no cross terms involving x0 y 0 . We can now read off the change in the function as we travel in the
directions x0 and y 0 , and the classification is in terms of the signs of the eigenvalues 1 , 2 :
1 > 0, 2 > 0 is a minimum

1 < 0, 2 < 0 is a maximum


1 < 0, 2 > 0 or 1 > 0, 2 < 0 gives a saddle point, with the function increasing in the direction of the
eigenvector corresponding to the positive eigenvalue.
A zero eigenvalue tells us nothing, and we must look to higher orders.

To see what this has to do with the quantity D I need to tell you the expression for the eigenvalues1 of the Hessian.
They are
1 1
q
2
= (fxx + fyy ) (fxx + fyy ) 4D (38)
2 2
with D defined as above. The subscript just distinguishes the eigenvalues associated to the sign on the square root.
Now we examine what D has to say about the sign of these two eigenvalues:
2
If D > 0 then the stuff under the square root is less than (fxx + fyy ) so the sign of both eigenvalues is the same
and is fixed by the first term if fxx and fyy are positive then both eigenvalues are positive and we have a min;
otherwise if fxx and fyy are negative then both eigenvalues are negative and we have a max.

If D < 0, the stuff in the square root is bigger than the square of the first term, so one eigenvalue is positive (+
sign) and one is negative ( sign) so we have a saddle.
If D = 0, one of the eigenvalues (the sign) is zero.... this means our analysis breaks down and we have to go
and look at higher order derivatives.

This shows that examining the sign of the eigenvalues gives us the same criteria as we found before!

These ideas generalise to higher dimensions (in n-dimensions, the Hessian is an nxn symmetric matrix with n eigen-
values) and the classification proceeds as before in terms of the eigenvalues of the matrix.

1 For current / future reference, the general procedure for finding eigenvalues is to consider the determinant

fxx fxy
=0 (37)
fxy fyy
2 = 0. Exercise: solve this to show you get the eigenvalues in the text.
which leads to a quadratic equation (fxx ) (fyy ) fxy

Вам также может понравиться