Академический Документы
Профессиональный Документы
Культура Документы
Manuscript received by the Editor August 11, 1998; revised manuscript received June 6, 2000.
Massachusetts Institute of Technology, Earth Resources Laboratory, E34-458, Cambridge, Massachusetts 02139. E-mail: rodi@ mit.edu.
GSY-USA, Inc., PMB #643, 2261 Market Street, San Francisco, California 94114-1600. E-mail: randy@gsy-usa.com.
c 2001 Society of Exploration Geophysicists. All rights reserved.
174
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 175
Compared to global optimization methods like grid search, scribed for the nonlinear inverse problem. The use of conjugate
Monte-Carlo search, and genetic algorithms, inversion meth- gradients for function minimization is a well-established opti-
ods that make use of the Jacobian (rst-order derivative) of the mization technique (Fletcher and Reeves, 1959; Polak, 1971)
forward function, like those cited in the previous paragraph, and was suggested for nonlinear geophysical inverse prob-
generally require the testing of many fewer models to obtain lems by Tarantola (1987). It has been applied to varied geo-
an optimal solution of an inverse problem. This fact is of critical physical problems, including crosswell traveltime tomography
importance in 2-D and 3-D electromagnetic inverse problems (Matarese and Rodi, 1991; Matarese, 1993), crosswell wave-
where the forward function entails the numerical solution of form tomography (Thompson, 1993; Reiter and Rodi, 1996),
Maxwells equations, and is the reason that iterated, linearized and dc resistivity (Ellis and Oldenburg, 1994; Shi et al., 1996).
methods have occupied center stage in electromagnetic inver- Our investigation compares the numerical performance of
sion despite their greater susceptibility to nding locally rather three algorithms for 2-D magnetotelluric inversion: a Gauss-
than globally optimal solutions. On the other hand, generation Newton algorithm, the Mackie-Madden algorithm, and a new
of the Jacobian in these same problems multiplies the com- NLCG algorithm. In tests involving synthetic and real data, the
putational burden many times over that of evaluating the for- algorithms are applied to the minimization of a common objec-
ward function alone, even when efcient reciprocity techniques tive function so that algorithm efciency and accuracy can be
(Madden, 1972; Rodi, 1976; McGillivray and Oldenburg, 1990) compared directly. Rather than implement a published NLCG
are exploited. Moreover, iterated, linearized inversion meth- algorithm (e.g., Press et al., 1992), we designed our NLCG al-
ods, done to prescription, have the additional computational gorithm to avoid excessive evaluations of the forward problem
chore of solving a linear system on the model space at each and to fully exploit the computational techniques for Jacobian
iteration step. These two tasksgenerating the Jacobian and operations used in the Mackie-Madden algorithm. Conversely,
linear inversiondominate the computations in 2-D and 3-D we modied the original Mackie-Madden algorithm to include
MT inversion, where the number of data and model parameters a preconditioner that we developed for NLCG. Given this, we
are typically in the hundreds or thousands. The computation of can state two objectives of our study: to demonstrate quanti-
optimal solutions to the 2-D MT inverse problem can require tatively the computational advantages of the two algorithms
several hours of CPU time on a modern workstation, whereas that use conjugate gradients (Mackie-Madden and NLCG)
computing optimal solutions of the 3-D problem is impractical over a traditional iterated, linearized inversion scheme (Gauss-
on the computers widely available today. Newton); and to determine whether the NLCG framework
This computational challenge has motivated various algo- offers improvements over the Mackie-Madden approach as
rithmic shortcuts in 2-D and 3-D MT inversion. One approach a conjugate gradients technique. Towards the latter end and
has been to approximate the Jacobian based on electromag- as a prelude to future research on the conjugate-gradients
netic elds computed for homogeneous or 1-D earth mod- approach to nonlinear inversion, we describe the Mackie-
els, which has been used in 2-D MT inversion by Smith and Madden and our new NLCG algorithms in common terms and
Booker (1991) in their rapid relaxation inverse (RRI) and in detail in an attempt to isolate the precise differences between
by Farquharson and Oldenburg (1996) for more general 2-D them.
and 3-D electromagnetic problems. Other workers have sought
approximate solutions of the linearized inverse problem. In this PROBLEM FORMULATION
category is the method of Mackie and Madden (1993), which
solves each step of a Gauss-Newton iteration incompletely us- Forward model for 2-D magnetotellurics
ing a truncated conjugate gradients technique. In addition to
As is customary in 2-D magnetotellurics, we model the solid
bypassing the complete solution of a large linear system, the al-
earth as a conductive halfspace, z 0, underlying a perfectly re-
gorithm avoids computation of the full Jacobian matrix in favor
sistive atmosphere. The electromagnetic source is modeled as a
of computing only its action on specic vectors. Although not
plane current sheet at some height z = h. Given that the phys-
as fast as RRI, the Mackie-Madden algorithm does not em-
ical parameters of the earth are independent of one cartesian
ploy approximations to the Jacobian and requires much less
coordinate (x), Maxwells equations decouple into transverse
computer time and memory than the traditional iterated, lin-
electric (TE) and transverse magnetic (TM) polarizations. For
earized inversion methods (as we will demonstrate in this pa-
the purpose of calculating MT data at low frequency, it sufces
per). Also in this category is the subspace method, applied by
to solve (see, for example, Swift, 1971)
Oldenburg et al. (1993) to dc resistivity inversion and by oth-
ers to various other geophysical inverse problems. This method
reduces the computational burden by solving each linearized 2 Ex 2 Ex
+ = i E x (1)
inverse problem on a small set of judiciously calculated search y2 z 2
directions in the model space.
E x
In their use of incomplete solutions of the linearized inverse = i (2)
problem, the subspace and Mackie-Madden inversion methods z z=h
depart from the strict schema of iterated, linearized inversion,
with an accompanying reduction in the computer resources for the TE polarization, and
needed to solve large, nonlinear inverse problems. In this pa-
per we investigate an approach to electromagnetic inversion Hx Hx
+ = iHx (3)
that is a further departure from the geophysical tradition: non- y y z z
linear conjugate gradients (NLCG), or conjugate gradients ap-
plied directly to the minimization of the objective function pre- Hx |z=0 = 1 (4)
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
176 Rodi and Mackie
for the TM polarization, where E x (Hx ) is the x component of sistivity for one site is given by the formula
the electric (magnetic induction) eld, is angular frequency, T 2
i a v
is the magnetic permeability (assumed to be that of free app = (10)
space), is the electrical conductivity, and is the inverse of bT v
conductivity, or resistivity.
where a and b are given vectors. An analogous discussion ap-
MT data are electric-to-magnetic-eld ratios in the fre-
plies to the TM polarization, with v being a discretization of
quency domain, which can be expressed as complex apparent
the Hx eld and with different choices of K, s, a, and b.
resistivities. For the TE polarization, the complex apparent re-
sistivity is dened as
Inversion method
i E x 2
app = . (5)
Hy We can write the inverse problem as
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 177
Let A denote the Jacobian matrix of the forward function F: Presuming H to be nonsingular, this necessary condition is
also sufcient and we can write the Gauss-Newton iteration as
A (m) = j F (m),
ij i
i = 1, . . . , N ; j = 1, . . . , M. m+1 = m H1
g .
Levenberg (1944) and Marquardt (1963) proposed a modi-
Given equation (11), we have
cation of the Gauss-Newton method in which the model in-
g(m) = 2A(m)T V1 (d F(m)) + 2LT Lm (12) crement at each step is damped. The rationale for damping
is to prevent unproductive movements through the solution
N
H(m) = 2A(m)T V1 A(m) + 2LT L 2 q i Bi (m) space caused by the nonquadratic behavior of or poor con-
i=1
ditioning of H. In algorithm GN, we employ a simple version
(13) of Levenberg-Marquardt damping and replace equation (18)
with
where Bi is the Hessian of F i and q = V1 (d F(m)).
We also dene an approximate objective function and its gra- H + I (m+1 m ) = g . (19)
dient and Hessian based on linearization of F. For linearization
Here, I is the identity matrix and is a positive damping pa-
about a model mref , dene
rameter allowed to vary with iteration step. Since the objec-
F(m; mref ) = F(mref ) + A(mref )(m mref ) tive function we are minimizing includes its own damping in
the form of the stabilizing (last) term in equation (11), and
(m;
mref ) = (d F(m; mref ))T V1 (d F(m; mref )) since this term is a quadratic function of m, a large amount of
Levenberg-Marquardt damping is not needed in our problem.
+ mT LT Lm. Algorithm GN chooses to be quite small after the rst few
It is easy to show that the gradient and Hessian of are given iteration steps and is therefore not a signicant departure from
by the expressions the Gauss-Newton method.
Our implementation of the Gauss-Newton algorithm solves
g(m; mref ) = 2A(mref )T V1 (d F(m; mref )) equation (19) using a linear, symmetric system solver from
the Linpack software library (Dongarra et al., 1979). First, the
+ 2LT Lm damped Hessian matrix, H + I, is factored using Gaussian
elimination with symmetric pivoting. The factored system is
H(mref ) = 2A(mref )T V1 A(mref ) + 2LT L. (14)
then solved with g as the right-hand side vector. The Jacobian
is quadratic in m (its rst argument), g is linear in m, and H matrix, A(m ), is needed to compute g and H in accordance
is independent of m. In fact, with equations (12) and (14). GN generates the Jacobian us-
ing the reciprocity method of Rodi (1976), which translates
(m;
mref ) = (mref ) + g(mref )T (m mref ) the task to that of solving a set of pseudoforward problems
1 having the same structure as equation (9) (see Appendix). The
+ (m mref )T H(mref )(m mref ) (15) memory requirements of GN are dominated by storage of the
2 Jacobian (NM real numbers) and the Hessian (M 2 real num-
g(m; mref ) = g(mref ) + H(mref )(m mref ). (16) bers). We note that the memory needed for forward modeling
and evaluating scales linearly with N and M.
Clearly F(mref ; mref ) = F(mref ), (m ref ; mref ) = (mref )
Convergence of the Gauss-Newton, or Levenberg-
and g(mref ; mref ) = g(mref ), but H(mref ) is only an approx-
Marquardt, iteration implies that the sequence g converges
imation to H(mref ) obtained by dropping the last term in
to zero and thus that the solution is a stationary point of .
equation (13).
Whether the stationary point corresponds to a minimum or
otherwise depends on how strongly nonquadratic is. When
Gauss-Newton algorithm (GN) the method does nd a minimum of , there is no assurance
that it is a global minimum.
One can describe the Gauss-Newton iteration as recursive
minimization of ,
i.e. the model sequence satises
Mackie-Madden algorithm (MM)
m0 = given
The second minimization algorithm we study is the algo-
(m
+1 ; m ) = min (m;
m ), = 0, 1, 2, . . . . (17) rithm rst introduced by Madden and Mackie (1989) and fully
m
implemented and more completely described by Mackie and
A consequence of equation (17) is that the gradient vector, Madden (1993). As adapted to 3-D dc resistivity inversion, the
g(m+1 ; m ), is zero. In light of equation (16), m+1 satises the algorithm is also described by Zhang et al. (1995).
linear vector equation Mackie and Madden (1993) presented their algorithm as
iterated, linearized inversion. Solution of the linear inverse
H (m+1 m ) = g , (18) problem at each iteration step was formulated in terms of
a maximum-likelihood criterion. It is informative and well
where we make the abbreviations
serves our purpose to recast the Mackie-Madden algorithm
g g(m ) as a modication of the Gauss-Newton method which, like
Gauss-Newton, performs a minimization of the nonquadratic
H H(m ). objective function .
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
178 Rodi and Mackie
m,k+1 = m,k + ,k p,k , k = 0, 1, . . . , K 1 g,0 = 2AT V1 (d F(m )) + 2LT Lm (24)
m+1 = m,K . g,k+1 = g,k + 2,k AT V1 f,k + 2,k L Lp,k ,
T
For each k, the vector p,k is a search direction in model space k = 0, 1, . . . , K 2. (25)
and the scalar ,k is a step size. Let us make the additional
From equations (23)(25), we see that A and AT each operate
abbreviation
on K vectors, or one each per CG step. Mackie and Madden
g,k g(m,k ; m ). (1993) showed that operations with the Jacobian and its trans-
pose can be accomplished without computing the Jacobian
In accordance with the CG algorithm (Hestenes and Stiefel, itself. Instead, the vector resulting from either of these op-
1952), the step size is given by the formula erations can be found as the solution of a single pseudofor-
ward problem requiring the same amount of computation as
gT,k p,k the actual forward problem, F. (We dene one forward prob-
,k = , (20)
pT,k H p,k lem to include all frequencies and polarizations involved in
the data vector.) The algorithms for operating with A and
which, we point out, solves the univariate minimization prob- AT are detailed in the Appendix. The main memory used by
lem, MM comprises several vectors of length N (e.g. f,k ) and M
(e.g. p,k , g,k , and C g,k ). Our preconditioner (C ) requires
(m
,k + ,k p,k ; m ) = min (m
,k + p,k ; m ). no storage (see the section Preconditioning below). Thus,
the memory needed by MM scales linearly with the number of
The search directions are iterated as data and model parameters, compared to the quadratic scaling
for GN.
p,0 = C g We apply algorithm MM using relatively few CG steps per
p,k = C g,k + ,k p,k1 , k = 1, 2, . . . , K 1 (21) Gauss-Newton step. The main purpose in doing so is to keep
the computational effort needed for Jacobian operations un-
where the M M positive-denite matrix C is known as a der that which would be needed to generate the full Jacobian
preconditioner, and where the scalars ,k are calculated as matrix. The Jacobian operations performed in K CG steps of
MM require computations equivalent to solving 2K forward
gT,k C g,k problems, as indicated above. The computational effort needed
,k = .
gT,k1 C g,k1 to generate the full Jacobian matrix is harder to characterize
in general but, in the usual situation where the station set
The rst term of equation (21) is a preconditioned steepest de- is common for all frequencies and polarizations, amounts to
scent direction, which minimizes pT g,k , the directional deriva- one forward problem per station. Therefore, MM will do less
tive of (m;
m ) at m = m,k , with pT C1 p xed. The second computation (related to the Jacobian) per Gauss-Newton step
term modies the search direction so that it is conjugate to than GN when K is less than half the number of stations. Ad-
previous search directions, meaning ditonally, algorithm MM avoids the factorization of H. Trun-
cating the CG iteration also effects a kind of damping of the
pT,k H p,k = 0, k < k. (22) Gauss-Newton updates, achieving similar goals as Levenberg-
Marquardt damping. It is for this reason that algorithm MM
The nal ingredient of the conjugate gradients algorithm is
solves the undamped system (18), rather than system (19).
iteration of the gradient vectors:
g,k+1 = g,k + ,k H p,k , k = 0, 1, . . . , K 2, In algorithm MM, the method of conjugate gradients was
applied inside a Gauss-Newtonstyle iteration to incompletely
which follows from equation (16). solve a linear system or, equivalently, to incompletely minimize
The main computations entailed in algorithm MM are in- a quadratic approximation to the objective function. Nonlin-
volved in the evaluation of the forward function, F(m ), for ear conjugate gradients (see, for example, Luenberger, 1984)
each [needed to compute (m ) and g ], and operation with directly solve minimization problems that are not quadratic,
the Jacobian matrix and its transpose for each k and . Regard- abandoning the framework of iterated, linearized inversion.
ing the latter, let Algorithm NLCG employs the Polak-Ribiere variant of non-
linear conjugate gradients (Polak, 1971) to minimize the objec-
A A(m ) tive function of equation (11).
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 179
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
180 Rodi and Mackie
a minimum with respect to , so the line search might stop a simplied NLCG, in which the line search enhancements (cu-
prematurely. The benet ensues when F is approximately lin- bic interpolation and bisection) are ignored, to MM.
ear between m,k and the minimizing model. In this case, the Algorithms MM and NLCG both generate a doubly indexed
stopping condition will be met and m,k+1 will be an accurate sequence of models, m,k . In MM, the slower index () indexes
result of the line search, even though and its gradient may a Gauss-Newton iteration while the faster index (k) indexes
have changed greatly from their values at m,k . The search stops a conjugate gradients loop. In our simplied NLCG, the op-
without additional, unnecessary computations such as an ad- posite is the case, with a conjugate gradients counter and
ditional update (m,k+2 ) or second derivative information at k a Gauss-Newton counter. However, the algorithms perform
the new model (requiring A,k+1 p ). Consequently, when the similar calculations at each step of their respective inner loops.
nonlinear CG iteration has progressed to the point where F The difference between the algorithms can be identied with
behaves linearly in all search directions, each line minimiza- the frequency with which the following events occur: calculat-
tion will require only one step (m+1 = m,1 ) and the remaining ing the forward function (F); changing the search direction (p)
computations will be essentially the same as the linear CG com- used in conjugate gradients; and resetting the search direction
putations in MM, with the exception that the forward function to be the steepest descent direction.
F is evaluated each time the model is updated. To demonstrate this, we sketch a simple algorithm having a
single loop that subsumes MM and NLCG with the restricted
Preconditioning line search. The input is a starting model, m0 :
Algorithm CGI (m0 )
We recall that algorithms MM and NLCG each provide for
m:= m0 ;
the use of a preconditioner, C , in their respective implemen-
for = 0, 1, 2, . . .
tations of conjugate gradients. The preconditioner can have a
if new ref
big impact on efciency in conjugate gradients. Two compet-
mref := m;
ing considerations in its choice are the computational cost of
e:= d F(mref );
applying the preconditioner, and its effectiveness in steering
else
the gradient vector into a productive search direction.
e:= e f;
This study compares two versions of each of algorithms
end
MM and NLCG: one without preconditioning (C = I) and one
g:= 2A(mref )T V1 e + 2LT Lm;
using
:=eT V1 e + mT LT Lm;
1
C = I + LT L , (32) if new dir
h:= C(mref )g;
where is a specied scalar. In the latter case, we apply the if steep
preconditioner to a vector g by solving the linear system for h, := 0;
else
I + LT L h = g. := hT (g glast )/last ;
We solve this system using a (linear) conjugate gradients end
technique. p:= h + p;
The rationale for equation (32) is to have an operator that glast := g;
can be applied efciently and that in some sense acts like the last := hT g;
inverse of H , the approximate Hessian matrix. The efciency end
of applying C stems from the simplicity and sparseness of the f:= A(mref )p;
above linear system for h. The amount of computation needed := pT g/(fT V1 f + pT LT Lp);
to solve the system is less than one forward function evalua- m:= m + p;
tion and, thus, adds little overhead to either algorithm MM or next
NLCG. The approximation to the inverse Hessian arises from The reader can verify that this algorithm corresponds to our
the second term of C1 , but we also attempt to choose so
mathematical descriptions of MM and NLCG. To help, we point
that the rst term is of comparable size to the matrix AT V1 A . out that the formula for above corresponds to that for ,k
In our later examples, we took to be a constant (indepen- in equation (20) (used in MM) but with that for ,k+1 ,k in
dent of ) based on the Jacobian matrix of a homogeneous equation (30) (used in the NLCG line search). Further, CGI
medium. replaces iteration of the gradient vector, in equation (25), with
iteration of an error vector, e.
Theoretical comparison of MM and NLCG Algorithm CGI has three ags: new ref, new dir, and steep.
The ag new ref is set to 1 (true) if the current model is to be
In the three main applications of NLCG presented below used as a reference model for linearization. The ag new dir
(Numerical Experiments), updating of the step size, , by is 1 if the search direction is to be changed. Flag steep is 1
cubic interpolation occurred nine times, updating by bisection if the newly computed search direction is to be reset to the
[formula (31)] occurred zero times, and Gauss-Newton updat- steepest descent direction, thus breaking the conjugacy condi-
ing [formula (30)] occurred 211 times (for a total of 220 line tion [equation (28)]. All three ags are initialized to 1. We can
search steps among the three examples). Moreover, none of characterize algorithms MM and NLCG by how these ags are
the line searches failed to converge within the tolerance given. changed thereafter, as shown in Table 1. Algorithm CGI above
The line search algorithm in NLCG is thus primarily a univari- does not show tests for line search convergence or failure, but
ate Gauss-Newton algorithm, and it is informative to compare these could be the same as in NLCG.
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 181
The main computations in CGI are taken up in the evaluation thetic data were calculated using the nite-element algorithm
of the three quantities F(mref ), A(mref )p and A(mref )T V1 e. of Wannamaker et al. (1986), whereas our inversion algorithms
Each of these quantities requires the same computational ef- employ the transmission-network algorithm of Mackie et al.
fort (see Appendix). The latter two quantities (operations with (1988). Each synthetic data set comprises complex apparent
A and AT ) are done on each pass through the loop uncondition- resistivities at multiple station locations, frequencies, and po-
ally, while the forward function is done only when new ref is 1. larizations. Noise was included by adding an error to the com-
Therefore, each model update in CGI requires computations plex logarithm of each apparent resistivity: log app + er + iei ,
equal to two or three forward function evaluations, depending where er and ei are uncorrelated samples from a Gaussian dis-
on how new ref is determined. tribution having zero mean and 0.05 standard deviation (5%
noise). The noise was uncorrelated between frequencies, sta-
NUMERICAL EXPERIMENTS tions, and polarizations. For comparison, the accuracy of our
This section presents results of testing the three MT inversion forward modeling algorithm is approximately 13% for the
algorithms described above on synthetic and eld data. In each range of parameters (grid dimensions, frequencies, and resistiv-
test, algorithms GN, MM and NLCG were applied to the min- ities) involved in the test problems below (Madden and Mackie,
imization of a common objective function [equation (11)] 1989).
with a given data vector d, variance matrix V, regularization
parameter , and regularization operator L. The data vector
and error variance matrix are described below with each ex- Model 1.Our rst tests employ a simple resistivity model
ample. The regularization operator for each example was the consisting of a 10 ohm-m rectangular body embedded in a
second-order nite-difference operator described earlier. To 100 ohm-m background. The anomalous body has dimensions
choose the regularization parameter, we ran preliminary in- of 10 10 km and its top is 2 km below the earths surface.
versions with a few values of and then subjectively chose one The tests use synthetic data for the TM and TE polarizations
that gave reasonable data residuals and model smoothness. at seven sites and ve frequencies, yielding a total of 140 real-
We point out that none of the three inversion algorithms being valued data. The frequencies range from 0.01 to 100 Hz and
tested determines as an output. Various other parameters are evenly spaced on a logarithmic scale. The model parame-
specic to the inversion algorithms were selected as follows: terization for inversion divides the earth into a grid of blocks
numbering 29 in the horizontal (y) direction and 27 in the ver-
1) In GN, the Levenberg-Marquardt damping parameter tical (z) direction, implying a total of 783 model parameters.
was set to 0.001 times the current value of the objective The variance matrix (V) was set to 0.0025 times the identity
function: = 0.001(m ). matrix, and the regularization parameter () was chosen as 30.
2) In NLCG, the tolerance for deciding convergence of the The starting model for each inversion was a uniform halfspace
line minimization, , was set to 0.003. with = 30 ohm-m.
3) In MM and NLCG, the preconditioner was either that de- We applied inversion algorithm GN and two versions each
ned by equation (32) or, in one experiment, the identity of MM and NLCG (with and without preconditioning) to the
(no preconditioning). synthetic data from model 1. Figure 1 shows the performance
4) In MM, the number of conjugate gradient steps per of each algorithm in terms of the value of the objective function
Gauss-Newton step, K , was set to 3. () it achieves as a function of CPU time expended. CPU time
used to compute the objective function for the starting model
All results were computed on a 400-MHz Pentium II PC run- is ignored, so the rst symbol plotted for each algorithm is at
ning the Linux operating system. The CPU times stated below zero CPU time. Following this, a symbol is plotted for each it-
are intended to reect only the relative performance of the al- eration step of an algorithm: a Gauss-Newton step for GN and
gorithms. We emphasize that the intent of these tests was to MM, a conjugate gradients step for NLCG. It is immediately
compare the speed and accuracy of GN, MM and NLCG as evident from Figure 1 that, in both MM and NLCG, the pre-
minimization algorithms, not the quality of the inversion mod- conditioner enhances performance signicantly, especially in
els in a geophysical sense. the case of MM. With preconditioning, MM and NLCG effec-
tively converge to a nal result in less than one minute of CPU
Examples with synthetic data time, while without preconditioning, they are far from conver-
gence after a minute. We also infer from the spacing between
We generated synthetic data by applying a 2-D MT forward symbols that preconditioning does not add signicantly to the
modeling algorithm to specied models of the earths resistiv- amount of computation in either algorithm. Henceforth, we
ity and perturbing the results with random noise. The forward will consider MM and NLCG only with preconditioning.
modeling algorithm we used for this purpose was intention- Next, we compare algorithms MM, NLCG and GN. We see
ally different from that used in our inversion algorithms. Syn- from Figure 1 that GN, like MM and NLCG, effectively con-
verges in less than one minute of CPU time. However, the rates
Table 1. How ags are set in algorithms MM and NLCG. of convergence differ amongst the algorithms. MM and NLCG
reduce the objective function in the early stages of minimiza-
Event MM NLCG tion at a noticeably faster rate than GN. This is quantied in
new ref=1 Every K th update Every update Table 2, which gives the amount of CPU time expended by each
new dir=1 Every update When line search algorithm to achieve various values of the objective function,
converges or fails determined by interpolating between iteration steps. Values of
steep=1 Every K th update When line search fails are referenced to the smallest value achieved by any of the
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
182 Rodi and Mackie
algorithms (in this case GN), which is denoted min in the ta- We note that the number of steps until convergence and
ble. It is clear that MM and NLCG achieve each level of the the CPU time used per step differ markedly among the algo-
objective function, down to 1.05 min , much faster than GN, rithms (Figure 1). GN requires the fewest number of steps and
with MM being slightly faster than NLCG. In the later stages takes the longest for each step, whereas NLCG requires the
of minimization ( < 1.05 min ), NLCG becomes the most ef- most steps and is fastest per step. In MM and NLCG, the time
cient, reaching within 1% of the minimum in about 20% less per iteration step reects largely the number of forward prob-
CPU time than GN and 40% less than MM. lems (and pseudoforward problems) invoked. Given our input
Figure 2 displays one model from the model sequence gen- parameters, algorithm MM solves seven (i.e. 1 + 2K ) forward
erated by each of the three algorithms, i.e., the model yielding problems per Gauss-Newton step (six devoted to operations
the objective function value closest to 1.01 min . The images with the Jacobian matrix). NLCG solves three forward prob-
are truncated spatially to display the best resolved parameters; lems per line search step (two for Jacobian operations). Since
deeper blocks and those laterally away from the station array the stopping criterion for the line search was rather liberal
are not shown. The models from the different algorithms are ( = 0.003), all but the rst three line minimizations converged
clearly very similar. Each model differs (block by block over the in one step. (The rst three each required two steps.) GN solves
portion shown) from the best model generated (that yielding eight forward problems per Gauss-Newton step (seven to com-
= min ) by less than a factor of 1.3 in resistivity, or difference pute the Jacobian matrix), which is only one greater than MM.
of 0.1 in log10 . Models later in each inversion sequence are However, GN spends signicant CPU time creating and fac-
even closer to each other and to the best model. This conrms toring the Hessian matrix, which explains why its CPU time
numerically the premise of our formulation that it is the min- per Gauss-Newton step is so much larger than that of MM.
imization criterion, and not the minimization algorithm, that Also of interest in Figure 1 is the observation that MM had
determines the solution of the inverse problem. a larger initial reduction in the objective function than GN.
This difference must be due to the difference between us-
ing Levenberg-Marquardt damping and truncated iteration for
Table 2. CPU times versus objective function: rst synthetic modifying the Gauss-Newton model update. Since we did not
data set.
attempt to optimize the choice of in GN or K in MM, we note
/min (min = 443.82) this difference without drawing a general conclusion about the
merits of the two damping techniques.
2.0 1.5 1.2 1.1 1.05 1.02 1.01
GN 23 28 33 36 41 45 46
MM 9 12 14 21 31 48 61
NLCG 11 13 18 23 27 32 36
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 183
Model 2.The next experiment with synthetic data uses a Inversion models resulting from the second data set are
more complicated model and larger data set. The model repre- shown in Figure 4. In the case of GN and NLCG, the models
sents a block-faulted structure with a resistive unit exposed at are for = 1.01 min ; for MM, it is the last model generated
the surface of the up-thrown block. The down-thrown block has ( = 1.026 min ). As in the previous example, there is great
the resistive unit being overlaid by a more conductive surface similarity among the models, although noticeable differences
layer. The data set comprises complex TM and TE apparent occur in the conductive overburden as well as beneath its right
resistivities for 12 sites and ten frequencies between 0.0032 and edge (x 5, z > 10 km). In the distance and depth range shown,
100 Hz, giving a total of 480 data. The inversion model has 660 the maximum departure of the displayed GN and NLCG mod-
parameters corresponding to a 33 20 grid of blocks. The ini- els from the best model computed is a factor of 2 in resistivity,
tial model for each algorithm was a homogeneous halfspace of whereas for MM it is a factor of 5. For both GN and NLCG,
10 ohm-m. The variance matrix was the same as in the previous the departure drops to about 1.5 when reaches 1.005 min .
example, and the regularization parameter was set to 20.
The performance of the three inversion algorithms is pre-
sented in Figure 3 and Table 3. The algorithms differ in a similar Example with eld data
manner as in the previous example. In the beginning, the con-
Lastly, we demonstrate the various inversion algorithms on
jugate gradients-based algorithms (MM and NLCG) reduce
real MT data collected by P. Wannamaker in the Basin and
the objective function much faster than the Gauss-Newton al-
Range (Wannamaker et al., 1997). The data set comprises TM
gorithm, with MM noticeably faster than NLCG. In the later
complex apparent resistivities at 58 sites and 17 frequencies per
stages of minimization, MM exhibits a slow convergence rate
site, for a total of 1972 real-valued data. The inversion model
and is overtaken rst by NLCG and then by GN in reducing the
was parameterized with a 118 25 grid of blocks, yielding 2950
objective function. MM was halted after about 1000 s, at which
model parameters. Each algorithm was applied with a homo-
point was 2.6% larger than min (which again was achieved
geneous initial model with resistivity 100 ohm-m. The diago-
by GN); hence the dashes in the last two columns of Table 3.
nal elements of the variance matrix (V) were set equal to the
We note that only six of the iterative line searches performed
squares of the reported standard errors and the off-diagonal
by NLCG took more than a single step, ve taking two steps
ones were set to zero. The regularization parameter was chosen
and one taking three.
as 8. The results are presented in Figures 57 and Table 4.
Looking at Figure 5, it is clear that NLCG and MM perform
Table 3. CPU times versus objective function: second syn- vastly better than GN on this real data set. NLCG achieved the
thetic data set.
/min (min = 1890.7)
2.0 1.5 1.2 1.1 1.05 1.02 1.01
GN 125 139 162 180 222 353 531
MM 47 67 114 201 404
NLCG 51 61 82 109 150 229 296
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
184 Rodi and Mackie
smallest among the algorithms in roughly the same amount (x 60 km), which are difcult to see since the color scale cov-
of time needed for one step of GN. GN took over 3 CPU hours ers almost a factor of 10 000 in resistivity. Otherwise the mod-
to reach within 10% of this value (Table 4), and reached only els are very similar. The maximum discrepancy from the model
within 4.4% of min when it was halted after about 7 hours. yielding = min is about a factor of 4 for the GN model and
These results demonstrate the poor scalability of algorithm a factor of 2 for the others.
GN with problem size. In this problem, GN solves 59 forward
problems per Gauss-Newton step (compared to seven for MM)
and must factor a 2950 2950 matrix (the damped Hessian).
The computer memory requirements are also extensive as the
Jacobian matrix contains 5.8 million (real) elements and the
Hessian 8.7 million elements. MM and NLCG, on the other
hand, require only several vectors of length 2950.
Figure 6 replots the MM and NLCG results on an expanded
time scale so that the performance of these conjugate gradients-
based algorithms can be compared. We see the same pattern
as in the synthetic data examples, only this time MM performs
even more favorably than NLCG in the early stages of mini-
mization. NLCG shows faster convergence at the later stages,
overtaking MM when is between 1.2 and 1.1 of the mini-
mum (Table 4). All but seven of the line searches in NLCG
converged in a single step, and only the rst took as many as
three steps.
The MM and NLCG inversion models in Figure 7 yield
= 1.01 min , whereas the GN model yields = 1.044 min .
There are some signicant differences between the GN model
and the others in a vertical band near the rightmost station
Table 4. CPU times versus objective function: Basin and FIG. 6. The results of Figure 5 for algorithms MM and NLCG
Range data set. shown on an expanded time scale.
/min (min = 9408.9)
2.0 1.5 1.2 1.1 1.05 1.02 1.01
GN 5143 6216 8245 11343 21608
MM 65 111 288 501 731 1232 1751
NLCG 158 224 342 425 536 712 827
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 185
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
186 Rodi and Mackie
A land-locked view of the conductivity structure under the Pacic Rodi, W. L., 1976, A technique for improving the accuracy of nite
Ocean: Geophys. J., 95, 181194. element solutions for magnetotelluric data: Geophys. J. Roy. Astr.
Mackie, R. L., and Madden, T. R., 1993, Three-dimensional magne- Soc., 44, 483506.
totelluric inversion using conjugate gradients: Geophys. J. Internat., 1989, Regularization and Backus-Gilbert estimation in nonlin-
115, 215229. ear inverse problems: Application to magnetotellurics and surface
Madden, T. R., 1972, Transmission systems and network analogies to waves: Ph.D. thesis, Pennsylvania State Univ.
geophysical forward and inverse problems: ONR Technical Report Shi, W., Rodi, W., Mackie, R. L., and Zhang, J., 1996, 3-D d.c. electrical
72-3. resistivity inversion with application to a contamination site in the
Madden, T. R., and Mackie, R. L., 1989, Three-dimensional magne- Aberjona watershed: Proceedings from Symposium on the Appli-
totelluric modeling and inversion, Proc. IEEE, 77, 318333. cation of Geophysics to Environmental and Engineering Problems
Marquardt, D. W., 1963, An algorithm for least-squares estimation of (SAGEEP): Environmental and Engineering Geophys. Soc., 1257
nonlinear parameters, J. Soc. Indust. Appl. Math, 11, 431441. 1267.
Matarese, J. R., 1993, Nonlinear traveltime tomography: Ph.D. thesis, Smith, J. T., and Booker, J. R., 1988, Magnetotelluric inversion for
Massachusetts Institute of Technology. minimum structure: Geophysics, 53, 15651576.
Matarese, J. R., and Rodi, W. L., 1991, Nonlinear traveltime inver- 1991, Rapid inversion of two- and three-dimensional magne-
sion of cross-well seismics: a minimum structure approach: 61st totelluric data: J. Geophys. Res., 96, 39053922.
Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 917 Swift, C. M., Jr., 1971, Theoretical magnetotelluric and Turam response
921. from two-dimensional inhomogeneities: Geophysics, 36, 3852.
McGillivray, P. R., and Oldenburg, D. W., 1990, Methods for calculat- Tarantola, A., 1987, Inverse problem theory: Elsevier.
ing Frechet derivatives and sensitivities for the non-linear inverse Tikhonov, A. N., and Arsenin, V. Y., 1977, Solutions of ill-posed prob-
problem: A comparative study, Geophys. Prosp., 38, 499524. lems: V. H. Winston and Sons.
Newman, G., 1995, Crosswell electromagnetic inversion using integral Thompson, D. R., 1993, Nonlinear waveform tomography: Theory and
and differential equations: Geophysics, 60, 899911. application to crosshole seismic data: Ph.D. thesis, Massachusetts
Newman, G. A., and Alumbaugh, D. L., 1997, Three-dimensional mas- Institute of Technology.
sively parallel electromagnetic inversionI. Theory: Geophys. J. In- Wannamaker, P. E., Johnston, J. J., Stodt, J. A., and Booker, J. R.,
ternat., 128, 345354. 1997, Anatomy of the southern Cordilleran hingeline, Utah and
Oldenburg, D. W., McGillivray, P. R., and Ellis, R. G., 1993, General- Nevada, from deep electrical resistivity proling: Geophysics, 62,
ized subspace methods for large scale inverse problems: Geophys. J. 10691086.
Internat., 114, 1220. Wannamaker, P. E., Stodt, J. A., and Rijo, L., 1986, Two-dimensional
Polak, E., 1971, Computational methods in optimization: A unied topographic responses in magnetotellurics modeled using nite ele-
approach: Academic Press. ments: Geophysics, 51, 21312144.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P., Wu, F. T., 1968, The inverse problem of magnetotelluric sounding:
1992, Numerical recipes in FORTRAN: The art of scientic com- Geophysics, 33, 972979.
puting, 2nd ed.: Cambridge Univ. Press. Zhang, J., Mackie, R. L., and Madden, T. R., 1995, 3-D resistivity for-
Reiter, D. T., and Rodi, W., 1996, Nonlinear waveform tomography ward modeling and inversion using conjugate gradients: Geophysics,
applied to crosshole seismic data: Geophysics, 61, 902913. 60, 13131325.
APPENDIX
JACOBIAN COMPUTATIONS
The Gauss-Newton method (algorithm GN) requires the with the complex matrix A being the Jacobian of F:
computation of each element of the Jacobian matrix, A. The
Mackie-Madden algorithm (MM) and nonlinear conjugate gra- Ai j (m) = j F i (m).
dients (NLCG), in contrast, employ A only in the computation
of quantities Ap and AT q for specic vectors p and q [e.g., We also have
equations (23) and (25)]. This Appendix describes algorithms Ap = Re EAp
for the computation of A, Ap, and AT q.
To begin, since each datum is the real or imaginary part of a AT q = Re AT ET q.
complex quantity, we will convert our problem to one involving
Our task translates to nding A, Ap, and AT q where q = ET q.
complex variables. Let d be a complex vector such that each
To specify F, it is convenient to consider all frequencies and
element of d is the real or imaginary part of a unique element
polarizations involved in the data vector d simultaneously. Let
of d:
v be a vector comprising the parameterized E x and/or Hx elds
d = Re Ed for all frequencies, and let the linear equation
where F is a complex function. It follows that where the vectors ai and bi are chosen to extract from v the
relevant eld averages for the polarization, frequency, and ob-
A = Re EA servation site associated with the ith complex datum.
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 187
Computation of A However, this last statement does not take into account the
particular structure of the matrix K and vectors ai and bi for
We consider the computation of A using two methods de- 2-D magnetotellurics. K has a block-diagonal structure with
scribed by Rodi (1976). Differentiating equation (A-2), each block corresponding to one polarization and frequency
ij ij combination. Furthermore, the nonzero elements of ai and bi ,
Ai j = A1 + A2 , (A-3) for any given i, are all associated with a common partition of v
where (since one 2-D MT datum conventionally involves only a single
T polarization and frequency). Therefore, only one block of each
ij 2 2 pseudoforward problem in equation (A-7) needs to be solved
A1 = T
j ai T j bi v
ai v bi v and, what is more, we may choose between the rst and second
methods independently for each polarization/frequency pair
and in computing its partition of A2 . The rst (second) method
ij is more efcient when the number of data for that polariza-
A2 = ciT j v, (A-4) tion/frequency is larger (smaller) than the number of model
parameters.
where the vector ci is dened by
2 2
ci = ai bi . Computation of Ap and AT q
aiT v biT v
From equation (A-3), we have
The matrix A1 accounts for the dependence of app on m
through the vectors ai and bi . The matrix A2 accounts for the Ap = A1 p + A2 p
dependence of v on m. We assume the vectors ai and bi and AT q = AT1 q + AT2 q.
their partial derivatives can be computed with closed-form ex-
pressions so that A1 can also be computed with such. We turn Again, we assume the rst term of each expression can be com-
to the more difcult task of computing A2 . puted explicitly and we turn our attention to the second terms.
From equation (A-1), we can infer The algorithm of Mackie and Madden (1993) for A2 p may
be derived as follows. From equation (A-4), we have
K j v = j s ( j K)v, j = 1, 2, . . . , M. (A-5) ij
A2 p j = ciT t, (A-9)
Again, we assume that K, s, and their partial derivatives are j
known analytically. The rst method described by Rodi (1976)
is to solve these M pseudoforward problems for the vectors where the vector t is given by
j v and substitute them into equation (A-4).
t= p j j v.
The second method of Rodi (1976) exploits the reciprocity
j
property of the forward problem, i.e., the symmetry of K. Solv-
ing equation (A-5) and plugging into equation (A-4), we get From equation (A-5), it is clear that t satises
A2 = ciT K1 ( j s ( j K)v).
ij
(A-6) Kt = p j ( j s ( j K)v). (A-10)
j
Let the vectors ui satisfy
The algorithm for A2 p is to solve the single forward problem,
Kui = ci , i = 1, 2, . . . , N . (A-7) equation (A-10), for t and then evaluate equation (A-9).
The Mackie-Madden method for AT2 q can be derived simi-
Given the symmetry of K, we can then write equation (A-6) as larly. From equation (A-8), we have
ij
A2 = uiT ( j s ( j K)v). (A-8) ij
q i A2 = rT ( j s ( j K)v), (A-11)
The second method is to solve equations (A-7) and then eval- i
uate equation (A-8). where we dene the vector r by
The matrices j K are very sparse since K is sparse and each
of its elements depends on only a few of the m j . The vectors
r= q i ui .
j s, ai , and bi are likewise sparse, or zero. Therefore, in ei-
i
ther method, construction of the right-hand-side vectors for
the pseudoforward problems [equations (A-5) or (A-7)] and From equation (A-7), r satises
ij
evaluation of the expression for A2 [equations (A-4) or (A-8)]
Kr = q i ci . (A-12)
take relatively little computation. The major computational ef-
i
fort in either method is in solving the appropriate set of pseud-
oforward problems: equations (A-5) or (A-7). For this reason, The algorithm for AT q is to solve equation (A-12) and substi-
the rst method [equations (A-4) and (A-5)] is more efcient tute into equation (A-11).
when N > M (more data than model parameters) while the sec- The major computation in each of these algorithms is the so-
ond, reciprocity method [equations (A-7) and (A-8)] is more lution of one pseudoforward problem: for r in equation (A-12)
efcient when M > N . or t in equation (A-10).
Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/