Monte-Carlo Alternatives
to Linearized Inversion
for Earth Imaging

seminar by
John N. Louie

additional contributions from
Sathish K. Pullammanappallil and Li Li



Seismic Data


As in x-ray tomograms done in a hospital, tomograms of the earth can be constructed by passing rays of seismic energy between a source and a receiver. Active and passive (earthquake) sources can be used. Rays can be direct or bounce off reflectors. Unlike x-rays, in the earth seismic rays bend and diffract significantly.

Each ray produces a seismogram. We arrange seismograms from adjacent rays as strips in an image, with time going down instead of around the drum, and the amplitude of the stylus swing coded as a color. Identifiable ray phases (e.g. a reflection) have lateral coherence in the images.

Basic seismic tomography uses the picked travel times of coherent phases to interpret the rock property of seismic velocity. Velocity distributions can be related to geologic environments. Diffraction tomography can locate seismically reflective structures in the crust.

Tomography is a model-building process, controlled by synthetic times forward-projected through proposed velocity models and compared with the observed times. Linearized, least-squares approaches to this have been very successful in the earth's deep interior. However, the crust is much more heterogeneous.

Monte-Carlo optimization

Linearized inversions often return false results in crustal tomography for velocity modeling from time data. As a matrix inverse problem, seismic tomography exhibits singularity and ill-conditioning. However, the forward-projection problem for times from velocity models is now deterministic and very fast on common computers. This motivates Monte-Carlo optimization solutions requiring only forward-projection and calculation of errors.

A Monte-Carlo optimization must effectively sample the model space to produce a low-error model. But model spaces in seismic tomography are huge. A typical image will have at least 100 x 100 blocks (pixels), each of which could take one of 20 distinguishable velocity values (given the property's range and precision). So the model space is 10,000-dimensional, and the number of different models is 10 to the 80th power, larger than the number of atoms in the universe.

To guide the optimization, we begin with models representing simple geologic options, and then test only variations to these models. The variation is guided by an analogue to some natural process. With a genetic algorithm or simulated annealing, we can find a good model after testing only as many as the dimensionality of the model space, instead of as many as its size.


A Genetic Algorithm (GA)

Li Li and Sushil Louis have developed a genetic algorithm for seismic tomography that uses the absolute value instead of least-squares's RMS for finding error.

A population of 200 models is interbred and tested during each generation. 100 or fewer generations yield a low-error sub-population for a two-dimensional tomogram with a model space dimensionality up to 100,000. Below is the synthetic ``true'' model, and its perfect data fits.

The optimization below fits low-weighted time data points (small points) poorly, concentrating on the highly-weighted data. Each parameter of this result is the average over the low-error sub-population of the final generation. This solution would yield the same geologic interpretation as the ``true'' model.

The GA is highly parallelizable. However, for 3-d tomograms where the dimensionality of the model space can be in the millions, it is possible that either the population size or the number of generations required for a low-error result is too large.


Simulated Annealing (SA)

Simulated annealing considers one model at a time serially, instead of many in parallel in a population. Although many iterations are needed, SA can sometimes produce a ``close'' result faster than the GA.

Simulated annealing models the cooling of crystals from a melt. In the analogue process the alignment and movement of atoms depends strongly on the temperature. From high T and a great variety of possible states a crystal gradually cools to low T, at a single low-energy (i.e. low-error) state.

During SA a working model is randomly modified between each iteration of testing against the data. Each modification is constrained, so is unlikely to wipe out all previous changes. Each proposed model is tested, and only accepted under the conditions outlined by SA.
A new model with lower error is always accepted.
A new model having higher error can be accepted conditionally, with greater errors accepted exponentially less. The acceptance depends on the empirical parameters T and q.

We also exponentially decrease T as the iterations progress, with a cusp at a critical temperature. So the variation in error also decreases as we approach the low-error solution.

We estimate the critical temperature and q by looking at the error left by short runs (having a number of iterations less than the square root of the model space dimension).
As seismic rays never sample all of our models, the singularity found by linearized inversions is expressed in SA as indeterminance among the low-error models. We average this set to estimate what is known, and its standard deviation estimates the degree of indeterminance.


Comparison with Linearized Inversion

At left is a matrix equation representing an iterative, linearized seismic reflection-time tomography problem, where m contains the model, R the data residual, and G the partials relating the two. The lower two rows are smoothness constraints on the model that do allow this to be solved by singular-value decomposition.
The linearized solution is never able to jump to a higher-error model. Thus a correct global solution depends on starting at an initial model that is close enough to the global solution. Otherwise the linearized method will fall into the nearest local error minimum. Most seismic tomography problems are exploratory in purpose, so choosing an initial model is difficult and would add bias.

For SA a correct solution at the global error minimum does not depend on any initial model. Early in the annealing any order is destroyed at high temperature.

The linearized solution also expresses indeterminance as lack of constraint, exhibited by the unphysically high velocity. SA's guided model alterations are easily constrained by any available geologic criteria.

Vertical travel times across the solutions above are an example of a constraint from a separate data set that can be applied easily to a SA process.
SA performs well for this highly non-linear simultaneous inversion for the location of a reflecting structure and the velocity above it. The linearized method requires too much smoothing to accurately locate the change in depth.

Because of the time required to compute the partial derivatives G, the linearized method is no faster than the SA.

A. Tarantola derived the posteriori probability density function for seismic tomography. It says that the probability of a model having a large data error is low, and represents the projection of the data probability density function into the model space. This function will have many peaks (for each local error minimum). A linearized solution requires imposing a PDF from prior knowledge (rho) that dominates the product enough to force interations into the global minimum. Often such procedures just converge to the prior knowledge.

Note how, as a negative exponential of data error, the conditional probability controlling SA has the form of the posteriori PDF. This also means SA is preferentially sampling the model space near the error minima.


Crustal Imaging

The bending and diffraction of seismic rays is particularly severe near earthquake faults. Faults can juxtapose geologic terranes having very different ages and origins. For seismic data sets crossing the Garlock and Hosgri faults we estimate velocity variations with SA. Having such unusually detailed velocity models, we can properly account for ray bending and construct diffraction tomograms, which are large-scale ultrasound scans.

The Garlock fault

Halfway between Reno and Los Angeles, the Garlock fault controls deformation of the Mojave Desert plateau. Finding that the Garlock dips below the Mojave suggested the Mojave is much more active than previously thought. This suggestion was reinforced by the 1992 M7.4 Landers earthquake.

A survey crosses the Garlock where its trace steps left at a pull-apart basin. Gravity measurements and wildcat oil drilling in the 1950s constrain the low-velocity sedimentary basin. We use the 96 receivers recorded at each of 150 source points along a N-S road crossing two branches of the Garlock.
Severe ray distortions at the faulted walls of the basin required optimizing a 50 m resolution 2-d velocity model with 500,000 SA iterations.
The SA optimized simultaneously for the location of the fault reflector and the velocities. The location and length of the fault were allowed to take any of a wide range of values. The data used were time picks of the identifiable fault reflection in a set of several thousand seismograms.

SA found virtually the same velocities and reflector locations despite wildly different initial models. The fault location proves the Garlock extends below the basin and southward under the Mojave. Tectonically detached plateaus have been identified worldwide and are sometimes called ``tectonic flakes.''

The Hosgri fault

The Hosgri fault runs just offshore of Lompoc, Calif. Locating and characterizing its history of motion is important to estimating the degree and type of seismic hazard that exists in the area. A marine seismic survey provides reflections from the fault, but they can't be properly imaged without resolving the large velocity contrast that also exists there.

Our SA technique was able to compute a velocity model with 25 m resolution over a 13.7 by 4 km section, an 87,000-dimensional model space. 250,000 iterations required a month and a half of clock time on a Sun Sparc-10.


Above is an ultrasound-type diffraction image computed without knowing the lateral velocity variations. This standard analysis preferentially yields near-horizontal structure, such as the details of the sedimentary layers. Since these layers have been drilled, sampled, and dated, their configuration is useful in constructing the geologic history of the Hosgri.

A complete diffraction image accounting for the optimized velocity model images the near-vertical fault and the sediment layers equally well. The fault is shown to dip east under the Santa Maria Valley, although recent thrusting motions (such as in the 1994 Northridge earthquake) are constrained to a minimal amount.

Coherency imaging

Monte-Carlo optimizations such as SA involve only forward modeling, so it is easy to add constraints, or even optimize different data in parallel. All the work above simply used travel times picked from direct or reflected seismic rays. But picking depends on being able to observe a phase that is conherent from trace to trace across a data image.

We use the equation above to quantitatively test for coherency, allowing us to optimize with the objective of maximizing coherency along our synthetic travel times. With this criterion we can avoid manually picking the times.

We develop a conditional probability based on maximizing the coherency, but we may concurrently want to minimize the errors in matching any picked times.

A new model is accepted if coherency has increased and time error decreased. If it is worse in only one aspect, that conditional probability is applied. If is is worse in both aspects, the conditional probability is concatenated as above.

As we incorporate additional data, we are able to push the constrained depth of the Garlock velocity models deeper.

Using direct ray times, top, only resolves velocities in the upper 2 km.

Reflected ray times, middle, gets velocities to 4 km but only in part of the model.

Incorporating coherency, bottom, allows constraint over a much larger area of the section.
Resulting Garlock reflector images:

a) and b) from simple velocities constrained only by direct rays.

c) from a complete coherency plus direct ray optimization, finding the greatest overall coherency.

d) from a reflected-ray time optimization, focused strongly on the picked reflector.


This work has been generously supported by the National Earthquake Hazards Reduction Programs of the National Science Foundation and the U.S. Geological Survey.

WWW viewers can access this presentation on the Internet at
http://www.seismo.unr.edu/ftp/pub/louie/talks/math/math95.html


Return To UNRSL Homepage