additional contributions from
Sathish K. Pullammanappallil and Li Li
Each ray produces a seismogram. We arrange seismograms from adjacent
rays as strips in an image, with time going down instead of around the
drum, and the amplitude of the stylus swing coded as a color.
Identifiable ray phases (e.g. a reflection) have lateral coherence in
the images.
Basic seismic tomography uses the picked travel times of coherent phases
to interpret the rock property of seismic velocity. Velocity distributions
can be related to geologic environments. Diffraction tomography can locate
seismically reflective structures in the crust.
Tomography is a model-building process, controlled by synthetic times
forward-projected through proposed velocity models and compared with
the observed times. Linearized, least-squares approaches to this have
been very successful in the earth's deep interior. However, the crust
is much more heterogeneous.
Monte-Carlo optimization
Linearized inversions often return false results in crustal tomography
for velocity modeling from time data.
As a matrix inverse problem, seismic tomography exhibits singularity
and ill-conditioning. However, the forward-projection problem for times
from velocity models is now deterministic and very fast on common
computers. This motivates Monte-Carlo optimization solutions requiring
only forward-projection and calculation of errors.
A Monte-Carlo optimization must effectively sample the model space to produce a low-error model. But model spaces in seismic tomography are huge. A typical image will have at least 100 x 100 blocks (pixels), each of which could take one of 20 distinguishable velocity values (given the property's range and precision). So the model space is 10,000-dimensional, and the number of different models is 10 to the 80th power, larger than the number of atoms in the universe.
To guide the optimization, we begin with models representing simple geologic options, and then test only variations to these models. The variation is guided by an analogue to some natural process. With a genetic algorithm or simulated annealing, we can find a good model after testing only as many as the dimensionality of the model space, instead of as many as its size.
A population of 200 models is interbred and tested during each generation.
100 or fewer generations yield a low-error sub-population for a
two-dimensional tomogram with a model space dimensionality up to 100,000.
Below is the synthetic ``true'' model, and its perfect data fits.
The optimization below fits low-weighted time data points (small points)
poorly, concentrating on the highly-weighted data.
Each parameter of this result is the average over the low-error sub-population
of the final generation.
This solution would yield the same geologic interpretation as the ``true''
model.
The GA is highly parallelizable. However, for 3-d tomograms where the
dimensionality of the model space can be in the millions, it is possible that
either the population size or the number of generations required for a
low-error result is too large.
Simulated annealing models the cooling of crystals from a melt. In the analogue process the alignment and movement of atoms depends strongly on the temperature. From high T and a great variety of possible states a crystal gradually cools to low T, at a single low-energy (i.e. low-error) state.
During SA a working model is randomly modified between each iteration of
testing against the data. Each modification is constrained, so is unlikely
to wipe out all previous changes. Each proposed model is tested, and only
accepted under the conditions outlined by SA.
A new model with lower error is always accepted.

A new model having higher error can be accepted conditionally, with greater
errors accepted exponentially less. The acceptance depends on the
empirical parameters T and q.
We also exponentially decrease T as the iterations progress, with a cusp
at a critical temperature. So the variation in error also decreases as we
approach the low-error solution.
We estimate the critical temperature and q by looking at the error left
by short runs (having a number of iterations less than the square root
of the model space dimension).
As seismic rays never sample all of our models, the singularity found by
linearized inversions is expressed in SA as indeterminance among the
low-error models. We average this set to estimate what is known, and
its standard deviation estimates the degree of indeterminance.
At left is a matrix equation representing an iterative,
linearized seismic reflection-time
tomography problem, where m contains the model, R the data residual, and
G the partials relating the two. The lower two rows are smoothness
constraints on the model that do allow this to be solved by singular-value
decomposition.
The linearized solution is never able to jump to a higher-error model.
Thus a correct global solution depends on starting at an initial model
that is close enough to the global solution.
Otherwise the linearized method will fall into the nearest local error
minimum.
Most seismic tomography problems are exploratory in purpose, so choosing
an initial model is difficult and would add bias.
For SA a correct solution at the global error minimum does not depend on
any initial model. Early in the annealing any order is destroyed at high
temperature.
The linearized solution also expresses indeterminance as lack of constraint,
exhibited by the unphysically high velocity.
SA's guided model alterations are easily constrained by any available
geologic criteria.
Vertical travel times across the solutions above are an example of
a constraint from a separate data set that can be applied easily to
a SA process.
SA performs well for this highly non-linear simultaneous inversion for
the location of a reflecting structure and the velocity above it.
The linearized method requires too much smoothing to accurately
locate the change in depth.
Because of the time required to compute the partial derivatives G,
the linearized method is no faster than the SA.
A. Tarantola derived the posteriori probability density function for
seismic tomography. It says that the probability of a model having a
large data error is low, and represents the projection of the data
probability density function into the model space.
This function will have many peaks (for each local error minimum).
A linearized solution requires imposing a PDF from prior knowledge
(rho) that dominates the product enough to force interations into
the global minimum. Often such procedures just converge to the
prior knowledge.

Note how, as a negative exponential of data error, the conditional
probability controlling SA has the form of the posteriori PDF.
This also means SA is preferentially sampling the model space near
the error minima.
Severe ray distortions at the faulted walls of the basin required optimizing
a 50 m resolution 2-d velocity model with 500,000 SA iterations.
The SA optimized simultaneously for the location of the fault reflector
and the velocities. The location and length of the fault were allowed to
take any of a wide range of values. The data used were time picks of the
identifiable fault reflection in a set of several thousand seismograms.
SA found virtually the same velocities and reflector locations despite
wildly different initial models. The fault location proves the Garlock
extends below the basin and southward under the Mojave. Tectonically
detached plateaus have been identified worldwide and are sometimes
called ``tectonic flakes.''
The Hosgri fault
The Hosgri fault runs just offshore of Lompoc, Calif. Locating and characterizing
its history of motion is important to estimating the degree and type
of seismic hazard that exists in the area. A marine seismic survey provides
reflections from the fault, but they can't be properly imaged without
resolving the large velocity contrast that also exists there.
Our SA technique was able to compute a velocity model with 25 m resolution
over a 13.7 by 4 km section, an 87,000-dimensional model space.
250,000 iterations required a month and a half of clock time on a Sun
Sparc-10.
Above is an ultrasound-type diffraction image computed without knowing
the lateral velocity variations. This standard analysis preferentially
yields near-horizontal structure, such as the details of the sedimentary
layers. Since these layers have been drilled, sampled, and dated, their
configuration is useful in constructing the geologic history of the Hosgri.
A complete diffraction image accounting for the optimized velocity model
images the near-vertical fault and the sediment layers equally well.
The fault is shown to dip east under the Santa Maria Valley, although
recent thrusting motions (such as in the 1994 Northridge earthquake)
are constrained to a minimal amount.



As we incorporate additional data, we are able to push the constrained
depth of the Garlock velocity models deeper.
Using direct ray times, top, only resolves velocities in the upper 2 km.
Reflected ray times, middle, gets velocities to 4 km but only in part of the model.
Incorporating coherency, bottom, allows constraint over a much larger area of
the section.
Resulting Garlock reflector images:
a) and b) from simple velocities constrained only by direct rays.
c) from a complete coherency plus direct ray optimization, finding the greatest overall coherency.
d) from a reflected-ray time optimization, focused strongly on the picked
reflector.
WWW viewers can access this presentation on the Internet at
http://www.seismo.unr.edu/ftp/pub/louie/talks/math/math95.html