University of Nevada
Reno

Linearized and nonlinear travel time tomography for upper
crustal velocity structure of the western Great Basin

A dissertation submitted in partial fulfillment of the
requirements for the degree of Doctor of Philosophy,
in Geophysics

by

Abu Muhammad Asad

Dr. John N. Louie, Dissertation Advisor

May, 1998

Table of Contents

Dedication

To my parents, who sacrificed so much for my well-being,
wife, Tahmina Dolly whose loving company kept me going,
and son, Afif who always smiles my fatigue away.


Acknowledgments

``Journey in the Land, and then behold how
He originated creation''
The Quran

``Yet all experience is an arch wherethro'
Gleams that untravell'd world whose margin fades
For ever and for ever when I move.''
Alfred Tennyson

All human experience stems from inspiration and cooperation from others. The experience of going through the graduate school and reaching the end is no exception. Before mentioning individuals by name, I would like to appreciate all the unnamed people who inspired me all through my life.

I would like to express my heart-felt gratitude and appreciation to my dissertation advisor, Dr. John N. Louie whose unselfish support and positive attitude always helped me through my most difficult graduate days. His overall receptiveness to new ideas and willingness to examine them were always refreshing to me. I appreciate the research funding through the years that he provided me thus enabling me to complete graduate school. I can never appreciate John enough for his understanding of the human problems of his students.

I thank Drs. John Anderson, Jim Brune, Sushil Louis and Yehua Zeng for serving on my thesis committee. I thank all of them again for agreeing to let me defend my thesis in a very short notice.

Graduate classes that particularly helped me with the necessary foundation for my research were John Louie's ``Seismic Imaging'' and ``Geophysical Case Histories'' , Jim Brune's ``Surface Waves'', John Anderson's ``Inverse Problems'', Steve Wesnousky's ``Seismotectonics'', Chuck Langston's (Penn State) ``Seismology'' and Kevin Furlong's (Penn State) ``Geodynamics''. John Louie's classes taught me quite a bit of programming in C as a bonus.

I have to particularly thank Satish Pullammanappallil for all the instructive discussions I had with him at different times, and for helping me with his optimization code, and for being a nice friend. I appreciate the help I enjoyed at different times from Arturo Aburto, Rasool Anooshehpoor, Glenn Biasi, Sergio Chavez-Perez, Diane dePolo, Bill Honjas, Li Li, Serdar Ozalaybey, Baoping Shi, Gordon Shields and Ken Smith. I would thank Gayle Wilson especially for being very helpful whenever I needed it badly.

My life in Reno was never boring because of the intimate friend Dr. Eltagh Mirghani whose personality always impressed me and drew me even closer to him. I appreciate everything he did to make my and my family's life in Reno pleasant. Amr Yassin, Mehmet Kanoglu and Muniruzzaman Khan deserve special thanks for lending me their cars whenever I needed for them at late night. I also thank Aboudou Akambi for being a nice friend and giving me an opportunity to practice my French. My warm thanks also go for Humayun Shamim and Jamaluddin for being very helpful friends.

Outside Reno, Shameem and Sabina Siddiqui, Mahbub Ahmed and Nurul Kabir came forward with unselfish helping hands at different times.

Lastly, my wife Tahmina (Dolly) always tried to understand and appreciate the problems a graduate student has to live with. Her unfailing and soothing support kept me going on at times when I was drooping. I greatly apologize to her and my son Afif for the hardship my student life meant to them for years.


Abstract

In this dissertation, I use linearized and nonlinear travel time tomographic methods to estimate upper crustal velocity structure in the western Great Basin. The northern and southern fringes of this region have seen quite extensive seismic profiling and 2-dimensional velocity modeling works, while the central part, with the exception of Long Valley caldera, has remained like a gap in terms of upper crustal velocity modeling. I obtain an upper crustal Pg refractor velocity map to bridge this gap of information in the central part of the western Great Basin.

Vis-a-vis the dilemma of the linearized methods' being closely dependent on a priori information and the fully nonlinear methods' being oftentimes computationally intractable, I adopt a compromise between these two using different combinations of both as the problem at hand warrants. I use a fully linearized least squares method of travel time inversion for hypocenters and local three-dimensional velocity model in the Eureka Valley area, a conjugate-gradient based method for mapping the Pg refractor at a regional scale and a fully nonlinear method of optimization by simulated annealing for obtaining a starting velocity model in an area where no prior model exists.

In the first part of this dissertation, I use a tomographic scheme in order to invert about 200,000 Pg travel times for an upper crustal refractor velocity mapping in the western Great Basin. The results feature a north-northwest trending upper crustal low velocity corridor that connects the eastern California shear zone with the Furnace Creek-Fish Lake Valley fault. A positive correlation between Pg velocity and gravity along the dominant structural trend of the western Great Basin (north-northwest) has also been inferred.

The second part of the dissertation proposes an inversion methodology composed of an in-tandem combination of nonlinear simulated annealing optimization and linearized inversion in order to solve the problem of determination of hypocenter locations and a 3-d velocity model of the Eureka Valley area. The velocity model derived adequately describes broad structural features such as a sedimentary basin at the local scale. The hypocenter locations, however, are a little scattered.

Finally, I do elaborate two-dimensional synthetic tests to determine an efficient definition of the critical temperature to be used in the cooling schedule of simulated annealing optimization for determination of source locations and velocities. I define the critical temperature on the triple criteria of change of shape of the error-versus-iteration curve for short runs at different temperatures, average least square error as a function of temperature, and number of accepted models as a function of temperature. I also propose a heuristic combination of the conventional ``acceptance'' criterion and a ``rejection'' criterion for probabilistic acceptance of models. Tests for separate velocity model and source location determinations proved promising.


Table of Contents

Dedication
Acknowledgments
Abstract

1. General Introduction

      1.1 Background
      1.2 Organization

2. Upper crustal velocity structure of the western Great Basin by Pg
      tomography of earthquake travel time data

      2.1 Introduction
             Previous works
             Geology
      2.2 Method
      2.3 Data compilation and analysis
      2.4 Results and discussion
             Error estimation
             Resolution tests
             Central part of the model
             Long Valley caldera
             Northern part of the model
             Comparison with gravity and topography
      2.5 Conclusions

3. Travel time inversion for earthquake locations and three-dimensional
      velocity structure in the Eureka Valley area, eastern california

      3.1 Introduction
              Geology
      3.2 Data
      3.3 Method
              Nonlinear optimization
              Minimum 1-D model estimation by VELEST
              Linearized 3-D inversion
              Implementation
      3.4 Results and discussion
              Hypocenter locations
              Velocity model
      3.5 Conclusions

4. On critical temperature, perturbation scheme, and probabilistic acceptance
      of models in simulated annealing optimization for velocity and source
      location estimation

      4.1 Introduction
      4.2 Synthetic model and data
      4.3 Perturbation schemes
      4.4 Determination of the critical temperature
                  Velocity model perturbation only
                  Source location perturbation only
                  Simultaneous perturbation of the velocity model
                   and source locations
      4.5 Probabilistic acceptance of models
      4.6 Results
                  Velocity model determination only
                  Source location determination only
                  Simultaneous determination of the velocity model
                   and source locations
      4.7 Conclusions

5. Conclusions and recommendations
      5.1 General conclusions
      5.2 Recommendations

References

Appendix A: Nature of Pg waves

Chapter 1
General introduction

1.1 Background

The central problem dealt with in this dissertation is that of tomographic estimation of upper-crustal seismic velocities from earthquake travel time data in the Western Great Basin. Knowledge of the velocity field is crucial in interpreting rock behavior from local to global scales. Subsurface seismic velocity can give information of varying precision on rock and mineral type, density, state of compaction, porosity, fracturing, fluid content, pressure and temperature (Schon, 1996).

Use of upper crustal velocity models for determination of sedimentary basin geometry is a standard interpretation tool, along with, or independent of, gravity modeling. A seismic velocity model can efficiently demarcate sedimentary basins from the underlying basement (Hauksson and Haase, 1997; Pullammanappallil and Louie, 1997; Eberhart-Phillips, 1990). Sedimentary rocks, with the exception of salt and evaporites, have significantly lower compressional and shear wave velocities than the igneous and metamorphic rocks they overlie. A two-dimensional velocity section derived for the upper 10 km of the crust can reveal the outline of the base of many sedimentary basins. A three-dimensional velocity model can picture the overall geometry of the basin floor.

One of the most important uses of seismic velocity models in geophysics is in the rather difficult task of relating measurements in the time domain to the subsurface geology in the space domain. A seismic velocity model occupies a pivotal place in the time-to-depth conversion of time domain data. In seismic data processing, the most sophisticated tools of time series analysis and migration techniques cannot compensate for an incorrect velocity model. Placement of impedance discontinuities, whether they are layer boundaries, faults or unconformities, in their true spatial positions depends critically on the velocity model used for migration (Claerbout, 1985; Yilmaz, 1987).

Any earthquake location program incorporates a seismic velocity model representative of the area. The more accurately the velocity model describes the subsurface, the more precise the earthquake locations are expected to be; and precise locations of hypocenters are essential for proper tectonic interpretation of the earthquakes (Nelson and Vidale, 1990; Eberhart-Phillips, 1990).

Velocity modeling has also helped locate magma chambers and estimate their size and fraction of melt within them (Iyer and Dawson, 1993; Toomey and Foulger, 1989). Vp / Vs ratio can be very helpful in characterizing temperature conditions in these cases.

Mapping of a more or less widely present refractor (e.g., upper crustal Pg refractor) velocity can put major crustal faults into feature by the velocity contrast across the fault. Hearn and Clayton (1986) image the trend of the San Andreas fault from a Pg velocity model obtained by travel time tomography.

Methods of tomographic velocity modeling in practice vary widely depending on the nature of the application they are designed for, the computational facilities available, extent of linearization etc. Data type (e.g., travel time, waveform, body wave, surface wave, first arrival etc.) and scale of the model (e.g., local, regional, global) are two very important factors that determine which method to adopt for velocity modeling. Presence or absence of significant prior information also plays a role. In areas where previous works have revealed at least some gross features of the model, a linearized method can very easily be justified. But in virgin areas, from a seismic velocity modeling point of view, the absence of utilizable and dependable prior information understandably warrant as little linearization as is absolutely needed by some other factors, such as limitation of computational facilities.

The dilemma between the inadequacy of the prevalent linearized methods in preserving the nonlinear nature of the problem of travel time inversion for velocities on one hand, and the excessive computational effort demanded by fully nonlinear methods on the other motivated me to search for a compromise. In this dissertation, I use a fully linearized (small perturbations with respect to an initial model) least squares method (Thurber, 1983; Eberhart-Phillips, 1990) of travel time inversion for hypocenters and local three-dimensional velocity model, a conjugate-gradient based method for mapping the Pg refractor at a regional scale (Hearn and Ni, 1994; Nolet and Sneider, 1990; Paige and Saunders, 1982); and a fully nonlinear method of optimization by simulated annealing (Pullammanappallil, 1994) for obtaining a starting velocity model in an area where no prior model exists.

1.2 Organization

Chapters 2, 3 and 4 describe works done for the tomographic estimation of velocity structure and source locations using different combinations of linearized and nonlinear methods.

In Chapter 2, I map an upper upper crustal refraction velocity of the western Great Basin using the so-called Pg first arrival times. This Pg refractor velocity mapping is done with a view to bridging a gap in the central part of the western Great Basin in terms of velocity modeling work, compared to the quite extensive modeling works existing in the northern (Eaton, 1963; Prodehl, 1979; Allmendinger et al., 1987; Benz et al., 1990; Holbrook, 1990; Catchings and Mooney, 1991) and southern (Gibbs and Roller, 1966; Geist and Brocher, 1987; Serpa and de Voogd, 1987; Wernicke et al., 1996; Louie et al., 1997; Chavez-Perez et al., 1998) thirds of the region.

I use the tomographic scheme of Hearn and Ni (1994) in order to invert about 200,000 travel times from 15000 earthquakes occurring in the area and recorded by two regional seismic networks. Earthquakes contributing to the data set come from widely different areas such as the topographically elevated Long Valley Caldera-Mammoth Lakes, about one km lower southern Nevada, Owens Valley, and Mojave Desert. I made an elaborate selection and analysis of the preliminary data in order to discern any effect of the source areas. Eventually I treat all the selected Pg travel times in a combined tomographic scheme and obtain Pg velocity model for this region along with station delays.

My results accompany detailed error estimation using the resampling technique of ``bootstrapping,'' and resolution analysis using sinusoidal checkerboard tests. I also compare three Pg velocity profiles against profiles of Bouguer gravity and topography to find a negative correlation between Pg velocity and Basin and Range topography, and a positive correlation between Pg velocity and gravity along the dominant structural trend of the area (north-northwest). A more or less continuous curvilinear trend of low velocities appears to connect the eastern California shear zone with the Furnace Creek-Fish Lake Valley fault zones.

In Chapter 3, I propose an inversion methodology composed of an in-tandem combination of nonlinear simulated annealing optimization and linearized inversion in order to solve the problem of determination of hypocenter locations and a 3-d velocity model of the Eureka Valley area. This area has little prior information on velocity except for the regional one-dimensional model used by the University of Nevada, Reno Seismological Laboratory for routine preliminary location of earthquakes. I use the simulated annealing scheme of Pullammanappallil and Louie (1993), and the three-dimensional linearized inversion scheme of Thurber (1983), improved by Eberhart-Phillips (1990). The nonlinear optimization finds a realistic starting velocity model that is further fine-tuned in the linearized inversion that follows it. I use about 7000 P and S arrival times recorded at 8 portable and 20 permanent stations from the Eureka Valley earthquake sequence of May-June, 1993. The velocity model derived in Chapter 3 adequately describes broad structural features such as a sedimentary basin at the local scale. The hypocenter locations, however, are a little scattered.

Chapter 4 deals exclusively with two-dimensional synthetic tests to determine an efficient definition of the critical temperature to be used in the cooling schedule of simulated annealing optimization for determination of source locations and velocities. While previous workers define the critical temperature on the sole criterion of lowest average least square error (Pullammanappallil, 1994) or highest correlation (Basu and Frazer, 1990) in ``short runs,'' I define it on the triple criteria of change of shape of the error-versus-iteration curve for short runs at different temperatures, average least square error as a function of temperature, and number of accepted models as a function of temperature. I show that this definition can handle some ``pathological'' computational situations better than the other definition. I also propose a heuristic combination of the conventional ``acceptance'' criterion and a ``rejection'' criterion for probabilistic acceptance of models. This results in a substantially smaller number of probability computations and of models to be saved. Tests for separate velocity model and source location determinations proved promising.


Chapter 2
Upper crustal velocity structure of the western Great Basin by Pg tomography of earthquake travel time data

2.1 Introduction

Previous works

I investigate crustal velocity structure of the western Great Basin (WGB) using the so-called Pg first arrival times from two earthquake networks covering the region. The Pg wave samples the upper 10 km of the crust and can be used to derive important information about crustal extension, fracturing, and pressure and temperature conditions of the upper crust. The northern and southern boundaries of my region of study have been investigated in considerable detail by different geophysical and geological methods, but most of the central part has yet to get enough attention from researchers, except perhaps the Long Valley Caldera area. My objective is to obtain a map of the Pg velocity that would be able to bridge this gap of information in the central part of the western Great Basin.

The northern fringes of the Western Great Basin at around 40° latitude have been the subject of investigation from as early as the King survey of the nineteenth century. The first significant seismic study conducted in the recent past in the northwestern Basin and Range is the Fallon-Eureka refraction profile recorded by the U.S Geological Survey in the 1960s. Eaton (1963) and Prodehl (1979) interpreted this profile following different methodologies and obtaining significantly different results. Priestley and Brune (1978) conducted a surface- wave dispersion analysis in western Utah and Nevada. Priestley et al. (1982) interpreted Pn delay times in terms of varying crustal thickness of the western Great Basin. More recently, the Consortium for Continental Reflection Profiling (COCORP) (Knuepffer et al., 1987; Allmendinger et al., 1987), and the Program for Array Seismic Studies of the Continental Lithosphere (PASSCAL) (Benz et al., 1990; Holbrook, 1990; Catchings and Mooney, 1991) carried out seismic reflection and refraction studies to determine the crustal structure of the Basin and Range Province across one of the biggest areas of Cenozoic extension, roughly along 40° latitude.

The southern boundary area of the WGB comprising such important regions as Death Valley has also had quite a few seismic imaging and velocity modeling studies. (Gibbs and Roller, 1966; Geist and Brocher, 1987; Serpa and de Voogd, 1988; Wernicke et al., 1996; Louie et al., 1997; Chavez-Perez et al., 1998) Almost all the velocity models coming out of the above-mentioned studies give only two-dimensional sectional views of the crust. A roughly north-northwest to south-southeast rectangular region more than 300 km long lies between the two well-studied areas.

Inside this rectangle, the Long Valley Caldera has drawn unique attention from many researchers, and has been fairly well investigated by local earthquake travel time inversion (Romero et al., 1993), seismic reflection and refraction surveys (Hill et al., 1985; Black et al., 1991), and teleseismic studies (Dawson et al., 1990; Steck and Prothero, Jr., 1994; Weiland et al., 1995). My study area also includes the Yucca mountain region, which has drawn considerable interest lately due to a proposed nuclear waste repository. Apart from these local studies, a few regional lithospheric-scale velocity works have also been performed. Ozalaybey et al. (1996) derived one-dimensional shear-velocity structure at nine broadband seismic stations in northern Nevada from combined analysis of receiver function and surface-wave phase velocity data. Hearn et al. (1991, 1994) studied the crustal and upper mantle velocity structure of the western United States by Pn travel time tomography. Jones et al (1992) describe the heterogeneity of the lithosphere in the Basin and Range Province by combining geological, geochemical and geophysical data.

Geology

Fenneman (1931) defined the northern subprovince of the Basin and Range province as the Great Basin. My area of investigation is the western part of this subprovince referred to as Western Great Basin and includes, but is not limited to, the tectonic province of the Walker Lane Belt defined by Stewart (1988). The Walker Lane Belt is characterized by mountain ranges of unusual shapes or trends lying between the Sierra Nevada on the west and areas of long north-northwest-trending basin-range topography on the east (Stewart, 1988). Some of its important characteristics are strike-slip faulting, areas of large-scale extension, detachment faults and metamorphic core complexes. The belt is a northwest-trending zone about 700 km long and 100-300 km wide and is situated between the ``bulk'' of the Basin and Range structures to the east and the Sierra Nevada to the west. The Walker Lane Belt receives approximately 22% of the Pacific-North American plate motion from the San Andreas system via the eastern California shear zone (Unruh et al., 1996). The southern limit of my intended study is marked by the Garlock fault. Because I am combining data sets from the Western Great Basin Seismic Network (WGBSN) and Southern Great Basin Seismic Network (SGBSN) administered by the University of Nevada, Reno Seismological Laboratory (UNRSL) with a data set from the Southern California Seismic Network (SCSN) administered by the California Institute of Technology (CIT) Seismological Laboratory, I actually define my model from 34° to 41° north latitudes and from 114.5° to 121° west longitudes (Figure 2.1). This extends the area to be investigated to the Mojave Deserts in the south and beyond the San Andreas fault in the west. The inclusion of the CIT dataset is crucial for getting better ray coverage of the Death Valley region.

Most of the geologic structures that I am going to concentrate upon are characterized by the Basin and Range features. Some geophysical traits of the region are widespread seismicity, thin crust, low upper mantle P-wave velocity, high heat flow, large scale crustal extension, and long-lived episode of magmatism (Eaton, 1982). The Sierra Nevada in the western part of my model is a relatively stable uplifted batholithic block of Mesozoic age. The eastern escarpment of the Sierra Nevada marks the western margin of the Basin and Range and is characterized by a series of Quaternary to Recent volcanic centers of which Long Valley Caldera is a well-studied one. The extended structures of the greater Death Valley region lie in the south central part of my model area and constitute one of the areas of most active Cenozoic extension in the Basin and Range Province.

The average elevation in the northern half of the area of investigation is about 1.5 km, and that in the southern half is about 0.7 km, with an abrupt step at around 37° latitude (Saltus and Thompson, 1995).

2.2 Method

I use Pg travel time tomography to map the putative Pg refractor velocity in the western Great Basin. The nature of Pg wave is somewhat elusive and poses a problem in parameterization needed for any tomography. As is discussed in Appendix I, researchers are split in their opinion about whether Pg is a turning wave or a head wave. Head-wave parameterization works almost perfectly for Pn tomography, but not always so for Pg tomography. For a single source-receiver pair one may get identical travel times for an infinite set of velocity models starting from a two-layered (head wave) to a continuously varying (turning wave) model (Aki and Richards, 1980). But, for a high number of receivers well-distributed over a range of distances, the non-uniqueness decreases, and, in such cases, head wave type arrivals are expected to render the travel time versus distance curve into a straight line whereas turning wave type arrivals will make the curve concave towards the distance axis (Figure 2.2). I count on this criterion, and in view of the fairly linear trend of the Pg segment of the travel time plot of my data (Figure 2.3), I adopt a head wave type parameterization for the tomography (Figure 2.2b).

I follow the tomographic method of Hearn and Ni (1994), which fits travel time perturbations from the simple linear travel time curve obtained during the prior data analysis and selection stage. Figure 2.2b depicts the schematic path of a refracted Pg wave. The path consists of three parts - i) the ray path from the source to the refractor (S1A1 or S2A2), ii) the ray path along the Pg refractor (A1B1, A1B2 or A2B2), iii) the ray path from the refractor to the receiver (B1R1 or B2R2). This three-part description of the ray path entails the necessity of distributing the travel time perturbations into source and receiver delays and delays due to the refractor velocity heterogeneity. Accordingly, I parameterize the source and receiver components by source and receiver static delays, and refractor velocity heterogeneity by dividing the refractor into a two-dimensional grid of cells. The cell size used is approximately 5 km by 5 km. The total number of cells is 18018.

I adapt the Pn tomographic scheme of Hearn and Ni (1994) to the case of Pg tomography. Following is a short description of the scheme. First a straight line is fit by regression through the Pg arrival times. The slope of the line gives the mean Pg slowness and the intercept gives the mean overburden time. These two quantities constitute the starting model of the tomographic scheme. The residual travel time tij with respect to this starting model for the Pg path between source i and receiver j can be expressed by the modified time term equation (Scheidegger and Willmore, 1957):

t sub ij ~~=~~a sub i~~+~~b sub j~~+~~sum from k=0 to n d sub ijk DELTA s sub k(1)
with dijk the distance traversed in the refractor cell i, ai the delay for source i, bj the delay for station j, DELTAsi is the slowness perturbation of the cell i, and n the number of Pg refractor cells traversed by the ray. Station and source static delays may be described by the equation:
source delay =int from {z sub s} to {z sub g} ~(s sub o sup {2} (z) ~-~s sub g sup 2 ) sup {1 over 2}~dz(2a)
station delay =int from 0 to {z sub g} ~(s sub o sup {2} (z)~-~s sub g sup 2 ) sup {1 over 2}~dz(2b)
where so is the overburden slowness profile as a function of depth, sg is the average Pg refractor slowness, zs is the source depth, and zg is the Pg refractor depth.

The above expressions of source and station delays indicate that both of them represent the combined effect of the concerned thickness and slowness of the overburden (upper crust), and these latter cannot be independently determined from the delay times. But some rough estimates of relative variations in the two parameters can be obtained.

Because of the errors in the origin times and source depths, I restrict myself to the interpretation of the station delays only.

My starting model for the tomographic scheme consists of only two quantities, the mean Pg slowness and the mean overburden delay, which are obtained from the slope and intercept respectively of the best-fit straight line through the travel time versus epicentral distance plot.

Hearn and Ni (1994) developed their tomographic scheme using the preconditioned version of the LSQR algorithm of Paige and Saunders (1982) and Nolet (1984). This algorithm, which is in essence of the conjugate gradient type, is very efficient in solving large systems of equations, suppressing the propagation of data errors and in the rate of convergence. It iteratively finds the least-squares solution to the matrix equation obtained by expressing all the observed travel times as linear combinations of the station delays, source delays and slowness perturbations as in equation (1). Preconditioning is applied through scaling the source and station delays by the square root of number of originating or arriving rays, and scaling the slowness perturbation by the quantity m =sqrt { sum from i=0 to n {D sub i} {d sub i}}, where Di is the total ray path length for ray i, di is the length of ray i within the cell, and n is the number of rays crossing the cell (Hearn and Ni, 1994). The preconditioning assures a fast convergence for the algorithm.

A regularization scheme minimizing the Laplacian of the slowness image (Lees and Crosson, 1989) was applied to the solution in order to adjust for variations in ray density and obtain a smoother image.

2.3 Data compilation and analysis

The tomographic study under discussion involved an elaborate data compilation, selection and analysis process, which need to be described in order to make the interpretation of results adequately meaningful. The travel time data used in the study come from three sources. Travel times from earthquakes occurring between February 1, 1991 and December 31, 1995 and recorded by the network administered by the University of Nevada, Reno Seismological Laboratory (UNRSL) form the first subset. Travel times from events occurring between January 1, 1991 and December 31, 1993 and recorded by Southern California Seismic Network (SCSN) of the California Institute of Technology (CIT) form the second subset. The third subset consists of only about 1000 travel times from 48 explosions shot between 1984 and 1991 in the Nevada Test Site (NTS) and recorded by a maximum of 55 stations in southern Nevada (obtained from Arthur Chen at Rensslaer Polytechnic Institute).

I prepared a composite data base of all the first P and S arrivals in a modified form of the International Seismological Centre (ISC) format from the above three sources. The data base resulted in a text file of 439224 lines of 21 columns. Each line describes all the relevant pieces of information pertaining to a phase arrival in a self-contained fashion. Some of the information contained in each line are station name, source and station coordinates, origin and arrival times, back-azimuth etc.

Of all the events used in the study, locations and origin times of only the 48 explosions at the NTS are known precisely, while those from the two networks are subject to some imprecision due to their having been determined by automatic event location programs used by UNRSL and CIT. I did not try to relocate these events, and assumed that there is no systematic bias in the origin times and locations. The high number of travel times used will statistically minimize the ``noise'' introduced in the data due to incorrect event coordinates in space and time. Of course, datum correction for all the event depths was imperative for the above assumption, as depth of an event mentioned in the original network catalogs is only with respect to the average elevation of stations used in locating the event and not, as would be expected, with respect to a surface of constant elevation like the mean sea level. I made a datum correction for each event depth from the average elevation of all the stations recording direct or Pg arrivals from that event so as to determine the depth with respect to the mean sea level.

Figure 2.3a shows the travel time versus epicentral distance plot in gray scale for most of the phases of the data base. After preliminary examination of different trends in the plot it was found that the Pg arrivals define a fairly straight line segment between epicentral distances of 20 and 130 km. Arrivals at less than 20 km distance define a roughly hyperbolic pattern and represent the direct arrivals, and those at more than 140 km distance represent the Pn arrivals. I selected phases following a general criterion of a minimum of 10 stations per event and 10 events per station. This criterion was relaxed if absolutely necessary, such as for areas with very few events or stations. Figure 2.1 shows the locations of events and stations thus selected. The selected data set consists of 180000 travel times from about 15000 earthquakes to around 340 stations with epicentral distances between 20 and 130 km in the Western Great Basin. A substantial portion of the data comes from the Mammoth Lakes (LVC), Little Skull Mountain (NTS) and Landers (MD) sequences.

The linear trend in the travel time versus epicentral distance plot suggests that most of the arrivals are of the head-wave type of refracted wave. The scatter of the data around the fitted straight line reflects the i) difference of source depths, ii) difference of overburden velocities, iii) lateral heterogeneity of the refractor velocity etc. For a certain source, depth variation and overburden velocity variation will show up as a constant shift of the straight line of fit. But the effect of refractor velocity heterogeneity will not be so simple to observe. It is primarily this heterogeneity that I am investigating. The overburden velocity variation will be considered only when absolutely warranted, such as for very large station residuals.

Figure 2.3b shows a gray scale plot of delay versus epicentral distance for maximum absolute delays of 1.0 s with respect to the best fitting velocity. This and the delay histogram of Figure 2.3c show that the delays are fairly symmetrically distributed with respect to zero. The average velocity and refractor depth obtained by least squares fitting of all the data with less than 1.0 s absolute residual are 6.09 km/s and 7 km respectively which I choose for all the cells of the starting model.

Though I eventually treat all the selected Pg travel times in a combined tomographic scheme, here I conducted a separate analysis of travel times for each of the following six earthquake source regions: a) Mammoth Lakes and Long Valley Caldera (LVC), b) northern Nevada, north of the latitude of Long Valley Caldera (NNEV), c) Eureka Valley (EV), d) Little Skull Mountain and Nevada Test Site (NTS), e) Owens Valley-Indian Wells (immediately north of the Garlock fault and east of Sierra Nevada) (OWV), and f) Landers (in the southern Mojave desert). Figure 2.4 shows the histogram of source depths (before datum correction) for each of the six regions. The striking difference between the representative source depths in the LVC (7-8 km) and NTS (9-11 km) areas is immediately noticeable. If one takes into account the fact that the average elevation of LVC area is more than 1 km higher than that of NTS area, the relative difference of the source depths after datum correction becomes even greater. Linear regression through Pg data points for the LVC and NTS areas give 6.01 km/s and 6.04 km/s respectively for the Pg velocity. The depths to the Pg refractor are 5.1 km and 7.2 km assuming an overburden velocity of 5.7 km/s, and, 6.2 km and 8.2 km assuming an overburden velocity of 5.85 km/s.

Figure 2.5 shows the ray coverage of the data used. The uneven density of rays at different parts of the model is conspicuous.

2.4 Results and discussion

I ran 100 LSQR iterations of the tomographic inversion for the entire region and obtained the Pg velocity image of Figure 2.6. The inverted velocity varies from 5.83 km/s to 6.3 km/s. Figure 2.7 shows the station static delays obtained from the tomography. Figure 2.8 shows delays after the tomographic inversion as a function of epicentral distance (a) and, a delay histogram(b). Comparing with Figure 2.3, one can easily see the improvement in delays achieved. The station delays vary from -0.56 s to 0.82 s. It is essential to look at the ray coverage map (Figure 2.5) in order to properly interpret the velocity and station delay maps, because the reliability of any feature on these maps depends on ray coverage of the feature area. As a general observation, one sees the highest velocity (6.3 km/s) on the southwestern side of the San Andreas fault, which could be the reflection of high velocity oceanic crust of the Pacific plate. The high velocity zone defines the strike of the San Andreas fault quite well. Hearn and Clayton (1986) find a similar Pg velocity pattern in this area.

Returning to the main part of my area of investigation, namely, the western Great Basin and adjacent structures, one can observe the low velocity patches of about 5.85 km/s in the Long Valley Caldera (LVC), Amargosa Valley (AV) and Owens Valley areas, and high velocity (6.15 km/s) in the Sierra Nevada. Southern Nevada near NTS is characterized by the overall average velocity of 6.1 km/s. The occurrence of very large positive delays (maximum 0.82 s) in this area suggests that the velocity could actually be lower, and also, the refractor could be deeper than 7 km. The eastern and northern fringes of the NTS show abnormally low velocities (5.83 km/s) and fairly large negative station delays. This area being part of a region of thicker and older crust (Ozalaybey et al., 1996) is expected to be rather fast. The large negative delays seem to account for this.

Error estimation

Standard errors of the velocity and delay parameters are determined using the resampling technique of ``bootstrapping'' (Efron and Gong, 1983; Efron and Tibshirani, 1991; Koch, 1992). The bootstrap method of error estimation uses an original data set to generate new data sets of equal size by samples with replacement from the original data set. A large number of iterations of the tomographic inversion scheme is run for each bootstrap data set. By using considerable number of bootstrap realizations (50) a set of bootstrap solutions (slowness and station delays) are obtained which is then used to estimate the velocity and station delay standard deviations. Figures 2.9 and 2.10 show the standard deviations of the Pg velocities and station delays. The maximum deviation error is 0.04 km/s. The San Andreas fault and eastern Nevada areas, and part of the Sierra Nevada show the highest velocity uncertainties. Most of the western Great Basin straddling the California-Nevada state border shows velocity errors of about 0.01 km/s.

The station delay errors (Figure 2.10) demonstrate somewhat erratic behavior, some uncertainties being greater than the respective delay magnitudes. Two such stations are to the southeast of the Long Valley Caldera. Three other stations with very large delay error (0.3 s) are situated at locations where ray density just decreases to zero (Figure 2.5) (south of the San Andreas fault, west of the Sierra Navada and Long Valley caldera, and far northern Nevada).

Resolution tests

I use four sinusoidal checkerboard test models to determine the resolution of the tomographic inversion in points of the model. The best-fitting average velocity of 6.09 km/s (which was used as the initial model in the tomographic inversion of the real data) corresponds to a slowness of 0.1642 s/km. For the synthetic test, I made the slowness vary by an amplitude 0.0039 with half-wavelengths of 1/2, 1/4, 1/6 and 1/10 degrees for the four models. This makes the minimum and maximum velocity of the input checkerboard pattern 5.81 km/s and 6.39 km/s respectively. Therefore, a recovery of this velocity range in the output models would be considered 100%. One can easily see how the percentage of recovery varies with the half-wavelength used.

The same distribution of sources and stations as in the case of the real data was used to compute travel times through the synthetic model. Both source and station delays were assumed zero for this computation and no noise was added. Figures 2.11-2.18 show the results of the resolution tests. Figures 2.11, 2.13, 2.15 and 2.17 are the velocity images obtained for half-wavelengths of 1/2, 1/4, 1/6 and 1/10 degrees respectively. Figures 2.12, 2.14, 2.16 and 2.18 show the corresponding station delay maps.

The best-resolved area is evidently that of the Long Valley Caldera (LVC), where checkerboard pattern is fairly well-recovered for all the wavelengths even though the recovered amplitude decreases with wavelength. The amplitude recovery is about 75% for the wavelength of 0.25 degree (Figure 2.13). The southern Owens Valley and the Mojave desert areas both show significant recovery (> 50%) at this wavelength. Recovery in the NTS area is 30-40% only. The region along the California-Nevada state border between LVC and NTS areas show nearly the same type of recovery. The San Andreas fault region shows some recovery only at the 0.5 degree wavelength.

The ideal resolution test result for the station delays would be zero delays for all the stations. One can see in Figures 2.12, 2.14, 2.16 and 2.18 that the test results obtained are far from ideal. A significant number of stations, especially in areas of low ray density show appreciable delays (-0.2 s to +0.18 s for 0.5 degree wavelength). The delay amplitude for these stations decreases with decreasing wavelength. But delay amplitude for stations in areas of good ray density show a gradual increase with decreasing wavelength. This can be attributed to the low wavelength velocity variation in the original synthetic model that is not efficiently recovered in the output velocity models. There seems to be increasing trade-off between velocity and station delays with decreasing wavelength. The high number of stations with nonzero delays in southeastern Nevada at all the wavelengths can partly explain the paradox of obtaining the big low velocity patch (Figure 2.6) in that region.

Central part of the model

In order to look at the velocity image in more detail I produced two expanded displays of the model, one featuring the area between LVC and the Garlock fault (Figure 2.19), and the other featuring the area north of LVC (Figure 2.20). Figure 2.19 shows a general high velocity (6.15-6.2 km/s) in the Sierra Nevada. Of course, a good portion of it lacks enough resolution. This velocity may be representative of the little-altered batholithic granite. A big high velocity patch of 80 km by 80 km dimension, again in a poorly-resolved area, straddles the Garlock fault at its eastern part and thins southwards in eastern Mojave desert. Hearn and Clayton (1986) found a Pg velocity of less than 6.0 km/s in the Mojave region. More recently Magistrale et al. (1992) in their three-dimensional tomographic modeling of the southern California crust, found layers of 6.07 km/s between 4 to 8 km, and 6.24 km/s between 8 to 14 km depth. I obtained Pg velocities in the range 5.95 to 6.2 km/s in the Mojave area (Figures 2.6 and 2.19).

A trend of low velocity (5.9-5.95 km/s) zone (shown by the dashed curve in the Figure 2.6) stretches northwards from central Mojave, crosses the Garlock fault east of the Owens Valley fault zone into the Owens Valley, and then jumps eastward through the northern part of the greater Death Valley area and meets another low velocity trend along the Furnace Creek fault zone. This latter trend of average 5.95 km/s extends from the Amargosa Valley-eastern Death Valley region north-northwestwards along the Furnace Creek fault zone, crosses the Fish Lake Valley and terminates in the Excelsior-Coldale area. In this trend, the Amargosa Valley shows an unusually low velocity of 5.85 km/s. Velocity in the Fish Lake Valley also is about 5.90 km/s. Most of the length of the Furnace Creek fault zone lies within a valley separating high mountains (Stewart, 1988). The low velocity may be the reflection of the zone of weakness of the upper crust underlying this valley. The Fish Lake Valley fault zone is characterized by a high slip rate (> 4 mm/yr) being responsible for about half of the Pacific-North American plate -boundary shear accommodated within the Basin and Range Province at that latitude (Reheis and Sawyer, 1997).

The Amargosa Valley belongs to north-trending structural trough termed Kawich-Greenwater Rift by Carr (1990). This rift includes a series of major volcanic centers near the western edge of the Nevada Test Site. The thick line passing through Amargosa Valley in Figure 2.19 (labeled KGRGL) shows the axis of a gravity low defining the rift. A trend of relatively low Pg velocity (6.0 km/s) extending along KGRGL from Amargosa Valley through Crater Flat and Timber Mountain volcanic complex can be observed. Here is a correlation of low velocity, low gravity and volcanic centers.

Long Valley Caldera

In the northwestern part of Figure 2.19 and southern part of Figure 2.20, the Long Valley Caldera (LVC) features as an area of characteristically low velocity of 5.87 km/s. The low velocity is apparently the expression of the residual magma chamber of the caldera. There is some difference of opinion about the depth of the residual chamber among researchers. Dawson et al. (1990), and Sanders (1993) mention 7.0 km as the starting depth. Steck and Prothero, Jr. (1994) found a tabular low velocity feature between 7 and 11 km depth, whereas Weiland et al. (1995) place the center of the chamber at 11.5 km depth. They found a 25-30% slower velocity at this depth than the surroundings. In my study, the difference between the LVC velocity and its surroundings is not as abrupt (only about 4%). This may be explained by an averaging effect in the velocity sampling of the Pg rays. The dimension of the LVC Pg low velocity feature is about 20 km by 20 km.

Northern part of the model

The ray coverage and resolution obtained in the area north of the LVC in Figure 2.20 are mostly below average. One can observe the northward trend of slightly- lower-than-average velocity in the Dixie Valley area with a little difficulty. Whether this an artifact of the ray distribution or a reflection of the crustal extension and geothermal gradient is hard to ascertain. The cross signs before MNA and KVN denote the location of two braodband stations of those names beneath which Ozalaybey et al. (1996) determined shear-wave velocity structure using a combined analysis of receiver function and surface-wave phase velocity data. Their S velocity model at MNA increases quickly (above 5 km depth) to 3.55 km/s and then maintains roughly the same value all through the crust down to 32 km depth except for a low velocity layer at 7-8.5 km. The S velocity at KVN increases more gently and reaches the 3.5 km/s value at 7 km and then roughly linearly increases to 3.65 km/s over the next 25 km of the crust except for a low velocity zone between 10 and 14 km depth. Assuming that the P wave velocity model mimics the shape of the shear-wave velocity model excepting the velocity inversions, one expects the MNA area to have higher P wave velocity relative to KVN area in the depth range sampled by Pg wave. For a Vp/Vs ratio of 1.732 this means a relative difference of 0.10 km/s in the P wave velocity. My Pg velocity map (Figure 2.20) shows this roughly the same relative difference.

The seismic profiles shot by COCORP (Allmendinger et al., 1987) and PASSCAL (Catchings and Mooney, 1991) are located around the latitude of 40°. The only profile along which there is any ray coverage at all in my study is the northeast-southwest transect of PASSCAL. The thick white dotted line in the northeastern quadrant of Figure 2.20 shows the location of the southernmost 90 km of this line passing through the Dixie Valley. Benz et al. (1990) determined the one-dimensional velocity model from Pg travel times among others and found a Pg velocity jump from 5.5 to 5.8 km/s at 60 km distance, and then a gradual increase to 6.1 km/s. Holbrook finds an average Pg velocity of 6.0 km/s at distances greater than 50 km for this profile. Pg velocities obtained in the this part of the present study lie between 6.0 and 6.05 km/s.

The high velocity (6.12 km/s) patch observed west of Reno (in Truckee) could be explained by the existence of batholithic granites in the area.

Comparison with gravity and topography

Figures 2.21, 2.22, and 2.23 show profiles of Pg velocity, ray density, elevation and Bouguer gravity along two north-south lines and one east-west line. (see Figure 2.6 for location of these lines). The north-south AA' line was chosen to pass through areas of maximum possible ray coverage. As this line does not cross the important structure of Long Valley Caldera, the second north-south line AA'' was also chosen. The location of BB' was chosen to be perpendicular to as many structural trends as possible. Inclusion of the ray density profile in Figures 2.21, 2.22, and 2.23 was intended to facilitate the comparison of the Pg velocities with gravity and elevation wherever the velocity result is supported by a sufficient ray coverage. At long wavelengths, both profiles AA' and AA'' demonstrate that Pg velocities and elevations have an inverse relationship, i.e., low velocities correspond to high elevations and vice versa. Velocities and Bouguer gravities vary in the same sense. Correlation between velocity and gravity can be observed at even lower wavelengths than those between velocity and elevation.

Comparing the different profiles along BB' is not as straightforward partly because of significant lack of ray density at several locations. There is positive correlation between Pg velocity and gravity at some individual highs and lows. Figure 2.24 shows elevation versus Pg velocity and Bouguer gravity versus Pg velocity scatter diagrams for all cells in the model having ray hit counts of 10 or more. These diagrams were done in order to discern any overall correlation between Pg velocity and elevation and gravity. There is no apparent trend in either of the two plots. Perhaps a more restrictive choice of cells (other than basing only on ray density) may feature the relationship observed in the north-south profiles.

2.5 Conclusions

I performed tomographic inversion for crustal refractor velocity in the western Great Basin using the so-called Pg arrival times from several thousand earthquakes. This work is the first Pg refractor velocity mapping done in the western Great Basin using travel time tomography. The results have been presented as a Pg velocity map for all the 5 km by 5 km cells of the model. A by-product of the inversion was a station delay map which gives some idea about major variation of the velocity that avoided being put into the velocity map, or any significant deviation from the constant refractor depth assumed.

At the individual structural level the Long Valley Caldera is characterized by a low Pg velocity of 5.87 km/s, Amargosa Valley by 5.85 km/s, and the Sierra Nevada by a high velocity of 6.15 km/s. At a more regional scale, low velocity trends are found to lie along areas of high slip rate such as Furnace Creek-Fish Lake Valley fault zones, volcanic centers such as the Kawich-Greenwater Rift, or crustal extension such as Death Valley and Amargosa Valley. The more or less continuous trend of low velocities running from the western Mojave Desert region northwards across the Garlock fault into the Owens Valley, and then bending eastwards across the Death Valley extended region to the Furnace Creek fault zone and finally terminating north of the Fish Lake Valley is a significant result of this study. This trend may play a role in the transfer of deformation from the Mojave block (Eastern California shear zone) to the Walker Lane Belt. (Unruh et al., 1996).

A positive correlation between Pg velocity and Bouguer gravity at individual structural and longer wavelength scales and a negative correlation between Pg velocity and elevation at long wavelengths are observed.


Chapter 3
Travel time inversion for earthquake locations and three-dimensional Velocity structure in the Eureka Valley area, eastern California

3.1 Introduction

Like most geophysical problems (Parker, 1994), the task of simultaneous determination of earthquake hypocenter locations and velocity structure of the surrounding area from arrival/travel time data is essentially a nonlinear inverse problem (Lee and Stewart, 1981; Crosson, 1976). The nonlinearity arises from the fact that one needs to compute travel times along raypaths from trial hypocentral locations through trial velocity models to different stations in order to compare them with observed travel times, but the raypaths themselves depend on the location and velocity model which one wants to determine. Most of the methods used to tackle this problem are based on linearization about a reference model by Taylor series expansion (Roecker, 1982; Lay and Wallace, 1995) entailing an unavoidable dependence of the final model on the model one starts with, since the Taylor series is valid only for small perturbations about the reference model. This condition for small perturbation is a big inconvenience of any linearized inversion because of the accompanying risk of entrapment in and inability to escape from local minima. Thus some a priori knowledge about the model becomes necessary for the linearized inversion of travel times to render the expected results. But there are situations where the area under study virtually offers no significant prior information. Keeping such a situation in mind, I propose an inversion methodology composed of a cascaded combination of nonlinear simulated annealing optimization and linearized inversion. I test the method by locating an earthquake sequence in the Eureka Valley area of the Western Great Basin and simultaneously estimating local three-dimensional velocity structure. The occurrence of a magnitude 6.1 earthquake followed by several hundred aftershocks, recorded by permanent networks and portable stations, offered an excellent opportunity for the test as well as geophysical study of an area that has so far lacked detailed local characterization. A better definition of the local structure at the scale of my study is a step forward in understanding local structures of the Great Basin, reducing the non-uniqueness of regional models.

The motivation behind resorting to the inversion scheme to be dealt with in this chapter comes from the absence of even a one-dimensional local velocity model of the area, and also of enough geologic information to make one. Eberhart-Phillips (1993) talks about two possible approaches for setting up an inversion. The first approach is to start with a velocity model based on previous geologic interpretations and try to fit the data with the model by perturbing the latter as warranted by the data. The second approach is to start with a simple 1-D model obtained with the same data and let the inversion solution add complexity wherever needed by the data. Both the approaches have the common implicit assumption of the starting model being close to the true model as required by any linear or linearized inversion process. Eberhart-Phillips (1990, 1993) prefers using a simple minimum 1-D model derived by the program VELEST (Ellsworth 1977; Kissling 1988; Ellsworth et al., 1990) from the same data set as would be used for 3-D inversion. Kissling et al. (1994) prescribe the minimum 1-D model as the ideal initial reference model for local earthquake tomography in order to minimize the artifacts caused by an inappropriate initial model. But the ``recipe'' to calculate the said minimum 1-D models includes the establishment of an a prioi 1-D model from information regarding the stratification of the area as the first step (Kissling et al., 1994). But in a hitherto little studied place like Eureka Valley none of the above approaches seems adequate simply because of the absence of reliable a priori geological and geophysical information. Hence, the necessity of a method that does not depend on the initial model. Two series of methods satisfy this condition. They are iterative strategies requiring matrix manipulations (Spakman, 1993) and global optimization schemes such as simulated annealing (Rothman, 1985, 1986; Sen and Stoffa, 1991) which do not require any matrix manipulation. The sparse station distribution in the Eureka Valley area precludes the effective use of any iterative strategy. Optimization by simulated annealing is appealing in this special case because of its independence of the initial model and ability to produce solutions with very sparse data. So my method of choice for investigating structure and aftershock locations builds on a first step of initial 3-D model estimation using nonlinear optimization by simulated annealing (Pullammanappallil and Louie, 1993), followed by a fine-tuning step using 3-D linearized inversion (Thurber, 1983; Eberhart-Phillips, 1990). I also run a 3-D linearized inversion using a minimum 1-D model calculated by VELEST and compare the two results.

Geology

Eureka Valley is situated in the southwestern Great Basin, roughly midway between Death Valley and Owens Valley (Figure 3.1). Structurally, it is a north-northwest trending valley about 30 to 40 km long, with possibly shallow sediment depths as reflected by moderate (< 10 mGal) gravity anomalies. Its topography conforms to the regional trend, which is extremely rough but gradually rising on average westward to the Sierra Nevada. From a seismotectonic point of view, it belongs to the central portion of the Inyo-Mono section of the wide Walker Lane belt defined by Stewart (1988) and characterized by a complex system of aligned or subparallel dip-slip and strike-slip faults. The Inyo-Mono Section is situated between the Furnace Creek fault zone on the east and the Sierra Nevada on the west. Most of the big mapped faults in this section strike north-northwest, but many smaller mapped faults, including those in the Eureka Valley, trend slightly east of north. The valley fill is mostly Quaternary eolian sand deposits and alluvia flanked on the west by Pliocene volcanics, Mesozoic granite and mostly Cambrian metasedimentary outcrops, and on the east by Carboniferous and Cambrian metasedimentary rocks (Oliver and Robbins, 1978). Berkstresser, Jr. (1974) reports the occurrence in the southern part of Eureka Valley of the tallest sand dune in California rising 208 m above the playa on which it stands.

3.2 Data

More than 500 aftershocks followed the M6.1 Eureka Valley earthquake of May 17, 1993. About 150 permanent stations belonging to the Southern Great Basin Seismic Network (SGBSN), the Western Great Basin Seismic Network (WGBSN), the Southern California Seismic Network (SCSN), and the Northern California Seismic Network (NCSN) recorded the main shock. Five portable stations deployed by the Seismological Laboratory of the University of Nevada, Reno (UNR) and three deployed by the California Institute of Technology (Caltech) seismological laboratory recorded hundreds of aftershocks in a span of 20 days, of which about 500 were coincident with events catalogued by UNR's permanent networks.

Figure 3.1 shows the locations of permanent and temporary stations, along with preliminary UNR catalog locations of the events. At first glance it becomes evident that the spatial distribution of the permanent stations around the epicentral area offers only limited azimuthal coverage. To locate the aftershocks and find the local velocity structure of the epicentral area I used P and S arrivals at 20 permanent stations timed by the UNR Seismological Laboratory and at 8 portable stations that I timed myself.

The ratio between the number of S and P observations is several times greater for the portable stations than for the relatively more distant permanent stations. Figure 3.2 shows a time-distance plot for P and S arrivals from 430 events at portable and permanent stations at maximum 100 km epicentral distance, based on the preliminary estimates of origin times and hypocentral coordinates obtained at the UNR Seismological Laboratory from a one-dimensional regional velocity model and the FASTHYPO program (written by Robert Hermann of Saint Louis University). The figure excludes obvious outliers by keeping only those picks that lie within a window centered around the average straight line fit to the cloud of data points. The arrival/travel times have weights of 0 to 4 assigned depending on quality of pick, 0 being the best and 4 being the least reliable.

3.3 Method

My approach is a combination of nonlinear optimization and linearized inversion (Asad et al., 1994) comprising the following stages:
1) Obtaining an initial three-dimensional velocity model using optimization by simulated annealing (Pullammanappallil and Louie, 1993).
2) Simultaneous relocation of hypocenters and determination of local velocity structure using linearized inversion (Thurber, 1983; Eberhart-Phillips, 1990).

For comparison, I also run an exclusively linearized inversion, where the stage 1 in the above approach is replaced by a step of finding the minimum 1-D model using VELEST (Kissling et al., 1994). A brief description of the three techniques involved in the two parallel approaches follows.

Nonlinear optimizationd

My nonlinear approach for simultaneous determination of hypocenters and velocity structure is an outgrowth of the simulated annealing optimization scheme of Pullammanappallil and Louie (1993,1994) and Pullammanappallil (1994). This scheme in principle does not require any a priori assumptions, and the results obtained by it are independent of any initial model. For the forward problem of travel time computation, the scheme avoids actual ray tracing and uses a fast finite-difference scheme based on a solution to the eikonal equation (Vidale, 1990; Hole et al., 1992), accounting for curved rays through arbitrarily variable velocities, and all types of primary arrivals. S arrivals were used only to increase the number of travel times assuming a constant Vp/Vs ratio of the square root of three, and no inversion for S velocity model was done. Random perturbation of the volume and velocity of a subset of the whole model is done from iteration to iteration. Velocity within this subset is constant for each iteration. The perturbed volume can vary from a single cell to the whole model, while the velocity is allowed to vary from 1.5 km/s to 8 km/s. Random perturbation of all the three coordinates of a randomly chosen hypocenter is done at each iteration. The horizontal hypocentral coordinates and the depth are allowed to vary by a maximum of 5 km and 14 km respectively between two iterations.

I use an iterative Monte-Carlo based optimization process to solve the inverse problem. Each iteration comprises the following steps: 1) travel time computation through the current model and determination of the least-square error; 2) simultaneous random perturbation of origin time, hypocenter location, and velocity structure, and computation of new least square error; 3) acceptance of the new model if the corresponding new error is less than that of the previous iteration, plus a provision for probability-based conditional acceptance of new models if the new error is larger; 4) repetition of the previous steps until the annealing converges, where the difference in the least square error between successive models and probability of accepting new models become very small. An important component of this study is the random forcing of hypocenter location perturbations at each iteration. As I will show below, this forcing appears to allow the optimization of a reliable 3D-velocity model while relegating unacceptable errors in the final, perturbed hypocenter coordinates.

Minimum 1-D model estimation by VELEST

Kissling et al. (1994) defines the minimum 1-D model as the 1-D velocity model that itself represents the least squares solution to the coupled hypocenter velocity model relation equation. In this model the layer velocities are approximately equal to the average velocity in the 3-D tomographic solution within the same depth range. The model is determined by a trial and error process starting with available a priori information about the subsurface structure.

Linearized 3-D inversion

In the linearized approach, I use the program SIMULPS originally developed by Thurber (1983) using approximate ray tracing (ART) and pseudo-bending (PB) algorithms and further improved by Eberhart-Phillips (1990), for forward modeling of P- and S-wave arrival times in an iterative, damped, least-squares inversion for hypocenters and three-dimensional velocity structure. The method also employs the parameter separation technique (Pavlis and Booker, 1980; Spencer and Gubbins, 1980) which allows to computationally split the problem into two, e.g., solving for the hypocenters and velocity model separately, while maintaining the mathematically coupled nature of the overall problem. Evans et. al (1994) describe in detail the technique and all the parameters involved. Hauksson and Haase's (1997) study of the three-dimensional velocity models of Southern California is one of the latest works done using SIMULPS.

Model parameterization in this method assumes a continuous velocity field. The earth structure is represented in three dimensions by velocities defined at discrete points, and velocity at any intervening point is determined by linear interpolation among the surrounding eight grid points. Values at the velocity nodes are systematically perturbed during inversion. Except the outermost nodes, which are always fixed, every node can either be kept fixed or included in the inversion. Derivative weight sum (DWS) is a useful measure of ray density in the neighborhood of a node (Toomey and Foulger, 1989) and is used to determine which of the nodes should be included in the inversion. DWS for a node is similar to the ray hit count, but weighted by the ray-node separation and raypath length in the vicinity of the node.

The ART algorithm selects the path with the least travel time from a suite of circular arcs connecting the source and receiver. The iterative PB method fine tunes the ray path obtained by ART to approximate better the true ray path dictated by local velocity gradients (Um and Thurber, 1987). I adopt Eberhart-Phillips' (1990) method of adjusting an initial ray path rather than Um and Thurber's (1987) method of calculating a new midpoint for each segment because the former allows much easier updating of the raypath each time a source-receiver path has to be calculated.

A circular arc is a poor approximation at large distances and overestimates the path length, because a refracted wave is likely to be the first arrival at such distances. For an area like Eureka Valley, where alluvium is apparently underlain by rigid rocks, refracted waves can start arriving at even 10 km distance. Because of this, and the fact that timing error usually increases with distance, I chose a distance-weighting scheme of weight 1 for source-receiver distances of up to 30 km, and 0 for 80 km and beyond, with a linear tapering in between. The iterative pseudo-bending scheme (Eberhart-Phillips, 1990) reduces the difference between exact and ART raypaths for large source-receiver distances.

Implementation

I use P and S arrival times recorded at the UNR and Caltech portable stations exclusively to perform nonlinear optimization for velocity and hypocenters, and add more picks from nearby permanent stations to perform linearized inversion in and near Eureka Valley. I timed a total of 1492 P wave arrivals and 768 S wave arrivals at 8 portable stations from 429 events (Figure 3.2, lower plot) recorded these stations. The maximum number of arrival time picks per event is 16, automatically limiting the number of events that could be effectively studied, because the linearized inversion required a minimum of four picks per event for location only. For simultaneous relocation and velocity estimation, the required minimum number of data points increases by the number of velocity model parameters. Therefore, I add distance-weighted arrival times from permanent stations up to 80 km epicentral distance increasing the number of observations to 6989 P and 1879 S arrival times from 430 events (including the main shock) recorded at 28 stations. The nonlinear optimization has no theoretical requirement for any minimum number of arrival times.

Velocity model parameterizations for the linear inversion and nonlinear optimization schemes somewhat differed from each other. In the nonlinear optimization, I used a 35 (x) by 35 (y) by 20 (z) grid of 1 km blocks. The linearized inversion used a 13 (x) by 15 (y) by 9 (z) grid of nodes with a node spacing of 2 km at the center and near surface portion of the model and increasing outwards and in depth. A minimum DWS of 7 is used as a condition for any node to be included in the inversion. I plot all my results in the central 30 by 30 by 20 km volume by linearly interpolating wherever necessary.

I relocated the hypocenters of the 430 events and modeled the velocity structure simultaneously using four different processes (see Figure 3.3):

1) Simultaneous relocation and minimum one-dimensional velocity model estimation using VELEST.
2) Simultaneous relocation and three-dimensional velocity estimation by linearized inversion using the minimum 1-D model obtained in process 1 as the starting model.
3) Simultaneous relocation and three-dimensional velocity estimation by nonlinear optimization by simulated annealing.
4) Simultaneous relocation and three-dimensional velocity estimation by linearized inversion using initial locations from the UNR catalog and the three-dimensional velocity model obtained from process 3 as the starting model.

In view of the widely observed phenomenon that earthquakes in the Basin and Range Province occur almost exclusively in the upper 15 km of the crust (Eaton, 1982; Vetter and Ryall, 1983; dePolo et al., 1992), I used the constraint of maximum hypocentral depth of 20 km for all four of the above processes.

3.4 Results and discussion

I present the relocated hypocenters and velocity models in the following two sections. As one can see, the locations in the two methods, i.e., process 1 followed by process 2, and process 3 followed by process 4, follow roughly the same pattern in map view and cross-section, but the velocity models differ significantly from each other.

Hypocenter locations

Figures 3.4 and 3.5 show maps of the epicentral locations obtained by the four processes, and Figures 3.6 and 3.7 show vertical cross-sections of the corresponding hypocenters along the box AB defined in Figures 3.4 and 3.5. Also plotted on the maps are Bouguer anomaly contours (smooth curves; Oliver and Robbins, 1978) and topographic contours (rough curves). The respective relations among gravity, topographic features, and epicentral locations are roughly maintained in all four cases. The details, however, vary from case to case. Figure 3.8 shows the histograms of rms errors of UNR catalog locations and locations obtained by process 4. One can see the overall decrease in rms error.

The most consistent spatial distribution is the north-northwest trend of epicenters, well-defined in both the cases of simultaneous relocation and three-dimensional velocity estimation by linearized inversion (Figures 3.4b) and 3.5b). Cross-sections along AB in Figures 3.6 and 3.7 reflect the same consistency, though the cross-section for the nonlinear optimization case (Figure 3.7a) shows the locations much more scattered in comparison with the other three cases. The main event has been located at depths of 11, 11.5 and 10.5 km by processes 1, 2 and 4 respectively (Figures 3.6a, 3.6b and 3.7b). Process 3, whose input data consisted of arrivals to portable stations only, does not have a mainshock relocation. Deducing a unique fault plane through the hypocentral scatter in any of the cross sections is not straightforward. Relocated hypocenter locations from process 3 are much more scattered than those from the other 3 processes. Average rms residuals of location in processes 3 and 4 are 0.10 and 0.12 respectively. The mainshock has an rms residual of 0.05 in both the processes.

I infer two different planes of roughly same strike (165 degrees) from the scatter of points in the cross sections of Figures 3.6a, 3.6b and 3.7b. One plane, which does not contain the relocated mainshock, dips steeply (about 75 degrees) eastwards; and the other, including the relocated mainshock, dips slightly less steeply (about 60 degrees) westwards. Loper et al. (1993) report a normal mechanism for the mainshock obtained from regional surface wave inversion. By incorporating broadband data from several institutions to the data recorded by Berkeley Digital Seismic Network, they relocated the large aftershocks relative to the mainshock using a master event technique and suggest a relatively complex rupture history for the event based on examination of mainshock and aftershock records, and identify at least two subevents. Massonet and Feigl (1995) model both east and west dips with normal mechanism in their satellite synthetic aperture radar (SAR) interferometric work, though they rule out the eastward dip as a local minimum in their modeling scheme. Their best-fitting focal mechanism has a north-northwestward strike and westward dip. Another SAR study by Peltzer and Rosen (1995) confirms a west-dipping fault with a slightly NNE strike. They reconcile the observations of north-northwest alignment of epicenters and north-northeast strike of mapped local faults to infer that the earthquake occurred on a west-dipping north-northeast-striking fault plane with the rupture propagating diagonally upward and southward. The Caltech TERRAscope moment tensor (MT) and Harvard centroid-moment tensor (CMT) solutions give gentler westward dips. Table 1 summarizes the focal mechanism parameters determined by the different researchers along with those inferred in this study (using standard sign convention of strike and dip):

  	  	Table 1

Method		Strike 	Dip	Rake	Reference
  		(deg)	(deg)	(deg)
TERRAscope MT	-173	48 west	 -	(Caltech website)
Harvard CMT	 210	30 west	 -93	(Dziewonski et al., 1994)
SAR No. 1	 173	54 west	 -	(Massonet and Feigl, 1995)
SAR No. 2	 187	50 west	 -	(Peltzer and Rosen, 1995)
This study	 165	60 west	 -	
One can see that the strike and dip obtained in my study match fairly well with the SAR results of Massonet and Feigl (1995). It is also not very different from the results of the other SAR study and TERRAscope except that their faults strike a little east of the north whereas mine does it north-northwestwards. My results differ considerably from the Harvard CMT solution, the latter giving a 45 degree more eastward strike and dip only half as steep. Finally, the rather complicated distribution of hypocenters on the east-west cross-section (Figures 3.6a, 3.6b and 3.7b), along with the somewhat curved distribution of epicenters (Figures 3.4a, 3.4b and 3.5b), can be the expression of a concave fault involvement and the wide range of values and directions of the strike and dip of the fault found by different methods may also be due to complications brought into play by the concave shape of the fault. A complex rupture history and the existence of two subevents as suggested by Loper et al. (1993) mentioned above can also be the reason behind the observed spatial configuration of the relocated hypocenters.

Velocity model

Table 2 shows the minimum one-dimensional model I obtained by VELEST after many trial and error steps. This minimum 1-D model was the starting model for process 2 (Figure 3.3). The initial velocity model for process 4 (Figure 3.3) was derived by smoothing the output model of the nonlinear optimization (process 3), then constraining the low-velocity anomalies and reversals that appear in the regions not well controlled by the data, in the manner of Pullammanappallil (1994).
	   Table 2

Layer	Velocity (km/s)	Starting Depth (km)
  1	   4.79	 	  0.00
  2	   5.13	 	  2.00
  3	   5.58	 	  4.00
  4	   5.84	 	  6.00
  5	   5.96	 	  8.00
  6	   6.09	 	 12.00
  7	   6.21	 	 16.00
  8	   6.50	 	 20.00
  9	   7.85	 	 30.00

Figures 3.9a and 3.9b show map views at 0 and 6 km depth respectively of the final three-dimensional velocity model obtained by simultaneous relocation and 3-D velocity estimation by linearized inversion with minimum 1-D model as the starting model (process 2 preceded by process 1 - Figure 3.3). Figures 3.10a and 3.10b show corresponding map views for the model obtained by 3-D linearized inversion with starting model estimated by simulated annealing (process 4 preceded by process 3 - Figure 3.3). In general, the velocity model from processes 1+2 has a higher average velocity than that from processes 3+4 (5.97 km/s versus 5.55 km/s). Velocity at surface in the first case varies from 3.6 km/s to 6.8 km/s whereas that in the second case varies from 2.7 km/s to 6.0 km/s only. There is quite good match in the spatial distribution of low and high velocity patches in the northeastern half of the the two maps, though the absolute velocities differ. In the southwestern half, whereas the map for processes 1+2 shows a more or less uniform velocity of about 4.8 km/s, that for processes 3+4 shows a 10 by 10 km high velocity patch of 5 km/s in a background of 4.4 km/s. In both the cases, the north-northwest trending epicenter distribution is underlain by a parallel relatively low velocity area flanked by higher velocities. The maps at 6 km depth have fewer features in common. A relatively low velocity north-northwest trend is present in the northeastern part of the maps. The trend is much better-defined for processes 3+4 than the other case. Also, the high velocity patch featured in the southwestern part of the surface map persists at 6 km. Figures 3.11a and 3.11b present standard errors with respect to velocity models at 6 km depth obtained by processes 2 and 4 respectively. The northeastern half of the models show higher errors than the southwestern half. Treatment of these errors without referring to the distribution of rays can be misleading. Figures 3.12a and 3.12b show the ray density in terms of derivative weight sum (DWS - discussed earlier) and diagonal elements of the resolution matrix at 6 km depth for process 4. The northeastern half shows much higher ray density compared with the other half. The highest resolution of 0.636 is obtained in the northern part of the model. Resolution in the southwestern part of the model is very low. Examining the velocity maps along with the ray density and resolution maps, I can say that even though the northeastern part of the model has higher errors its velocities are more reliable than those in the southwestern half because both ray density and resolution in the latter is low. It means that the 10 by 10 km high velocity block found in the southwestern part of the model by process 4 is not as reliable as the NNE low velocity trend found by both processes 2 and 4.

Figures 3.13 and 3.14 show vertical cross-sections of models obtained by processes 2 and 4 along CC' and DD' defined in Figures 3.9a and 3.10a. The hypocenters shown in the cross-sections are those located inside the volume limited by vertical planes along the dashed lines on either side of CC' and DD'. The low velocity trend observed on the map views are found as a basin in cross-section DD' of both the processes. The basin is about 8 km wide and 3-4 km deep. As observed in the case of map views, the absolute velocity of the basin in the case of process 4 is 0.8 km/s lower than that in the case of process 2. The mainshock is situated at depth under the western flank of the basin.

Because most of the hypocenters are shallower than 10 km, ray coverage at greater depths is poor and as such, the velocity cross-sections are not quite reliable at depths more than 10 km. The near-surface DWS is quite high in the eastern half of both sections CC' and DD' (not shown). Between 5 and 10 km depths the area with high DWS extends farther westwards. Therefore, the low velocity basin featuring in both map views and cross-sections for processes 2 and 4 is a structure supported by data, whereas the existence of the high velocity block found in process 4 is not equally supported by data. But its existence cannot be ruled out altogether. A better coverage of local stations in the southwest quadrant of the model would help in constraining the existence of the high velocity block (see Figure 3.1).

One can see a good correspondence of the gravity and topographic highs and lows with the high and low velocity areas in the velocity. The linear north-northwest low velocity trend coincides with the strike of the alluvial valley and also with a trend between two gravity lows. The 10 by 10 km high velocity block matches with a gravity high of comparable dimension.

In addition to qualitatively matching between trends in topography, gravity, and computed velocity model, I made a quantitative comparison of the fit between gravity anomalies and the two velocity models. Any gravity modeling from seismic velocity data first requires the velocities to be converted to densities. This conversion is almost always a problematic job because of the widely varying conditions under which different researchers have found their empirical relationships. Some of the formulas are derived entirely from studies on igneous and metamorphic rocks (Birch, 1961; Bateman and Eaton, 1967), while others are derived from exclusively sedimentary rock samples (Nafe and Drake, 1963; Gardner et al., 1974). Not a single one of these relationships can be used with perfect relevance to my study area. I used the following empirical formula of Gardner et al. (1974) to convert three-dimensional velocity models into three-dimensional density models:

rho = 0.23V0.25,
where, rho is density in g/cm3 and V is P wave velocity in ft/s. The 3-D density models have as many blocks as the velocity models. Then, using a reference density of 2.67 g/cm3 (this value was used for preparing the Bouguer anomaly map of Oliver and Robbins, 1978), I computed gravity values at 1 km grid space at the surface by adding the vertical components of the anomalies due to all the density blocks for each grid point (Grant and West, 1965).

Figures 3.15a and 3.15b show the computed gravity values (in gray scale) superimposed on the Bouguer gravity anomaly contour map of Oliver and Robbins (1978) for processes 2 and 4. Both the computed gravity maps show maximum contrasts twice as much as that in the Bouguer anomaly map. Whereas the maximum contrast on the Bouguer anomaly map is 25 mGal, those for processes 2 and 4 are 52 and 56 mGal respectively. Computed gravity values for process 4 are mostly negative, while those for process 2 are predominantly positive. Evidently this difference is due to the lower velocities of the process 4 model, which is on average 0.8 km/s slower than the process 2 model. The shape of the valley is modeled well by both the processes. The discrepancy between the magnitudes of gravity contrast on the Bouguer map and my modeled maps may suggest that even though outlines of the major low and high velocity regions have been obtained by the inversion, the details of lateral heterogeneity and of vertical variation of velocity have not been adequately discerned. Of course, another reason for the discrepancy lies in the inevitable difference between the physical conditions under which the empirical relationship was obtained and those prevailing in the Eureka Valley (exclusively sedimentary versus both sedimentary and, igneous and metamorphic rocks).

So far, I have shown that both processes 2 (linearized inversion with initial model obtained by minimum 1-D velocity modeling) and process 4 (linearized inversion with initial model obtained by nonlinear optimization by simulated annealing) map a few common features. The low velocity basin is mapped well by both the processes. But at depth the process 2 performs much more poorly than process 4 (Figures 3.9b and 3.10b). At 6km depth, the general appearance of the process 4 model is much more coherent and realistic. Also, the initial minimum 1-D model used for process 2 has a too high P wave velocity (4.79 km/s) at surface. Even though I attempted quite a few trial and error steps for the minimum 1-D modeling the results probably correspond to a local minimum. This is the inherent danger in any linearized inversion used for solving a nonlinear problem, especially in the absence of prior information. The relatively higher velocity obtained in the process 2 is reflection of the high velocity initial 1-D model.

One can see that the simulated annealing optimization provides a velocity model that can be used as an initial model in a simultaneous linearized inversion of velocity and hypocenters, to obtain realistic velocity models and hypocentral locations. Hypocentral locations as such from the optimization alone are very much scattered. This may be an effect of not allowing the right degree of hypocentral perturbation during the optimization of successive trial models. Pullammanappallil and Louie (1993), in a similar simultaneous optimization of velocities and reflector locations, found that inappropriate constraint of reflector ``hypocenters'' led to poor results. These observations led me to do simultaneous linearized inversion for hypocenters and velocity using the velocity results of nonlinear optimization as the initial velocity model, producing the successful results above.

3.5 Conclusions

I proposed and implemented an efficient optimization-cum-inversion algorithm for studying an area with few a priori information. P and S wave travel time data from a moderate earthquake sequence in Eureka Valley, eastern California, allowed me to delineate the velocity structure of the valley and surrounding areas, and define the fault plane associated with the sequence. I find a north-northwest trending normal fault dipping steeply westwards. This is in conformity with the dominant type of slip mechanism prevalent in the southwestern part of the Great Basin near the Sierra Nevada Range. Although I have enough confidence in the timing of portable station data, their small number per event limited the resolution of the results. On the other hand, the possibility of better results using a higher number of arrivals at nearby permanent stations from the seismic monitoring networks was compromised by lower confidence in the picks.

As a result of combining a linearized inversion with a nonlinear optimization in tandem, I was able to obtain a realistic velocity model and fault plane definition from hypocenters. The velocity model adequately describes broad structural features at the local scale. The nonlinear simulated annealing optimization scheme, by virtue of its capability of finding the global error minimum irrespective of any starting model, produced a velocity model for an area like Eureka Valley where little is known about the local velocity structure. The linearized inversion following the nonlinear optimization could thus start from a fairly valid initial model and preliminary hypocentral locations, and fine tune them.


Chapter 4
On critical temperature, perturbation scheme and probabilistic acceptance of models in simulated annealing optimization for velocity and source location estimation

4.1 Introduction

The nonlinear nature of the problem of simultaneous determination of a subsurface velocity model and earthquake hypocenter locations from travel times has pushed researchers to devise ways of tackling the problem without resorting to indiscriminate linearization, which oftentimes severely degrades the results. Monte Carlo methods were devised with a view to handling nonlinear problems properly. But the purely random way of exploring the model space in typical Monte Carlo methods proved somewhat prohibitive for large models from a computational point view. Modified Monte Carlo methods based on the principle of the so-called ``importance sampling'' provided a more effcient way of model space search. Simulated annealing (SA) and genetic algorithm (GA) are two such methods. Of these two, SA enjoys a better mathematical foundation and proof of convergence than GA (Sen and Stoffa, 1996).

In Chapter 3, I proposed an in-tandem combination of nonlinear optimization by simulated annealing (SA) and linearized inversion as a compromise between computation time economy, which is better achieved by linearized methods than fully nonlinear ones, and proper handling of the nonlinearity of the problem (which linearized methods do away with at the level of mathematical formulation of the problem). One major difficulty encountered in the SA optimization part of the scheme was its production of a fairly realistic velocity model but randomly scattered hypocenter locations. The linearized inversion following the SA trial used the SA-derived velocity model along with initial hypocenter locations from the preliminary catalog as an initial model for further fine-tuning. In this chapter I look in depth into the cooling schedule and scheme of parameter perturbation, and at the overall ``error dynamics'' for the type of SA code developed by Pullammanappallil (1994), with a view to improving its performance. I limit myself to synthetic tests on a simple two-dimensional model.

In the recent years, simulated annealing has gained increasing popularity in solving a wide spectrum of geophysical problems. Rothman (1985, 1986) solved the prevalent problem of ``cycle skips'' in linearized methods of residual statics computation of noisy data by following the Monte Carlo optimization technique of Metropolis et al. (1953) adapted by Kirkpatrick et al (1983) as simulated annealing. He proposed an extension of the the SA technique to the case of estimation of velocity. Use of simulated annealing in inversion of seismic travel times as well as waveforms for velocity estimation has since become an important area of research (Landa et al., 1989; Koren et al., 1991; Mosegaard and Vestergaard, 1991; Sen and Stoffa, 1991; Pullammanappallil and Louie, 1993, 1994).

A very important aspect in any problem of optimization by simulated annealing is the determination of the annealing schedule (also known as temperature or cooling schedule) to be used. The prefect annealing is a very slow cooling process that carefully favors ``crystallization'' and prevents formation of any ``non-crystalline'' structure from the ``melt'' obtained at a high temperature. From computational point of view, the same rate of slow cooling appears simply prohibitive in implementation. Search for parts of the annealing process that can accommodate faster cooling rates led to the inclusion of the concept of ``critical temperature'' into the annealing schedule. In physical annealing, the critical temperature is the temperature in the cooling process where rapid crystallization starts and a phase change occurs (Basu and Frazer, 1990). The cooling rate around this temperature has to be slow enough to favor the crystallization, but the temperature itself should also be high enough for the process to avoid defects (local minima) in the crystal. In simulated annealing, convergence near the critical temperature is rapid. The annealing schedule has to iterate enough above this temperature to ensure that potential local minimum entrapments are avoided. Basu and Frazer (1990) define the critical temperature as representing a balance point at which low-energy states are preferred, but the system is warm enough to tunnel between such states. They equate it to the temperature in the ``short runs'' at which the negative average energy (or the correlation coefficient) is maximum. Pullammanappallil's (1994) definition of the critical temperature is essentially the same. His critical temperature is the temperature in the ``short runs'' at which the average least square error is minimum.

Although there is some kind of consensus among the SA practitioners that a fairly fast rate of cooling above the critical temperature can expedite the annealing process without jeopardizing proper ``crystallization'' (convergence to global minimum), there are considerable differences among them in its implementation. Geman and Geman (1984) use an inverse logarithmic function of the ``annealing time'' (1/log t). Basu and Frazer (1990), and Pullammanappallil (1994) find a cascade function of constant temperatures with short ``staircases'' above and significantly longer ``staircases'' below the critical temperature efficient in solving seismic velocity estimation problems. While I follow the cascade-type cooling schedule of Pullammanappallil (1994), I find his determination of the critical temperature deficient in envisaging some situations that can occur in the annealing schedule and affect the overall convergence of the process in a detrimental way. In this chapter, I make a detailed investigation into the process of determination of the critical temperature and accommodate the abnormal situations into its definition. I also compare several types of velocity and source location perturbation schemes. Finally, I propose a slightly modified criterion of probabilistic acceptance of models.

4.2 Synthetic model and data

Two sets of synthetic first arrival travel times were computed using 20 sources and 39 receivers in a 40 km by 20 km two-dimensional velocity model. I used Vidale's (1988) two-dimensional travel time computation code, which uses a fast finite-difference scheme based on a solution to the eikonal equation. The model in Figure 4.1a (henceforth called model 1) is used for velocity model determination only keeping the locations fixed. That in Figure 4.1b (henceforth called model 2) is used for source location determination keeping the velocity fixed, and also for simultaneous determination of velocity and source location determination. Thus, we have a set of 780 travel times for each of the models. This makes both the problems of velocity model determination (800 velocity model parameters), and combined velocity and source location determination (840 model parameters) under-determined. Only the problem of source location determination keeping the velocity fixed is over-determined (40 source coordinate parameters - 20 x and 20 z coordinates).

4.3 Perturbation schemes

In the following sections I will be referring to several schemes of model perturbation, which are described below:

i) Scheme 1: The same amount of velocity perturbation is executed for all cells belonging to a rectangle of randomly chosen area (varying from zero to the whole model) at each iteration. The resultant velocity after perturbation for any iteration is randomly computed to lie in the range 1.50 km/s and 8 km/s (corresponding to slownesses of 0.667 s/km and 0.125 s/km). This range is much bigger than the range of velocities one expects for most of the cases. But such a big range proved helpful in avoiding entrapment in local minima and expediting convergence (Pullammanappallil, 1994).

ii) Scheme 2: The absolute x and z coordinates (with respect to x=0 and z=0) of one source are guessed separately and randomly in the 0-40 and 0-20 km ranges. Only one source, chosen randomly, is displaced at each iteration.

iii) Scheme 3: The absolute x coordinate of one source and z coordinate of another are guessed separately and randomly in the 0-40 and 0-20 km ranges. The two sources to be perturbed are chosen randomly at each iteration.

iv) Scheme 4: Maximum adjustment of 40 km for the x coordinate and 20 km for the z coordinate of one source with respect to present values is done at each iteration. The amount of x and z perturbations are computed randomly in the 0-40 and 0-20 km ranges respectively with the only constraint that the sources remain within the model area.

Smaller perturbations for the source coordinates, e.g., 5 km, performed very poorly and hence, all the tests I perform in the following sections are limited to the above four types of perturbations.

A least square error type objective function is used in the optimizations discussed in the following sections. For any iteration i, it is defined as:

E sub i ~~~=~~~ {1 over n} sum from j=1 to n {(t sub j sup o ~-~ {t sub j sup c}~)} sup 2
where n is the number of observations, tjo and tjc denote observed and calculated travel times for the jth observation respectively.

4.4 Determination of the critical temperature

I do short run tests to understand the nature of the critical temperature for 7 combinations of the perturbation scheme and synthetic data, using the same probability function for comparison of errors in successive iterations. For each combination, I make short runs of 2000 iterations for temperatures 10, 5, 1, 0.5, 0.1, 0.01, 0.001, 0.0001.

As described earlier, two sets of synthetic first arrival travel times were computed using 20 sources and 39 receivers in a 40 km by 20 km two-dimensional velocity model. For each of the 7 combinations, I show four curves, two describing the shape of the least square error as a function of iteration for the accepted models, one showing the average least square errors of all accepted models at each of the eight temperatures, and one showing the number of accepted models for those temperatures.

Velocity model perturbation only:

In this case, source locations were assumed to be known (those in Figure 4.1a) and the velocity model is to be determined from the 780 synthetic travel times computed through the model 1. A constant initial velocity of 6.0 km/s is used and velocity perturbation is executed according to scheme 1 described in section 4.3.

Least square errors as a function of iteration number showed a highly fluctuating pattern within the range 0.0 to 100.0 for temperatures 10 down to 0.1. At temperature 0.01 the shape suddenly changed to a more or less smooth and monotonically decreasing curve. Figures 4.2a and 4.2b show the shapes of error versus iteration number at temperatures 0.1 and 0.01. Figure 4.2c shows the average error versus temperature and Figure 4.2d shows the number of accepted models versus temperature curves. One can see that the change of shape of the error-versus-iteration curve is reflected also in the average error-versus-temperature and number of accepted models-versus-temperature curves. Both the curves show a steep ramp between temperatures 0.1 and 0.01 and very negligible slope on either side of the ramp (at temperatures higher than 0.1 and lower than 0.01). This is the typical behavior in most of the cases. The lowest average error corresponds to the temperature (0.01) giving the lowest number of accepted models and showing an abrupt change from an oscillating pattern at the next higher temperature to a smoothly decreasing pattern. Both the average error and the number of accepted models maintain a constant value from temperature 0.01 downwards. According to Pullammanappallil's (1994) and Basu and Frazer's (1990) definitions (henceforth referred to as the PBF criterion) the critical temperature for this case would be 0.01. Their only criterion being average error, they do not look at the two other criteria, e.g., change of shape of the error-versus-iteration curve from temperature to temperature, and sudden drop in the number of accepted models versus temperature, that I propose to examine. In the present case, depending only on the average error criterion seems to be sufficient. But this is not always the case, as I will show later.

Source location perturbation only:

In this case, the velocity model was assumed to be known (Figure 4.1b) and the source locations are to be determined from the 780 synthetic travel times computed through the model 1. At the beginning of the optimization, all the sources are assigned to be located at the center of the model (x = 20 km and z = 10 km). Source locations are perturbed using schemes 2, 3 and 4 described in section 4.3.

Figures 4.3, 4.4 and 4.5 show the results of the short runs using schemes 2, 3 and 4 respectively. One can see the same fluctuating pattern in the curve showing least square errors as a function of iteration number in the temperature range 10 down to 0.1 as in the case of velocity model determination only. But the the amplitude of fluctuation is much smaller in this case (Figures 4.3a, 4.4a and 4.5a), evidently due to the small number of parameters (40 source coordinates) being perturbed compared to a significantly higher number of parameters (800 velocity values) in the earlier case. Of course, velocity and source location perturbations have different ways of affecting the error values. Perturbation of a source parameter is bound to change the error, whereas it is possible to have velocity model perturbations in the unsampled part of the model without any change of error. The change in the shape of error-versus-iteration curves occur between temperatures of 0.1 and 0.01 in these cases also. The shape of this curve at temperature 0.01 has an overall monotonically decreasing pattern with some remnant low amplitude oscillations in the first 50-60 iterations for schemes 2 and 4. (Figures 4.3b and 4.5b). Figures 4.3c-4.3d, 4.4c-4.4d and 4.5c-4.5d show average error and number of accepted models as functions of temperature for the three perturbation schemes. The patterns in these two curves also closely mimic those in the case of velocity perturbation only (Figure 4.2c-4.2d). Of course, one can observe a gentler slope of the ramp in the average error curve in this case, conspicuously related to the small amplitude of least square errors (< 5.0 s2). The lowest average error for scheme 2 is achieved at temperature 0.0001 where it is almost negligibly lower than that at 0.001. The average error in the case of perturbation scheme 3 reaches a minimum at temperature 0.01 and increases very slightly at lower temperatures, whereas that for scheme 4 reaches the lowest value at 0.001 and remains constant for lower temperatures. The number of accepted models reaches the lowest value at temperatures 0.001, 0.0001 and 0.001 for schemes 2, 3 and 4 respectively. Only in the case of scheme 3 is there correspondence between the lowest error and lowest number of accepted models. My 3 criteria and the single PBF criterion give the same critical temperature (0.01) in the case of scheme 3. My suggested critical temperatures for both schemes 2 and 4 are 0.01 whereas those according to PBF are 0.0001 and 0.001. In view of the low number of accepted models for temperatures 0.01 downwards for all the three schemes, it is evident that the choice of critical temperatures depending solely on the lowest average error criterion results in a much smaller number of accepted models at several temperatures higher than the chosen critical temperature for schemes 2 and 4, and consequent insufficient evolution of the model before the cooling schedule reaches the critical temperature.

Simultaneous perturbation of the velocity model and source locations:

In this case, both the velocity model and source locations are assumed unknown (Figure 4.1b) and are to be determined from the 780 synthetic travel times computed through the model 1. A constant initial velocity of 6.0 km/s and constant initial location (x = 20 km and z = 10km) for all the sources are used. Source location perturbation is done using the same three schemes (2, 3 and 4) as in the previous section. Velocity perturbation in each of these three cases follows that in scheme 1.

Figures 4.6, 4.7, and 4.8 show the results of this test for the three schemes. All the four curves in the case of scheme 2 (Figures 4.6a-4.6d) seem to be somewhat like weighted averages of the corresponding curves of the velocity-only and source-only perturbation experiments (Figures 4.2a-4.2d and 4.3a-4.3d). The exact mathematical nature of the weighting involved is not clear. I presume it depends mostly on the random number sequences used in the three cases (The number of random numbers used in each iteration of the present case is roughly equal to the sum of those used in the corresponding iteration of the two previous cases). My critical temperature in this case is 0.01 whereas that according to PBF criterion is 0.001. In this case, again, the PBF criterion causes a decrease in the number of accepted models.

The results of experiments using schemes 3 and 4 both reveal a pathological computational phenomenon that I call a ``glitch'' (Figures 4.7b and 4.8b). Whereas the patterns of error-versus-iteration curve at temperature 0.01 for both the schemes are fairly monotonically decreasing for more than 1000 iterations, the error abruptly jumps from 0.266 s2 at iteration 1236 to 31.85 s2 at iteration 1238 for scheme 3, and from 0.32 s2 at iteration 1188 to 20.96 s2 at iteration 1193 for scheme 4. After this ``glitch'' the error again assumes the roughly hyperbolical decreasing pattern. After investigation, I found that these glitches were caused by an unexpected combination of the differential error (i.e, difference between the error of the present iteration and that of the iteration at which the last model was accepted) and the random number used for determining the probability of acceptance. An immediate effect of the occurrence of the glitches is an increased average error and number of accepted models at temperature 0.01 (Figures 4.7c-4.7d and 4.8c-4.8d). As a consequence, the ramp between high average errors at high temperatures and low errors at low temperatures becomes wider than those in all other cases thus far studied.

Even though the number of accepted models increases slightly due to the glitch, the ramp between high and low rates of model acceptance retains roughly the same shape as in the previous cases. What should be the critical temperatures in these two cases? Consideration of the average error as a function of temperature with the exclusion of the shapes of error-versus iteration curve for each temperature and number of accepted models-versus-temperature curve will be very misleading in this case. The average errors at temperatures 0.01 and 0.001 for scheme 3 are 5.53 s2 and 0.71 s2 respectively. One can easily see that the average error in this case is caused by the glitch and does not reflect the ``normal error level'' at this temperatures (the average error over the first 1236 iterations of the short run at temperature 0.01 is only 0.89 s2). The significant difference between the average errors at temperatures 0.01 and 0.001 (5.53 versus 0.71, Figure 4.7c) would, in the absence of other criteria, absolutely justify the choice of 0.001 as the critical temperature. But the unusual glitch occurring in the short run and causing the high value of average error was an artifact of a rare combination of numbers at a particular iteration of the short run, and in its absence (which will, most certainly, be the case in the actual long run) the number of accepted models will be low. For the optimization to sample large enough number of models, a judicious choice of the critical temperature would be 0.01 supported by the criteria of change of pattern of error-versus-iteration curves (Figures 4.7a and 4.7b) and slope of the number of accepted models-versus-temperature curve (Figure 4.7d)

Figure 4.9 shows the evolution of the shape of the error-versus-iteration curves with temperature for a three-dimensional ``blob'' model (a constant high velocity cube in a lower velocity background) using a three-dimensional perturbation scheme similar to scheme 4 for a short run of 1000 iterations (Asad and Louie, 1996). The number of unknown parameters (velocity cells and source coordinates) in this case was 1770. Figures 4.10a and 4.10b show the average error and number of accepted models at different temperatures. In this case too, the PBF criterion suggests a lower critical temperature than would be determined if all the three criteria that I propose are considered.

4.5 Probabilistic acceptance of models

In any realization of simulated annealing the system is ``warmed'' until it is completely ``melted''. In implementation it means starting the optimization at a sufficiently high temperature where the probability of acceptance of models with bigger errors is high, and thus ensuring that any bias to the initial model is destroyed. By keeping an overall high error during the iterations of the optimization at high temperatures, premature entrapment in local minima are also avoided. The cooling schedule has to be designed around such a choice of the critical temperature that there are enough number of iterations at higher temperatures facilitating ``escape'' from potential local minima, but not so many as can unnecessarily delay the decrease of error towards the global minimum. Every SA practititioner faces a dilemma here between the needs of escaping local minima and preservation of model improvement, the latter being roughly defined as decrease in error.

A high number of accepted models causes a good evolution of the model by extensive sampling of the model space. A substantial part of the evolution of the model parameters take place at high temperatures. The rate of evolution decreases with decreasing temperature. Therefore, if the error level does not come down to a significantly low level before the optimization enters the low temperatures, it will be very difficult for the optimization process to reach the neighborhood of the global minimum. That is why proper choice of a critical temperature is so crucial.

In this section, I propose a heuristic combination of the ubiquitous ``acceptance'' criterion and a new ``rejection'' criterion for the probabilistic treatment of the issue of acceptance of new models with a view to handling the above-mentioned dilemma.

The general rule in simulated annealing methods is that a new model with a smaller error than that of the current model is accepted unconditionally and one with bigger error than that of the current model is conditionally accepted with a probability of

Pc = exp(-DELTAE/T),
where, DELTAE is the absolute value of the error difference and T is the temperature. In practice, a random number selected from a uniformly distributed series in the interval (0,1) is compared with Pc (Kirkpatrick et al., 1983). If it is less than Pc, the new model is accepted; if not, the currently accepted model is retained.

I use this scheme of probabilistic acceptance of models with higher error for all the iterations (several thousands) in the starting temperature. This will achieve the state of unbias to the initial model and also let the optimization sample the model space to certain extent. Starting from the next lower temperature I use what I call ``rejection'' criterion, according to which any model with higher error than that of the currently accepted model is rejected unconditionally, while a model with lower error is accepted probabilistically. The same probability function including the same definition of DELTAE as above is used in this case too. The difference comes in the part of comparison with the random number. In this case, the random number has to be bigger than Pc for the model to be accepted. This essentially preserves the Gibbs distribution nature of the probability.

In the acceptance criterion, more models with higher error are accepted at higher than at lower temperatures. In the rejection criterion more models with lower error are rejected at higher temperatures than at lower temperatures. In the acceptance criterion, the lower DELTAE the higher the probability of accepting a model with higher error, whereas, in the rejection criterion, the higher DELTAE the higher the probability of accepting a model with lower error. Both acceptance and rejection based on probability almost ceases at low temperatures resulting in lower error being the only and sufficient practical condition of acceptance of new models.

Three advantages of the use of rejection criterion are:

The main disadvantage of using the rejection criterion is that the ``course of evolution'' of the model is much shorter compared to that in the acceptance criterion because of the difference in the number of models accepted in the two cases. This is partly compensated for in the initial ``warming-to-melt'' stage of the optimization where I use the acceptance criterion at the initial high temperature for a few thousand iterations. The number of iterations necessary will depend on the size of the model.

4.6 Results

I conducted long runs (20000 iterations) of simulated annealing for the following cases:

Velocity model determination only:

I execute two runs, one using the standard ``acceptance'' criterion, and the other using the combined ``acceptance-rejection'' criterion that I proposed in section 4.5, for determining the velocity model from the 780 synthetic travel times computed through the model 1. The source locations are all assumed known. A constant initial velocity model of 6.0 km/s is used.

The total number of accepted models in the standard run was 3457, of which 1846 were accepted with higher errors. Figure 4.11a shows the velocity model obtained at the 18957th iteration. The average least square of the 780 travel times at this iteration is 0.0007 s2. For the sake of comparison I plot the velocity using the same gray scale as that used for the true model (Figure 4.1a). The low velocity anomaly in the right part of the model is quite well recovered, but recovery of the high velocity anomaly at the left part is very poor.

Figure 4.11b shows the velocity model at the 16504th iteration of the combined acceptance-rejection run. It corresponds to a least square error of 0.0008 s2. The total number of accepted models in this case was 4026, of which 2133 were accepted with high errors in the initial ``warming'' part of the process. 73 models were rejected. In this case, the high velocity anomaly at the left is quite well recovered whereas the low velocity anomaly becomes more diffuse. However, the overall recovery of anomaly in this case is more balanced than the other case. Figure 4.12 shows the histograms of final time errors for the two cases.

Source location determination only:

I execute four different long runs for determining source locations assuming the velocity model to be known. Three of these four runs are conducted using perturbation schemes 2, 3 and 4 and the standard ``acceptance'' criterion. The fourth one was conducted using perturbation scheme 3 and the combined acceptance-rejection criterion. The initial locations of all the sources for each of the four cases were set at x = 20 km and z = 10 km. Figure 4.13 shows the results of the four runs. Figure 4.13a shows the true velocity model and source locations. Figure 4.13b shows the locations obtained by the four runs along with the true source locations. The 20 source locations in all the cases (including the true locations) are connected by straight line segments in increasing original order of x in order to facilitate comparison. At the first glance, the overall pattern of source profiles obtained in all the cases seem to be close to the true profile except for two sources in the right part of the profile obtained by scheme 4. These two sources are offset by about 15 km in the x coordinate. The best locations are obtained by the scheme 3 using standard ``acceptance'' criterion (mean absolute shift of 0.33 km for x and 0.25 km for z). Table 1 gives the mean absolute shift in x and z for the four runs:
		             Table 1

 	Scheme	Criterion		x (km)	z (km)
 	  2	acceptance		0.41	0.33
 	  3	acceptance		0.33	0.25
 	  4 	acceptance		2.08	0.56
 	  4	acceptance		0.68	0.43 *
 	  3	acceptance-rejection	0.44	0.29

* excluding the two anomalous sources

In all the runs the z coordinate is more accurately determinated than the x coordinate. The higher accuracy of the scheme 3 compared to scheme 2 suggests that uncoupled perturbation of x and z coordinates of the same source is more efficient in source location determination than coupled perturbation. The poorest quality of locations obtained by the scheme 4 reveals the limitation of adjustment of previous coordinates in contrast to determination of absolute coordinates at each iteration. The errors in locations obtained by scheme 3 and acceptance-rejection criterion are comparable to those obtained by scheme 2 and acceptance criterion.

Simultaneous determination of the velocity model and source locations:

In this case, the same 4 types of source location perturbation as discussed above are each accompanied by a velocity perturbation following scheme 1 for simultaneously determining the velocity model and the source locations. As in the tests of critical temperature in section 4.3, a constant initial velocity of 6.0 km/s and constant initial location (x = 20 km and z = 10km) for all the sources are used for long optimization runs of 20000 iterations.

Figure 4.14 shows the locations obtained for the four schemes. There is obvious deterioration of the quality of the location results in all the schemes with respect to those in the case of determination of source location only (Figure 4.13). Here, most of the source locations are confined towards the center of the model. Despite the considerably shifted locations for individual sources, the source location profiles still retain some resemblance to the true profile. Table 2 gives the mean absolute shift in x and z coordinates in the four cases:

		              Table 2

 	Scheme	    Criterion		x (km)	z (km)
 	  2	    acceptance	 	 2.32	 3.31
 	  3	    acceptance	 	 2.42	 4.17
 	  4	    acceptance	 	 6.29	 6.79
 	  3	acceptance-rejection	 2.62	 2.51

Looking at the Table 2 one can easily conclude that the combination of the perturbation scheme 3 and acceptance-rejection criterion gives the best locations. Performance of the scheme 4 is by far the worst. This is also evident from the Figure 4.14b.

Figures 4.15, 4.16, 4.17, and 4.18 show the velocity models obtained in the four cases. The upper panel in each figure shows the velocity model in the same gray scale as used for the true model in Figure 4.1b. The gray scale level in the lower panel corresponds to the range of velocities in the resultant model. The true and obtained source profiles are plotted on the velocity model in each case. A common feature of all the obtained models is their overall bias towards low velocities. A relatively higher velocity patch (compared to the average velocity) is observed to appear in the left part of the three models obtained by the acceptance criterion. The resemblance between the true and obtained models ends here. As to the velocity model obtained by the scheme 3 and acceptance-rejection criterion it has no resemblance with the true model. Joint examination of the source profile and velocity model obtained in each case reveals the ``trade-off'' between them. The best source location and worst velocity model correspond