next previous
Up: Simultaneous least-squares adjustment of


3 Minimizations

 Up to this point, our approach is not fundamentally different from Morbey's (1975). The difference lies in the way in which we find the minimum of D (Eq. 17). A common procedure is to assume that sufficiently accurate approximations to the orbital parameters were available, as Morbey did. He used these rough approximations to solve the normal equations by linearization and iterations. The possibility that this initial guess might not be in a neighborhood of the global minimum is not considered.

Even with highly precise measurements, the number of local minima of F may be significant. One generally accepts Hoffmann and Salamon (1990) that the number of local minima cab be approximated by $O(\exp(N))$ where N is the number of parameters. One should not suppose any initial guess (even when it is based on a previous orbit determination) will lead to the global minimum by using a local search only. An efficient search for (a neighborhood of) the global minimum is mandatory. A survey of methods for global optimization has been edited recently by Horst and Pardalos(1995).

One could think that the more precise the measurements, the lower the number of local minima. If the observations are error free, that only means the objective function reaches 0 at the global minimum. Such a situation does not prevent one from having a huge set of local minima, mainly due to the highly nonlinear nature of D with respect to its components.

The chance to miss the global minimum even when starting with a quite good first guess is not all that small. During the preparation of this paper, we accidentally hit upon such a case. To test our approach, we generated a set of observations with normal errors and we tried to recover the original orbit (or at least the one that minimizes D). Let $D_{\rm i}$ denote the value of D computed with the orbital parameters used to generate the observations, $D_{\rm l}$ the value of D reached with only the local minimization procedure starting from that orbit and, finally, $D_{\rm g}$ the value of D at the global minimum. We met a case in which

D_{\rm g}<D_{\rm l}<D_{\rm i}\end{displaymath}

where the two inequalities are strict. We hope this example convinces the reader that the probability to miss the global minimum is not as low as current belief would put it.

3.1 Global search

Simulated Annealing Metropolis et al. (1953); Kirkpatrick et al. (1983) has already been successfully applied to the determination of the orbital parameters of visual binaries Pourbaix (1994); Pourbaix and Lampens (1997). The basically same approach is also applied here. While the basic concepts of simulated annealing have been conserved, the efficiency of the algorithm has been highly improved.

The requirements of simulated annealing are:

a working space: each point of this space is potentially a solution;
a scalar objective function: one seeks the global minimum of this function;
a "temperature'' to regulate the "annealing'' process and an algorithm to decrease this temperature;
a point generator to describe how to go from the current trial to the next one.

The working space and the objective function were defined in the previous sections. For the temperature, two features are required: the initial value and a way to set the value of the temperature after k reductions. After many experiments, we decided to fix the initial temperature to the lowest value of D reached in a sample of points randomly chosen in the working space.

There is a lot of theoretical and experimental work which provides guidance for the reduction of the temperature. We chose the approach suggested by Ingber (1993a,b) Ingber (1993a, b) and we implemented this algorithm to allow 1000 reductions of the temperature in such a way that the final temperature is one ten-thousandth of the original temperature
t_k&=&t_0\exp(-c\sqrt[N]{k}),\\ c&=&4\exp(\frac{-3}{N}),\end{eqnarraystar}
where N denotes the dimension of the problem (here, N=10).

The most significant improvement concerns the point generator. Instead of a basic random point generator, we are using a more sophisticated procedure based on the Modified Simplex Method Nelder and Mead (1965) and described by Press et al. (1992).

This algorithm has at least two weaknesses: (a) the algorithm can stop as soon as a local minimum is reached (this is probably implementation dependent); (b) the simplex can degenerate when the dimension of the problem becomes "important'' (10 is already important). If (a) happens, we simply have to restart using a new simplex. A way to avoid (b) would be to replace the algorithm of Nelder and Mead by another kind of multidirectional search method such as the one latelly proposed by Torczon (1991). Torczon proved there is no chance for her simplexes to degenerate; the cost of this proof is the prohibitive computational effort required for this method. We prefer, as in (a), to start again with a new simplex in cases where the current simplex would degenerate.

It is possible to build such simplexes to increase the chances to visit the different regions of the working space. The risk to miss (a neighborhood of) the global minimum is then reduced. At each temperature level, a maximum number (350) of function evaluations is authorized.

An attentive reader could be puzzled by reading that it is possible for SA to fall in a local minimum. It is important to keep in mind that there is, in general, no proof of convergence for that method. The rare existing proofs require an infinite decrease of the "temperature'' (i.e., an infinite computation time). Facing such a situation, we have to compromise between the execution time and the confidence we have in the "global'' nature of the minimum. That explains the potential relative inefficiency of any implementation of SA.

3.2 Local improvement

Even with the Modified Simplex Method as the point generator, one cannot expect to be at the global minimum at the end of the simulated annealing phase. A local search algorithm has to be used to tune the minimum. Here also, different methods have been proposed: the ones not requiring the gradient (Powell, Modified Simplex Method, etc.), the ones using the gradient (Davidon-Fletcher-Powell, Levenberg-Marquard, Broyden-Fletcher-Goldstrab-Shanno (BFGS), etc.).

We chose the BFGS method for its quite efficient behavior independently of the magnitude of the residuals. Our implementation is based on the pseudo-code proposed by Dennis Jr. and Schnabel (1995), but a complete description of the method can be found in Fletcher (1987).

next previous
Up: Simultaneous least-squares adjustment of

Copyright The European Southern Observatory (ESO)