Even with highly precise measurements, the number of local minima of F may be significant. One generally accepts Hoffmann and Salamon (1990) that the number of local minima cab be approximated by where N is the number of parameters. One should not suppose any initial guess (even when it is based on a previous orbit determination) will lead to the global minimum by using a local search only. An efficient search for (a neighborhood of) the global minimum is mandatory. A survey of methods for global optimization has been edited recently by Horst and Pardalos(1995).
One could think that the more precise the measurements, the lower the number of local minima. If the observations are error free, that only means the objective function reaches 0 at the global minimum. Such a situation does not prevent one from having a huge set of local minima, mainly due to the highly nonlinear nature of D with respect to its components.
The chance to miss the global minimum even when starting with a quite good first guess is not all that small. During the preparation of this paper, we accidentally hit upon such a case. To test our approach, we generated a set of observations with normal errors and we tried to recover the original orbit (or at least the one that minimizes D). Let denote the value of D computed with the orbital parameters used to generate the observations, the value of D reached with only the local minimization procedure starting from that orbit and, finally, the value of D at the global minimum. We met a case in which
where the two inequalities are strict. We hope this example convinces the reader that the probability to miss the global minimum is not as low as current belief would put it.
The requirements of simulated annealing are:
The working space and the objective function were defined in the previous sections. For the temperature, two features are required: the initial value and a way to set the value of the temperature after k reductions. After many experiments, we decided to fix the initial temperature to the lowest value of D reached in a sample of points randomly chosen in the working space.
There is a lot of theoretical and experimental work which provides guidance for the reduction of the temperature. We chose the approach suggested by Ingber (1993a,b) Ingber (1993a, b) and we implemented this algorithm to allow 1000 reductions of the temperature in such a way that the final temperature is one ten-thousandth of the original temperature
The most significant improvement concerns the point generator. Instead of a basic random point generator, we are using a more sophisticated procedure based on the Modified Simplex Method Nelder and Mead (1965) and described by Press et al. (1992).
This algorithm has at least two weaknesses: (a) the algorithm can stop as soon as a local minimum is reached (this is probably implementation dependent); (b) the simplex can degenerate when the dimension of the problem becomes "important'' (10 is already important). If (a) happens, we simply have to restart using a new simplex. A way to avoid (b) would be to replace the algorithm of Nelder and Mead by another kind of multidirectional search method such as the one latelly proposed by Torczon (1991). Torczon proved there is no chance for her simplexes to degenerate; the cost of this proof is the prohibitive computational effort required for this method. We prefer, as in (a), to start again with a new simplex in cases where the current simplex would degenerate.
It is possible to build such simplexes to increase the chances to visit the different regions of the working space. The risk to miss (a neighborhood of) the global minimum is then reduced. At each temperature level, a maximum number (350) of function evaluations is authorized.
An attentive reader could be puzzled by reading that it is possible for SA to fall in a local minimum. It is important to keep in mind that there is, in general, no proof of convergence for that method. The rare existing proofs require an infinite decrease of the "temperature'' (i.e., an infinite computation time). Facing such a situation, we have to compromise between the execution time and the confidence we have in the "global'' nature of the minimum. That explains the potential relative inefficiency of any implementation of SA.
We chose the BFGS method for its quite efficient behavior independently of the magnitude of the residuals. Our implementation is based on the pseudo-code proposed by Dennis Jr. and Schnabel (1995), but a complete description of the method can be found in Fletcher (1987).
Copyright The European Southern Observatory (ESO)