Up: Spectral decomposition by genetic
Subsections
This section details the results of Ga-GA applied to
simulated target data sets which have a known level of noise added.
Section 3.1 discusses the performance of Ga-GA
in the absence of data noise (except for very small numerical
rounding errors). Sections 3.2 and 3.3
provide ideal circumstances to test the performance of
Ga-GA, against that of the two standard algorithms
mentioned earlier; CURVEFIT and AMOEBA, for data with a realistic
noise level and with a noisy background
present (Sects. 3.2 and 3.3 respectively).
Section 3.3 will also show the ease with which additional
spectral features may be incorporated into the analysis.
We use Ga-GA to analyse three noiseless targets, i.e. we
replace
by 1 in Eq. (3), each
corresponding to a different Gaussian configuration. The three test
targets are: 1) A single "wide" Gaussian with the target genotype
given by three parameters,
. 2) Two "joined" Gaussians corresponding
to the six parameter genotype
, and 3) a more complex
five Gaussian configuration with the fifteen parameter target genotype
given by
.
Each case was analysed ten times (to allow performance statistics
to be compiled), each run with a different initial
population, for a fixed number of generations. It is also possible
to configure Ga-GA to run until it achieves a fixed
although for certain types of analysis this method is
unfavourable (Charbonneau & Knapp 1996). The
number of generations used in each case is different however, and
varies with the increase in complexity of the target solution.
Therefore target 3 typically requires a 1200 generation run, which is
considerably more than the 200 and 500 generation runs required
for targets 1 and 2 respectively.
The returned parametrisation of each target is given in Table 1.
The subscript T quantities (e.g. XT) are the target parameters
and the subscript G quantities (e.g. XG) are the corresponding
mean values returned by Ga-GA after multiple fixed
generation runs. It is clear from the results presented in
Table 1 that Ga-GA obtains a very good
representation of each target (within the errors).
Table 1:
Results for cases 1), 2) and 3) described above. Subscript T
quantities indicate target parameters, and subscript G quantities
are the mean after multiple evolutionary runs. Similarly, the values
of
are the final mean values of
. The errors for each parameter are calculated as the means
of the ten run ensemble
|
 |
Figure 2:
Test run for Ga-GA, taken from the ensemble of ten runs,
for the noiseless single Gaussian target (solid line) of Case 1 and
the profile modelled by Ga-GA ( ) |
The errors in the parameters are global error estimates and are
calculated in a Monte Carlo fashion, i.e. we perform multiple runs of
Ga-GA each with a different initial population, this is
achieved by initializing the random number generator with a different
seed (Charbonneau & Knapp 1996). This Monte
Carlo approach "forces" Ga-GA to search the parameter space
from a different starting point each time. This will also allow the
calculation of "mean" values for each of the parameters.
Figure 2 shows a plot of target 1 (solid line) and
the profile derived from the "fittest" genotype (
) after
only 200 generations with the
.Similarly, Fig. 3 shows the profile constructed from
the fittest genotype,
, for the
double Gaussian configuration of target 2.
 |
Figure 3:
Test run for Ga-GA, taken from the ensemble of ten runs,
for the noiseless double Gaussian target (solid line) of Case 2
and the profile modelled by Ga-GA ( ) |
 |
Figure 4:
Test run for Ga-GA, taken from the ensemble of ten runs,
for the noiseless five Gaussian target (solid line) of Case 3
and the profile modelled by Ga-GA ( ) |
Figure 4 demonstrates Ga-GA's handling of
the more complex case 3, resulting in
of the fittest genotype after 1200 generations. For
these test cases final values of
, if we doubled the
number of generations, will be limited by numerical precision and
would possibly attain no better values than those given and it
must be emphasised that these results are for one particular
run of Ga-GA from the ensemble of 10 runs.
We show, in Fig. 5, the decrease in
with generation number for the full ensemble of runs
(indicating the mean
(solid line), extrema (dashed
line) and median (dotted line) for each generation step) for each of the
test cases above. These plots demonstrate the power of Genetic Algorithms as
optimization tools.
 |
Figure 5:
Convergence of against generation number for each of
the three cases in Sect. 3.1. Top panel: Case 1
(single Gaussian), Middle panel: Case 2 (two Gaussians) and Bottom
panel: Case 3 (five Gaussians). For each generation step the mean
(solid line), median (dotted line) and extrema (dashed
lines), for the ten run ensemble, are indicated. It is clear that,
when a relatively "poor" parameterisation is present, the
difference between the median and mean of is
demonstratably effected, this effect is evident in the top and
bottom panels |
The steplike structure is clearly visible in all
three plots, although to a much greater extent in the uppermost plot.
Such steps occur when Ga-GA suddenly obtains a new "fitter"
value for one (or more) parameter(s), the long flat "plateaus" are
points where the current "best" in the population hasn't changed or
when the population is largely degenerate, i.e. all the individuals have
very similar genotypes. These mutation jumps will occur because the mutation rate has been
allowed to increase, and will thus introduce new genetic material at a
higher frequency.
Figure 5 also justifies our earlier claim that more
complex targets (more parameters) require a greater number
generations in the run. As with any optimization method the plots
show how the gradient of
lessens with the increase in
the number of parameters in the genotype, the increase in the
number of generations required for a GA to evolve an acceptable
solution increases with the dimension, D, of the search space;
typically it does so in a manner that is highly problem dependent,
but often ends up as being a low (order unity) power of N. So
such convergence plots provide evidence to suggest that we have not
yet evolved a "perfect" match for the target. This may be estimated
by looking at the gradient of the plot at the end of its
evolutionary run. The center and bottom plots in
Fig. 5 show that the evolutionary process may not be
finished.
Reliable analysis of a "noisy" target must be the benchmark for any
spectral decomposition technique. We therefore compare the performance
of Ga-GA to that of the AMOEBA and CURVEFIT
algorithms in decomposing a "noisy" five Gaussian target, again with
Ga-GA results the mean of ten runs. The target is generated
by the same fifteen parameter genotype as case 3 of
Sect. 3.1
(
)to which we now add
"random" noise. The noise is set to be
normally distributed about the data with an rms amplitude of
, so
in
Eq. (3).
Table 2:
Details of the target parameters(PT), genetically modelled solution
returned by Ga-GA and the deterministic routines for the
fifteen parameter configuration with
normally distributed
random noise. Ga-GA results and CPU times (
) are
the mean of an ensemble of ten runs. The CPU times are
normalised to the CPU time of a CURVEFIT run
|
The results of the calculations for each algorithm
are shown in Table 2 where Ga-GA achieves the
lowest
(1.889), by a factor of six from CURVEFIT
(12.961) and by a factor of about ten from AMOEBA (18.626). It
must be noted that all produce "good" parameterisations of the
spectrum given the severe noise present, but bear in mind that the
latter two algorithms are practically given the target parameters
as a startpoint, and are hence heavily influenced by the user. This
is definitely not the case with Ga-GA.
 |
Figure 6:
Performance comparison plot between Ga-GA, AMOEBA
and CURVEFIT. They are compared using the target of
Sect. 3.2 with 15% added random noise. See also Table 2 |
CURVEFIT and AMOEBA also exhibit
another behavioural pattern not observed with Ga-GA; they
will occasionally become "stuck" at points in the solution space
where hope of convergence to the target is lost
. This does not happen in every run, but indicates to the
user that a single run using either method is not enough to
guarantee a reliable parameterisation.
Figure 6 shows the results of Ga-GA (
),
CURVEFIT (+) and AMOEBA (
) operating
on the fifteen parameter, five Gaussian target. The profile shown for
Ga-GA, as in Sect. 3.1, is the "fittest"
phenotype from the ten different runs. It is clear from the results
in Table 2, and the plots in Fig. 6 that the sharp
features of Gaussian two (at a possible limit of resolution)
present CURVEFIT and AMOEBA with a very awkward test. Indeed, by
inspection of the errors quoted in Table 2 it is possible to see
the feature(s) that Ga-GA finds most awkward to
"identify", these are the amplitudes A2, A3 and A4.
We now consider the case where the target has a considerable
background level. A GA approach makes inclusion of such a
background, or continuum, extremely simple. To show this, consider
a parameterisation of the background by addition of a quadratic of
order n, an example for n=2 is given in Eq. (4). As
an example, consider a new three Gaussian configuration
with 5%
noise (
) and background; the
alteration to the fitness evaluation routine is minimal. We add the
quadratic form to the standard phenotype calculation of
Eq. (1), which then becomes:
|  |
(4) |
where a, b, and c are taken from the adapted genotype by adding
to the Gaussian description parameters.
 |
Figure 7:
Plot of the 3 Gaussian configuration [10 90 6 50 70 3 80 40 4]
and the background parameters, a=30.0, b=0.5, and c=0.002 with a
5% random noise level. See also Table 3 |
To generate the target the background parameters are assigned the
values a=30.0, b=0.5 and c=0.002.
A plot of the target solution (broken line) and the best phenotype
(
) is shown in Fig. 7. The figure also shows
the profile returned by CURVEFIT (+) and that returned by AMOEBA
(
). Ga-GA's estimate of the background
parameters are a=29.243, b=0.554 and c=0.002 (with respective
errors given below). Ga-GA results were returned after
1000 generations and the mean final
was 0.8664,
with CURVEFIT giving a statistically equivalent fit (0.8600) and
AMOEBA by a factor of two (2.000). The full results of the
parameterisation for all three algorithms are given in Table 3.
Table 3:
Results from Sect. 3.3 for a target (P(T))
with fixed background level and
normally distributed random
noise. Again, Ga-GA results and CPU times (
) are
the mean of an ensemble of ten runs. CPU times are normalised to that
of a CURVEFIT run
|
Up: Spectral decomposition by genetic
Copyright The European Southern Observatory (ESO)