Up: Spectral decomposition by genetic

3 Results

This section details the results of Ga-GA applied to simulated target data sets which have a known level of noise added. Section 3.1 discusses the performance of Ga-GA in the absence of data noise (except for very small numerical rounding errors). Sections 3.2 and 3.3 provide ideal circumstances to test the performance of Ga-GA, against that of the two standard algorithms mentioned earlier; CURVEFIT and AMOEBA, for data with a realistic noise level and with a noisy background present (Sects. 3.2 and 3.3 respectively). Section 3.3 will also show the ease with which additional spectral features may be incorporated into the analysis.

3.1 Application to noiseless target spectra

We use Ga-GA to analyse three noiseless targets, i.e. we replace $\sigma_{\rm data}(\underline{x})$ by 1 in Eq. (3), each corresponding to a different Gaussian configuration. The three test targets are: 1) A single "wide" Gaussian with the target genotype given by three parameters, $\left[\,X\,A\,W\,\right] = \left[\,50\,100\,20\,\right]$ . 2) Two "joined" Gaussians corresponding to the six parameter genotype $\left[\,40\,100\,20\,\,80\,90\,15\,\right]$ , and 3) a more complex five Gaussian configuration with the fifteen parameter target genotype given by $\left[\,10\,30\,5\,\,22\,60\,1\,\,26\,40\,3\,\,43\,70\,5\,\,55\,60\,5\, \right]$ .

Each case was analysed ten times (to allow performance statistics to be compiled), each run with a different initial population, for a fixed number of generations. It is also possible to configure Ga-GA to run until it achieves a fixed $E(\underline{x})$ although for certain types of analysis this method is unfavourable (Charbonneau & Knapp 1996). The number of generations used in each case is different however, and varies with the increase in complexity of the target solution. Therefore target 3 typically requires a 1200 generation run, which is considerably more than the 200 and 500 generation runs required for targets 1 and 2 respectively.

The returned parametrisation of each target is given in Table 1. The subscript T quantities (e.g. X_T) are the target parameters and the subscript G quantities (e.g. X_G) are the corresponding mean values returned by Ga-GA after multiple fixed generation runs. It is clear from the results presented in Table 1 that Ga-GA obtains a very good representation of each target (within the errors).

**Table 1:** Results for cases 1), 2) and 3) described above. Subscript T quantities indicate target parameters, and subscript G quantities are the mean after multiple evolutionary runs. Similarly, the values of $\langle E(\underline{x}) \rangle$ are the final mean values of $E(\underline{x})$ . The errors for each parameter are calculated as the means of the ten run ensemble
$\begin{tabular} {p{1.3cm}p{1.3cm}p{1.3cm}p{2.3cm}p{2.3cm}p{2.3cm}} \hline\hline ... ...pace & $59.964 \pm 0.014$\space & $5.003 \pm 0.001$\space \\ \hline\end{tabular}$

$\begin{figure} \resizebox {\hsize}{!}{\includegraphics{H0844F2.ps}} \vspace{-2mm}\end{figure}$

Figure 2: Test run for Ga-GA, taken from the ensemble of ten runs, for the noiseless single Gaussian target (solid line) of Case 1 and the profile modelled by Ga-GA ( $\triangle$ )

The errors in the parameters are global error estimates and are calculated in a Monte Carlo fashion, i.e. we perform multiple runs of Ga-GA each with a different initial population, this is achieved by initializing the random number generator with a different seed (Charbonneau & Knapp 1996). This Monte Carlo approach "forces" Ga-GA to search the parameter space from a different starting point each time. This will also allow the calculation of "mean" values for each of the parameters.

Figure 2 shows a plot of target 1 (solid line) and the profile derived from the "fittest" genotype ( $\triangle$ ) after only 200 generations with the $E(\underline{x}) = 2.476\ 10^{-4}$ .Similarly, Fig. 3 shows the profile constructed from the fittest genotype, $E(\underline{x}) = 3.296\ 10^{-3}$ , for the double Gaussian configuration of target 2.

$\begin{figure} \resizebox {\hsize}{!}{\includegraphics{H0844F3.ps}} \vspace{-2mm}\end{figure}$

Figure 3: Test run for Ga-GA, taken from the ensemble of ten runs, for the noiseless double Gaussian target (solid line) of Case 2 and the profile modelled by Ga-GA ( $\triangle$ )

$\begin{figure} \resizebox {\hsize}{!}{\includegraphics{H0844F4.ps}} \vspace{-2mm}\end{figure}$

Figure 4: Test run for Ga-GA, taken from the ensemble of ten runs, for the noiseless five Gaussian target (solid line) of Case 3 and the profile modelled by Ga-GA ( $\triangle$ )

Figure 4 demonstrates Ga-GA's handling of the more complex case 3, resulting in $E(\underline{x})=1.984 \ 10^{-4}$ of the fittest genotype after 1200 generations. For these test cases final values of $E(\underline{x})$ , if we doubled the number of generations, will be limited by numerical precision and would possibly attain no better values than those given and it must be emphasised that these results are for one particular run of Ga-GA from the ensemble of 10 runs.

We show, in Fig. 5, the decrease in $E(\underline{x})$ with generation number for the full ensemble of runs (indicating the mean $E(\underline{x})$ (solid line), extrema (dashed line) and median (dotted line) for each generation step) for each of the test cases above. These plots demonstrate the power of Genetic Algorithms as optimization tools.

$\begin{figure} \resizebox {\hsize}{!}{\includegraphics{H0844F5.ps}} \vspace{-2mm}\end{figure}$

Figure 5: Convergence of $E(\underline{x})$ against generation number for each of the three cases in Sect. 3.1. Top panel: Case 1 (single Gaussian), Middle panel: Case 2 (two Gaussians) and Bottom panel: Case 3 (five Gaussians). For each generation step the mean $E(\underline{x})$ (solid line), median (dotted line) and extrema (dashed lines), for the ten run ensemble, are indicated. It is clear that, when a relatively "poor" parameterisation is present, the difference between the median and mean of $E(\underline{x})$ is demonstratably effected, this effect is evident in the top and bottom panels

The steplike structure is clearly visible in all three plots, although to a much greater extent in the uppermost plot. Such steps occur when Ga-GA suddenly obtains a new "fitter" value for one (or more) parameter(s), the long flat "plateaus" are points where the current "best" in the population hasn't changed or when the population is largely degenerate, i.e. all the individuals have very similar genotypes. These mutation jumps will occur because the mutation rate has been allowed to increase, and will thus introduce new genetic material at a higher frequency.

Figure 5 also justifies our earlier claim that more complex targets (more parameters) require a greater number generations in the run. As with any optimization method the plots show how the gradient of $E(\underline{x})$ lessens with the increase in the number of parameters in the genotype, the increase in the number of generations required for a GA to evolve an acceptable solution increases with the dimension, D, of the search space; typically it does so in a manner that is highly problem dependent, but often ends up as being a low (order unity) power of N. So such convergence plots provide evidence to suggest that we have not yet evolved a "perfect" match for the target. This may be estimated by looking at the gradient of the plot at the end of its evolutionary run. The center and bottom plots in Fig. 5 show that the evolutionary process may not be finished.

3.2 Application to a "noisy" target spectrum

Reliable analysis of a "noisy" target must be the benchmark for any spectral decomposition technique. We therefore compare the performance of Ga-GA to that of the AMOEBA and CURVEFIT algorithms in decomposing a "noisy" five Gaussian target, again with Ga-GA results the mean of ten runs. The target is generated by the same fifteen parameter genotype as case 3 of Sect. 3.1 ( $\left[\,10\,30\,5\,\,22\,60\,1\,\,26\,40\,3\,\,43\,70\,3\,\,55\,60\,5\,\right]$ )to which we now add $15\%$ "random" noise. The noise is set to be normally distributed about the data with an rms amplitude of $15\%$ , so $\sigma_{\rm data}(\underline{x}) = 0.15\,C(\underline{x})$ in Eq. (3).

**Table 2:** Details of the target parameters(P_T), genetically modelled solution returned by *Ga-GA* and the deterministic routines for the fifteen parameter configuration with $15\%$ normally distributed random noise. *Ga-GA* results and CPU times ( $T_{\rm CPU}$ ) are the mean of an ensemble of ten runs. The CPU times are normalised to the CPU time of a CURVEFIT run
$\begin{tabular} {ccccc} \hline\hline $P$\space & $P_{T}$\space & \small{AMOEBA} ... ...space & & $18.626$\space & $12.961$\space & $1.889$\space \\ \hline\end{tabular}$

The results of the calculations for each algorithm are shown in Table 2 where Ga-GA achieves the lowest $E(\underline{x})$ (1.889), by a factor of six from CURVEFIT (12.961) and by a factor of about ten from AMOEBA (18.626). It must be noted that all produce "good" parameterisations of the spectrum given the severe noise present, but bear in mind that the latter two algorithms are practically given the target parameters as a startpoint, and are hence heavily influenced by the user. This is definitely not the case with Ga-GA.

$\begin{figure} \resizebox {\hsize}{!}{\includegraphics{H0844F6.ps}} \vspace{-2mm}\end{figure}$

Figure 6: Performance comparison plot between Ga-GA, AMOEBA and CURVEFIT. They are compared using the target of Sect. 3.2 with 15% added random noise. See also Table 2

CURVEFIT and AMOEBA also exhibit another behavioural pattern not observed with Ga-GA; they will occasionally become "stuck" at points in the solution space where hope of convergence to the target is lost. This does not happen in every run, but indicates to the user that a single run using either method is not enough to guarantee a reliable parameterisation.

Figure 6 shows the results of Ga-GA ( $\ast$ ), CURVEFIT (+) and AMOEBA ( $\diamondsuit$ ) operating on the fifteen parameter, five Gaussian target. The profile shown for Ga-GA, as in Sect. 3.1, is the "fittest" phenotype from the ten different runs. It is clear from the results in Table 2, and the plots in Fig. 6 that the sharp features of Gaussian two (at a possible limit of resolution) present CURVEFIT and AMOEBA with a very awkward test. Indeed, by inspection of the errors quoted in Table 2 it is possible to see the feature(s) that Ga-GA finds most awkward to "identify", these are the amplitudes A₂, A₃ and A₄.

3.3 Application to a target with a background level

We now consider the case where the target has a considerable background level. A GA approach makes inclusion of such a background, or continuum, extremely simple. To show this, consider a parameterisation of the background by addition of a quadratic of order n, an example for n=2 is given in Eq. (4). As an example, consider a new three Gaussian configuration $\left[\,10\,90\,6\,\,50\,70\,3\,\,80\,40\,4\,\right]$ with 5% noise ( $\sigma_{\rm data}(\underline{x})=0.05\,C(\underline{x})$ ) and background; the alteration to the fitness evaluation routine is minimal. We add the quadratic form to the standard phenotype calculation of Eq. (1), which then becomes:

$\begin{displaymath} P(\underline{x})_{j} = a + bx + cx^{2} +\,\sum_{i=1}^{N}\,G_{i}(x)\end{displaymath}$

(4)

where a, b, and c are taken from the adapted genotype by adding $\left[\,a\,b\,c\,\right]$ to the Gaussian description parameters.

$\begin{figure} \resizebox {\hsize}{!}{\includegraphics{H0844F7.ps}}\end{figure}$

Figure 7: Plot of the 3 Gaussian configuration [10 90 6 50 70 3 80 40 4] and the background parameters, a=30.0, b=0.5, and c=0.002 with a 5% random noise level. See also Table 3

To generate the target the background parameters are assigned the values a=30.0, b=0.5 and c=0.002.

A plot of the target solution (broken line) and the best phenotype ( $\ast$ ) is shown in Fig. 7. The figure also shows the profile returned by CURVEFIT (+) and that returned by AMOEBA ( $\diamondsuit$ ). Ga-GA's estimate of the background parameters are a=29.243, b=0.554 and c=0.002 (with respective errors given below). Ga-GA results were returned after 1000 generations and the mean final $E(\underline{x})$ was 0.8664, with CURVEFIT giving a statistically equivalent fit (0.8600) and AMOEBA by a factor of two (2.000). The full results of the parameterisation for all three algorithms are given in Table 3.

**Table 3:** Results from Sect. 3.3 for a target (P(T)) with fixed background level and $5\%$ normally distributed random noise. Again, *Ga-GA* results and CPU times ( $T_{\rm CPU}$ ) are the mean of an ensemble of ten runs. CPU times are normalised to that of a CURVEFIT run
$\begin{tabular} {ccccc} \hline\hline $P$\space & $P_{T}$\space & AMOEBA & CURVEF... ...space & & $2.000$\space & $0.8600$\space & $0.8664$\space \\ \hline\end{tabular}$

Up: Spectral decomposition by genetic