Up: Covered data structures

# 3. The algorithm

In this chapter the basic steps of the algorithm will be described. The mathematical details will be shown in the appendix. The following formulae are given in order to have a clear mathematical definition. The whole data set has to have the following form:

This equation can be split into two parts in order to be able to separate one component from the rest of the data set I:

f(x) represents the form and position of the overlapping main structure. si(x), i=1,..,n represents the substructures and h as well as gi represent a kind of amplitude or multiplier of the structures.

The following inequalities form the restrictions:

These restrictions normally do not represent a handicap due to two considerations:

1. All the data sets based on histograms (e.g. counting values) have always positive values.
2. It is often possible to shift the data sets to values larger than zero.

The algorithm needs the following main information:

• The whole data set I(x).
• A first estimation f0(x) of f(x) of the superimposing or overlapping main structure.

The estimation of f(x) concerns only the form and position but not the intensity assumed in the whole data set. Further parameters of the algorithm are flags which indicate whether one of the sets has to be smoothed or not.

The algorithm represents an iteration performing the following steps:

## 3.1. The preparation-phase

Depending on the flags mentioned above, smoothed versions of the sets I and f will be built ( and ). With these sets, errors (e.g. so-called outliers) will be eliminated.

The next step consists in the calculation of an offset Of. If an offset over the whole data set can be found, this offset has to be considered in the following calculations.

Referring to Eq. (1 (click here)) the algorithm has to estimate the parameter h. In order to get a starting value h0 for the iteration, h0 will be calculated as follows:

The value of the parameter depends on the estimation of the error in the data set. The higher the error (resp. the ratio between the error and the data set) will be assumed, the higher the value of has to be.

After these preparations the iteration starts.

## 3.2. The iteration

In order to get hi+1, we calculate a function which represents a ratio between the whole data set Is and the estimated main structure :

The values of G lie between -1 and any value >0 (cf. Appendix, points 1 and 2). In these regions where no substructure takes part in the data set, and if the real amplitude h has been found, G(x)=0. The value increases in these regions where the substructures become higher than the estimated main structure. This can lead to extremely high values in the regions where the main structure is much lower than the substructures and the errors (mostly the edges of ). In these regions where the estimated main component is higher than the substructures the values of G lie between 0 and 1 () (cf. Appendix, point 3).

This function is the basis of the so-called correcting parameter a, which represents a kind of a weighted average of special values of G. Two parameters are introduced which select the best values of G in order to build the parameter a:

1. The higher the value of f0 the more it influences the main structure of the data set I. A parameter takes this fact into account. If there are no substructures, the value of could be set to zero.
2. A parameter is introduced in order to exclude all values of G which exceed a certain value.

Further, a weighting function wa favours these values of G where f0 has its highest values (e.g. ).

Based on G, , and the weighting-function wa, the correcting parameter a has the following form:

With this parameter, a first improvement of the amplitude hi can be reached:

After the calculation of hi1 it is possible to build a new set by substracting the estimated main structure from the data set I. There should not remain values less than zero due to condition 2 (click here). If they exist, then the main structure has been overestimated. Therefore the set can be used to calculate a second parameter b which corrects such an overestimation:

n represents the number of elements of . The new amplitude hi+1 is now calculated as follows:

## 3.3. The end of the iteration

The end of the whole iteration is determined by one of the following three factors:

• The difference between hi and hi+1 is smaller than a given limit.
• The variance of all iterated amplitudes is lower than a given limit.
• The variance begins to increase. In this case, the form and/or position of the main component is wrong.

The second and third condition are necessary for the following reasons:

It is possible, that the algorithm does not converge to an exact value. If the algorithm converges to two fixed values and pends between these values (near the solution), the variance stops this pending state after a certain time. In case of wrong form or position of the main component, the variance increases after some iterations and does not converge.