Up: Covered data structures

# 2. Actual methods used

In order to get the number, form and position of the different structures, as well as an estimation of the error, many solutions exist. Most of these solutions represent a specialization of two main methods:

In many cases the Maximum Likelihood-method (ML-algorithm)  is used to estimate unknown  parameters (Sutherland & Saunders 1992). Boller also uses an algorithm based on the Likelihood Satistic and works with four-dimensional (artificial) Gaussian distributions.

The density-estimation is another way, often used, to estimate parameters or shapes of distributions building a mixed distribution. One of the algorithms based on the density estimation is the Kernel-method (De Jager et al. 1986).

The specialized solutions requires one or more restrictions which can affect for example:

• the number of substructures,
• special parameterizable models such as Gaussian distributions,
• differentiability or
• the number of dimensions.

In order to categorize objects in a color-color diagram, the most common method is described for example in Walker & Cohen (1988) and Walker et al. (1989). Having a sample of some already identified objects, they try to calculate a distribution-function using different methods. This function often has its base in the normal distribution. Depending on the presentation of the results, the following is often given:

• The parameters of the distribution functions for each type of found group of the diagram.
• A box (drawn in the diagram) giving the maximal limits of the regions where such a type of object should be found.

Walker et al. (1989) give a diagram with boxes specifying the limits mentioned above. Further a table is given containing the parameters and of Gaussian distributions for many types of objects and for each dimension of the color-color diagram ([12]-[25], [25]-[60], [60]-[100]).

The disadvantages of such a kind of presentation are the following:

• Referring to the table of Walker et al. (1989) the parameters are presented only for one dimensional Gaussian distributions. Building a multidimensional Gaussian distribution with these parameters leads to errors because of a missing skewness-factor. Further, the relations between the different Gaussian distributions are not given. Therefore it is not possible to calculate contamination ratios or to clearly distinguish between the different groups.
• The graphical presentation with boxes has the same disadvantages as described above. Furthermore the limits never represent natural regions.

The new algorithm can be used with natural regions. It is not necessary to adapt any distribution function to the data set. In order to investigate for example the occupation zones (OZs) of different types of objects in a color-color-diagram, two approaches are conceivable:

1. Use the 4580 so-called unassociated IRAS sources defined in Walker & Cohen (1988) as data set I. In order to calculate the underlying substructures, use the same sets si of identified objects as given in Walker et al. (1989). Using the algorithm with I and si the result will be a distribution function which gives an idea concerning not only the regions where such a group of objects should be found but also the contamination of the different groups.
2. In case of a large dispersion of a group it is possible to use instead of si the Gaussian parameters as starting values.

We are currently investigating the capabilities of the algorithm respecting that kind of color-color-diagrams (Kienel & Kimeswenger, in preparation).

Up: Covered data structures

Copyright by the European Southern Observatory (ESO)
web@ed-phys.fr