We use a Neural Network (hereafter NN) method to perform the cleaning. This requires the construction of a large training sample. We build it by cross-identifying each object of our preliminary catalogue with known stars or galaxies. The known stars are taken from the SAO catalogue. The known galaxies are taken from the LEDA database.
The cross-identification is based on the J2000 equatorial coordinates.
The identity of two objects is accepted when there is only one
object within a radius of 10''.
This severe constraint removes interacting objects which are not suitable
for a training sample.
So, we obtain 54 186 objects classified as galaxy "G'' and 90 339 classified
as star "S''.
Further, 2105 objects are classified as defect "D'' because of their discrepant
characteristics (e.g., a very elongated matrix with
nli/npx > 25 or
npx/nli >25).
Objects with
are not used for the construction of the training sample.
The NN has three outputs: G, S and D for galaxies, stars and defects, respectively. The choice of the input parameters of the NN is important. They must be very discriminant with reference to the outputs. There is no rule for the choice of these parameters. We define seven parameters which will be tested with our training sample:
![]() |
Figure 7: Decomposition of a matrix in nine rectangles. This decomposition is used to define the diffraction cross parameter dc and the defect parameter df |
![]() |
Figure 8: Histogram of the square of the external perimeter divided by the matrix surface area for stars (solid line) and galaxies (dashed line) |
![]() |
Figure 9: Histogram of the diffraction-cross parameter for stars (solid line) and galaxies (dashed line) |
![]() |
Figure 10: Histogram of the ratio of the object surface area divided by the matrix surface area for stars (solid line) and galaxies (dashed line) |
After several trials and errors, we adopted the NN represented in Fig. 11. Because there are only three output parameters (G, S, D) we adopted a simple NN with only one intermediate layer of 10 neurones each of them having 7 input parameters and 3 output ones. There are 100 free weights W connecting two neurones of two different layers. The input is a vector with seven components. The output is a vector with three components. The expected output vectors are (1, 0, 0), (0, 1, 0), (0, 0, 1).
The different steps of the NN training are the following.
First of all, the weights are randomly chosen between -1 and 1.
Then, the training sample is read and each individual parameter is normalized
by subtracting its mean and dividing
by its standard deviation, both calculated from the whole sample.
So, we obtain seven input components Pi.
An object is a vector with seven components.
Then, for each object, the seven input parameters Pi are entered and
propagated down to the last layer (output layer).
For this purpose, the input X of a given neurone is the weighted mean
of its input connections while
its output is calculated through a non
linear sigmoid function:
![]() |
(3) |
![]() |
(4) |
The process is repeated (i.e., the normalized input parameters are entered and the calculation is done again) until the system becomes stable. In practice this is done by testing different iteration numbers.
We did some preliminary tests on the whole training sample to find the best number of intermediate neurones and the best number of iterations. We tested the number of intermediate neurones between 7 and 42 and the number of iterations between 50 and 600. Finally, we decided to adopt 10 intermediate neurones and 100 iterations.
An efficient way to demonstrate the success of an automatic classification programme is the usage of a "control sample'', i.e. determine the automated parameters (G, S, D from NN) in the same way for the control sample and compare these with independent reference values of the control sample. Actually, we built nine control samples. The whole sample of 132972 objects with proper object classification was divided into ten non-overlapping subsamples S0 to S9 having the same size (1/30-th of the total sample). The NN was programmed ten times and we kept the solution (S0) giving the best result for the whole sample. Then, to prove the validity of the NN, configured with S0 only, we applied this configuration to the nine independent samples S1 to S9. The results for these nine control samples are given in Table 1.
samples | size | percentage of success |
S1 | 4667 | 94% |
S2 | 4667 | 94% |
S3 | 4667 | 93% |
S4 | 4667 | 93% |
S5 | 4667 | 93% |
S6 | 4667 | 92% |
S7 | 4667 | 94% |
S8 | 4667 | 94% |
S9 | 4667 | 94% |
Total sample | 132972 | 84% |
Obviously, the components of the calculated output vector (G, S, D) are not exactly 0 or 1. The component G obtained with the training sample is shown in Fig. 12. Most of its values are 0 or 1 (i.e., the NN answers either "yes'' if it is a galaxy, or "no'' if it is not a galaxy).
![]() |
Figure 12: NN-Output G obtained with the training sample. Most of the values are close to zero or one. Components S or D have exactly the same bimodal distribution |
In our control we considered a result as good when the largest component corresponds to the expected one. For instance: if we got the answer: G=0.7, S=0.6, D=0.1for an object known as a galaxy (G=1, S=0, D=0) we concluded that the NN gave a right answer, because the largest component is G.
For the final application of the NN we imposed more severe constraints in order to reduce contamination of the galaxy catalogue by stars or defects. The adopted conditions were those given in Table 2. Further, some objects are considered a priori as defects when the parameter P5 (ratio of the object surface area by the matrix surface area) is larger than 0.95 (case of a matrix almost without sky background pixel), or when the axis ratio is larger than 100.
Conditions | Classification | code |
![]() |
Galaxy | G |
![]() |
Probable Gal. | g |
![]() |
Star | S |
D > E and D > G | Defect | D |
otherwise | Possible Gal. | - |
Using this NN cleaning we classified:
objects as galaxies (G),
as probable galaxies (g),
as "possible galaxies'' (-).
We classified:
objects as stars (S) (in addition
to the catalogue of 47 million stars previously extracted),
as defects (D).
Copyright The European Southern Observatory (ESO)