next previous
Up: An image database


Subsections

3 Preliminary analysis

We constructed 10 training samples in different regions and for different plates. For these samples the objects are classified, by eye, as Star, Galaxy and Unknown. From the corresponding matrix of pixels of classified objects we calculated many parameters and systematically plotted them two by two. We found that the dispersion of pixel optical densities (i.e. standard deviation of the pixel intensities) plotted versus the inverse of the surface area gives a diagram in which galaxies and stars are well separated in two distinct zones as in Fig. 2. The surface area is simply the number of pixels having an intensity I larger than the sky background intensity $I_{\rm bg}$. The dispersion of pixel density, $\sigma$, is calculated as the standard deviation of the pixel intensities through the classical equation:

\begin{displaymath}%
\sigma = \sqrt{ \frac{n \sum I^2- (\sum I)^2}{ n^2}}
\end{displaymath} (1)

where the sums are calculated with the n pixels brighter than $I_{\rm bg}$. An example of this diagram is given in Fig. 2.

3.1 First star/galaxy recognition


  \begin{figure}
\par\includegraphics[width=6cm,clip]{ds1851f2.eps}\end{figure} Figure 2: Dispersion of density (i.e., standard deviation of the pixel intensities) of a given object versus the inverse of its surface area. Here the diagram is shown for a training field for which stars, galaxies and defects have been classified by human expert. Stars (crosses) and galaxies (open circles) are well separated. The units are the following: the dispersion of density $\sigma (d_{\rm c})$is expressed in units of 6553.4 times the actual optical density ( $\log S_{\rm o} /S$) of the plate (see footnote in first page). The surface area is simply the number of pixels above the sky background (each pixel has a constant surface area of $1.7\hbox {$^{\prime \prime }$ }\times 1.7\hbox {$^{\prime \prime }$ }$). The choice of these units is not crucial provided it is the same throughout the work as it is in the present paper

These diagrams were plotted for each plate (i.e., 1443 diagrams) and a polynomial separation curve was fitted manually to each of them. Three examples of these diagrams are given in Figs. 3 to 5, from the best to the worst. In Fig. 3 the frontier between Stars and Galaxies is a straight line. Stars and Galaxies are well separated. The frontier separating stars from galaxies is often quite linear as in Fig. 3, but not necessarily. Indeed, it is also common that the separation curve bends down for large objects (small 1/S) as in Fig. 4. This seems to be due to the saturation of pixel intensities in either the central part of galaxies or in the halo of bright stars. This phenomenon has also been seen in the source extraction from I-band CCD images of the DENIS survey (Mamon, private communication). In regions of low galactic latitude the separation is more difficult as shown in Fig. 5 for a fields located at $b=-2 \deg$. Thus, at low galactic latitude (i.e., $\vert b\vert<18 \ \deg$) the separation between stars and galaxies becomes more difficult especially for small objects (i.e. large values of the inverse of the surface area) because the two zones are progressively mixing with each other.


  \begin{figure}
\par\includegraphics[width=6cm,clip]{ds1851f3.eps}\end{figure} Figure 3: Example of the diagram of dispersion of density versus the inverse of the surface area ( $\sigma -1/S$) for a field in the Southern equatorial hemisphere. The separation between stars and galaxies can be inferred from a comparison with Fig. 2. The separation curve between stars and galaxies is linear. The units are the same as in Fig. 2


  \begin{figure}
\par\includegraphics[width=6cm,clip]{ds1851f4.eps}\end{figure} Figure 4: Another example of a diagram of dispersion of density versus the inverse of the surface area ( $\sigma -1/S$) for a field in the Northern equatorial hemisphere. The separation curve between stars and galaxies is not linear. The units are the same as in Fig. 2


  \begin{figure}
\par\includegraphics[width=6cm,clip]{ds1851f5.eps}\end{figure} Figure 5: Another example of diagram $\sigma -1/S$ for a field located near the galactic plane ($b=-2 \deg$) in the Southern equatorial hemisphere. The separation between stars and galaxies is more difficult. The units are the same as in Fig. 2

This first discrimination step produces a catalogue with $4\ 349\ 140$ galaxy candidates and $47\ 352\ 280$ star candidates. Hereafter, only the galaxy candidates will be considered. Nevertheless, our process did not remove every star from the galaxy candidate sample. A visual inspection showed that bright stars are sometimes counted as galaxy candidates because of their extended halo as explained above.

The construction of a completeness curve is a general way to check if a catalogue obeys the expected increase of object number with distance. If we assume that the number of galaxies within a sphere centered on the observer and of radius r increases as r3, it can be shown that the number N of galaxies with an apparent diameter larger than a given limit $D_{\rm lim}$ follows the law: $\log N(D>D_{\rm lim})=-3 \log D_{\rm lim}+ cst$[*]. Generally, this completeness curve is used to check if a sample is complete up to a given apparent diameter. Here, it is used to check if the number of galaxy candidates is homogenously distributed in space, as expected. Note that this curve is insensitive to the angular coverage of the catalogue or to the galactic extinction.

The over-sampling of large objects is confirmed by the completeness curve $\log N-\log D_*$(Fig. 6), which shows an excess of large galaxies. Here, $D_*=\sqrt{4\, S\hbox{$^{\prime\prime}$ }/ \pi}$ is the equivalent diameter defined from the surface area $S\hbox{$^{\prime\prime}$ }$ in arcsec-2. In this paper the surface area S is expressed in number of pixels. Thus, from the pixel size 1.7 $^{\prime\prime}$ it results:

 \begin{displaymath}%
\log D_* = 0.5 \log S +0.283
\end{displaymath} (2)

where D* is in arcseconds and the surface area S in number of pixels. We note that the completeness curve in diameter is quite linear for objects smaller than $\log D_*=1.25$ (i.e. 1/S<0.012 or $D_* \approx 18\hbox{$^{\prime\prime}$ }$). The completeness is fulfilled down to $\log D_*=1.04$, i.e. $D_*=10\hbox{$^{\prime\prime}$ }$in agreement with our cut-off (36 pixels). The catalogue contains one million galaxy candidates larger than $18\hbox{$^{\prime\prime}$ }$, and thus 3.3 millions with diameter between $18\hbox{$^{\prime\prime}$ }$ and $10\hbox{$^{\prime\prime}$ }$.


  \begin{figure}
\par\includegraphics[width=8cm]{ds1851f6.eps}\end{figure} Figure 4: Completeness curve $\log N-\log D_*$ built from the galaxy candidate catalog. The effective diameter D* is expressed in arcseconds. The relation with the surface area S expressed in number of pixels is given by Rel. (2). It is visible that there is an excess of large objects (see text)

Now, we have to clean up our galaxy candidate catalogue. This is the target of the next section.


next previous
Up: An image database

Copyright The European Southern Observatory (ESO)