next previous
Up: An objective and automatic


Subsections

3 Performance test

We examine the performance of our method described in Sect. 2 by Monte Carlo simulations. The errors in estimates of position, redshift, and richness, missing rate of existing clusters (incompleteness), and spurious detection rate are investigated. Some comparison of this method with that by P96 is also discussed. In this section, we adopt $\theta _1 = 0$, $\Delta \theta = 2r_{\rm core}/{\rm d}_A(z=0.15)$ where ${\rm d}_A(z)$ is angular diameter distance, $n_{\theta}$ = 5, m1 (in the B band) = 14.0, $\Delta m$ = 0.5, and nm = 19. Limiting magnitude is set to mB=23.5.

3.1 Estimates of position, redshift, and richness

3.1.1 Monte Carlo simulation

When a cluster is detected, its projected position, redshift, and richness are estimated. Errors in these estimates depend not only on the real redshift and richness (hereafter $z_{\rm real}$ and $N_{\rm real}$, respectively), but also on limiting magnitude, color band, and the Galactic absorption. To evaluate the dependence on $z_{\rm real}$ and $N_{\rm real}$, we examine 20 cases with $z_{\rm real}$ = {0.16, 0.20, 0.24, 0.28} for N = 300 and $z_{\rm real}$ = {0.16, 0.20, 0.24, 0.28, 0.35, 0.40, 0.45, 0.50} for N = {1000, 3000}. In the present study we limit the redshift range to $z\leq 0.5$, for which we expect that ample data will be available in the near future. For each case, 500 artificial B band galaxy samples are generated by Monte Carlo simulation according to the model described in Sect. 2. N = 300 corresponds to MKW-AWM systems (Bahcall 1980), N = 1000 corresponds to Abell richness class 0-1, and N = 3000 corresponds to Abell richness class 2 (similar to the Coma cluster). The relationship between our N and Abell richness parameter $c \equiv N_{m_3\leq m\leq m_3+2}$ is presented in Appendix. The Galactic absorption is not considered.

3.1.2 Position

We measure angular distance between the true position of the cluster center $\vec{x}_0$ and the estimated position $\vec{x}_N$ where $N_{\rm p}$ is maximum in the "richness image'' for $z_{\rm est}$. Properly speaking, we must use the position $\vec{x}_{\cal L}$ corresponding to peak ${\cal L}_{\rm p}$, rather than peak $N_{\rm p}$. It is, however, much easier to detect a peak in the "richness image'' than in the "likelihood image'' as described in Sect. 2. Since $\vec{x}_N$ is actually close enough to $\vec{x}_{\cal L}$ (separation is much less than the core radius), there is almost no problem to use $\vec{x}_N$. The estimated positions are distributed around $\vec{x}_0$ and are well fit by two-dimensional Gaussian distribution. Figure 4 shows the values of $\sigma_{\rm est}$ of the best-fit Gaussians normalized by the angular core radius. The errors in the estimations are about $\theta_{\rm core}$, 0.5 $\theta_{\rm core}$, and 0.3 $\theta_{\rm core}$ for N = 300, 1000, and 3000, respectively. These values are quite small compared with the angular extensions of the clusters themselves.

  
\begin{figure}
\begin{center}
\includegraphics[width=7cm,angle=-90]{ds6487f4.eps}\end{center}\end{figure} Figure 4: Errors in position estimation $\sigma_{\rm est}$ normalized by the angular core radius $\theta_{\rm core}$ as a function of cluster redshift. Filled circles and solid line are for clusters with N = 3000, open circles and dashed line for N = 1000, and open triangles and dotted line for N = 300. How much fainter we can observe than m* is shown at the top

3.1.3 Redshift and richness

Figure 5 shows the result of redshift and richness estimations for the nearby 12 cases of the artificial clusters. The plus marks indicate the most probable values and the two contours represent 68% and 95% confidence levels. Three sets of a plus mark and two contours in each panel are for N = 300, 1000, and 3000. The contours are all elongated in the direction from the bottom left to the upper right. This is because the estimation of redshift and that of richness are coupled with each other. That is, a rich cluster at a large distance looks similar to a less rich, nearer cluster.
  
\begin{figure}
\begin{center}
\includegraphics[width=8cm]{ds6487f5.eps}\end{center}\end{figure} Figure 5: Errors in estimates of redshift and richness. In each panel of a-d, three cases corresponding to N = 3000, 1000 and 300 are shown. The plus mark means the most reliable value. Inner and outer contours around the plus mark show 68% and 95% confidence levels, respectively

The direction of the largest dispersion in the distributions of 500 points of ($z_{\rm est}$, $N_{\rm est}$), namely, the direction of the major axis of the contours in Fig. 5, differs amongst clusters of different richnesses. This is due to different relative ratio of number of cluster galaxies to that of field galaxies within cluster region. Figures 6a and 6b show accuracies in the estimates of redshift and richness, respectively, for all the 20 cases. Error bars mean the widths of 68% confidence contours in Fig. 5, projected onto the corresponding axis. Errors in the estimates of redshift and richness at z=0.2 are, respectively, about 0.02 and 12% for $N_{\rm real}$ = 3000 clusters and about 0.04 and 30% for $N_{\rm real}$ = 1000 clusters. No systematic deviations from true values are seen. Thus, redshift and richness estimations by this method go fairly well without any spectroscopic information.

  
\begin{figure}
\begin{center}
\includegraphics[width=8cm]{ds6487f6.eps}\end{center}\end{figure} Figure 6: Upper panel a) shows errors in redshift estimation, while lower panel b) shows errors in richness estimation. The error bars represent $\pm 1\sigma$, corresponding to the inner contours in Fig. 5. Filled circles and solid line are for clusters with N = 3000, open circles and dashed line for N = 1000, and open triangles and dotted line for N = 300

These errors are internal. In practice, there exist external errors in addition to the internal ones investigated above, owing to intrinsic properties of real clusters: dispersion in M* values, variations in shapes of luminosity functions and surface density profiles, elongation of clusters, substructures, overlapping with other clusters along the line of sight, etc.. These uncertainties will affect the estimations of $z_{\rm est}$ and $N_{\rm est}$. Moreover, for very distant ($z\sim 1$) ones, systematic evolutions of cluster galaxies or evolution of clusters themselves may also affect the estimates.

The most direct and serious effect on the redshift estimation comes from the dispersion in M*. Colless (1989) evaluated the upper limit of dispersion in M* to be 0.4 mag, which corresponds to a redshift estimation error of $\Delta z\sim 0.03$ in B band. For other uncertainties, it is difficult to quantitatively evaluate their effects on redshift and richness estimations. Intrinsic properties of real clusters are still unclear. Therefore, we should rather study them in more detail after obtaining a "large and statistically complete'' cluster catalog by an "objective'' cluster-finding method such as the present one by changing parameters of cluster models. Spectroscopic observations are also needed to verify the results of redshift estimations and to study M* values and its dispersion, evolution, etc.. Several times of iterations would be needed to establish both a really objective cluster catalog and a really objective cluster-finding technique.

3.2 Incompleteness

For a real but very faint (poor and/or distant) cluster, we may miss either the likelihood peak or the richness peak or both. To evaluate probabilities of missing real clusters, we again use the 20 $\times$ 500 artificial clusters. We find that our cluster-finding technique can detect almost all clusters up to $z_{\rm real}\sim$ 0.30. In the case of $N_{\rm real}$ = 3000, the missing probabilities do not exceed 0.2% (namely, no cluster in 500 samples is missed) at $z_{\rm real}\leq$ 0.35. Then the number of missed clusters begins to increase up to $\sim$5% at $z_{\rm real}$ = 0.50. In the case of $N_{\rm real}$ = 1000, incompleteness appears at $z_{\rm real}$ = 0.28 and grows up to $\sim$15% at $z_{\rm real}$ = 0.50. Even for poor ($N_{\rm real}$ = 300) clusters, only 8-15% are missed in the range of 0.16 $\leq z_{\rm real}\leq$ 0.28.

Gunn et al. (1986) pointed out the large incompleteness of the Abell catalog at $z\sim$ 0.30. There are only 8 Abell clusters in the regions they observed, although they estimated that about 150 clusters exist up to the redshift limit of 0.30. Complete sampling of distant clusters is indispensable for the correct understanding of their nature.

3.3 Spurious detection rate

We study the detection rate of non-physical (spurious) clusters using artificial random distribution of galaxies. Of course, the actual field galaxies have non-zero angular correlation function (e.g., Davis & Peebles 1983). Therefore the actual spurious detection rate may be slightly different from those based on random distribution. Even if the distribution of field galaxies is random, certainly there exist some galaxy clumps by projection effects. Searching for clusters by simply finding overdensities of galaxies on the sky will result in detecting a number of such spurious ones. Here we display how well we can suppress spurious detections by taking into account magnitude information and projected positions simultaneously.

We evaluate the spurious detection rate with 1000 sets of artificial $50'\times 50'$ "field'' data which do not contain any clusters. The limiting magnitude in the B band is 23.5. In order to evaluate the spurious detection rate rigorously, it is necessary to obtain ($z_{\rm est}, N_{\rm est}$) of all spurious clusters. However, since this is a time-consuming task, we adopt a simpler approach to roughly estimate the upper limit of spurious detection rate here.

In a "richness image'' for a given filter redshift, we simply count the number of "richness peaks'' which exceed a given threshold value $N_{\rm th}$ and are not separated more than 3$\theta_{\rm core}$ from the corresponding "likelihood peak''s. We perform this task for the 1000 artificial "fields''. The distribution of the 1000 "richness peak''s (per $50'\times 50'$ area) is very well fit by Poissonian distribution. We compute the best-fit Poissonian mean value ($\lambda$) with least squares method. Then we convert the $\lambda$ to the value per deg2 and simply regard it as an upper limit of spurious detection rate. In Fig. 7, we show the upper limits of spurious detection rate for four thresholds ($N_{\rm th}$=200, 300, 400, and 500) as a function of filter redshift by solid lines.

To compare these values with those by a traditional method, we calculate the spurious detection rates by count-in-cells technique with cell's size of 2$\theta_{\rm core}$ for 2.5$\sigma$ and 3$\sigma$ levels ($\sigma$ is the standard deviation of the distribution of the number of galaxies per cell). They are also shown in Fig. 7 by dashed lines. It is clearly seen that the use of magnitude information remarkably suppresses the spurious detection rate, especially at lower redshift.

Moreover, the values represented by solid lines in Fig. 7 are just upper limits. We can further suppress the spurious detection rate by examining the shape of the ${\cal L}_{\rm p}-z_{\rm fil}$ curve. For the most of spurious clusters, the ${\cal L}_{\rm p}-z_{\rm fil}$ curves (e.g., Fig. 3a) do not have a single peak and are sometimes very noisy so that we can exclude these cluster candidates as "junks'' from the resulting cluster catalog. For some of the others, however, the ${\cal L}_{\rm p}-z_{\rm fil}$ curves have a good-looking peak just like the one seen in Fig. 3a. These are "really spurious'' clusters, which we can not discriminate from real clusters even with additional information of galaxy magnitudes.

Let us roughly estimate the numbers of "really spurious'' clusters with z=0.16, 0.20, 0.24, and 0.28. For the case of z=0.16, first we randomly select 10 spurious cluster candidates in the "richness images'' for $z_{\rm fil}$ = 0.16. Then we examine their ${\cal L}_{\rm p}-z_{\rm fil}$ curves to find ones with good-looking peaks, and count the number of "really spurious'' clusters, $z_{\rm est}$ of which falls into 0.16 $\pm$ 0.02 (0.02 is half of the interval of $z_{\rm fil}$ for which likelihood values are actually computed). The numbers of "really spurious'' clusters are found to be 3 and 1 for $z_{\rm fil}$ = 0.16 and 0.20, respectively. No "really spurious'' clusters are found for $z_{\rm fil}$ = 0.24 and 0.28. Thereby the correct spurious detection rate goes down to much lower than the upper limit: it is about 30% at z=0.16, 10% at z=0.20, and less than 10% at z=0.24 and 0.28, of the values shown as the solid lines in Fig. 7.

  
\begin{figure}
\begin{center}
\includegraphics[width=5.5cm,angle=-90]{ds6487f7.eps}\end{center}\end{figure} Figure 7: Spurious detection rates per deg2 as functions of filter redshift. Solid lines show the results (upper limit) of the present method, while dashed lines show those of count-in-cells technique

Here we examine spurious detection only by simple statistical projection effect. In addition to this case, overlaps of two or more poor groups, superpositions of field galaxies on poor groups, and small clumpy portions in outskirts of nearby large clusters (see Sect. 4) also contribute to spurious detection. For these cases, ${\cal L}_{\rm p}-z_{\rm fil}$ curves will also be very noisy or have several peaks or no peak. Such cluster candidates can easily be excluded from the resulting catalog or checked off as doubtful ones. Only spectroscopic observations of the galaxies of these cluster candidates can reveal what in fact they are.

Even for conspicuous galaxy clumps found by simple glances at galaxy distributions, some of them eventually turn out to be spurious. On the other hand, some marginal concentration of galaxies are identified as real clusters. Using projected positions and magnitude simultaneously, we often obtain quite different results from those by intuitive methods which use only projected distributions of galaxies. In other words, we can quite easily identify a number of non-physical clusters which we can never discriminate without magnitude information.

3.4 Comparison with the method by P96

This method is a variant of the one by P96. The basic idea is identical, but there are two main differences in the actual procedures.

The first one is the form of likelihood function. While our likelihood function is based on Poisson statistics (Eq. (6)), the one employed by P96 (Eq. (15) of their paper) is based on Gaussian statistics, namely, P96's likelihood is proportional to $\chi ^2$. Of course, the equation is valid, as they say, when there are sufficient number of background galaxies. However, especially when the limiting magnitude is brighter and there are fewer galaxies, the adoption of Gaussian statistics becomes unsuitable. Moreover, the likelihood function actually employed by P96 (Eq. (16) of their paper) is an approximated shape of the formal expression, though they mention that maximizing the simplified likelihood is roughly equivalent to maximizing the formal one. We do not investigate how these differences affect the accuracies of redshift estimation. However, the results are obviously different. For our case, the true and estimated redshifts agree well and a significant systematic deviation does not appear up to at least z=0.5. On the other hand, the redshift estimated with the method by P96 tends to be systematically smaller than the true value (see Fig. 14 of their paper). Considering the different color band, limiting magnitude, and Hubble constant between this work and P96, z=0.5 for our case corresponds to $z\sim$0.7 and $z\sim$ 1.0 for the cases of V4 and I4 bands, respectively, in their paper. At $z_{\rm true}$ = 1.0 in the lower panel (for I4 band) of the Fig. 14 of P96's paper, the discrepancy between the true redshift and the mean value of the estimated ones are no less than 0.2, while that for our method is much less than 0.01 at z=0.50 as shown in Fig. 6a.

The second difference is binning procedure. We bin the galaxies with their positions and magnitudes while P96 did not. Binning procedure significantly reduces the processing time (down to a tenth) in the same computational environment. This is crucial for constructing an, especially, huge cluster catalog in which such techniques can display their real worth.


next previous
Up: An objective and automatic

Copyright The European Southern Observatory (ESO)