2. Detection level estimation in the wavelet space

2.1. Model and simulation

Simulations can be used for deriving the probability that a wavelet coefficient is not due to the noise (Escalera et al. 1992). Modeling a sky image (i.e. uniform distribution and Poisson noise) allows determination of the wavelet coefficient distribution and derivation of a detection threshold. For substructure detection in a cluster, the large structure of the cluster must be first modeled, otherwise noise photons related by the large scale structure will introduce false detections at lower scales. If we have a physical model, Monte Carlo simulations can also be used (Escalera & Mazure 1992; Grebenev et al. 1995), but this approach requires a long computation time, and the detections will always be model-dependent. Damiani et al. (1996), and also Freeman et al. (1996) propose to calculate the background from the data in order to derive the fluctuations due to the noise in the wavelet scales. It is regretable to have to do this, because we lose one the main advantage of the use of the wavelet transform, which is to be background-free. Indeed, wavelet coefficients have a null mean, and the detection is just done by comparison to a given threshold. Furthermore, background estimation is not an easy task, and generally requires several steps (filtering, interpolation, etc), and error estimation on the background is generally difficult to calculate.

2.2. Sigma clipping

A straightforward method, initially proposed by (Bijaoui & Giudicelli 1991), for deriving the detection levels at each scale is to apply a sigma clipping at each scale. Therefore a standard deviation is estimated at each scale j, and wavelet coefficients w_j(x,y) are considered as significant if

where k is generally taken equal to 3. This method allows us to easily detect strong features, but is certainly not optimal for detection of weak objects. Indeed, as the noise is not Gaussian, it is difficult to estimate the real probability of false detection using this detection criterion.

2.3. Local Gaussian noise

Vikhlinin et al. (1995) proposed to assume a Gaussian local noise, and to estimate the map from the the local background. The standard deviation related to a wavelet coefficient w_j(x,y) is derived from using the property of linearity of the wavelet transform (Starck & Bijaoui 1994). As previously, the hypothesis is not true, and the consequence is the same. A solution is to use Monte Carlo simulations to set the correspondence between the standard deviation of a wavelet coefficient and the levels of significance (Grebenev et al. 1995), but the simulations must be performed for each image because the significance levels vary strongly with the number of photons (Grebenev et al. 1995).

2.4. Anscombe transform

In Slezak et al. (1994) and Biviano et al. (1996), the Anscombe transform

has been used and acts as if the data arose from a Gaussian noise with white model, with , under the assumption that the mean value of I is large. Simulations have shown (Murtagh et al. 1995) that a number of photons less than 30 per pixel introduces a bias. In X-ray images, the number of photons is often lower, and sometimes can even be equal to zero. Using Anscombe transform in this case will introduce an over estimation of the noise level. To overcome this difficulty, the noise standard deviation can be reestimated, for instance as in (Slezak et al. 1994) i.e. by applying a sigma clipping at the first scale of the wavelet transform. However, this approach assumes that the noise is homogeneous, which is not true. Indeed, if the number of photons per pixel is lower that 30, the standard deviation of noise after Anscombe transformation, is varying strongly with the number of photons (Murtagh et al. 1995).

2.5. Wavelet function histogram

An approach for very small numbers of counts, including frequent zero cases, has been described in Slezak et al. (1993) and Bury (1994), for large scale clustering of galaxies. We have adopted here the same approach to analyze X-ray images.

A wavelet coefficient at a given position and at a given scale j is

where K is the support of the wavelet function (i.e. the box in which is not equal to 0) and n_k is the number of events which contribute to the calculation of w_j(x,y) (i.e. the number of photons included in the support of the dilated wavelet centered at (x,y)).

If a wavelet coefficient w_j(x,y) is due to the noise, it can be considered as a realization of the sum of independent random variables with the same distribution as that of the wavelet function (n_k being the number of photons or events used for the calculation of w_j(x,y)). Then we compare the wavelet coefficient of the data to the values which can taken by the sum of n independent variables.

The distribution of one event in the wavelet space is directly given by the histogram H₁ of the wavelet . Since independent events are considered, the distribution of the random variable W_n (to be associated with a wavelet coefficient) related to n events is given by n autoconvolutions of H₁

Figure 1 (click here) shows the shape of a set of H_n. For a large number of events, H_n converges to a Gaussian.

Figure 1: Autoconvolution histograms for the wavelet associated with a B₃ spline scaling function for 1 and 2 events (top left), 4 to 64 events (top right), 128 to 2048 (bottom left), and 4096 (bottom right)

In order to facilitate the comparisons, the variable W_n of distribution H_n is reduced by

E being the mathematical expectation, and the cumulative distribution function is

From F_n, we derive and such that and .

Let us define a reduced wavelet coefficient as
eqnarray293
where is the standard deviation of the wavelet function, is the standard deviation of the dilated wavelet function (), and w_j(x,y) a wavelet coefficient obtained using the à trous wavelet transform algorithm.

Therefore a reduced wavelet coefficient, w^r_j(x,y), calculated from w_j(x,y), and resulting from n photons or counts is significant if:

or

This detection method presents several advantages: it is independent of any model, no simulation is needed, and it is theoretically rigorous.

Up: Structure detection in