The hyperparameters have a fundamental role in the Bayesian framework. They determine the relative weight between the prior information and the likelihood in the Bayesian function and therefore determine the chosen solution between the default image and the maximum likelihood solution. Ultimately, they define the degree of smoothness of the solution.
If we use a single value for all the hyperparameters (), we could obtain a reasonable global fit of the data, but the parts of the reconstructed image with low light levels would appear noisy (over-reconstructed) while bright objects (stars) would be too smooth (under-reconstructed). This effect is also present in the Richardson-Lucy algorithm when stopped according to some appropriate stopping rule and can easily be observed in the normalized residuals comparing the projection of the solution and the data. To compute the normalized residual over a set of S detectors we use the expression:
(24) |
The FMAPE algorithm avoids noise amplification by the use of a set of space-variant hyperparameters. A variable allows different degrees of smoothing in different regions, ranging from very smooth (or very similar to the prior )regions to maximum likelihood regions .
In our implementation of the algorithm, we have computed the for the different regions by using segmentation techniques in conjunction with: a) Regional cross-validation and b) Regional measurements of average normalized residuals. We present here a segmentation technique based on the wavelet decomposition and a method of classification (Kohonen self-organizing maps). However the problem of segmentation of an astronomical image may be as difficult as the reconstruction itself. The segmentation problem is, thus, open and improvements of the technique described here will be needed in the future.
Stars are essentially point sources but their large range of intensities does not allow us to consider all the stars as belonging to a single region. Thus, each star (or bright object) has been considered to be a different region. All other elements in the image should be separated into a reasonable number regions of similar characteristics for assignment of values. What is meant by similar characteristics will be described in the next section.
As stated above, it is possible to use diverse techniques to segment the image. We have experimented with several methods, some of them from the field of medical imaging. However, at this stage, we prefer to use a general purpose method of classification. The self-organizing network (Kohonen 1990) is one of the best understood and it appears to be useful to clasify astronomical objects.
The hyperparameters belong to the object space (as well as the emission densities a_{i}). Thus, the segmentation must be done in object space and not in data space. In most cases in astronomy the data (raw image) resemble the object and segmentation could be performed in data space, but this is incorrect (for example, in the problem of tomographic imaging the aspect of the data has nothing to do with the aspect of the object and segmenting the data makes no sense). Thus, some kind of previous deconvolution should be performed prior to the segmentation process. We use the Maximum Likelihood method (MLE or Richardson-Lucy) for this deconvolution because it is the most widely studied method of restoration in astronomy and it gives very good results. We stopped the algorithm using the cross-validation or feasibility tests for the whole image. Our starting point for the segmentation process is, thus, the result of a standard reconstruction (Richardson-Lucy method + statistically based stopping criteria).
We have assumed that regions with pixels having a similar mean value should form one region, though two regions with similar mean values but differing content of high frequencies should be different. The use of a wavelet decomposition at the larger scales provides the segmentation process with information about average values over extended regions and low frequencies, while the use of the smaller scales provides information about high frequency information. The joint use of all the scales helps us achieve the segmentation objective. As stated above, we have carried out the actual segmentation step by using a Kohonen self-organizing network. That method allows the automatic use of all the information contained in the wavelet decomposition of the image and results in excellent convergence. Then, in order to prepare the MLE image for the segmentation by a Kohonen self-organizing network, we represent it first by a block of planes containing its wavelet decomposition.
Following Starck & Murtagh (1994), we decompose the MLE image using the discrete wavelet transform algorithm known as "à trous" ("with holes"). An image is successively transformed as follows:
We use a scaling function which has a cubic spline profile. Letting
in which , we can write:
(25) |
Which wavelet plane to use can depend on the size of the point spread function. In this paper we use the second wavelet plane which has little noise but keeps enough signal to be a good representation of the stars and prominent objects. Also, in our experience, the plane was always adequate for a wide range of PSF sizes coming from images from both the Hubble Space Telescope and the ground. The method consists simply of isolating the regions with a number of counts over certain threshold.
The threshold is set automatically. To do this, we use the IRAF routine "imstatistics" to compute the standard deviation of the data in the background and set the threshold to . The threshold parameter could be tuned by interaction with the user but this interaction would introduce a bias and a user-dependent solution.
The codebook vectors are presented to a self-organizing map with a square array of weight vectors which we have defined to be of dimensions , or . The network uses square learning regions, with each plane of the image block normalized to a maximum range of 1.0. The "best matching" weight vector is selected by Euclidean distance. Before the training process starts, the order of the codebook vectors is randomized taking care that no vector is left out in any of the training cycles. The learning parameter starts at 0.1 and ends at 0.05 over 75 complete passes through the codebook vectors.
The outcome of the learning process is 9, 16 or 25 weight vectors that are representative of all the codebook vectors. Invariably, one of the vectors will represent all the stars (this is not the case if the extra plane with all the stars at the same high intensity is not included). A testing step in which all the codebook vectors are classified in terms of their proximity to the weight vectors follows the learning process. A map of regions is then generated, in which all MLE image pixels corresponding to a given weight vector appear with the same region number. The stars are made to have regions numbers above the 8, 15, or 24 of the smoother elements.
Thus, the final result of this process is a map of few extended regions (9, 16 or 25) plus one more region for each star or bright object.
Once the segmentation process is completed, we can either compute the regional hyperparameters by cross-validation tests or adjust them dynamically by feasibility tests monitoring the residuals. In both cases, we carry out the final Bayesian reconstruction with the FMAPE algorithm given by Eqs. (26), (23) using the original data.
We start using an appropriate initial guess for ,indicated by experience. For astronomical images, an initial guess of for all the pixels is generally a good choice. During the reconstruction, we allow each of the values to change: At the end of each iteration, we use Eq. (30) to compute the mean normalized residuals of each region. Then, we adjust the corresponding value of in such a manner that the residuals in each region approach 1.0, within a band of . This is achieved by the following simple algorithm: In the regions in which the residuals are higher than 1.05 we increase the by a certain amount. Likewise, If the residuals are below 0.95, we decrease the . These corrections are carried every few iterations. The reconstruction is finished when the image of the normalized residuals is featureless and the average residual values for all regions are within the prescribed band.
We have observed that the above methodology results in reconstructions in which the boundaries of each region become somewhat visible in the final results. In order to avoid this effect, we keep incrementing or decrementing a "master" map of , but use a smoothed set of that current master map at each iteration. The smoothing is carried out with a Gaussian kernel of approximately one pixel of standard deviation.
The method for obtaining the regional hyperparameters by cross-validation is the following: Consider the data being split into two halves, A and B. This splitting can be obtained in two ways: a) as a result of two independent but successive exposures, and b) as a result of a thinning process if only one data set is available (see Núñez & Llacer 1993a).
Once we have obtained the two data sets, the method consists of the following steps: 1) Reconstruct data set A with an initially low uniform for all the pixels, carrying out the reconstruction to near convergence. 2) For each segmented region compute the following two quantities: a) the log likelihood of data set A with respect to the reconstructed image A (direct likelihood), and b) the log likelihood of data set B with respect to the reconstructed image A (cross-likelihood). The computations of regional log likelihoods are carried out by Eq. (3) applied to the specific regions separately. If there is no readout noise, Eq. (4) can be used. 3) Increase the uniform to a higher value and repeat steps 1) and 2) until the conditions described in the following paragraphs are met.
A plot of the values of the direct likelihood and cross-likelihood as a function of should be made for each region. When the uniform values of are low, the iterative reconstruction process carried out to convergence reconstructs mostly features that are common to data sets A and B (the signal), and both direct likelihood and cross-likelihood for all regions will increase with increasing values of the . As the values of increase further, the reconstruction of some regions of data set A will begin to yield results that correspond to the specific statistical noise of data set A, but do not correspond to the noise of data set B. The direct likelihood of those regions will always continue to increase with higher , but the cross-likelihood will reach a maximum and start to decrease. Figure 7 shows the cross-validation test curve for region No. 3 of the first example presented in the next section.
For each region, then, the maximum of the cross-likelihood determines the proper value of to be used in a final reconstruction. In some cases, mainly for some small regions corresponding to stars, the cross-validation curve does not show a maximum. In those cases a standard high value for is used. Once the set of hyperparameters is obtained, we carry out the Bayesian reconstruction of data set A with the FMAPE algorithm given by Eqs. (26), (23).
After reconstructing data set A, the process could be repeated for data set B, obtaining the cross-likelihood curves with respect to data set A. In practice, once a set of have been obtained from data set A, the same values can be used to reconstruct data set B. After obtaining the Bayesian reconstruction of the two data sets, both reconstructions are added to obtain the final result. During the final addition of the two half images, residual artifacts due to the reconstruction process that are not common to both reconstructions can also be easily eliminated.
The cross-validation method is robust for the following reasons: a) both data sets have the same signal; b) the statistical nature of the noise is identical in both cases; c) the noise is independent in the two data sets, and d) they have the same Point Spread Function.
Copyright The European Southern Observatory (ESO)