2 Comparative analysis of ISRA and RLA as descent algorithms

In this section, we recall the origins of ISRA and RLA with the objective to show that their analysis can be done in a similar way. Writing these algorithms in the form of descent algorithms allows a better understanding of their behavior. We compare the results given by these algorithms on simulated data with different noise levels; we show the global closeness of the results and we put in evidence the limits of the algorithms for the different types of noise process.

2.1 Gaussian noise: The Image Space Reconstruction Algorithm (ISRA)

Let us first consider that the data are corrupted by an additive, zero-mean, white Gaussian noise "n'', with variance $\sigma_i$ , and that the data "y_i'' are independent random variables. The image formation model may be written as y=Hx+n. For a given pixel "i'', the data "y_i'' is the sum of a deterministic part (Hx)_i, which is the mean of the intensity in the considered pixel, and a random noise component "n_i''. The likelihood, that is the conditional probability to have "y'' knowing "Hx'' is given by:

$\begin{displaymath}L(x)=P(y/x)\approx \prod_i\exp\left[-\frac{(y_i-(Hx)_i)^2}{2\sigma_i^2}\right]\cdot \end{displaymath}$

(4)

The corresponding log-likelihood is then:

$\begin{displaymath}{\rm Log}[P(y/x)]\approx -\frac 12\sum_i\frac{(y_i-(Hx)_i)^2}{\sigma_i^2} \end{displaymath}$

(5)

and the solution "x'' is obtained by minimization of the weighted Euclidean distance between the raw data and the restored mean:

J(x)	=	$\displaystyle -{\rm Log}[P(y/x)]$
	=	$\displaystyle \frac 12 \sum_i \frac{(y_i-(Hx)_i)^2}{\sigma_i^2}=\frac 12 \vert\vert y-Hx\vert\vert^2_R$	(6)

where the symbol $\vert\vert\cdots\vert\vert^2$ stands for the Euclidean norm and the index R denotes the weighting diagonal matrix $R={\rm diag}\left[\frac{1}{\sigma_i^2}\right]$ .

Any iterative descent method may be used to minimize J(x).

Making use of the gradient expression:

$\begin{displaymath}\nabla_xJ(x)=H^TRHx-H^TRy \end{displaymath}$

(7)

a very simple algorithm is then the constant-stepsize gradient method, which can be written in the form:

$\begin{displaymath}x^{(k+1)}=x^{(k)}+\alpha[-\nabla J(x^{(k)})]=x^{(k)}+\alpha H^TR(y-Hx^{(k)}) \end{displaymath}$

(8)

where the descent stepsize $\alpha$ must be such that $0<\alpha\allowbreak<2/\vert\vert H^TRH\vert\vert$ to ensure the convergence of the algorithm.

The solution x^* given by this algorithm when $k\rightarrow \infty$ is the minimum norm least squares solution (MNLS), which may be formally expressed as x^*=(H^TRH)^-1H^TRy corresponding to the first order optimality condition: $\nabla J(x^*)=0$ .

However, the deconvolution problem is an ill-posed problem, and the MNLS algorithm produces numerical instabilities. The noise affecting the data is amplified during the restoration process.

Moreover the solution may have negative parts without physical meaning; this later point is overcome by using the constraint for a non-negative solution. Several algorithmic methods have been proposed for that, and in particular ISRA.

ISRA was initially proposed in a multiplicative form by Daube-Witherspoon & Muehllehner (1986). The convergence of the algorithm was then analyzed by De Pierro (1987, 1990, 1993), and Titterington (1987), and they show that the sequence obtained by ISRA converges toward the solution of the problem:

$\begin{displaymath}{\rm Min}/x\ J(x)=\frac 12\vert\vert y-Hx\vert\vert^2_R,\ \mbox{ with the constraint } x\geq 0, \end{displaymath}$

and that the Kuhn-Tucker first order optimality conditions are satisfied at the optimum (Luenberger 1973).

In its original form, ISRA is written as:

$\begin{displaymath}x_i^{(k+1)}= x_i^{(k)}\cdot\frac{(H^TRy)_i}{(H^TRHx^{(k)})_i}\quad \forall i. \end{displaymath}$

(9)

All components of the initial estimate must be positive or equal to zero to satisfy the constraints. Note that if the value of a particular component of an estimate is zero, it remains to zero for all successive iterations. This property may be used to apply a support constraint, setting to zero the components of the initial estimate outside the support.

The iterative algorithm given by Eq. (9) can straightforwardly be rewritten in the additive form:

$\begin{displaymath}x_i^{(k+1)}=x_i^{(k)}+\frac{x_i^{(k)}}{(H^TRHx^{(k)})_i}[H^TR(y-Hx^{(k)})]_i. \end{displaymath}$

(10)

Or in a matrix form:

$\begin{displaymath}x^{(k+1)}=x^{(k)}+{\rm diag}\left[\frac{x_i^{(k)}}{(H^TRHx^{(k)})_i}\right]H^TR(y-Hx^{(k)}), \end{displaymath}$

(11)

which is clearly a modified gradient form with descent direction:

$\begin{displaymath}d^{(k)}={\rm diag}\left[\frac{x_i^{(k)}}{(H^TRHx^{(k)})_i}\right]H^TR(y-Hx^{(k)}). \end{displaymath}$

(12)

In the optical deconvolution problem, the matrix H and R have positive entries and all the components of the initial estimate have positive values to satisfy the constraint. Then the descent properties of the gradient algorithm are preserved (De Pierro 1987, 1990). In the gradient form, ISRA may eventually be relaxed to control the convergence speed of the algorithm. The relaxation method (Lascaux & Theodor 1986) could be summarized as follows: for any non-relaxed iterative algorithm giving an estimate $\hat x^{(k+1)}$ from x^(k), the relaxed estimate x^(k+1) with the relaxation factor " $\alpha$ '', is given by the relation: $x^{(k+1)}=(1-\alpha)x^{(k)}+\alpha\hat x^{(k+1)}$ .

Evidently, for relaxed algorithms, the non-negativity of the solution as well as the convergence of the algorithm is not guaranteed; a specific analysis that goes beyond the scope of this paper would then become necessary.

2.2 Poisson noise: The Richardson Lucy Algorithm (RLA)

For a perfect photon detection, the intensity in the pixel "i'', is a random variable that follows a Poisson law of mean (Hx)_i .The likelihood may be written as:

$\begin{displaymath}L(x)=P(y/x)=\prod_i\frac{[(Hx)_i]^{y_i}}{y_i!}\exp[-(Hx)_i]\cdot \end{displaymath}$

(13)

Using Stirling's formula, the negative Log-likelihood becomes:

$\begin{displaymath}T(x)\!=\!-{\rm Log}\left[L(x)\right]\!\approx\!\!\sum_i\left[(Hx)_i-y_i\right]+y_i{\rm Log}\frac{y_i}{(Hx)_i}\cdot \end{displaymath}$

(14)

This function is known as the Csiszär (1991) I-divergence measure between (Hx) and y. This measure generalizes the Kullback divergence or cross-entropy measure to accommodate functions whose integrals are not constant, as they would be if they were probability distributions (Snyder et al. 1992). It is the quantity we seek to minimize, keeping the constraint that "x'' must be positive. Dropping in T(x) the terms independent of "x'', we must minimize under positive constraint, the quantity:

$\begin{displaymath}D(x)=\sum_i(Hx)_i-y_i{\rm Log}(Hx)_i \end{displaymath}$

(15)

Shepp & Vardi (1982), solve this problem using the Expectation-Maximization (EM) method, and obtain the iterative algorithm:

$\begin{displaymath}x_i^{(k+1)}=\frac{1}{a_i}x_i^{(k)}\left[H^T\frac{y}{Hx^{(k)}}\right]_i \end{displaymath}$

(16)

where the ratio $\frac{y}{Hx^{(k)}}$ is a vector computed component wise, and with $a_i=\sum_jH_{j,i}$ .

This algorithm was previously proposed by Richardson (1972) and by Lucy (1974) in the field of Astrophysics. Shepp & Vardi (1982) and Byrne (1993) show that the convergence hold, and that the sequence generated by this algorithm also satisfies the Kuhn-Tucker optimality conditions for the constrained minimization problem: Min/x: D(x), $x\geq 0$ . As for ISRA, all successive estimates remain positive if the initial estimate is positive. Moreover, if a component of the solution is set to zero, or becomes equal to zero, it remains equal to zero for all successive iterations.

Here again, it is interesting to write RLA in the additive form:

$\begin{displaymath}x_i^{(k+1)}=x_i^{(k)}+\frac{x_i^{(k)}}{a_i}\!\!\left[H^T{\rm ... ...\!\left[\frac{1}{(Hx^{(k)})}_i\right]\!\!(y-Hx^{(k)})\right]_i \end{displaymath}$

(17)

or in the global matrix form:

$\begin{displaymath}x^{(k{+}1)}{=}x^{(k)}{+}{\rm diag}\!\!\left[\frac{x_i^{(k)}}{... ...iag}\!\!\left[\frac{1}{(Hx^{(k)})_i}\right]\!\!(y{-}Hx^{(k)}). \end{displaymath}$

(18)

Taking into account the expression of the gradient of D(x):

$\begin{displaymath}\nabla D(x)=H^T{\rm diag}\left[\frac{1}{(Hx)_i}\right](Hx-y) \end{displaymath}$

(19)

and with $x_i^{(k)}\geq 0$ , a_i>0, $\forall i$ , we recognize a descent algorithm of the form:

$\begin{displaymath}x^{(k+1)}=x^{(k)}+{\rm diag}\left[\frac{x_i^{(k)}}{a_i}\right] \left[-\nabla D(x^{(k)})\right]. \end{displaymath}$

(20)

The descent direction is no longer the negative gradient direction, but the descent property of the gradient algorithm is kept. In our particular problem, a supplementary simplification is due to the fact that a_i=1 $\forall i$ . As for ISRA, the relaxation allows to modify the speed of convergence, but as previously mentioned this implies a particular analysis of the relaxation factor to ensure the non-negativity of the solution and the convergence of the algorithm.

2.3 Comparative analysis of ISRA and RLA

A few remarks can be made on the results presented in the above sections.

i) The additive forms of ISRA and RLA evidence the similar roles played by the weighting matrix $R={\rm diag}\frac{1}{\sigma^2_i}$ in Eq. (11) and by the matrix ${\rm diag}\frac{1}{(Hx^{(k)})_i}$ in Eq. (18). These quantities both correspond to variances of the signal. For the Gaussian process, $\sigma^2_i$ is the variance of the intensity in a pixel. For the Poisson process (Hx^(k))_i also represents the variance of the signal, because for this process the variance equals the mean.

ii) In the Gaussian case, J (x) is a quadratic form, the convexity of this objective function allows to obtain the solution by solving the linear system $\nabla J(x)=0$ . The Gradient or Conjugate Gradient methods in their classical form leading to linear algorithms may be used in this case and the stepsize is known explicitly. We must observe that the non-linearity of the algorithm ISRA is introduced only by the non-negativity constraint.

On the contrary, in the Poisson case, the functional D(x), although convex, is not quadratic. Thus, the non-linearity of the RLA algorithm is due to the particular form of D (x) as well as to the non-negativity constraint.

iii) For the two algorithms, we set the initial estimate to a constant value: $x_i^{(0)}=\frac{\sum_i y_i}{N}$ where N is the number of pixels in the image; moreover the integral of the image is generally normalized to 1. We must observe that in the RLA algorithm we have for all k: $\sum_ix_i^{(k)}=\sum_iy_i$ , which is not the case for ISRA; this is due to the fact that the functional D(x) implicitly contains the constant intensity constraint: $\sum_i\left[(Hx)_i-y_i\right]$ together with $\sum_j H_{j,i}=a_i=1$ , these later property appearing naturally for the PSF in a deconvolution problem or being imposed in the algorithm.

In the case of ISRA, the conservation of the total intensity of the image is ensured by normalization after each iteration.

iv) The two algorithms are not regularized. A non-negativity constraint, although necessary, does not produce smooth solutions. When the iteration number increases too much, the classical phenomena of instability appear in the solutions. On the practical point of view, the iteration number must be limited to obtain an acceptable compromise between resolution and stability of the solutions; this allows some regularization. Alternatively, an explicit regularization may be done by introducing a supplementary constraint to attenuate the high frequency components in the solution; this will be proposed in a future paper.

2.4 Numerical simulations

We have performed a numerical simulation to illustrate some features of the deconvolution procedures described in the previous sections; the computations are made using Mathematica (Wolfram 1996).

$\begin{figure} \includegraphics[width=10cm]{figure1.eps}\end{figure}$

Figure 1: From left to right: Simulated Object, Point Spread Function (PSF) and noiseless blurred Image. The object represents an ensemble of three stars with various centers to limbs variations. A bright feature is added to the larger star's atmosphere

$\begin{figure} \includegraphics[width=9.5cm]{figure2.eps}\end{figure}$

Figure 2: Simulated noisy images: Poisson statistics. Representations of the blurred image of Fig. 1 observed with 10⁴ photoelectrons (left) and 10⁶ photoelectrons (right)

$\begin{figure} \includegraphics[width=9.5cm]{figure3.eps}\end{figure}$

Figure 3: Simulated noisy images: representations of the blurred image of Fig. 1 observed with a level of additive Gaussian noise of strength comparable to that of Fig. 2

2.4.1 Test data

The simulated data are shown in Figs. 1 to 3. Figure 1 gives, in contour plot, the object, PSF and image of the noiseless objet-image relationship. Only the central part of the images are shown here (about 30% of the surface of the $64\times 64$ points images). Such representation is used all over the paper for similar figures.

The object x is chosen to represent an ensemble of three stars, with different diameters and centers to limb variations. A bright and dark sunspot-like feature is added to the larger star's atmosphere. The PSF h is a realistic representation of the PSF of a true telescope. It is computed as the squared modulus of the Fourier Transform of a function representing the telescope aperture with a small phase aberration. It may correspond to observing conditions with the Hubble Space Telescope operating in the far ultraviolet (Gilliland & Dupree 1996). The resulting Optical Transfer Function is then a low pass filter, limited in spatial frequencies to the extent of the aperture autocorrelation function. The observed filtered image y is then strictly band limited. This particular property is of interest for the study of the reconstructed image in the Fourier plane discussed below.

Figures 2 and 3 represent possible detected image in the case of Poisson and Gaussian statistics. Figure 2 represents the observed image assuming a perfect photodetection, for two levels of noise corresponding to a total of 10⁴ and 10⁶ photoelectrons in the image. Such images may be obtained in visible observations. Figure 3 represents the observed image corrupted by additive Gaussian noise and may correspond to infrared observations with noise coming from the detector and the background.

2.4.2 Numerical comparison between ISRA and RLA

To implement numerically ISRA and RLA algorithms, we have used the continuous forms of the algorithms for which the convolutions can be easily obtained via FFTs. The algorithms, written in the form of an iterative multiplication, then become:
for ISRA:

$\begin{displaymath}x^{k+1} ( r )=x^k(r)\frac{h(-r)\otimes y( r )}{h(-r)\otimes h( r ) \otimes x^k( r )} \end{displaymath}$

(21)

and for RLA:

$\begin{displaymath}x^{k+1} ( r )=x^k ( r )\left[h(-r)\otimes \left\{\frac{y ( r ) }{h ( r ) \otimes x^k ( r )}\right\}\right] \end{displaymath}$

(22)

where the symbol $\otimes$ denotes the convolution. These well-known forms are easily obtained with the continuous form of the gradients developed in annex A and B.

The results of the deconvolutions for the two different statistics are presented in the following three figures. Figures 4 and 5 give the results obtained for the deconvolution of Fig. 2, while Fig. 6 gives an example of the results obtained for an additive signal-independent Gaussian noise. The results are presented for different iterations number "k''. As k increases, the image becomes sharper, but the noise increases too. When doing with real data, it is extremely difficult to determine what is the best-reconstructed image, i.e. when to stop the iteration. In a numerical simulation, we can obviously answer this question. Aside a visual comparison between the results and the object, we have used as criterion of quality the simple Euclidean Distance (ED)- between the true object and the reconstructed image. On this basis, for Poisson noise, we found the results of RLA slightly better than those of ISRA. This is not surprising, since RLA is constructed for the Poisson statistic. However, the differences between ISRA and RLA results are almost insignificant, as it can be deduced by a simple visual inspection of the results. To the contrary, ISRA gives much better results than RLA for Gaussian statistics. This is also expected, but in this case, the difference is strongest. This can be clearly seen in Fig. 6. Moreover, ISRA is almost insensitive to small negative value present in the signal (these negative values appear when the average value of the background is subtracted). For RLA, an offset must be used to reconstruct the image to deal with positive values only. This difference in behavior can be easily understood considering that ISRA works on the ratio of two smooth expressions (H^Ty over H^THx) while RLA does the smoothing by H^T after taking the ratio y over Hx.

Figure 7 clearly shows that the reconstructed images present energy outside the frequency limit of the telescope. These effects are in part due to the non-linearity of these algorithms. The three top curves of this figure correspond to the modulus of the Fourier Transforms of the images of Fig. 1. In this representation, the zero frequency is at the center of the image. As already indicated, the resulting blurred image has no frequency power outside the dish zone corresponding to the aperture autocorrelation function.

$\begin{figure} \includegraphics[width=10cm]{figure4.eps}\end{figure}$

Figure 4: Reconstructed objects obtained from the 10⁶ photoelectrons image using ISRA (Top) and RLA (Bottom) for various iterations numbers. The center images (iteration #164 for ISRA and iteration #134 for RLA) correspond to the best results in terms of the Euclidean Distance (ED) between the object and the reconstructed image. Left and right images correspond to iterations numbers 50 and 200 with comparable EDs. Results obtained using ISRA or RLA are similar

$\begin{figure} \includegraphics[width=10cm]{figure5.eps}\end{figure}$

Figure 5: Reconstructed objects obtained from the 10⁴ photoelectrons image using ISRA (Top) and RLA (Bottom) for various iterations numbers. The center images (iteration #7 for ISRA and iteration #6 for RLA) correspond to the best results in terms of the Euclidean Distance (ED) between the object and the reconstructed image. Left and right images correspond to iterations numbers 2 and 18 or 22, with comparable EDs. Here again, the results obtained using ISRA or RLA are very similar

$\begin{figure} \includegraphics[width=10cm]{figure6.eps}\end{figure}$

Figure 6: Reconstructed objects obtained from the image contaminated by additive noise using ISRA (Top) and RLA (Bottom) for various iterations numbers. In this case, the results obtained using ISRA are better than those obtained with RLA

Figure 7: Effect of the reconstruction procedure in the Fourier plane. The curves represent in gray levels the modulus of the Fourier Transforms of the images. From top to bottom and from left to right: original object, point spread function and noiseless image. Results of the RLA algorithm, for the iterations numbers 1, 5 and 20 for the 10⁴ photoelectrons image. It is clear that the RLA algorithm produces power in the reconstruction outside the limits of the original blurred image. Similar results are obtained using ISRA

$\begin{figure} \includegraphics[width=8.8cm]{figure8.eps}\end{figure}$

Figure 8: Spatial frequencies band extension due to non-linearity of the iterative algorithms. The curves give the ratios of the quantity ${\rm Abs}\lfloor X^{(k)}(u)\rfloor$ beyond the cut off frequency to ${\rm Abs}\lfloor X^{(k)}(u)\rfloor$ below the cut off frequency, as a function of the iteration number k. From top to bottom: RLA and ISRA with 10⁴ photons per image, RLA and ISRA with 10⁶ photons per image

The three bottom curves of Fig. 7 represent the modulus of the Fourier Transforms of the reconstructed images. The example is given for RLA at iterations numbers 1, 5 and 20, for the 10⁴ photoelectrons image, but similar results can be obtained for ISRA. The iterative algorithms rapidly produce power outside the limits of the original blurred image, i.e. in a region of the Fourier plane fully dominated by the noise. The extent of frequencies is inherent to the iterative algorithms used. Figure 8 gives a measure of this excess of signal outside the cut-off frequency by giving, as a function of the iteration number, the ratio of the integral of Abs[X^k(u)] outside the cut-off frequency to the integral of Abs[X^k(u)] inside the cut-off frequency. This behavior is still observed when a noiseless image is processed.

Up: Comparison between ISRA and