(2) |
The conditional probability describes the noise in the data and its possible object dependence. It is fully specified in the problem by the likelihood function. As indicated above, we have two independent processes: image formation on the detector array and detector readout. Taking the two processes into account, the compound likelihood is (Núñez & Llacer 1993a):
and its logarithm is:
(3) |
(4) |
We use entropy to define the prior probability in a generalization of the concepts originally described by Frieden (1972).
Let N be the total energy (usually the number of counts or photons) in the object. Assume that there is an intensity increment describing an intensity jump to which we can assign some appropriate physical meaning. Frieden (1972) describes as the finest known intensity jumps that are possible in the object, but we prefer to relax that definition and consider it an adjustable hyperparameter of the Bayesian function.
Assume, in addition, that we have prior information regarding the statistical makeup of the object obtained, for example, from previous observations or from observations at other wavelengths from which one could infere realistic information about the makeup of the object at the wavelength at which the observation is made. If is the prior energy distribution, the prior probability of a photon going to pixel i is given by:
In that case, the a priori source probability of quanta distributing randomly over the B pixels is then characterized by the multinomial law (Lloyd 1980; Handbook of Applicable Mathematics, Vol II, p. 92) as:
(5) |
If the prior information is not reliable, it is best to ignore it and give a uniform value to that means g_{i} = 1/B. In that case, the multinomial distribution of (5) reduces to a normalized version of the Boltzmann distribution used by Frieden (1972).
Assume, also, that the intensity increment is space variant in the object space. The introduction of the variability of is the core of this paper. The parameter controls the amount of prior information that is introduced into the solution. If the values of Q_{i} represent a uniform field, the parameter will control the degree of smoothness of the solution. Its spatial variability will allow the local control of smoothness avoiding noise amplification in the different areas of the image.
Let be the variable increment. Now, in place of units we distribute units. The a priori source probability of quanta distributing randomly over the B pixels is then:
(6) |
(7) |
(8) |
in which we discarded the term . This approximation is adequate for x > 10. Also, the use of the complete approximation would give a term in in the algorithms that would produce a numerical instability at the background pixels. In some cases (p.e. in regions of very low flux or background in photon counting devices) the approximation used may produce some error but we prefer to use this approximation to avoid the menctioned numerical instability. Using (10)
and taking into account (7) the logarithm of the probability is:
(9) |
where
If we do not consider the space variation of the intensity increment i.e. , and taking into account that both and are constant, the log of the prior probability is:
(10) |
(11) |
(12) |
where h_{j} is given by:
and is a Lagrange multiplier for the conserconservation of counts. Note that the relative weight of the two terms in the Bayesian function (17) is controlled by the set of hyperparameters .
The nonlinear nature of the reconstruction problem described by (17) suggests an iterative algorithm. We have found that gradient methods, like steepest ascent or conjugate gradient algorithms, although well established, produce negative results for most of the pixels in the low light level part of the image. Also, the well-behaved Expectation Maximization (EM) algorithm -developed by Dempster et al. (1977) and extensively used for maximum likelihood reconstruction, after the work of Shepp & Vardi (1982)- is difficult to use in the Bayesian case with entropy because it requires the solution of transcendental equations in the M-step. Instead, we have found that the method of Successive Substitutions described by Hildebrand (1974) and Meinel (1986), and used by the authors in several previous papers, affords us greater flexibility than other methods and results in rapidly converging algorithms.
The Successive Substitutions method can be described as follows: given a series of equations on the unknowns in the form
(13) |
where F is some function, is the complete set of variables , and K is a normalization constant, (18) can be transformed into a recursive relation by
(14) |
Each of the new values of a^{(k+1)}_{i} is calculated from all the known B values of , and the complete set is updated at once. The constant K is obtained by invoking the energy conservation law:
(15) |
(16) |
where
(17) |
(18) |
(19) |
(20) |
(21) |
The iterative process defined by (26), (23) is the main result of this paper. We call this algorithm FMAPE. The algorithm has a number of desirable characteristics: It solves the cases of both pure Poisson data and Poisson data with Gaussian readout noise. The algorithm allows us to adjust the hyperparameters in order to determine the degree of smoothing in different regions or, in connection with known default images, to adjust the proximity between the reconstruction and the default image . The algorithm maintains the positivity of the solution; it includes flatfield corrections; it removes background and can be accelerated to be faster than the Richardson-Lucy algorithm (Lucy 1974). The main loop (projection and backprojection) of the algorithm is similar in nature to the Expectation Maximization algorithm. The algorithm can be applied to a large number of imaging situations, including CCD and Pulse-counting cameras both in the presence and in the absence of background.
The algorithm contains two arbitrary constants n and C. Both have been introduced in a way that the effect of changing the values of the constants n and C only affects the calculated values of K at the end of each iteration. The point of convergence of the algorithm (26), (23) is then solely dependent on the hyperparameters . This can be seen as follows: Eq. (25) represents a set of B simultaneous equations in the unknowns a_{i}. Once the problem is solved by some appropriate method, Eq. (25) will hold, independently of the values of C and n. Equation (26) provides a method of solving the set of Eq. (25). If it converges to a solution of (25), that solution will therefore be independent of the values of C and n.
The constant C is introduced to insure the positivity of the solution. The presence of the negative entropy term
on the right hand of (26) means that the positivity of the solution is not automatically guaranteed unless an appropriate constant is introduced. As indicated above, different choices of C will not affect the convergence point. Such a problem is often present in maximum entropy approaches, either by gradient or EM algorithms.
Since the first term in (26) is always positive, a value of C somewhat larger than
will suffice. In practice, we find that changing C can affect the speed of convergence. Increasing C by a factor of 2 over the approximate smallest value necessary to maintain non-negativity results in slowing down convergence by approximately the same factor. The use of C=1 has usually resulted in adequate convergence speed and non-negativity of the pixels.
The constant n affects the rate of convergence. Since n affects the correction factor in (26), one can expect a range of values of n over which the iterative process is stable. We have observed that for n=1, the convergence rate improves by a factor of approximately n with respect to the rate for n=1. At the beginning of the process, when the correcting factors are rather high, a large n could result in an instability of the algorithm. After several iterations, when the correcting factors are close to unity, large values of n can be used. We have obtained stability using up to at the beginning of the iterative process and larger values later.
In the absence of readout noise or when it is negligible, in (23). Then the exponentials are dominant at k=p_{j} and . We note that by its definition, is always positive, while the original data can be negative. For example, if the background is small, as in the case of the Hubble Space Telescope, the mean background is just above zero, but due to the readout noise, we may find a large number of pixels with negative values. Negative values are not allowed in any algorithm with projection-backprojection operations, like in the classical Richardson-Lucy algorithm. A preprocessing of the data or other approximations are common. We do not need any approximation or preprocessing. In fact, is a representation of the data that is positive and close to the data, with a degree of closeness that depends on the projections and . In this sense can be considered as a Bayesian filtered version of the data.
The practical computation of by Eq. (23) is not a trivial problem of numerical analysis. The p^{'}_{j}s are numbers in the same range of the data, but to compute them, we need to compute summations from to . In addition, exponentials and factorials can easily give either machine overflow or underflow, even using quadruple precision. To solve the problem Snyder et al. (1993b) use Poisson and saddlepoint approximations. We preferred to use a recursive technique without any approximation with very good results. Also, it is easy to see that the summations from k=0 to can be reduced to summations from to with without loss of precision. By using this method, we can compute the p^{'}_{j} values with high accuracy, without increasing the CPU time apreciably.
We can obtain a maximum likelihood algorithm from the general FMAPE by taking the limit when .In that case the weight of the prior information (cross-entropy term) in the Bayesian function (17) is zero and the solution becomes a Maximum Likelihood one. The algorithm for the maximum likelihood (MLE) case in the presence of readout noise and background is:
(22) |
where p^{'}_{j} is given by Eq. (23).
In the case of no background and no readout noise, algorithm (28) becomes:
(23) |
For n=1 and disregarding the gain (flatfield) corrections (q_{i}=1), (29) is identical to the Richardson-Lucy algorithm.
Copyright The European Southern Observatory (ESO)