2 SVD analysis for the thin-target model

Let us consider Eq. (4) with a general form for the cross-section $Q(\epsilon,E)$ . In practical applications the photon spectrum is known only at discrete values of the photon energy and the equation we have to deal with is

$\begin{displaymath} g(\epsilon_n)=\int_{\epsilon_n}^{\infty} Q(\epsilon_n,E) f(E) {\rm d}E~~~~~n=1,\ldots,N. \end{displaymath}$

(11)

Actually detectors provide integrals of the spectrum over finite ranges of photon energy and a more realistic version of the discrete equation is represented by Eq. (48) in (Piana 1994). However in that paper it was shown that as far as the inversion is concerned, the difference between the use of point data and binned data is not significant, though we need to allow for the effect of binning on photon count statistics.

We assume that the function f belongs to the Hilbert space $L^2(0,\infty)$ of the functions $\varphi$ such that $\int_{0}^{\infty} \vert\varphi(x)\vert^2 {\rm d}x < \infty$ . The choice of this source space is for mathematical reasons, though it is not physically unrealistic to assume that the electron distribution functions are square integrable. We introduce the integral operator $A: L^2(0,\infty) \rightarrow Y$ defined by

$\begin{displaymath} (Af)_n = \int_{\epsilon_n}^{\infty} Q(\epsilon_n,E) f(E) {\rm d}E~~~~~~n=1,\ldots,N. \end{displaymath}$ (12)

Y is a finite dimension vector space equipped with the inner product

$\begin{displaymath} ({\vec{g, h}})_Y = \sum_{n,m=1}^{N} g_n w_{nm} h_m . \end{displaymath}$ (13)

The most suitable choice for the weights w_nm depends on the sampling adopted (Bertero et al. 1984). For example, in the case of the uniform sampling,

$\begin{displaymath} \epsilon_n = \epsilon_1 + d(n-1)~~~~~n=1,\ldots,N \end{displaymath}$ (14)

with d constant, the typical choice is

$\begin{displaymath} w_n = d~~~~~n=1,\ldots,N\end{displaymath}$ (15)

while, in the case of the geometric sampling,

$\begin{displaymath} \epsilon_n=\epsilon_1 \Delta^{n-1}~~~~~~~n=1,\ldots,N\end{displaymath}$ (16)

with $\Delta$ a constant, it is usually

$\begin{displaymath} w_n = (\log \Delta) \epsilon_n~~~~~~n=1,\ldots,N.\end{displaymath}$ (17)

Therefore the linear inverse problem with discrete data (11) we are interested in can be described by the equation

$\begin{displaymath} {\vec{g}}=Af. \end{displaymath}$ (18)

We note that Riesz's lemma (Reed & Simon 1972) allows us to describe the action of the integral operator by use of the scalar product $(\cdot,\cdot)$ in $L^2(0,\infty)$ , i.e.

$\begin{displaymath} (Af)_n = (f,\phi_n)~~~~~~n=1,\ldots,N \end{displaymath}$ (19)

with

$\begin{displaymath} \phi_n(E)= \left\{ \begin{array} {lr} 0 & E \leq \epsilon_n \\ Q(\epsilon_n,E) & E\gt\epsilon_n \end{array} \right. \end{displaymath}$ (20)

for $n=1,\ldots,N$ . We define the adjoint operator $A^*:Y \rightarrow L^2(0,\infty)$ of A by

$\begin{displaymath} (A^* {\vec{g}},f)=({\vec{g}},Af)_Y. \end{displaymath}$ (21)

Then the singular system of the operator A is the set of triples $\{\sigma_n; u_n, {\vec{v}}_n \}_{n=1}^{N}$ which solves the set of problems (Bertero et al. 1985)

$\begin{displaymath} Au_n = \sigma_n {\vec{v}}_n;~~~~A^* {\vec{v}}_n = \sigma_n u_n ~~~~n=1,\ldots,N. \end{displaymath}$ (22)

The real non-negative numbers $\sigma_n$ , $n=1,\ldots,N$ form a non-increasing sequence and are called singular values, the functions u_n, $n=1,\ldots,N$ are called singular functions and the vectors ${\vec{v}}_n$ , $n=1,\ldots,N$ are called singular vectors.

Knowledge of the singular system of the operator A is extremely useful for the solution of the linear inverse problem with discrete data (18). In particular:

the singular system gives information about the degree of instability of the solution of the inverse problem;
it represents a basic tool for the computation of regularization algorithms providing approximate and stable solutions of the problem;
from the distribution of the zeros of the singular functions it is possible to infer information about the resolution obtainable from the data.

Now we want to describe these three properties of the singular system in detail.

If $Q(\epsilon,E)$ is the Kramers or Bethe-Heitler cross-section, the linear inverse problem of determining the electron distribution function f from the photon spectrum g when f and g are related by Eq. (4) is ill-posed in the sense of Hadamard (Hadamard 1923). A problem is said well-posed in the sense of Hadamard if:

the solution of the problem exists for each data function.
the solution is unique;
the solution depends continuously on the data.

If at least one of these three requirements is not satisfied, the problem is ill-posed in the sense of Hadamard. In the case of the inverse problem with discrete data (18) the solution is clearly not unique - i.e., infinitely many functions f yield precisely the same data vector $\vec{g}$ , though it can be shown (Bertero 1989) that the problem of determining the solution among these which has minimum norm (the so called generalized solution) is well-posed. However, though the generalized solution of (18) is unique for precisely given $\vec{g}$ , it is unstable to noise on $\vec{g}$ - that is the problem is "ill-conditioned''. It can be shown that if $\delta {\vec{g}}$ and $\delta f$ are respectively the error on the data and the error on the (generalized) solution, the inequality

$\begin{displaymath} \frac{\Vert\delta f\Vert}{\Vert f\Vert} \leq C(A) \frac{\Vert\delta {\vec{g}}\Vert _Y}{\Vert{\vec{g}}\Vert _Y}\end{displaymath}$

(23)

holds. Here $\Vert\cdot\Vert$ and $\Vert\cdot\Vert _Y$ denote respectively the norm in $L^2(0,\infty)$ , i.e.

$\begin{displaymath} \Vert\varphi\Vert=\sqrt{\int_{0}^{\infty} \vert\varphi(x)\vert^2 {\rm d}x} \end{displaymath}$

(24)

and the norm in Y, i.e.

$\begin{displaymath} \Vert{\vec{g}}\Vert _Y = \sqrt{\sum_{n,m=1}^{N} g_n w_{nm} g_m}. \end{displaymath}$

(25)

The number C(A), which depends on the explicit form of the operator A, is called the condition number and clearly represents a reliable indicator of the (potential) instability of the problem, for if C(A) is large from Eq. (23) a small relative error on the data may imply a significant relative error on the solution. It can be shown that the relation

$\begin{displaymath} C(A) = \frac{\sigma_1}{\sigma_N} \end{displaymath}$

(26)

holds, which shows that the singular value spectrum of the operator carries explicit information about the instability of the problem.

It is well-known that the numerical instability of the solution of problem (18) can be reduced by applying regularization. In the case of linear regularization methods a family of linear operators $R_r: Y \rightarrow L^2(0,\infty)$ , with r the so-called regularization parameter, is introduced and an approximate stable solution of the problem is looked for in the set of regularized solutions $f_r = R_r {\vec{g}}$ . The best of these regularized solutions is obtained by choosing an optimum value of the regularization parameter r and to this aim "ad hoc'' criteria have been formulated (Davies 1992). Examples of linear regularization methods (Bertero 1989) are represented by Tikhonov's formulation, where

$\begin{displaymath} R_{\lambda} = (A^* A + \lambda I)^{-1} A^*\end{displaymath}$ (27)

and $r=\lambda$ is real positive and by Landweber's iterative formulation, where

$\begin{displaymath} R_k = \tau \sum_{j=0}^{k-1} (I-\tau A^*A)^j A^* \end{displaymath}$ (28)

with

$\begin{displaymath} 0 < \tau < \frac{2}{\Vert A\Vert _{\rm sup}^{2}} \end{displaymath}$ (29)

and r=k is a positive integer (here $\Vert\cdot\Vert _{\rm sup}$ denotes the canonical operator norm). The computation of the singular value decomposition of the operator A is basic for fast application of both these methods since the action of the operators $R_{\lambda}$ and R_k can be described in terms of the singular system. In fact it can be shown that

$\begin{displaymath} f_{\lambda}(E)= R_{\lambda} {\vec{g}} = \sum_{n=1}^{N} \frac... ...a_n}{\sigma_{n}^{2} + \lambda} ({\vec{g}},{\vec{v}}_n)_Y u_n(E)\end{displaymath}$ (30)

and

$\begin{displaymath} f_k(E)= R_{k} {\vec{g}} = \tau \sum_{n=1}^{N} \frac{[1-(1-\t... ...sigma_{n}^{2})^k]}{\sigma_n} ({\vec{g}},{\vec{v}}_n)_Y u_n(E). \end{displaymath}$ (31)

We note that also the computation of the typical criteria for the choice of the regularization parameter is simplified if the singular system of A is known (Bertero 1989).

Knowledge of the behaviour of the singular functions also gives information about the resolution achievable for given data and a given source space (in the present case we have assumed $L^2(0,\infty)$ as the source space). In linear inverse problem theory the concept of resolution limit is deeply related to a concept called optimum choice of the experiment, meaning that measurements are made in such a way as to give the maximum possible source function resolution consistent with the limits imposed by noise. As already pointed out, the minimum norm solution, or generalized solution, of problem (18) exists and is unique; its expression in terms of the singular system is

$\begin{displaymath} f(E)=\sum_{n=1}^{N} \frac{1}{\sigma_n} ({\vec{g}},{\vec{v}}_n)_Y u_n(E) . \end{displaymath}$ (32)

If we increase the number of experimental points N, the discretized problem (18) becomes closer and closer to the continuous problem (4) and, as a consequence of the ill-posedness of this problem and of the presence of noise on the data, the generalized solution becomes more and more unstable. In other words, by adding more points, we do not add more information and, even in the limiting case of an infinite-dimensional problem, from a given data function, only a finite amount of information can be derived. Obviously this finite amount of information can be obtained with some finite number, say N₀, of points. Terminating sum (32) at N=N₀ is optimal in that it recovers the full information available while stabilizing the solution against random noise variations. We could make the solution more stable by reducing N further, but this has the price of reducing the information on resolution in the solution. If the N₀-th singular function has zeros, from Eq. (32) the generalized solution cannot contain details in intervals smaller than the distance $\Delta E$ between adjacent zeros and the ratio $\Delta E/E$ may be adopted as a measure of the resolution corresponding to N₀. In general the optimum number N₀ of points is not known and has to be chosen according to how we decide to trade-off resolution versus instability. Let us then consider the case of a noisy vector $\vec{g}$ of N data points, with N>N₀, and let us apply, for example, Tikhonov's algorithm (30), in which high order terms in (32) are increasingly suppressed rather than eliminated. In the case of expansion (30) the number of singular functions which significantly contribute to the restoration of the source function is in general less than N. We denote this number again by N₀. A possible criterion for choosing N₀ is that the relative error due to the truncation of the expansion (30) must be just smaller than the measured relative error on the photon spectrum. Then the N₀-th singular function is the last one giving a significant contribution to the regularized solution defined in (30). If $\Delta E$ is the distance between the zeros of the N₀-th singular function so chosen, $\Delta E/E$ is the corresponding resolution limit (with "higher resolution'' meaning smaller $\Delta E/E$ ).

The computation of the singular system of the operator A is based on the observation (cf. Eq. (22)) that the singular values and the singular vectors are given respectively by the square root of the eigenvalues and by the eigenvectors of the matrix AA^*. Furthermore it can be proven that (Bertero et al. 1985)

AA^* = G^T W (33)

where W is the weight matrix whose mn entry is defined by

$\begin{displaymath} W_{mn}= w_m \delta_{mn} \end{displaymath}$

(34)

and G is the Gram matrix whose mn entry is defined by

$\begin{displaymath} G_{mn}=(\phi_m,\phi_n) \end{displaymath}$

(35)

with the functions $\phi_n$ , $n=1,\ldots,N$ defined as in (20).

If we adopt the Kramers cross-section the explicit form of the functions $\phi_n$ is given by

$\begin{displaymath} \phi_n(E)=\left\{ \begin{array} {lr} 0 & E \leq \epsilon_n \\ \frac{1}{\epsilon_n E} & E \gt \epsilon_n \end{array} \right. \end{displaymath}$ (36)

and the mn entry of the Gram matrix is, for $m \geq n$

$\begin{displaymath} G_{mn} = \frac{1}{\epsilon_{m}^{2} \epsilon_n }\end{displaymath}$ (37)

and for m<n

G_mn=G_nm.
(38)

If we adopt the Bethe-Heitler approximation, the explicit form of the functions $\phi_n$ is given by (Piana 1994)

$\begin{displaymath} \phi_n(E)= \left\{ \begin{array} {lr} 0 & E \leq \epsilon_n ... ...1-\frac{\epsilon_n}{E}}}& E \gt \epsilon_n \end{array} \right. \end{displaymath}$

(39)

and the mn entry of the Gram matrix is, for m>n:

	$\begin{eqnarray} G_{mn} & = & \frac{4}{(\epsilon_n\epsilon_m)^{\frac{3}{2}}} \lo... ...silon_n}\right) \log\left(1-\frac{\epsilon_n}{\epsilon_m}\right); \end{eqnarray}$

		(40)

for m=n:

$\begin{displaymath} G_{mm}=\frac{1}{(\epsilon_m)^3} 8{\rm log}2;\end{displaymath}$

(41)

for m<n:

G_mn=G_nm.
(42)

In Table 1 we present the values of the condition number for N=10,25,50,100 geometrically sampled points in the energy range $\epsilon_1=10$ keV, $\epsilon_N=100$ keV in the case of the Kramers and Bethe-Heitler cross-sections. From these results the following facts can be deduced:

for both cross-sections the condition number increases with the number of sampling points. This behaviour can be explained by observing that in the limit $N \rightarrow \infty$ the finite rank operator defined in (19) tends to the integral operator defined by the right hand side of Eq. (4). This operator is compact and its singular spectrum tends to zero. More roughly speaking, if the number of sampling points increases, the discrete ill-conditioned problem becomes more and more similar to the continuous problem which is ill-posed and whose ill-conditioning is formally infinite;
the condition number is systematically bigger if the Bethe-Heitler cross-section is adopted instead of the Kramers one because the former is a smoother kernel. This means that in the case of the Bethe-Heitler cross-section the inverse problem of determining the electron distribution function from the photon spectrum is more unstable in the presence of data noise. The factor of 10 increase in C(A) for N=25 data points, for example, implies that the data accuracy needed for the recovery of the electron spectrum with prescribed accuracy is 10 times higher when the Bethe-Heitler formula is used.

**Table 1:** Condition numbers for the thin-target model in the case of the Kramers (K) and the Bethe-Heitler $\rm (B-H)$ approximations for different numbers N of sampling points. The photon energy range is $\epsilon_1=10$ keV, $\epsilon_N=100$ keV and the sampling is geometric
$\begin{tabular} {\vert\vert c\vert c\vert c\vert c\vert c\vert\vert} \hline & $N... ...ace & $183$\space & $922$\space & $2915$ & $8846$\space \\ \hline \end{tabular}$

$\begin{figure} \begin{center} \includegraphics [height=13cm,clip=]{fig1.eps} \end{center}\end{figure}$

Figure 1: Singular functions for the thin-target model in the case of the Kramers (solid) and the Bethe-Heitler (dashes) cross-section: a) first singular function; b) second singular function; c) third singular function; (4) fourth singular function

In Fig. 1 the first four singular functions in the cases of the Kramers and Bethe-Heitler cross-sections' are plotted for N=25 geometrically sampled points and energy range between $\epsilon_1=10$ keV and $\epsilon_N=100$ keV. These singular functions are computed by using the formula

$\begin{displaymath} u_n(E)= \frac{1}{\sigma_n} \sum_{j=1}^{N} w_j ({\vec{v}}_n)_j \phi_j(E). \end{displaymath}$ (43)

We note that the singular functions in the Kramers case are discontinuous at $E=\epsilon_n$ $n=1,\ldots,N$ (they are linear combinations of the functions $\phi_n$ in (36) which are discontinuous at these energies). Furthermore, for both approximations, the singular function of order n is characterized by n zeros, approximately displaced according to a geometric progression of ratio p. For a given data vector $\vec{g}$ (and for a given source space) the value of p in the last significant singular function (i.e., giving a significant contribution to Eq. (30)), describes the achievable resolution limit. Therefore the value of this limit explicitly depends on the level of noise affecting the data. However it is possible to roughly foresee the difference in the resolution limit achievable for the Kramers or the Bethe-Heitler approximations. For example, let us take N=25 points geometrically sampled between $\epsilon_1=10$ keV and $\epsilon_N=100$ keV. In the Kramers approximation, the condition number for 25 points is $C(A)\simeq 114$ and the value of p for the 25-th singular function is $p \simeq 1.1$ . It follows that the zeros of the 25-th singular function occur at

$\begin{displaymath} E_k=E_1 p^{k-1}~~~~~k=1,\ldots,25 \end{displaymath}$ (44)

so that

$\begin{displaymath} \frac{\Delta E_k}{E_k} = p-1 \simeq 0.1 \end{displaymath}$ (45)

is the resolution limit ( $10 \%$ ). In the Bethe-Heitler approximation the condition number for 25 points is larger ( $C(A) \simeq 922$ ). However, if we reduce the number of points used in the Bethe-Heitler case, we find that the ratio $\sigma_1/\sigma_{11}$ is approximately equal to the condition number in the Kramers cross-section case for all 25 points (i.e., $\sigma_1/\sigma_{25}$ in the Kramers cross-section case). For the eleventh Bethe-Heitler singular function the value of p is $p \simeq 1.2$ and in this case $\Delta E_k/E_k \simeq 0.2$ . From this we conclude that the resolution in the Kramers cross-section approximation is double that for the Bethe-Heitler, for the same solution accuracy.

Up: Optimal inversion of hard