3 The Backus-Gilbert method

The Backus-Gilbert method is one of a family of methods for attacking inverse problems. There are several general introductions to the method (Parker 1977, gives an excellent review, and Loredo & Epstein 1989, gives an example in an astrophysical context) - we give a minimal introduction here, partly in order to fix our notation. There is a more formal discussion of the method in an appendix, and a thorough treatment can be found in Backus & Gilbert (1970).

We wish to recover an underlying function u(r), in this case, either an intensity, or the run of polarization with radius; the parameter r here is the projected radial distance from the centre of the eclipsed star's disk, in units where the star's radius is r=1. We cannot measure this underlying function directly, but instead measure an integral of it, $F(\vec s)$ , which might be the observed luminosity or polarisation of the eclipsed star. This is related to the underlying function through

$\begin{displaymath} F({\vec s}) = \int_0^1 u(r) K(r;{\vec s}) \,\relax {\rm d}\relax r, \relax \end{displaymath}$ (4)

where $\vec s$ is the vector between the centres of the star and its occultor, and the kernel $K(r;\vec s)$ can be calculated a priori. The set of observations that we make, $f_i\equiv F({\vec s}_i)$ ,is therefore related to this underlying function by

$\begin{displaymath} f_i = \int_0^1 u(r) K_i(r)\,\relax {\rm d}\relax r + n_i, \relax \end{displaymath}$ (5)

where $K_i(r)\equiv K(r;{\vec s}_i)$ and n_i is a random admixture of noise. From these measurements we wish to produce an estimator $\hat u(r)$ of the underlying function u(r).

For the Backus-Gilbert method, we suppose that the mean of the estimator and the underlying function are related by an averaging kernel $\Delta(r,r')$ , through

$\begin{displaymath} \relax E\left(\hat u(r)\right) = \int_0^1 \Delta(r,r') u(r') \,\relax {\rm d}\relax r'\,. \relax \end{displaymath}$ (6)

Since we do not know the underlying function, the averaging kernel is of no use to us directly; however we can study its properties, and use our data f_i in such a way as to optimise those properties, and so minimise the dependence of the estimate $\hat u(r)$ on the underlying function and the noise. It is clear from Eq. (6) that $\hat u(r) = u(r)$ if $\Delta(r,r')$ is the Dirac delta function. However, such a solution is highly sensitive to noise in the data and hence very unstable. We attain stability by increasing the width of $\Delta(r,r')$ , and hence smoothing the recovered value over a wider region of the source - that is, we minimise the width of $\Delta(r,r')$ subject to the conflicting demand that the resolution of $\Delta(r,r')$ is sufficient for the problem at hand, in a sense we shall make more precise below.

We relate our data f_i to our estimator $\hat u(r)$ through a set of response kernels q_i(r), which produce an estimate of the underlying function through

$\begin{displaymath} \hat u(r) = \sum_i q_i(r) f_i. \relax \end{displaymath}$ (7)

If we substitute Eq. (5) into Eq. (7), and assume $\relax E\left(\sum_i q_i(r) n_i\right)=0$ , then we can compare with Eq. (6), and obtain

$\begin{displaymath} \Delta(r,r') = \sum_i q_i(r) K_i(r'). \relax \end{displaymath}$ (8)

This allows us to form some measure of the width of $\Delta(r,r')$ such as

$\begin{eqnarray} \relax \relax \mathcal{A}&\equiv &\int (r-r')^2[\Delta(r,r')... ...\\ &=& {\vec q}(r)^{T} {\vec W}(r) {\vec q}(r), \nonumber \relax \end{eqnarray}$ (9)

which depends on q_i, and depends on K_i through the definition

$\begin{displaymath} W_{ij}(r) \equiv \int_0^1 (r'-r)^2 K_i(r') K_j(r')\,\relax {\rm d}\relax r'. \relax \end{displaymath}$ (10)

This is the standard definition of the width; others are reasonable, and may be preferable in different circumstances.

Again assuming that $\relax E\left(\vec q\cdot\vec n\right)=0$ , we can also form a measure of the stability of Eq. (7)

$\begin{displaymath} \relax \mathcal{B}\equiv\relax \mathop{\rm{Var}}\relax \hat u(r) = {\vec q}(r)^{T} {\vec S} {\vec q}(r), \relax \end{displaymath}$ (11)

which depends on q_i and the noise covariance matrix $S_{ij}\equiv\relax E\left(n_i n_j\right)$ . In our simulations below, we take the n_i to be independent and Gaussian with standard deviation $\sigma$ ; in this case, $S_{ij}=\delta_{ij}\sigma^2$ , and our assumption $\relax E\left(\vec q\cdot\vec n\right)=0$ is true.

Finally, the demand that $\Delta(r,r')$ in Eq. (6) have unit area, leads to the constraint

$\begin{displaymath} {\vec q}(r) \cdot{\vec R} = 1, \relax \end{displaymath}$ (12)

where $R_i\equiv\int K_i(r)\,\relax {\rm d}\relax r$ .

The Backus-Gilbert method consists of finding those q_i(r) which minimise

$\begin{eqnarray} \relax \relax \mathcal{A}+\lambda\relax \mathcal{B }&=& {\ve... ...lambda\relax \mathop{\rm{Var}}\relax \hat u(r), \nonumber \relax \end{eqnarray}$

for some selected parameter $\lambda$ , subject to the constraint ${\vec q}\cdot{\vec R}=1$ . The minimisation problem has explicit analytic solutions

$\begin{displaymath} {\vec q}_\lambda(r) = \frac{[{\vec W}(r) + \lambda {\vec ... ...cdot[{\vec W}(r) + \lambda {\vec S}]^{-1}\cdot{\vec R}} \relax \end{displaymath}$ (13)

in terms of the parameter $\lambda$ , and these different solutions, when combined with the data f_i using Eq. (7), give different estimators $\hat u_\lambda(r)$ . The nature of the trade-off in the minimisation is clear: in order to improve the stability of the recovery, we choose a $\lambda$ which makes $\Delta(r,r')$ broader, and so generate response kernels q_i which extend the weighted average over a greater number of the data points f_i. The cost of this is that the estimate of the recovered point will be biased by the inclusion of the extra data, and this will be more marked when the underlying function is rapidly varying.

The first important point about the Backus-Gilbert method is that the parameter $\lambda$ allows us to adjudicate between the conflicting demands of minimising the width of the kernel $\Delta(r,r')$ and minimising the sensitivity of the recovered value (which is a realisation of the statistical variable $\hat u_\lambda(r)$ ) to the measurement noise, and that this adjudication can be done prior to any data being collected , based only on the characteristics of the kernel $K(r;\vec s)$ and the noise.

Secondly, we must emphasise that the ${\vec q}_\lambda(r)$ we obtain gives us, through Eq. (7), a single point in the recovered function, $\hat u_\lambda(r)$ . This means that in this simplest version of the Backus-Gilbert method we must perform the inversion for each value of r for which we wish to find $\hat u_\lambda(r)$ .Since the calculation of the coefficients ${\vec q}_\lambda(r)$ involves a matrix inversion, which is an n³ procedure, it can be computationally expensive, but this limitation is acceptable in our particular case, as we only wish to recover the polarisations at the limb, r=1. This feature has the compensation that we can if necessary select a different optimal value of $\lambda$ for each recovered point.

Up: Inversion of polarimetric data

	$\begin{eqnarray} \relax \relax \mathcal{A}&\equiv &\int (r-r')^2[\Delta(r,r')... ...\\ &=& {\vec q}(r)^{T} {\vec W}(r) {\vec q}(r), \nonumber \relax \end{eqnarray}$	(9)