algorithm: SOLA using a Lagrange multiplier and normal equations.

Transform to standard form
Perform SVD reduction (optional)
Set up the augmented normal equations and compute factorization
Compute the right-hand side
Solve for coefficients
Compute the estimates of f and the corresponding variances at
If needed, compute the corresponding averaging kernels

In practice one may wish to repeat the algorithm for different shapes of the target function and/or varying values of the regularization parameter . Changing the target functions requires re-computing steps 4, 5, 6 and 7. Changing requires re-computing steps 3, 5, 6 and 7. In Sect. 4 (click here) we will give a detailed analysis of the number of operations used by the algorithm.

2.2. SOLA by elimination of the constraint

We can identify Eq. (9 (click here)) as an instance of the well known equality constrained least squares problem (LSE); see Lawson & Hanson (1974), Chap. 20. To realize this, let us introduce the following quantities:
eqnarray465
Using this notation we can write (9 (click here)) in the following form:

These are in fact the normal equations for a constrained least squares problem, for which we have the equivalent expression:

We can reformulate this as
equation484
We now have an equality constrained least squares problem in unit covariance form. Several methods have been developed for solving problem LSE. Here we use the one described in Lawson & Hanson (1974), Chap. 20 and Golub & Van Loan (1989), Sect. 12.1.4. The method has three stages:

Derive a lower-dimensional unconstrained least squares problem from the given problem.
Compute the solution to the lower-dimensional problem.
Transform this solution back to the original setting.

Below we describe in detail how each of these three steps should be implemented to give an efficient SOLA algorithm.

2.2.1 Derivation of a lower-dimensional problem

We can determine a parameterization of the set of vectors in that satisfy the constraint in Eq. (15 (click here)):

To do this we compute an Householder transformation (see Golub & Van Loan 1989, Sect. 5.2) such that

In practice is not computed explicitly, but is represented in the form . The columns of form a basis for and conceptually we can partition as

where . It follows that can be represented as

where is an arbitrary (m-1)-vector (since all the columns of are orthogonal to ). We can exploit this fact to transform problem LSE into the following form, where we have inserted Eq. (16 (click here)) into Eq. (15 (click here)) and collected terms:

We now multiply from the left under the norm-sign with the orthogonal matrix

without changing the solution. Using the orthogonality of we finally arrive at the reduced problem

This is an ordinary unconstrained least squares problem. Having solved this for we obtain the solution to the original constrained problem as

2.2.2. Solution via full bidiagonalization

We now describe an algorithm for solving (17 (click here)) which is efficient when many values of are required. Following Eldén (1977) we first bring into bidiagonal form:

where

Here , , and is lower bidiagonal. This is the most costly part of the computation, it requires flops. The remaining part of the algorithm only involves , which means that the dimension of the problem is effectively reduced from to . As in the SVD transformation, is not explicitly computed, but is computed ``on the fly''. If is required for computing the discretized kernel , then the orthogonal transformations that consists of are stored for later use. For algorithmic details on bidiagonalization see, e.g., Golub & Van Loan (1989), pp. 236-238.

If we make the transformation of variables

then we easily see that (17 (click here)) is equivalent to

This least squares problem is now solved by determining a series of 2n-1 Givens rotations (see Golub & Van Loan 1989, Sect. 5.2) such that

where is lower bidiagonal. The part of the right-hand side denoted by ``" consist of n non-zero elements introduced along the way, but these are not computed in practice. We illustrate this procedure on a small example with n=3. First a rotation applied to rows n and 2n annihilates the (2n,n)-element, and a nonzero element is created in position (2n,n-1); then we annihilate this element by a rotation applied to rows 2n-1 and 2n:
displaymath3317
where

In the next step the same procedure is applied to the leading sub-matrix to annihilate the element in position (2n-1,n-1), and so on. During the reduction we only need to store the diagonals of in a pair of vectors, and the current value of . Now the solution can be computed as

2.2.3. Solution via Lanczos bidiagonalization

For a large sparse or structured (e.g., Hankel or Toeplitz) matrix it is prohibitive to compute explicitly the full bidiagonalization in Eq. (19 (click here)). Instead we would prefer to keep in factored form and use an iterative algorithm to solve the system in (17 (click here)). An efficient and stable algorithm for doing this, based on Lanczos bidiagonalization (Golub & Van Loan 1989, Sect. 9.3.3), is the algorithm LSQR, Paige & Saunders (1982a). This algorithm only accesses the coefficient matrix via matrix-vector multiplications with and .

An even more efficient approach is to apply the LSQR algorithm with and use the fact that the kth iterate is a regularized solution. To be more specific, when we apply LSQR to Eq. (17 (click here)) with , i.e., to

then the iteration number k plays the role of the regularization parameter; small k correspond to large , and vice versa. The reason is that LSQR captures the spectral components of the solution in order of increasing frequency, hence the initial iterates are smoother than the iterations in later stages, and as a consequence the initial iterates are more regularized.

The LSQR algorithm with is equivalent to Lanczos bidiagonalization of . Specifically, after k steps, the Lanczos bidiagonalization algorithm has produced three matrices , , and such that

where both and have orthonormal columns, and is bidiagonal. Moreover, the first k columns of , the first k-1 columns of , and the leading sub-matrix of are identical to the corresponding quantities in the previous step of the algorithm. The kth iterate is then computed as

where solves the problem

The key to the efficiency of LSQR lies in the way that is updated to , while and are never stored; we refer to Paige & Saunders (1982a) for more details.

We emphasize the difference between the two bidiagonalization procedures. The full bidiagonalization (19 (click here)) is explicitly computed once, and it is independent of the regularization parameter; enters when solving the augmented bidiagonal system in Eq. (22 (click here)). The Lanczos bidiagonalization appears implicitly in computing the LSQR iterates , and the sequence of iterates corresponds to sweeping through a range of regularization parameters.

2.2.4. Transformation back to the original setting

To compute the solution to the original problem in (15 (click here)) it is necessary to transform the solution or to the original variable . Below and denote either and from Sect. 2.2.2 (click here) or and from Sect. 2.2.3 (click here). Inserting Eq. (18 (click here)) and (21 (click here)) in Eq. (2 (click here)) we get
eqnarray671
Similarly, if we insert Eqs. (18 (click here)), (19 (click here)) and (21 (click here)) in Eq. (10 (click here)) and use the definition of then we get
eqnarray679
Finally, due to the orthogonality of and the variance of the error in due to the noise takes the form
eqnarray684

Consider first full bidiagonalization from Sect. 2.2.2 (click here). We notice that is the only quantity in the above expressions that depends on the regularization parameter and the target function . Therefore and only need to be computed once (in fact, they are already computed in earlier stages of the algorithm). This means that re-computing the solution when changing or is very cheap. This is a great improvement compared with the Lagrange-multiplier method from Sect. 2.1 (click here), where a matrix factorization must re-computed each time.

Although we do not explicitly compute , it is still quite inexpensive to compute the discretized averaging kernels for the different solutions. Again, one of the terms, namely , is already computed in earlier steps of the algorithm and the term requires only approximately flops.

Consider next the case from Sect. 2.2.3 (click here) where we use Lanczos bidiagonalization. Again, only depends on the target function , but now both and depend on the regularization parameter, i.e., the iteration number k. Moreover, they both appear only implicitly in the algorithm, while it is the iteration vector that is explicitly computed. It is now more efficient to use the formulation to compute the estimated solution, and the variance of the error in is given by . Similarly, the discretized averaging kernel should be computed by means of .

2.3. Summary of the new algorithms

Our new algorithms, based on bidiagonalization, are summarized below. They describe the general case where is computed at points . The matrix below is again the matrix whose columns contain ``samples'' of the q target function: , , .

Up: Efficient implementations of

Copyright by the European Southern Observatory (ESO)
This email address is being protected from spambots. You need JavaScript enabled to view it.