4. The computational efficiency of the algorithms

In the following we will consider the computational efficiency of the traditional and the new implementations of the SOLA method. We remark that for sparse and structured matrices, for which the iterative LSQR algorithm should be used, the flop counts depend on the sparsity and structure of the matrix as well as the necessary number of iterations.

We calculate the number of floating point operations used in the algorithms as a function of the problem size. The ``size'' of the problem is characterized by three constants: The number m of kernels, the number n of points used in the discretization of the target functions and kernels, and the number q of target functions, i.e., the number of points where we wish to determine . The relative sizes of m, n and q will depend on the actual application area, but it is often the case that .

4.1. Reduction to unit covariance form

Since the reduction to unit covariance form is the only part of the algorithms that depends on the covariance matrix , we have chosen to analyze this preprocessing step separately, and to analyze the algorithms under the assumption that we have a problem in unit covariance form, i.e., the associated covariance matrix is the identity matrix.

The actual amount of statistical information available may vary a great deal, depending on the application area. In fact, it is often very difficult to estimate the covariance matrix. In helioseismology, for instance, computing the 's from the solar data is a complex process where the correlations are not well known. Some experiments have been performed on artificial data, and they seem to indicate that the covariance matrix has rather small off-diagonal elements, suggesting that the data may be described satisfactory by a diagonal or banded covariance matrix with little bandwidth; see Schou et al. (1995).

table826
Table 2: Flop counts for transformation to unit covariance form; p is the bandwidth

In Table 2 (click here) we list the computational cost of the preprocessing step for a full covariance matrix, a banded covariance matrix with bandwidth p, and a diagonal covariance matrix, respectively. For a full covariance matrix the time spent computing the Cholesky factorization is excessively large. In the more usual case where the variance of the individual errors is known in advance, but no information about correlations between the errors is available, the transformation consists in a simple scaling of the rows in the kernel matrix and a similar scaling of the data and will not contribute significantly to the overall computation time. We emphasize that this transformation is done only once, even if the problem is being solved for several values of and/or target functions.

4.2. Comparison of the algorithms

table837
Table 3: Flop counts for Algorithms 2.3 (click here)-2.3 (click here). The quantity is the average number of iterations, and N is the number of flops in a matrix-vector multiplication
Table 4: Flop counts for Algorithm 2.1 (click here) with and without SVD preprocessing

In Tables 3 (click here) and 4 (click here) we summarize the detailed flop counts for Algorithms 2.1 (click here)-2.3 (click here).

In connection with the iterative algorithm used in Step 4 of Algorithm 2.3 (click here), the quantity N denotes the number of flops involved in a matrix-vector multiplication with , and denotes the average number of iterations (since k will typically depend on ). If is sparse, then N is twice the number of nonzeros in the matrix, and if has Toeplitz form then N is proportional to . We shall not comment further on the efficiency of this algorithm, since the flop counts depend dramatically on N.

If we consider the case where one wants to calculate , and the corresponding averaging kernels at q specified points and q is equal to the number n of discretization points, then the number of flops used by the three algorithms are as follows:
eqnarray865
Obviously Algorithm 2.1 (click here) without the SVD preprocessing is prohibitively slow except for very small problems. The other two approaches are comparable, Algorithm 2.3 (click here) being faster if . A question of special interest is the amount of work needed to re-compute the solution when is changed:
tabular874
Here Algorithm 2.3 (click here) is clearly superior. In most parameter-choice methods, only and have to be re-computed - a fact that makes Algorithm 2.3 (click here) even more attractive.

A crucial factor in mollifier methods is the shape of the chosen target function . Therefore it is likely that solutions corresponding to a series of differently shaped target functions are computed. Below we list the number of flops used when q target functions are changed:
tabular885
Again, Algorithm 2.3 (click here) is the most efficient - although the difference is not as dramatic as above.

Pijpers & Thompson (1994) demonstrated that for helioseismic inversion it is possible to determine an automatic scaling of the width of the (Gaussian) target functions . From physical arguments it is possible to derive an expression that relates the smallest length scale by which the rotation can be resolved at radius to certain physical quantities. They demonstrate that once the optimal target width has been computed at , this relation can be used to scale the target width at any radius. Using this scaling the optimal value of for each can be computed efficiently using Algorithm 2.3 (click here).

Up: Efficient implementations of