next previous
Up: Efficient implementations of

4. The computational efficiency of the algorithms

 

In the following we will consider the computational efficiency of the traditional and the new implementations of the SOLA method. We remark that for sparse and structured matrices, for which the iterative LSQR algorithm should be used, the flop counts depend on the sparsity and structure of the matrix as well as the necessary number of iterations.

We calculate the number of floating point operations used in the algorithms as a function of the problem size. The ``size'' of the problem is characterized by three constants: The number m of kernels, the number n of points used in the discretization of the target functions and kernels, and the number q of target functions, i.e., the number of points where we wish to determine tex2html_wrap_inline3913. The relative sizes of m, n and q will depend on the actual application area, but it is often the case that tex2html_wrap_inline3921.

4.1. Reduction to unit covariance form

 

Since the reduction to unit covariance form is the only part of the algorithms that depends on the covariance matrix tex2html_wrap_inline3925, we have chosen to analyze this preprocessing step separately, and to analyze the algorithms under the assumption that we have a problem in unit covariance form, i.e., the associated covariance matrix is the identity matrix.

The actual amount of statistical information available may vary a great deal, depending on the application area. In fact, it is often very difficult to estimate the covariance matrix. In helioseismology, for instance, computing the tex2html_wrap_inline3927's from the solar data is a complex process where the correlations are not well known. Some experiments have been performed on artificial data, and they seem to indicate that the covariance matrix has rather small off-diagonal elements, suggesting that the data may be described satisfactory by a diagonal or banded covariance matrix with little bandwidth; see Schou et al. (1995).

  table826
Table 2: Flop counts for transformation to unit covariance form; p is the bandwidth

In Table 2 (click here) we list the computational cost of the preprocessing step for a full covariance matrix, a banded covariance matrix with bandwidth p, and a diagonal covariance matrix, respectively. For a full tex2html_wrap_inline3939 covariance matrix the time spent computing the Cholesky factorization is excessively large. In the more usual case where the variance of the individual errors is known in advance, but no information about correlations between the errors is available, the transformation consists in a simple scaling of the rows in the kernel matrix and a similar scaling of the data and will not contribute significantly to the overall computation time. We emphasize that this transformation is done only once, even if the problem is being solved for several values of tex2html_wrap_inline3941 and/or target functions.

4.2. Comparison of the algorithms

    table837
Table 3: Flop counts for Algorithms 2.3 (click here)-2.3 (click here). The quantity tex2html_wrap_inline3977 is the average number of iterations, and N is the number of flops in a matrix-vector multiplication
Table 4: Flop counts for Algorithm 2.1 (click here) with and without SVD preprocessing

In Tables 3 (click here) and 4 (click here) we summarize the detailed flop counts for Algorithms 2.1 (click here)-2.3 (click here).

In connection with the iterative algorithm used in Step 4 of Algorithm 2.3 (click here), the quantity N denotes the number of flops involved in a matrix-vector multiplication with tex2html_wrap_inline4015, and tex2html_wrap_inline4017 denotes the average number of iterations (since k will typically depend on tex2html_wrap_inline4021). If tex2html_wrap_inline4023 is sparse, then N is twice the number of nonzeros in the matrix, and if tex2html_wrap_inline4027 has Toeplitz form then N is proportional to tex2html_wrap_inline4031. We shall not comment further on the efficiency of this algorithm, since the flop counts depend dramatically on N.

If we consider the case where one wants to calculate tex2html_wrap_inline4035, tex2html_wrap_inline4037 and the corresponding averaging kernels at q specified points tex2html_wrap_inline4041 and q is equal to the number n of discretization points, then the number of flops used by the three algorithms are as follows:
eqnarray865
Obviously Algorithm 2.1 (click here) without the SVD preprocessing is prohibitively slow except for very small problems. The other two approaches are comparable, Algorithm 2.3 (click here) being faster if tex2html_wrap_inline4047. A question of special interest is the amount of work needed to re-compute the solution when tex2html_wrap_inline4049 is changed:
tabular874
Here Algorithm 2.3 (click here) is clearly superior. In most parameter-choice methods, only tex2html_wrap_inline4073 and tex2html_wrap_inline4075 have to be re-computed - a fact that makes Algorithm 2.3 (click here) even more attractive.

A crucial factor in mollifier methods is the shape of the chosen target function tex2html_wrap_inline4077. Therefore it is likely that solutions corresponding to a series of differently shaped target functions are computed. Below we list the number of flops used when q target functions are changed:
tabular885
Again, Algorithm 2.3 (click here) is the most efficient - although the difference is not as dramatic as above.

Pijpers & Thompson (1994) demonstrated that for helioseismic inversion it is possible to determine an automatic scaling of the width of the (Gaussian) target functions tex2html_wrap_inline4103. From physical arguments it is possible to derive an expression that relates the smallest length scale by which the rotation can be resolved at radius tex2html_wrap_inline4105 to certain physical quantities. They demonstrate that once the optimal target width has been computed at tex2html_wrap_inline4107, this relation can be used to scale the target width at any radius. Using this scaling the optimal value of tex2html_wrap_inline4109 for each tex2html_wrap_inline4111 can be computed efficiently using Algorithm 2.3 (click here).


next previous
Up: Efficient implementations of

Copyright by the European Southern Observatory (ESO)
web@ed-phys.fr