Up: Parametric representation of helioseismic

3 Mathematical development

3.1 Signal model

The signal that we propose to analyse is the full-disk measurement from GOLF [Gabriel et al. 1995]. It is regularly sampled and does not present any gaps. This signal is well known [Lazrek et al. 1997] for containing l=0 to 3 modes and marginally l=4 and l=5. Higher degree modes are averaged and remain, in particular in the lower frequency range, well below the noise level.

The signal is down-sampled to 80 s after being filtered above 12 MHz to avoid aliasing. A small bandwidth of $\sim 6~ \mu {\rm Hz}$ around an l=1 mode contains only a known number of modes. The wings of the modes surrounding the bandwidth studied can be regarded as an additional white noise.

It can be easily demonstrated that when $T_{\rm s}\nu_0 \ll 1$ where $T_{\rm s}$ is the sampling frequency, the samples from the continuous model (1) verify a second order autoregressive equation:

x_n^(k)=-a₁^(k) x_n-1^(k)-a₂^(k) x_n-2^(k)+e_n^(k),
(2)

where e_n^(k) is a Gaussian white noise excitation. The index k references the mode. This can be proved using the filter representation of the process and converting the analog filter to a discrete filter using the backward approximation. Various statistical justifications of this model can be found for example in [Priestley 1994].

It is worth noting that the width of the analysed frequency bands corresponds to an innovation rate of 2 days. The hypothesis of temporal independence of the excitation signal e_n^(k) is justified by statistical analysis of the mode power already mentioned in Sect. 2 and in particular by the absence of correlation found by Foglizzo et al.

Finally, we assume that the measured signal y_n is the sum of L modes:

$\begin{displaymath} y_n=\sum_{k=1}^L x_n^{(k)}+w_n, n=0,~\ldots,~N-1,\end{displaymath}$ (3)

and a Gaussian white noise with variance $\sigma^2$ . We will assume herein that $\sigma^2$ is known. The set of measurements will be referred ${\cal Y}_N=\{y_0,~\ldots,~y_{N-1}\}$ .

In the model (2,3) the correlation between the different modes excitations is introduced by the covariance of the input terms. Let $\vec{E}_n=(e_n^1,~\ldots e_n^L)^t$ be the vector of excitation and Q its covariance: $\mbox{$E$}[\vec{E}_n \vec{E}_n^t] = Q$ . The correlation coefficient between excitation of mode i and j is defined by $r_{ij}=Q_{ij}/\sqrt{Q_{ii}Q_{jj}}$ .

The problem consists in the estimation of the coefficients of the autoregressive process, denoted $\theta$ , and the covariance matrix Q.

3.2 Position of the problem

An immediate solution is to resort to maximum likelihood estimation. The process y_n is Gaussian with covariance matrix $R_{y,N}(\theta,Q)$ . Consequently, the estimated parameters are the global maximisers of the log-likelihood function or equivalently, after dropping the constant terms, of the function:

$\begin{displaymath} {\cal L}({\cal Y}_N;\theta,Q)=-\log \vert R_{y,N}(\theta,Q)\vert-Y_NR_{y,N}(\theta,Q)^{-1}Y_N^t,\end{displaymath}$ (4)

with $Y_N=(y_0,~\ldots,~y_{N-1})$ . Unfortunately, the maximisation of (4) by an optimisation algorithm is not tractable because:

the dimension $N\times N$ of $R_{y,N}(\theta,Q)$ leads to a large computational cost for the evaluation of ${\cal L}(\theta,Q)$ and the non-linear dependence between the unknown parameters and $R_{y,N}(\theta,Q)$ leads to a cost function that is highly non-linear, see Appendix A.
the maximisation must be performed under the constraints that the autoregressive models are stable and the matrix Q is semi-positive definite. Whereas the first constraint can be removed by a proper reparametrisation of $\theta$ , a simple expression of the second constraint is not available.

A suboptimal solution is to minimise a distance between a M order matrix $R_{y,M}(\theta,Q)$ and its estimated version $\hat{R}_{y,M}$ ,computed from the y_n. This solution significantly reduces the computational cost. If the cost function is defined as:

$\begin{displaymath} {\cal C}(\theta,Q)=\mathrm{trace}(R_{y,M}(\theta,Q)-\hat{R}_{y,M})^2,\end{displaymath}$

(5)

it can be shown that its dependence on the elements of Q is quadratic, see appendix A. However, the problems related to the nonlinear dependence on $\theta$ and the constraint on Q remain.

A major difficulty in this estimation problem comes from the coupling between the two sets of parameter $\theta$ and Q. To address this, we propose a two stage approach that first estimates $\theta$ and then uses this result for the estimation of Q. Besides the different nature of $\theta$ and Q, this solution can be justified by the strong a priori that is available on the shape of the modes that can be incorporated in the first stage. Another argument is given by an analogy with the pure autoregressive case where the power of the excitation is generally computed from the autoregressive parameters and the data, [Stoica & Moses 1997].

3.3 Estimation of the dynamical parameters

For this goal, we will derive a new statistical representation of y_n, i.e. a signal $\bar{y}_n$ with the same autocorrelation sequence as y_n. Consider the signal z_n obtained by filtering y_n:

$\begin{displaymath} H(z)=\prod_{q=1}^L (1+a_1^{(q)}z^{-1}+a_2^{(q)} z^{-2})=\sum_{q=0}^{2L}c_qz^{-q}.\end{displaymath}$

(6)

Hypothesis (2,3) implies that z_n is the sum of each mode filtered by H(z) and can be expressed:

	$\begin{eqnarray} z_n&=&\sum_{k=1}^{L} H(z)x_n^{(k)} + H(z)w_n \\ &=&\sum_{k=1}^{... ..._{q=0}^{2(L-1)} b_q^{(k)} e_{n-q}^{(k)}+\sum_{q=1}^{2L}c_qw_{n-q},\end{eqnarray}$	(7)
		(8)

with:

$\begin{displaymath} \sum_{q=0}^{2(L-1)} b_q^{(k)} z^{-k}=\prod_{q=1,q\not =k}^L (1+a_1^{(q)}z^{-1}+a_2^{(q)} z^{-2}).\end{displaymath}$

(9)

Equation (8) implies $\mbox{$E$}z_nz_{n+l}=0$ for |l|>2L. Then z_n has a moving average representation of order 2L, [Stoica & Moses 1997, p. 102]:

$\begin{displaymath} \bar{z}_n=A(z) \bar{e}_n,\quad A(z)=\sum_{k=1}^{2L} b_k z^{-k}.\end{displaymath}$

(10)

Consequently y_n has an autoregressive moving average representation where the coefficients of the autoregressive part equal the coefficients of H(z):

$\begin{displaymath} \bar{y}_n=\frac{A(z)}{H(z)} \bar{e}_n.\end{displaymath}$

(11)

This result suggests the following algorithm for the estimation of the parameter vector $\theta$ :

1.: Apply to y_n an ARMA estimation algorithm in order to estimate its autoregressive part, H(z).
2.: Compute the filters 1+a₁^(k)z^-1+a₂^(k) z^-2 combining the L pairs of H(z) conjugate roots.

We propose to estimate H(z) using the subspace parameter algorithm described in [Stoica & Moses 1997, p. 115]. This method is based on the singular value decomposition of the estimated Block-Hankel matrix of covariances $\hat{C}_{y,M}$ whose element (i,j) are the estimated autocorrelations y_n at lag |i+j-1|. Let $\hat{\Sigma}$ the diagonal matrix made from the 2L principal singular values of $\hat{C}_{y,M}$ and $\hat{U}$ the matrix containing the associated 2L left singular vectors. Let $\overline{\Gamma}$ and $\underline{\Gamma}$ be the matrices made from the first and last, M-1 rows of $\hat{U}\hat{\Sigma}^{1/2}$ . The roots of H(z) are given by the eigenvalues of matrix $\Phi$ , the least square solution of the system $\overline{\Gamma} \Phi = \underline{\Gamma}$ .

3.4 Estimation of the excitation covariance

The second step of the method consists in replacing $\theta$ by its estimated value $\hat{\theta}$ and estimating the covariance of the vector $\vec{E}_n$ . The derivation of an efficient algorithm for this requires a signal model that explicitly involves this vector process. A solution is to use a state space representation of y_n.

It can be easily verified that, up to a shift in the excitation, each mode can be represented by the following state space representation:

$\begin{displaymath} \vec{X}_{n+1}^{(k)}=F^{(k)} \vec{X}_n+G^{(k)} \vec{E}_n^{(k)},\quad x_n^{(k)}=H^{(k)^t} \vec{X}_n.\end{displaymath}$ (12)

with:

$\begin{displaymath} F^{(k)}= \left( \begin{array} {cc} 0 & -a_2^{(k)} \\ 1 & -... ...uad H^{(k)}=\left(\begin{array} {c} 0 \\ 1\end{array}\right).\end{displaymath}$

Consequently, a state space representation of y_n is obtained by the concatenation of the previous models:

$\begin{displaymath} \vec{X}_{n+1}=F \vec{X}_n+G\vec{E}_n,\quad y_n=H^t \vec{X}_n+w_n,\end{displaymath}$ (13)

where F is a $2L\times 2L$ block diagonal matrix with the matrices F^(k) on the diagonal. H is the $2L \times 1$ vector obtained by concatenation of the H^(k), and G is the $2L \times L$ matrix having as non-zero elements G(2k,2)=1, $k=1,~\ldots,~L$ . The unobserved set of states will be referred ${\cal X}_N=\{\vec{X}_0,~\ldots,~\vec{X}_{N-1}\}$ .

It is worth noting that the model (13), contrary to the model (2,3), allows a simple expression of the power spectrum density (PSD), [Anderson & Moore 1979, p. 75]:

$\begin{eqnarray} \lefteqn{S_y(\omega)=}\\ && [H^t(zI-F)^{-1}GQ\nonumber G^t(z^{-1}I-F^t)^{-1}H+\sigma^2]_{z=e^{j\omega}}.\end{eqnarray}$ (14)

As suggested in Sect. 3.1, we intend to estimate Q as the global maximiser of ${\cal L}({\cal Y}_N,~\hat{\theta},~Q)$ . The dependence on $\hat{\theta}$ will be dropped in the the following. The observation equation of model (13) shows that there exist a many-to-one mapping between ${\cal X}_N$ and ${\cal Y}_N$ . This suggests maximizing the likelihood by an iterative technique: the expectation maximisation algorithm (EM). This method is direct maximising the likelihood of ${\cal Y}_N$ , the "incomplete data'' making an essential use of ${\cal X}_N$ , the "complete data''. The reference paper on EM is Dempster et al. (1977) Moon (1996) has presented a tutorial on the EM algorithm including extensive references.

This algorithm has been recently applied in Deng and Shen (1997) for the estimation of all the parameters of a linear dynamic system. We will derive here the method for the case where the only unknown is Q.

The E step is the computation of the conditional expectation of ${\cal L}({\cal X}_N;Q)$ given ${\cal Y}_N$ and the current estimate of Q, noted $\hat{Q}^{[k]}$ . For the M step, let $\hat{Q}^{[k+1]}$ be that value of Q that maximizes the previous conditional expectation.

In our case, the log likelihood of ${\cal X}_N$ is calculated using the Markov property of (12), [Anderson & Moore 1979]. Dropping all the terms that are not functions of Q, we have:

$\begin{displaymath} {\cal L}({\cal X}_N;Q)=-\frac{N-1}{2}\log \vert Q\vert -\frac{1}{2} \sum_{k=1}^{N-1}G^t\vec{V}_k^tQ^{-1}\vec{V}_kG,\end{displaymath}$

$\begin{displaymath} \vec{V}_k=\vec{X}_k-F\vec{X}_{k-1}.\end{displaymath}$

Taking the conditional expectation of this equality and using the differentiation rules ${\rm d} \log \vert Q\vert/{\rm d}Q=Q^{-1}$ and ${\rm d} \vec{U}^tQ^{-1}\vec{U}/ {\rm d}Q=-Q^{-1}\vec{U}\vec{U}^tQ^{-1}$ gives the following recursion for $\hat{Q}^{[k]}$ :

$\begin{displaymath} \hat{Q}^{[k+1]}=\frac{1}{N-1}\sum_{k=1}^{N-1}G^t \mbox{$E$}[\vec{V}_k\vec{V}_k^t/{\cal Y}_N,\hat{Q}^{[k]}] G.\end{displaymath}$

(15)

Expansion of the inner term shows that each iteration requires the computation of terms $\mbox{$E$}[\vec{X}_k\vec{X}_k^t/{\cal Y}_N,\hat{Q}^{[k]}]$ and $\mbox{$E$}[\vec{X}_k\vec{X}_{k-1}^t/{\cal Y}_N,\hat{Q}^{[k]}]$ . If $\hat{\vec{X}}_{k/n}$ is the conditional expectation of the state with respect to the observation and $\hat{Q}^{[k]}$ , P_k/N its variance and P_k,k-1/N the cross covariance between $\hat{\vec{X}}_{k/n}$ and $\hat{\vec{X}}_{k-1/n}$ , we have:

$\begin{displaymath} \mbox{$E$}[\vec{X}_k\vec{X}_k^t/{\cal Y}_N,\hat{Q}^{[k]}]=P_{k/N}+\hat{\vec{X}}_{k/N}\hat{\vec{X}}_{k/N}^t, \end{displaymath}$

$\begin{displaymath} \mbox{$E$}[\vec{X}_k\vec{X}_{k-1}^t/{\cal Y}_N,\hat{Q}^{[k]}]=P_{k,k-1/N}+\hat{\vec{X}}_{k/N}\hat{\vec{X}}_{k-1/N}^t. \end{displaymath}$

These quantities can then be computed, as in Deng & Shen (1997), using a fixed interval smoothing algorithm, [Anderson & Moore 1979] whose equations are summarized in Appendix B.

3.5 Remarks on the algorithm

The development of Sect. 3.3 can be easily extended to incorporate the iterative estimation of $\sigma^2$ . This solution has not been investigated here because of the high signal to noise ratio that has been observed in the analysed frequency band. In our case $\sigma^2$ has been estimated by the average of the bins of the periodogram surrounding the modes.
The structure of (15) guarantees that $\hat{Q}^{[k]}$ is semi-positive definite.
In Sage & Melsa (1971), it is shown that the maximum a posteriori estimator of Q is:

$\begin{displaymath} \hat{Q}_{\rm MAP}=\frac{1}{N-1}\sum_{k=1}^{N-1} G^t \hat{\vec{V}}_{k/N}\hat{\vec{V}}_{k/N}^tG,\end{displaymath}$ (16)

$\begin{displaymath} \hat{\vec{V}}_{k/N}=\hat{\vec{X}}_{k/N}-F\hat{\vec{X}}_{k-1/N}.\end{displaymath}$ (17)

An iterative implementation of this estimator can be obtained estimating the smoothed state vector at iteration k using $\hat{Q}^{[k]}_{\rm MAP}$ and compute $\hat{Q}^{[k+1]}_{\rm MAP}$ using Eq. (16). It is worthy noting that this algorithm corresponds to the EM algorithm where the update of Q is computed assuming that the matrices P_k/N and P_k,k-1/N equal zero. A major drawback of this solution is that it does not guarantee an increase of the likelihood at each iteration.
A large part of the previous development can be easily extended to the case where the modes are represented by autoregressive models of order higher than two, allowing the relaxation of the hypothesis of temporal independence of the noise input. The major difficulty that arises is to group the roots of H(z) corresponding to the different modes.

Up: Parametric representation of helioseismic

	$\begin{eqnarray} \lefteqn{S_y(\omega)=}\\ && [H^t(zI-F)^{-1}GQ\nonumber G^t(z^{-1}I-F^t)^{-1}H+\sigma^2]_{z=e^{j\omega}}.\end{eqnarray}$	(14)