next previous
Up: Parametric representation of helioseismic


3 Mathematical development

3.1 Signal model

  The signal that we propose to analyse is the full-disk measurement from GOLF [Gabriel et al. 1995]. It is regularly sampled and does not present any gaps. This signal is well known [Lazrek et al. 1997] for containing l=0 to 3 modes and marginally l=4 and l=5. Higher degree modes are averaged and remain, in particular in the lower frequency range, well below the noise level.

The signal is down-sampled to 80 s after being filtered above 12 MHz to avoid aliasing. A small bandwidth of $\sim 6~ \mu {\rm Hz}$ around an l=1 mode contains only a known number of modes. The wings of the modes surrounding the bandwidth studied can be regarded as an additional white noise.

It can be easily demonstrated that when $T_{\rm s}\nu_0 \ll 1$ where $T_{\rm s}$is the sampling frequency, the samples from the continuous model (1) verify a second order autoregressive equation:

xn(k)=-a1(k) xn-1(k)-a2(k) xn-2(k)+en(k),


where en(k) is a Gaussian white noise excitation. The index k references the mode. This can be proved using the filter representation of the process and converting the analog filter to a discrete filter using the backward approximation. Various statistical justifications of this model can be found for example in [Priestley 1994].

It is worth noting that the width of the analysed frequency bands corresponds to an innovation rate of 2 days. The hypothesis of temporal independence of the excitation signal en(k) is justified by statistical analysis of the mode power already mentioned in Sect. 2 and in particular by the absence of correlation found by Foglizzo et al.

Finally, we assume that the measured signal yn is the sum of L modes:  
y_n=\sum_{k=1}^L x_n^{(k)}+w_n, n=0,~\ldots,~N-1,\end{displaymath} (3)
and a Gaussian white noise with variance $\sigma^2$. We will assume herein that $\sigma^2$ is known. The set of measurements will be referred ${\cal Y}_N=\{y_0,~\ldots,~y_{N-1}\}$.

In the model (2,3) the correlation between the different modes excitations is introduced by the covariance of the input terms. Let $\vec{E}_n=(e_n^1,~\ldots e_n^L)^t$ be the vector of excitation and Q its covariance: $\mbox{$E$}[\vec{E}_n \vec{E}_n^t] = Q$. The correlation coefficient between excitation of mode i and j is defined by $r_{ij}=Q_{ij}/\sqrt{Q_{ii}Q_{jj}}$.

The problem consists in the estimation of the coefficients of the autoregressive process, denoted $\theta$, and the covariance matrix Q.

3.2 Position of the problem

An immediate solution is to resort to maximum likelihood estimation. The process yn is Gaussian with covariance matrix $R_{y,N}(\theta,Q)$. Consequently, the estimated parameters are the global maximisers of the log-likelihood function or equivalently, after dropping the constant terms, of the function:  
{\cal L}({\cal Y}_N;\theta,Q)=-\log \vert R_{y,N}(\theta,Q)\vert-Y_NR_{y,N}(\theta,Q)^{-1}Y_N^t,\end{displaymath} (4)
with $Y_N=(y_0,~\ldots,~y_{N-1})$. Unfortunately, the maximisation of (4) by an optimisation algorithm is not tractable because:

A suboptimal solution is to minimise a distance between a M order matrix $R_{y,M}(\theta,Q)$ and its estimated version $\hat{R}_{y,M}$,computed from the yn. This solution significantly reduces the computational cost. If the cost function is defined as:  
{\cal C}(\theta,Q)=\mathrm{trace}(R_{y,M}(\theta,Q)-\hat{R}_{y,M})^2,\end{displaymath} (5)
it can be shown that its dependence on the elements of Q is quadratic, see appendix A. However, the problems related to the nonlinear dependence on $\theta$ and the constraint on Q remain.

A major difficulty in this estimation problem comes from the coupling between the two sets of parameter $\theta$ and Q. To address this, we propose a two stage approach that first estimates $\theta$ and then uses this result for the estimation of Q. Besides the different nature of $\theta$ and Q, this solution can be justified by the strong a priori that is available on the shape of the modes that can be incorporated in the first stage. Another argument is given by an analogy with the pure autoregressive case where the power of the excitation is generally computed from the autoregressive parameters and the data, [Stoica & Moses 1997].

3.3 Estimation of the dynamical parameters

  For this goal, we will derive a new statistical representation of yn, i.e. a signal $\bar{y}_n$ with the same autocorrelation sequence as yn. Consider the signal zn obtained by filtering yn:
H(z)=\prod_{q=1}^L (1+a_1^{(q)}z^{-1}+a_2^{(q)} z^{-2})=\sum_{q=0}^{2L}c_qz^{-q}.\end{displaymath} (6)
Hypothesis (2,3) implies that zn is the sum of each mode filtered by H(z) and can be expressed:
z_n&=&\sum_{k=1}^{L} H(z)x_n^{(k)} + H(z)w_n \\ &=&\sum_{k=1}^{...
 ..._{q=0}^{2(L-1)} b_q^{(k)} e_{n-q}^{(k)}+\sum_{q=1}^{2L}c_qw_{n-q},\end{eqnarray} (7)
\sum_{q=0}^{2(L-1)} b_q^{(k)} z^{-k}=\prod_{q=1,q\not =k}^L (1+a_1^{(q)}z^{-1}+a_2^{(q)} z^{-2}).\end{displaymath} (9)
Equation (8) implies $\mbox{$E$}z_nz_{n+l}=0$ for |l|>2L. Then zn has a moving average representation of order 2L, [Stoica & Moses 1997, p. 102]:
\bar{z}_n=A(z) \bar{e}_n,\quad A(z)=\sum_{k=1}^{2L} b_k z^{-k}.\end{displaymath} (10)
Consequently yn has an autoregressive moving average representation where the coefficients of the autoregressive part equal the coefficients of H(z):  
\bar{y}_n=\frac{A(z)}{H(z)} \bar{e}_n.\end{displaymath} (11)
This result suggests the following algorithm for the estimation of the parameter vector $\theta$:
Apply to yn an ARMA estimation algorithm in order to estimate its autoregressive part, H(z).
Compute the filters 1+a1(k)z-1+a2(k) z-2 combining the L pairs of H(z) conjugate roots.

We propose to estimate H(z) using the subspace parameter algorithm described in [Stoica & Moses 1997, p. 115]. This method is based on the singular value decomposition of the estimated Block-Hankel matrix of covariances $\hat{C}_{y,M}$ whose element (i,j) are the estimated autocorrelations yn at lag |i+j-1|. Let $\hat{\Sigma}$ the diagonal matrix made from the 2L principal singular values of $\hat{C}_{y,M}$ and $\hat{U}$ the matrix containing the associated 2L left singular vectors. Let $\overline{\Gamma}$ and $\underline{\Gamma}$ be the matrices made from the first and last, M-1 rows of $\hat{U}\hat{\Sigma}^{1/2}$. The roots of H(z) are given by the eigenvalues of matrix $\Phi$, the least square solution of the system $\overline{\Gamma} \Phi = \underline{\Gamma}$.

3.4 Estimation of the excitation covariance

The second step of the method consists in replacing $\theta$ by its estimated value $\hat{\theta}$ and estimating the covariance of the vector $\vec{E}_n$. The derivation of an efficient algorithm for this requires a signal model that explicitly involves this vector process. A solution is to use a state space representation of yn.

It can be easily verified that, up to a shift in the excitation, each mode can be represented by the following state space representation:

\vec{X}_{n+1}^{(k)}=F^{(k)} \vec{X}_n+G^{(k)} \vec{E}_n^{(k)},\quad x_n^{(k)}=H^{(k)^t} \vec{X}_n.\end{displaymath} (12)

0 & -a_2^{(k)} \\  1 & -...
{c} 0 \\  1\end{array}\right).\end{displaymath}

Consequently, a state space representation of yn is obtained by the concatenation of the previous models:

\vec{X}_{n+1}=F \vec{X}_n+G\vec{E}_n,\quad y_n=H^t \vec{X}_n+w_n,\end{displaymath} (13)
where F is a $2L\times 2L$ block diagonal matrix with the matrices F(k) on the diagonal. H is the $2L \times 1$ vector obtained by concatenation of the H(k), and G is the $2L \times L$ matrix having as non-zero elements G(2k,2)=1, $k=1,~\ldots,~L$. The unobserved set of states will be referred ${\cal X}_N=\{\vec{X}_0,~\ldots,~\vec{X}_{N-1}\}$.

It is worth noting that the model (13), contrary to the model (2,3), allows a simple expression of the power spectrum density (PSD), [Anderson & Moore 1979, p. 75]:

\lefteqn{S_y(\omega)=}\\ && [H^t(zI-F)^{-1}GQ\nonumber G^t(z^{-1}I-F^t)^{-1}H+\sigma^2]_{z=e^{j\omega}}.\end{eqnarray} (14)

As suggested in Sect. 3.1, we intend to estimate Q as the global maximiser of ${\cal L}({\cal
Y}_N,~\hat{\theta},~Q)$. The dependence on $\hat{\theta}$ will be dropped in the the following. The observation equation of model (13) shows that there exist a many-to-one mapping between ${\cal X}_N$ and ${\cal Y}_N$. This suggests maximizing the likelihood by an iterative technique: the expectation maximisation algorithm (EM). This method is direct maximising the likelihood of ${\cal Y}_N$, the "incomplete data'' making an essential use of ${\cal X}_N$, the "complete data''. The reference paper on EM is Dempster et al. (1977) Moon (1996) has presented a tutorial on the EM algorithm including extensive references.

This algorithm has been recently applied in Deng and Shen (1997) for the estimation of all the parameters of a linear dynamic system. We will derive here the method for the case where the only unknown is Q.

The E step is the computation of the conditional expectation of ${\cal
L}({\cal X}_N;Q)$ given ${\cal Y}_N$ and the current estimate of Q, noted $\hat{Q}^{[k]}$. For the M step, let $\hat{Q}^{[k+1]}$ be that value of Q that maximizes the previous conditional expectation.

In our case, the log likelihood of ${\cal X}_N$ is calculated using the Markov property of (12), [Anderson & Moore 1979]. Dropping all the terms that are not functions of Q, we have:

{\cal L}({\cal X}_N;Q)=-\frac{N-1}{2}\log \vert Q\vert -\frac{1}{2}


Taking the conditional expectation of this equality and using the differentiation rules ${\rm d} \log \vert Q\vert/{\rm d}Q=Q^{-1}$ and ${\rm d} \vec{U}^tQ^{-1}\vec{U}/
{\rm d}Q=-Q^{-1}\vec{U}\vec{U}^tQ^{-1}$gives the following recursion for $\hat{Q}^{[k]}$: 
\mbox{$E$}[\vec{V}_k\vec{V}_k^t/{\cal Y}_N,\hat{Q}^{[k]}] G.\end{displaymath} (15)
Expansion of the inner term shows that each iteration requires the computation of terms $\mbox{$E$}[\vec{X}_k\vec{X}_k^t/{\cal
Y}_N,\hat{Q}^{[k]}]$ and $\mbox{$E$}[\vec{X}_k\vec{X}_{k-1}^t/{\cal
Y}_N,\hat{Q}^{[k]}]$. If $\hat{\vec{X}}_{k/n}$ is the conditional expectation of the state with respect to the observation and $\hat{Q}^{[k]}$, Pk/N its variance and Pk,k-1/N the cross covariance between $\hat{\vec{X}}_{k/n}$ and $\hat{\vec{X}}_{k-1/n}$, we have:

\mbox{$E$}[\vec{X}_k\vec{X}_k^t/{\cal Y}_N,\hat{Q}^{[k]}]=P_{k/N}+\hat{\vec{X}}_{k/N}\hat{\vec{X}}_{k/N}^t, \end{displaymath}

\mbox{$E$}[\vec{X}_k\vec{X}_{k-1}^t/{\cal Y}_N,\hat{Q}^{[k]}]=P_{k,k-1/N}+\hat{\vec{X}}_{k/N}\hat{\vec{X}}_{k-1/N}^t. \end{displaymath}

These quantities can then be computed, as in Deng & Shen (1997), using a fixed interval smoothing algorithm, [Anderson & Moore 1979] whose equations are summarized in Appendix B.

3.5 Remarks on the algorithm

next previous
Up: Parametric representation of helioseismic

Copyright The European Southern Observatory (ESO)