2 Combined smoothing

In general, two time series of observations are available - one with measured function values of a certain quantity whose analytic expression is unknown (Series 1) and the other with measured time derivatives of the same quantity (Series 2). Both series are given at unequally spaced epochs that need not be necessarily identical, and the individual observations are given with different precision, defined by their formal uncertainties (that can be easily converted into weights). We further assume that both series are independent one from the other, and also that the individual observations of the same series are de-correlated. The situation is illustrated in Fig. 1, where the upper plot shows the observed function values and their uncertainties (Series 1), the lower plot depicts Series 2 with the observed values of the first derivatives with their uncertainties.

$\begin{figure} \par\includegraphics[width=8.8cm,clip]{jv9736f1.eps} \end{figure}$

Figure 1: Illustration of combined smoothing. Full circles and vertical lines represent the observations and their uncertainties, full lines the smoothed curves (top - function values, bottom - first derivatives)

Full lines in both plots represent the smoothed values; the lower one is the first derivative of the upper one.

2.1 Theoretical basis

We are looking for a unique time series defined at all points with observations (no matter whether belonging to Series 1 or 2), lying on a sufficiently smooth curve, whose function values fit well to the values of Series 1 and whose time derivative fits well to the values of Series 2. Let

input Series 1 be given as the observed function values y_j'at instants x_j with weights p_j, $j=1,~2,\ldots n$ ;
input Series 2 as the observed values of first derivatives $\bar{y}_k'$ at instants $\bar{x}_k$ with weights $\bar{p}_k$ , $k=1,~2,\ldots \bar n$ ;
output series of the searched function values on the smoothed curve be y_i at the instants x_i containing all values of x_j and $\bar{x}_k$ , $i=1,~2, \ldots N$ , where $N\le n+\bar n$ .

In further considerations we shall assume that both input series are defined at all N points x_i, and that the respective weights at the points with no observations are equal to zero.

We define three quantities:

1.

Smoothness of the curve (identical with the definition of the original smoothing):

$\begin{displaymath}S = \frac{1}{x_N-x_1} \int_{x_1}^{x_N} \varphi^{\prime \prime \prime 2}(x) {\rm d}x\,. \end{displaymath}$

Analytical expression of the function $\varphi(x)$ is unknown so the value of its third derivative $\varphi'''(x)$ must be estimated numerically from the smoothed data y. The smoothed curve in the interval between two points [x_i+1; y_i+1] and [x_i+2; y_i+2] is defined as a third-order Lagrange polynomial L_i(x) running through the four adjacent points i, i+1, i+2, i+3, i.e.

$\begin{eqnarray*}L_i(x) &=& \sum^3_{k=0}{\left( \prod^3_{\stackrel{\scriptstyl... ...}} \frac{(x-x_{i+j})}{(x_{i+k}-x_{i+j})} \right)y_{i+k} }. \\ \end{eqnarray*}$

Its third derivative is then equal to

$\begin{eqnarray*}L'''_i(x) &=& \sum^3_{k=0}{\left(6 \prod^3_{\stackrel{\script... ...}{(j\ne k)}} \frac{1}{(x_{i+k}-x_{i+j})} \right)y_{i+k} }. \\ \end{eqnarray*}$

The third derivatives between each pair of points are given as constants, thus the integral is replaced by a summation:

S	=	$\displaystyle \frac{1}{x_N - x_1}\sum_{i=1}^{N-3}\int_{x_{i+1}}^ {x_{i+2}}{L'''_i}^2(x){\rm d}x$	(1)
	=	$\displaystyle \sum_{i=1}^{N-3}(a_iy_i+b_iy_{i+1}+c_iy_{i+2}+d_iy_{i+3})^2$

where

$\begin{eqnarray*}a_i &=& \frac{6\sqrt{(x_{i+2}-x_{i+1})/(x_N - x_1)}}{(x_{i }-x_... ... x_1)}}{(x_{i+3}-x_{i })(x_{i+3}-x_{i+1})(x_{i+3}-x_{i+2})}\cdot \end{eqnarray*}$

This definition of the smoothness implies that the ideally smooth function (i.e., the one for which S=0) is a quadratic function (and, consequently, its first derivative a linear function) of time. Therefore, the combined estimator proposed in the present paper is different from the original smoothing applied directly to the observed derivatives alone;

2.

Fidelity of the smoothed curve to the observed values (almost identical with the definition of the original smoothing):

$\begin{displaymath}% F = \frac{1}{n} \sum_{i=1}^N p_i(y_i'- y_i)^2; \end{displaymath}$

(2)

3.

Fidelity of the smoothed curve to the observed first derivatives (new):

$\begin{displaymath}% \bar{F} = \frac{1}{\bar n} \sum_{i=1}^N \bar{p}_i (\bar{y}_i'-\bar{y}_i)^2\,, \end{displaymath}$

(3)

in which the smoothed values of first derivatives $\bar{y}_i$ can be expressed in terms of the smoothed function values y_i.

Using the first derivatives of the same Lagrange polynomials L_i(x) that are used to define S we can express L'_i(x), for x lying anywhere in the interval limited by the four adjacent points i, i+1, i+2, i+3, as

$\begin{displaymath}L'_i(x)=A_i(x) y_i+B_i(x)y_{i+1}+C_i(x) y_{i+2}+D_i(x) y_{i+3} \,, \end{displaymath}$

in which

$\begin{eqnarray*}A_i(x)&=& \frac{\sum^2_{\stackrel{\scriptstyle l=0}{(l\ne 0)}}... ..._{i+m})} {(x_{i+3}-x_i)(x_{i+3}-x_{i+1})(x_{i+3}-x_{i+2})}\cdot \end{eqnarray*}$

To express the smoothed values of first derivatives $\bar{y}_i$ in Eq. (3) in terms of the smoothed function values y_i, we have a certain freedom of choice of the four points surrounding each of the epochs x_i. We adopt, for the sake of time symmetry, the approach leading to exactly the same solution if the time axis is reversed; the following equations formulate the constraints assuring that the values $\bar{y}_i$ lie on the curve that is the time derivative of the one represented by y_i:

1.

For calculating $\bar y_1$ , we use the epochs x₁, x₂, x₃, x₄; it means that we have

$\displaystyle % \bar a_1$	=	$\displaystyle A_1(x_1),~\bar b_1=B_1(x_1),$
$\displaystyle \bar c_1$	=	$\displaystyle C_1(x_1),~\bar d_1=D_1(x_1),$
$\displaystyle \bar y_1$	=	$\displaystyle \bar a_1 y_1+\bar b_1 y_2+\bar c_1 y_3+\bar d_1 y_4;$	(4)

2.

For calculating $\bar{y}_i$ in the first half of the series, i.e. for $i= 2,~3,\ldots N/2$ , we use the epochs x_i-1, x_i, x_i+1, x_i+2; it means that we have

$\displaystyle % \bar a_i$	=	$\displaystyle A_{i-1}(x_i),~\bar b_i=B_{i-1}(x_i),$
$\displaystyle \bar c_i$	=	$\displaystyle C_{i-1}(x_i),~\bar d_i=D_{i-1}(x_i),$
$\displaystyle \bar y_i$	=	$\displaystyle \bar a_i y_{i-1}+\bar b_i y_i+\bar c_i y_{i+1}+ \bar d_i y_{i+2};$	(5)

3.

For calculating $\bar{y}_i$ in the second half of the series, i.e. for $i=N/2+1,~N/2+2,\ldots N-1$ , we use the epochs x_i-2, x_i-1, x_i, x_i+1; it means that we have

$\displaystyle % \bar a_i$	=	$\displaystyle A_{i-2}(x_i),~\bar b_i=B_{i-2}(x_i),$
$\displaystyle \bar c_i$	=	$\displaystyle C_{i-2}(x_i),~\bar d_i=D_{i-2}(x_i),$
$\displaystyle \bar y_i$	=	$\displaystyle \bar a_i y_{i-2}+\bar b_i y_{i-1}+\bar c_i y_i+ \bar d_i y_{i+1};$	(6)

4.

For calculating $\bar y_N$ , we use the epochs x_N-3, x_N-2, x_N-1, x_N; it means that we have

$\displaystyle % \bar a_N$	=	$\displaystyle A_{N-3}(x_N),~\bar b_N=B_{N-3}(x_N),$
$\displaystyle \bar c_N$	=	$\displaystyle C_{N-3}(x_N),~\bar d_N=D_{N-3}(x_N),$
$\displaystyle \bar y_N$	=	$\displaystyle \bar a_N y_{N-3}+\bar b_N y_{N-2}+\bar c_N y_{N-1}+ \bar d_N y_N.$	(7)

The expression (3) then can be rewritten as

$\textstyle \bar F$	$\displaystyle = \frac{1}{\bar n} \left[\bar p_1\left( \bar y_1'-\bar a_1 y_1 -\bar b_1 y_2-\bar c_1 y_3-\bar d_1 y_4\right)^2\right.$	(8)
+	$\displaystyle \sum_{i=2}^{N/2} \bar p_i\left(\bar y_i'-\bar a_i y_{i-1} -\bar b_i y_i-\bar c_i y_{i+1}-\bar d_i y_{i+2}\right)^2$
+	$\displaystyle \sum_{i=N/2+1}^{N-1} \bar p_i\left(\bar y_i'-\bar a_i y_{i-2} -\bar b_i y_{i-1}-\bar c_i y_i-\bar d_i y_{i+1}\right)^2$
+	$\displaystyle \left. \bar p_N\left(\bar y_N'-\bar a_N y_{N-3} -\bar b_N y_{N-2}-\bar c_N y_{N-1}-\bar d_N y_N\right)^2 \right].$

We are looking for the smoothed values y_i (and the first derivatives $\bar{y}_i$ corresponding to them) as a compromise among three different conditions, each one applied separately leading to a different solution:

1.: The curve should be smooth (i.e., minimizing S);
2.: The smoothed values should be close to the observed values of the function (i.e., minimizing F);
3.: The first derivatives of the smoothed curve should be close to the observed values of first derivatives (i.e., minimizing $\bar{F}$ ).

The adjustment is then done by minimizing a combination of the constraints above, i.e. the expression

$\displaystyle % Q = S + \varepsilon F + \bar{\varepsilon} \bar{F}$	=	$\displaystyle {\rm min}$	(9)
$\displaystyle \Rightarrow \frac{\partial Q}{\partial y_i}$	=	$\displaystyle 0 ~~{\rm for} ~i=1, ~2, \ldots N\,,$

in which $\varepsilon \ge 0$ , $\bar\varepsilon \ge 0$ , obviously leading to the system of N linear equations with unknowns y_i. The degree of compromise among the three conditions is achieved by choosing the values of two parameters: the coefficients of smoothing $\varepsilon$ and $\bar{\varepsilon}$ that have respectively the dimensions [time^-6] and [time^-4]. The larger are the values, the larger weight we assign to the fidelity to the observed function values or their first derivatives, and the closer the "smoothed'' values are to the observations. To form these equations, we need to express the partial derivatives of S, F and $\bar{F}$ with respect to y_i, in terms of the unknowns y_i and $\bar{y}_i$ .

The partial derivative of S with respect to y_i can be simply expressed as

$\begin{displaymath}\frac{\partial{S}}{\partial{y_i}} = 2(a_i\Delta_i + b_{i-1}\Delta_{i-1} + c_{i-2}\Delta_{i-2} + d_{i-3}\Delta_{i-3})\,, \end{displaymath}$

in which

$\begin{displaymath}\Delta_i = a_iy_i + b_iy_{i+1} + c_iy_{i+2} + d_iy_{i+3} \end{displaymath}$

and

$\begin{displaymath}a_i=b_i=c_i=d_i=0\quad {\rm for}\quad i\le 0~{\rm or}~i\ge N-2\,. \end{displaymath}$

The partial derivative of function F is even simpler:

$\begin{displaymath}\frac{\partial{F}}{\partial{y_i}} = \frac{2p_i(y_i - y'_i)}{n}\,. \end{displaymath}$

More complicated is to express the partial derivatives of $\bar{F}$ ; they are different for different i, due to different choice of $\bar{F}$ in Eq. (9). For sake of shortness, we are using $\bar{y}_i$ in the subsequent equations instead of y_i; their relations are defined by Eqs. (4-7) above. The first four partial derivatives are

$\begin{eqnarray*}\frac{\partial{\bar F}}{\partial{y_1}}=\frac{2}{\bar n} \left[... ...r y_1-\bar y_1') +\bar p_2\bar a_2(\bar y_2-\bar y_2') \right], \end{eqnarray*}$

$\begin{eqnarray*}\frac{\partial{\bar F}}{\partial{y_2}}=\frac{2}{\bar n} \left[... ...ar y_2-\bar y_2')+ \bar p_3\bar a_3(\bar y_3-\bar y_3')\right], \end{eqnarray*}$

$\begin{eqnarray*}\frac{\partial{\bar F}}{\partial{y_3}}&=&\frac{2}{\bar n} \lef... ...right.\\ &+&\left.\bar p_4\bar a_4(\bar y_4-\bar y_4')\right], \end{eqnarray*}$

$\begin{eqnarray*}\frac{\partial{\bar F}}{\partial{y_4}}&=&\frac{2}{\bar n} \lef... ...ar y_4-\bar y_4')+ \bar p_5\bar a_5(\bar y_5-\bar y_5')\right]. \end{eqnarray*}$

Then follows a group in the first half of the series, for $i=5,~6, \ldots~N/2-2$

$\begin{eqnarray*}\frac{\partial{\bar F}}{\partial{y_i}}&=&\frac{2}{\bar n} \lef... ...eft.\bar p_{i+1}\bar a_{i+1}(\bar y_{i+1}-\bar y_{i+1}')\right], \end{eqnarray*}$

followed by four derivatives around the center of the series

$\begin{eqnarray*}\lefteqn{\frac{\partial{\bar F}}{\partial{y_{\frac{N}{2}-1}}}=\... ...{2}+1}(\bar y_{\frac{N}{2}+1} -\bar y_{\frac{N}{2}+1}')\right], \end{eqnarray*}$

$\begin{eqnarray*}\lefteqn{\frac{\partial{\bar F}}{\partial{y_{\frac{N}{2}}}}=\fr... ...{2}+2}(\bar y_{\frac{N}{2}+2} -\bar y_{\frac{N}{2}+2}')\right], \end{eqnarray*}$

$\begin{eqnarray*}\lefteqn{\frac{\partial{\bar F}}{\partial{y_{\frac{N}{2}+1}}}=\... ...{2}+3}(\bar y_{\frac{N}{2}+3} -\bar y_{\frac{N}{2}+3}')\right], \end{eqnarray*}$

$\begin{eqnarray*}\lefteqn{\frac{\partial{\bar F}}{\partial{y_{\frac{N}{2}+2}}}=\... ...{2}+4}(\bar y_{\frac{N}{2}+4} -\bar y_{\frac{N}{2}+4}')\right]. \end{eqnarray*}$

In the second half of the series, i.e. for i=N/2+3, N/2+4, $\ldots~N-4$ , we have

$\begin{eqnarray*}\lefteqn{\frac{\partial{\bar F}}{\partial{y_i}}= \frac{2}{\bar... ...)+ \bar p_{i+2}\bar a_{i+2}(\bar y_{i+2}-\bar y_{i+2}')\right], \end{eqnarray*}$

and the last four partial derivatives read

$\begin{eqnarray*}\lefteqn{\frac{\partial{\bar F}}{\partial{y_{N-3}}}= \frac{2}{... ...1}-\bar y_{N-1}')+ \bar p_N\bar a_N(\bar y_N-\bar y_N')\right], \end{eqnarray*}$

$\begin{eqnarray*}\lefteqn{\frac{\partial{\bar F}}{\partial{y_{N-2}}}=\frac{2}{\b... ...1}-\bar y_{N-1}')+ \bar p_N\bar b_N(\bar y_N-\bar y_N')\right], \end{eqnarray*}$

$\begin{eqnarray*}\lefteqn{\frac{\partial{\bar F}}{\partial{y_{N-1}}}=\frac{2}{\b... ...1}-\bar y_{N-1}') +\bar p_N\bar c_N(\bar y_N-\bar y_N')\right], \end{eqnarray*}$

$\begin{eqnarray*}\lefteqn{\frac{\partial{\bar F}}{\partial{y_N}}=\frac{2}{\bar n... ...ight.}\\ &&+\left.\bar p_N\bar d_N(\bar y_N-\bar y_N')\right]. \end{eqnarray*}$

2.2 Algorithm to compose the equations

From the preceding subsection it follows that the matrix of the system of linear Eqs. (10) is sparse, with only seven diagonals centered around main diagonal containing non-zero elements. The matrix is symmetric and singular for $\varepsilon=0$ (the sum of elements in the i-th row is equal to $\varepsilon p_i/n$ ). To calculate the elements we use the algorithm described below.

1.

We successively put into each i-th element on the main diagonal of the matrix the values $\varepsilon p_i/n$ , and we put into the right-hand-sides of the equations the values ${\varepsilon p_i y_i'}/{n}$ (this part corresponds to constraint F);

2.

For $i=1,~2, \ldots N-3$ we add successively to each i-th $4\times 4$ submatrix centered around main diagonal the matrix

$\begin{displaymath}\left ( \begin{array}{cccc} a_i^2 & a_i b_i & a_i c_i & a_i ... ..._i\\ d_i a_i & d_i b_i & d_i c_i & d_i^2 \end{array} \right)\end{displaymath}$

(this part corresponds to constraint S).

3.

For $i=1,~2, \ldots N$ we successively calculate the matrix

$\begin{displaymath}\frac{\bar \varepsilon \bar p_i}{\bar n} \left (\begin{array... ... d_i\bar b_i &\bar d_i\bar c_i &\bar d_i^2 \end{array} \right)\end{displaymath}$

and the column vector

$\begin{displaymath}\frac{\bar \varepsilon\bar p_i}{\bar n} \left( \begin{array}... ...bar b_i \bar y_i'\\ \bar d_i \bar y_i' \end{array} \right) \end{displaymath}$

from $\bar a_i$ , $\bar b_i$ , $\bar c_i$ , $\bar d_i$ given by one of the Eqs. (4), (5), (6) or (7), depending on i.

If i=1, we add them respectively to the first $4\times 4$ submatrix centered around the main diagonal (i.e., to the one with indices 1, 2, 3 and 4) and to the four corresponding right-hand-sides.

If $i=2,~3,~\ldots N/2$ , we add them respectively to the $4\times 4$ submatrices centered around the main diagonal with indices i-1, i, i+1 and i+2 and to the four corresponding right-hand-sides.

If $i=N/2+1,~N/2+2,~\ldots N-1$ , we add them respectively to the $4\times 4$ submatrices centered around the main diagonal with indices i-2, i-1, i and i+1 and to the four corresponding right-hand-sides.

Finally, if i=N, we add them to the last $4\times 4$ submatrix centered around the main diagonal (i.e., to the one with indices N-3, N-2, N-1 and N) and to the four corresponding right-hand-sides.

(this part corresponds to constraint $\bar{F}$ ).

If we wish to normalize the weights in both input series (i.e., if we require that the average weights in Series 1 and Series 2 are identically equal to unity), we simply replace $n, \bar n$ in the expressions above by $\sum_1^N p_i$ , $\sum_1^N \bar p_i$ , respectively. The software that we use in all subsequent tests and examples follows namely this approach.

2.3 Numerical solution

By solving the system of linear equations formed above we arrive at the smoothed function values y_i, referred to the instants x_i; to obtain the smoothed first derivative values $\bar{y}_i$ , the constraints (4), (5), (6) or (7) are then used.

To solve the system, we can use a standard Gaussian elimination method, as proposed in original smoothing (Vondrák 1969). This is a very simple and straightforward procedure in which we can easily make use of the important properties of the matrix; it is symmetric and sparse.

However, as we shall prove, the system is positive definite and we know that the symmetric factorization exists and moreover is stable to compute. Apart from the unknowns we sometimes need to estimate the uncertainties of the smoothed values or their functions. Therefore we follow a similar procedure as proposed by us for the case of observed values without first derivatives (Cepek & Vondrák 2000) enabling to calculate easily not only the unknowns but also their covariances.

In matrix notation we can rewrite the system of Eqs. (10) simply as

$\begin{displaymath}% \vec{A}\vec{y}=\vec{B}\left( \begin{array}{c} \vec{y'}\\ ... ...}}' \end{array}\right), \qquad \vec{B}=(\vec{B_1}\vec{B_2}), \end{displaymath}$

(10)

where submatrices $\vec{B_1}, \vec{B_2}$ have the form

$\begin{displaymath}\vec{B_1}=\frac{\varepsilon}{n} \mathrm{diag}\{p_1,\ldots,p_n\} \end{displaymath}$

and

$\begin{displaymath}\vec{B_2}=\frac{\bar\varepsilon}{\bar n}\left( \begin{array}... ..._n\bar c_n \\ 0 & &\bar p_n\bar d_n \\ \end{array}\right). \end{displaymath}$

Matrix A is a symmetric band matrix. The bandwidth of A is equal to only three

$\begin{displaymath}\beta(\vec{A})=\max \{\;\vert i-j\vert\ \vert\ a_{ij} \ne 0\;\}=3. \end{displaymath}$

We store only the symmetric part of the matrix A in N by 3+1 vectors.

Following the algorithm to compose the equations given in the Sect. 2.2 we can express the matrix as the summation

$\begin{displaymath}\vec A = \left(\begin{array}{ccc} \varepsilon p_1/n & ~ & 0 ... ...\\ ~ & \vec D_i & ~ \\ 0 & ~ &\ddots \end{array} \right), \end{displaymath}$

where the structure of coefficients in all diagonal blocks is

$\begin{displaymath}\vec D_i = \left(\begin{array}{cccc} aa & ab & ac & ad \\ ... ... \left(\begin{array}{cccc} a & b & c & d \end{array}\right). \end{displaymath}$

All diagonal elements $\varepsilon p_i/n$ are nonnegative and all diagonal blocks $\vec D_i$ are clearly positive semidefinite. Resulting matrix $\vec A$ is thus positive semidefinite as well.

Supposing that the matrix $\vec A$ is regular, it follows that the semidefinite symmetric matrix $\vec A$ is positive definite and we can use Cholesky factorization to solve the system (10).

We use the variant of Cholesky factorization

$\begin{displaymath}\vec{A} = \vec{U}^{\rm T}\vec{D}\vec{U} \end{displaymath}$

where D is diagonal matrix and U is upper triangular matrix with unit diagonal. The bandwidth $\beta(\vec{U})=3$ is the same as the bandwidth of A. With the unique Cholesky factorization of matrix A the solution of system of Eqs. (10) is obtained simply by solving two band triangular systems (forward and backward substitution)

$\begin{displaymath}\vec{U}^{\rm T}\vec{w} = \vec{B}\vec{y}', \qquad \vec{U}\vec{x} = \vec{D}^{-1}\vec{w}. \end{displaymath}$

We can estimate the standard deviation of smoothed values from (2) as

$\begin{displaymath}\sigma_0=\sqrt{\frac{\sum_{i=1}^N p_i(y_i-y'_i)^2}{n}}=\sqrt F. \end{displaymath}$

Smoothed values are linear combinations of observed values

$\begin{displaymath}\vec{y}=\vec{A}^{-1}\vec z=\vec{A}^{-1}\vec{B} \left(\begin{array}{c} \vec y' \\ \bar{\vec y}' \end{array}\right). \end{displaymath}$

Its covariance matrix $\vec{\Sigma}_{yy}$ is, according to the general principle of propagation of variances and covariances of linear functions

$\begin{displaymath}\vec\Sigma_{yy}\!\! =\!\! \vec A^{-1}\vec\Sigma_{zz}\vec A^{-... ...a_{\bar{y}'\bar{y}'} \end{array}\right)\vec B^{T}\vec A^{-1}. \end{displaymath}$

(11)

In the special case of smoothing observations without measured time derivatives, vector $\bar{\vec y}'$ and submatrix $\vec B_2$ are missing in the Eq. (10), inverse of $\vec{\Sigma}_{yy}$ is a band matrix and we can compute only selected subset band of $\vec{\Sigma}_{yy}$ as described in (Cepek & Vondrák 2000). With measured first derivatives we have to invert the full matrix $\vec A$ to express covariances of smoothed values $\vec y$ , but inverting symmetric positive matrix is an easy task and we can exploit the fact that variance-covariance matrix $\vec\Sigma_{zz}$ is a band matrix with bandwidth three.

2.4 Some properties of the filter, choice of coefficients of smoothing

As already stated above, the matrix of the system is singular for $\varepsilon=0$ (this is equivalent to a case when we have observations of first derivatives only); in such a case, there exists no unique solution of the problem unless we impose an additional constraint. This can be done, e.g., by requiring that the smoothed curve runs through one point, either observed or arbitrarily chosen. In this case, we can simply put 1 into the main diagonal of any row of the matrix (see step 1 of Sect. 2.2) and the chosen function value to the corresponding right-hand-side. From this also follows that we always need at least one observation of function value with non-zero weight (the observations of first derivatives alone are not sufficient).

On the other hand, there arises no problem at all if $\bar\varepsilon=0$ and $\varepsilon \ne 0$ (equivalent to a case with no observations of first derivatives); in this case, the problem reduces to original smoothing, and the observed first derivatives are simply ignored.

A really serious problem arises only if both $\varepsilon=0$ and $\bar\varepsilon=0$ . Then we have a singular matrix with the deficiency equal to three, and any quadratic parabola satisfies the system of Eqs. (10). Then we have a freedom of choosing three additional constraints (not linearly dependent); e.g., we can choose any three values (at different epochs) on the curve, we can do a quadratic regression to the observed function values, a combined regression to fit the parabola both to observed function and first derivative values, etc.

There is a close relation between the coefficients of smoothing and the transfer function T of the filter (i.e., the ratio between the amplitude of the smoothed curve and the observed amplitude of a periodic function with frequency f). The transfer function for the function values (identical with the original smoothing) was analytically expressed by Huang & Zhou (1981,1982), assuming the weights p_i equal to unity, as

$\begin{displaymath}% T=\frac{1}{1+\varepsilon^{-1}(2\pi f)^6}\cdot \end{displaymath}$

(12)

Similarly, we can express the transfer function $\bar{T}$ for the observed first derivatives in terms of $\bar{\varepsilon}$ and frequency f:

$\begin{displaymath}% \bar{T}=\frac{1}{1+\bar{\varepsilon}^{-1}(2\pi f)^4}\cdot \end{displaymath}$

(13)

Both transfer functions are graphically displayed in Figs. 2 and 3, where they are plotted against dimensionless arguments $\varepsilon P^6$ and $\varepsilon P^4$ (P=1/f is the period) in logarithmic scale.

$\begin{figure} \par\hspace*{0mm} \includegraphics[width=8.8cm,clip]{jv9736f2.eps} \end{figure}$

Figure 2: Transfer function T for the measured function values, plotted as function of $\varepsilon P^6$

$\begin{figure} \par\hspace*{0mm} \includegraphics[width=8.8cm,clip]{jv9736f3.eps} \end{figure}$

Figure 3: Transfer function $\bar{T}$ for the measured first derivatives, plotted as function of $\bar\varepsilon P^4$

$\begin{figure} \par\includegraphics[width=12cm,clip]{jv9736f4.eps} \end{figure}$

Figure 4: Combined smoothing of simulated data, consisting of three harmonic oscillations plus random noise. Grey triangles in subplots a) (function values y) and c) (values of first derivative $\bar y$ ) represent the input data, full, dashed and dotted lines in the same subplots depict the smoothed curves with stronger and stronger smoothing. Differences between the weakest smoothing (full lines in subplots a, c)) and original signal are displayed as crosses in subplots b) and d), in highly enlarged scale

$\begin{figure} \par\includegraphics[width=12cm,clip]{jv9736f5.eps} \end{figure}$

Figure 5: Combined smoothing of UT1R-TAI measured by VLBI and lodR measured by GPS, using $\varepsilon$ = 25 10² day^-6, $\bar{\varepsilon}$ = 4 10⁴ day^-4 (leading to dispersions equal to formal standard errors). Smoothed curves are shown as full lines and residuals (enlarged scale on the right) as grey crosses. Reduced values of UT1R-TAI+27.5+0.00183(MJD-48987) are displayed in top, lodR in bottom

It is necessary to say that these transfer functions are strictly valid only in cases when either only function values, or only first derivatives (plus one arbitrarily chosen function value, as mentioned above) are observed. When both types of observations are mixed in one solution, a transfer function lying somewhere between the two is reflected in the smoothed series, depending on the ratio of the numbers of both types of observation.

In principle, there are two possibilities of choosing the coefficients of smoothing $\varepsilon$ and $\bar{\varepsilon}$ , both requiring at least an approximate a priori knowledge of the observed process; we should have a realistic estimation of either the precision of the measurement, or the shortest period contained in the signal:

1.

Knowing a priori average values of standard errors in Series 1 and 2, we try to find such coefficients of smoothing that would yield the same estimated a posteriori standard errors from the dispersion of observations around the smoothed curve (these can be easily obtained as $\sqrt F,~\sqrt{\bar{F}}$ ). This approach requires the solution in several steps, by iterations. It often appears that the satisfactory solution is impossible to find, especially if the a priori standard errors are unrealistically small and both series of observed values and first derivatives are not mutually compatible on this level of accuracy (see next sections for tests with simulated data, practical examples and more detailed discussion). Then higher values than a priori standard errors must be used, or the following possibility of choosing coefficients of smoothing should be taken;

2.

Knowing in advance which frequencies we aim to suppress in both observed series, we use the transfer functions of Eqs. (12), (13) to calculate the corresponding values of $\varepsilon$ and $\bar{\varepsilon}$ , and use the method as a low pass filter. Here we can apply a very simple rule; denoting the period for which half of the amplitude is suppressed as P_0.5, i.e. corresponding to $T=\bar T = 0.5$ (shorter periods are suppressed more, longer periods less), we calculate both coefficients of smoothing as

$\begin{displaymath}% \varepsilon (P_{0.5})=\left(\frac{2\pi}{P_{0.5}}\right)^6, ... ...\bar\varepsilon (P_{0.5})=\left(\frac{2\pi}{P_{0.5}}\right)^4. \end{displaymath}$

(14)

Alternatively, slightly different formulas hold for calculating the coefficients of smoothing if we wish to pass 99% of the amplitude of a periodic process with period P_0.99(corresponding to $T=\bar T = 0.99$ ):

$\displaystyle % \varepsilon (P_{0.99})=99\left(\frac{2\pi}{P_{0.99}}\right)^6, \bar\varepsilon (P_{0.99})=99\left(\frac{2\pi}{P_{0.99}}\right)^4.$

This approach is especially suitable in cases when we require more heavily smoothed series (with only low frequencies retained) for special analyses, or if we apply the method twice, to retain only the frequencies in a specific frequency range as proposed for original smoothing in (Vondrák 1977).

2.5 Testing the method with simulated data

Before using the method with real observations, it is reasonable to test it with simulated data, with the expected results to be known beforehand. We selected simulated data series that are very similar to what can be encountered in reality. Namely we used the signal resembling UT1 series, consisting of three periodic terms - annual, semi-annual and fortnightly - of the form (in time seconds):

$\begin{displaymath}\begin{array}{ll} y=&-0.020\sin 2\pi t + 0.012\cos 2\pi t + ... ...pi t \\ &-0.007\cos 4\pi t + 0.0008\sin 52\pi t, \end{array}\end{displaymath}$

in which t=(x-50448)/365.2422 is the argument expressed in years. Its first derivative with respect to time argument x is then

$\begin{displaymath}\begin{array}{ll} \bar y=&\pi(-0.040\cos 2\pi t - 0.024\sin ... ...&+0.028\sin 4\pi t + 0.0416\cos 52\pi t)/365.2422. \end{array}\end{displaymath}$

We generated the series y, $\bar y$ at one-day spacing, covering the interval 400 days long, and added to both of them Gaussian random noise with rms dispersion equal to $10~\mu {\rm s}$ and $10~\mu {\rm s}$ /day, respectively. Both series (in which all weights were put equal to 1) were then subject to combined smoothing described above, using numerous pairs of coefficients of smoothing $\varepsilon$ , $\bar{\varepsilon}$ . The tests led to thefollowing findings:

In order to get the best approximation of the signal, it is recommendable to choose the values $\varepsilon (P_{0.5})$ , $\bar\varepsilon (P_{0.5})$ as given by Eqs. (14) for P_0.5 lying approximately between one third and one half of the shortest known period contained in the signal. Alternatively, it is also possible to calculate $\varepsilon (P_{0.99})$ , $\bar\varepsilon (P_{0.99})$ from Eqs. (15), in which we put P_0.99 equal to the shortest known period of the signal. Any values of $\varepsilon$ , $\bar{\varepsilon}$ lying in the vicinity
of these yield approximately the same acceptable results;
Substantially larger values of $\varepsilon$ , $\bar{\varepsilon}$ than given above lead to more "ragged'' curves in which the noise of the observations is less suppressed;
Using substantially smaller values of $\varepsilon$ , $\bar{\varepsilon}$ leads to smoother curves in which the short-periodic part of the signal is suppressed;
Using substantially smaller value of $\bar{\varepsilon}$ (with $\varepsilon$ kept according to the rule above) does not do much harm; it only applies lower weights to all observed first derivatives (extreme case $\bar\varepsilon=0$ neglects these observations completely). This variant can be used if we are sure that the quality of these observations is not very good;
Using substantially larger value of $\bar{\varepsilon}$ (keeping $\varepsilon$ as calculated above) leads generally to unsatisfactory results, especially if the epochs of both series do not coincide (see the first example with real data in Sect. 3 below).

An illustration of combining the simulated data with different values of $\varepsilon$ , $\bar{\varepsilon}$ is given in Fig. 4. The simulated observations of y (plot a) and $\bar y$ (plot c) are depicted as grey triangles. Three different sets of $\varepsilon$ , $\bar{\varepsilon}$ are used:

1.: $\varepsilon=\bar\varepsilon = 0$ , leading to a quadratic function of y, and a straight line of $\bar y$ , both being calculated as best fit to both data series. These results are displayed as dotted lines;
2.: $\varepsilon =3.5\,\, 10^{-7}$ day^-6, $\bar\varepsilon = 4.9\,\, 10^{-5}$ day^-4 (corresponding to P_0.5=75days), given as dashed lines. This estimator almost exactly follows the annual and semi-annual term and completely suppresses the fortnightly term;
3.: $\varepsilon =3.9$ day^-6, $\bar\varepsilon = 2.5$ day^-4 (corresponding to P_0.5=5 days), represented by full lines. It almost perfectly reproduces the complete signal. The differences of this smoothed curve from either "observed'' values or the signal are so small that they cannot be seen in the given scale. Therefore, the differences between the smoothed curves and the original signal are displayed in the same graph, highly enlarged (1000 and 100 times in plots b and d, respectively). The rms difference between the smoothed curves of y and $\bar y$ and the original signal is respectively $5.9~\mu {\rm s}$ and $4.3~\mu {\rm s}$ /day, which are equal approximately to only one half of the noise in the input data.

We also made a number of tests with the same simulated data in which the function values y, sampled only once per 7 days, were combined with $\bar y$ values at daily intervals. We also made tests with the epochs of y shifted with respect to $\bar y$ by small fractions of the day (0.1d and 0.01d). The results were almost equivalent to the previous ones, and we came to conclusion that the same rules of choosing the coefficients $\varepsilon$ , $\bar{\varepsilon}$ as described above can be applied even in these cases.

Up: Combined smoothing method and