2. Basic equations

Let x_k be values of the signal obtained at times t_k, k=1...N. In the "local" (or "running") approximations it is usually suggested that the data are fitted by a function which depends not only on the moment t, but on the limits of the interval of fitting. Examples of such function fitting of a test signal by various methods are shown in Fig. 1 (click here). However, the resulting function is expected to be dependent only on one argument. Thus it is generally chosen so that the smoothing ("computed") value at the moment t₀ is equal to a value of the smoothing function at t=t₀:

(cf. Whittaker & Robinson 1928). In this case remains a free parameter which determines statistical and spectral properties of the function for a fixed set of data . Here we assumed that the data are renumerated according to the trial argument interval from to . Obviously, such numbering is dependent both on the "mean argument" t₀ and on the "filter half-width".

Figure 1: Approximations of the model discrete "signal" x_k=t_k³+t_k² defined at times t_k=k, by using 4 tested fits for t₀=10 and ("wm", "wp") or ("um", "up"). Such difference in leads to equal number n=19 of observations with non-zero weights. A smoothing value of at t=t₀ corresponds to an adopted value

In the most often case of the linear fits, the function may be expressed as

where the coefficients may be determined e.g. by minimizing a weighted sum of the residuals

The weights w_k are generally characteristic of the accuracy of the measurements x_k and are equal to , where is an "unit weight error", if p_k=1 for the data used for the fit (cf. Whittaker & Robinson 1928). The parameter is a scale coefficient which may be set to arbitrary positive value. It does not affect the smoothing function and its statistical characteristics. The "additional" weights were used in Paper I to make the smoothing function and its first derivative continuous.

The following concrete functions were used:

("unweighted" fits), and

("weighted" fits). As the base functions, we have used the polynomials:

The minimum of the function for the fixed data corresponds to a system of "normal" equations:

where

Introducing a vector of values

one may write

In our designations, is a matrix, inverse of , and

This means that coefficients and basic functions have "interchanged their places": are now functions of t₀ and , whereas values of the basic functions are constant.

Introducing a vector similar to (9), one may write

This vector is also a function of t₀, . Each k-th component of this vector may be interpreted as a dependence of the calculated value smoothing the unit value x_k=1, whereas all other signal values are equal to zero. For N evenly distributed observations (i=1...N) with a step , the function h of 3 variables becomes dependent on 2 variables only. In this case, one may write a convolution - type expression

Here k' is a number corresponding to t_k'=t₀ in each interval of the local approximation. This equation is valid for i=k'...N-n+k'. For the "borders" (i=1...k'-1,N-n+k'+1...N) one has to redetermine the vector . In Paper I we determined values of h for the illustrative 9-point "wp" fits.

In this paper we prefer to express all fit functions X (coefficients, derivatives etc.) in terms of the "projective" vectors h[X,k], because this allows one to estimate the accuracy and possible correlations between the parameters. If and are deviations of the functions X and Y caused by deviations of the observations , then
eqnarray336
Mathematical expectation of the left side of this equation may be calculated, if the correlation matrix or its mathematical expectation is known.

For uncorrelated deviations , , where is a Kronecker symbol, and Eq. (14) may be rewritten as

In the particular case Y=X one obtains variance of X: For the coefficients one may obtain relation

where

and

One may note that for p_k=p= const, the matrices For the unweighted fits one usually suggests p=1, thus This last result is usual for least squares approximations (cf. Whittaker & Robinson 1928; Anderson 1958).

Following Paper I, one may formally separate "true" (index "t") values of the signal and the deviations of the real observations from them . The values are often believed to be uncorrelated with each other and the "true" values, have a zero mathematical expectation and a variance . Usually "true" values are unknown (except models with known "signal" and "noise"), but may have systematic deviations from the corresponding fit which may be characterized by a parameter The weighted sum (3) of the squares of the residuals is
equation390
where

The second summand in the right part of Eq. (19) may be called as corresponds to the deviations. Equation (19) allows to estimate of which is needed for accuracy determinations. One may note that, for constant weights (4), the expression in brackets in Eq. (19) is equal to (n-m-1) for all non-degenerate systems of basic functions . Usually the fits are chosen so that is negligible as compared with i.e. it is suggested that the systematic deviation of the fit from the true signal is much less than its statistical error. The values of and the estimate of are dependent on t₀ and

The variance of the smoothed value at argument t₀ is
eqnarray412
Here is the deviation of the smoothed value from the true one .

If the argument t₀ coincides with t_k of the k-th observation, then one may transform Eq. (23) of Paper I into
eqnarray430
For polynomial fits and for the weights (4) and (5) p_k=1, thus . For "constant" weights (4), taking into account that in this case , one may obtain even more simple expression . Making summation of the left and right parts of Eqs. (22, 19) for all observations (or only part of them), one may also estimate neglecting systematic deviations of the fit from the true signal as compared with the statistical error of the signal value. These values we will mark as and Another characteristic value of the variance may be defined as
equation452
For normally distributed uncorrelated signal the values and must be very close as they characterize the same quantity - the unbiased estimate of the unit weight variance. The parameter is the rms deviation of the signal from the fit, its mathematical expectation depends on and is biased.

One may note that general expressions for the smoothed value and its accuracy may cause problems, if the number of points n₁ in the subinterval is not sufficient. If n₁=m+1 and all the arguments t_k are different then one may obtain the fit interpolating all the values. If number of different arguments is smaller than m+1, the system of normal equations is degenerate and no fit of order m is available. In this case one may decrease m (what changes statistical and spectral properties of the fit) or not to use the fit at this data point. We prefer the second way when computing and

Computation of the smoothed values at the moments of observations is carried out most often to compute estimates of and to provide time series analysis of the residuals of the signal from the fit. Another application is to compute the fit at arbitrary argument t₀. For this case we propose to use the following restrictions: a) the number of the data points inside the interval must exceed m+1 (as was mentioned above); b) the numbers of the data points with t_k<t₀ (j₁) and t_k>t₀ (j₂) must both be nonzero; c) the number of the data points with (j₃) must exceed some limiting value (practically and d) the accuracy estimate of the smoothed value must not exceed some limiting value, e.g. or manually inserted one; e) the value must lie within the interval where one may recommend to use from 0 to 0.1. These restrictions (some of them may be not used) allow to obtain the fit only at arguments t₀ where it has sense, because in other case one may formally obtain values extrapolating the data at the edge(s) of the subinterval and apparent waves which are not statistically significant.

This may be more simple, if the observations are evenly sampled, and the coefficients h[x,k] are the same for all intervals (except edges), as well as the matrices , , .

Generally one may introduce 2 scale factors Then but the parameters do not depend on and thus they may be set to any nonzero value. Practically one may choose and (i.e. for unequal weights and w_k=1 for equal weights. It is important to note that generally the parameter itself does not correspond to the characteristics accuracy of the observations. The accuracy of the fit is defined in a more complex way by Eq. (21).

Foster (1996c) proposed to introduce the parameter

This quantity is scale-invariant, as does not depend on parameters and It has physical sense of the variance of the parameter

which coincides with a weighted mean. Imposing the normalization

one may define the "local" ensemble variance

where n_*=n₂-n₁+1 is the number of the data points (from n₁ to n₂) in the trial interval In previous expressions the sums from 1 to n and from n₁ to n₂ were equal as they contained the additional weight p_k which is equal to zero for k outside the interval [n₁,n₂]. The use of points in a local interval is correct from the statistical point of view but is not suitable to use the ensemble variance which vary with t₀. Thus one may introduce the "global" ensemble variance which is defined for the whole data set and does not depend on the interval :

Foster (1996c) proposes to use instead of as it has clear physical meaning. In other notation, this corresponds to From Eqs. (23) and (25) one may define the effective number of data points

With a normalization one will obtain in a form by Foster (1996c). One may note that for equal weights p_k=p, and for unequal weights.

In our notation, these expressions are meaningful for "um" and "wm", as x_m coincides with the smoothing value at t₀. For parabolic and other non-linear fits one may redefine the effective number of data points using instead of p_k in Eqs. (23, 24, 27):

Up: Method of running

Copyright by the European Southern Observatory (ESO)
This email address is being protected from spambots. You need JavaScript enabled to view it.