3. Mathematical background of the linear state space model

In this section we briefly introduce the Linear State Space Model (LSSM). For a detailed discussion, see Honerkamp (1993) and Hamilton (1995). The LSSM is a generalization of the autoregressive (AR) model invented by Yule (1927) to model the variability of Wolf's sunspot numbers.

We follow Wold's decomposition theorem (Wold 1938; Priestley 1992; Fuller 1996) which states that any discrete stationary process can be expressed as the sum of two processes uncorrelated with one another, one purely deterministic (i.e. a process that can be forecasted exactly such as a strictly period oscillation) and one purely indeterministic. Further, the indeterministic component, which is essentially the stochastic part, can be written as a linear combination of an innovation process, which is a sequence of uncorrelated random variables.

A given discrete time series x(t) is considered as a sequence of correlated random variables. The AR model expresses the temporal correlations of the time series in terms of a linear function of its past values plus a noise term and is closely related to the differential equation describing the dynamics of the system. The fact that x(t) has a regression on its own past terms gives rise to the terminology "autoregressive process" (for detailed discussions see Scargle 1981; Priestley 1992). A time series is thus a realization of the stochastic process or, more precisely, the observation of a realization of the process during a finite time interval. The AR model expresses the temporal correlations in the process in terms of memory, in the sense that a filter (a_i) remembers, for a while at least, the previous values x(t-i). Thus the influence of a predecessor value decreases as time increases. This fading memory is expressed in the exponential decay of the AR autocorrelation function (see Eq. 10 (click here)). The AR processes variable x(t) remembers its own behavior at previous times, expressed in a linear relationship in terms of plus which stands for an uncorrelated (Gaussian) white noise process.

The number of terms p used for the regression of x(t) determine the order of the AR process, which is abbreviated to an AR[p] process. The parameter values a_i have to be restricted for the process to be stationary (Honerkamp 1993). For a first order process this means |a₁| < 1 , for a second order process: . Depending on the order p, the parameters a_i of the process represents damped oscillators, pure relaxators or their superpositions. For the first order process AR[1] the relaxation time of the system is determined from a₁ by:

In the case of a damped oscillator for an AR[2] process the parameters, the period T and the relaxation time respectively, are related by:

For a given time series the parameters a_i can be estimated e.g. by the Durbin-Levinson- or Burg-algorithm (Honerkamp 1993). By statistical testing it is possible to infer whether a model is compatible with the data.

Figure 2: a) EXOSAT ME X-ray lightcurve of NGC 5506 (Jan. 1986), b) Hidden AR[1]-process, estimated with the LSSM fit

A first generalization of AR models are the autoregressive-moving-average (ARMA) models that include also past noise terms in the dynamics:

Both models, AR and ARMA processes, assume that the time series is observed without any oberservational noise. In presence of such noise the parameters a_i will be underestimated and statistical tests will reject the model even if its order is specified correctly.

LSSMs generalize the AR and ARMA processes by explicitly modelling observational noise. Furthermore, LSSMs use the so called Markov property, which means that the entire information relevant to the future or for the prediction is contained in the present state. The variable x(t) that has to be estimated cannot be observed directly since it is covered by observational noise . Following the Markov property it is possible to regressively predict the values x(t), though.

The measured observation variables y(t) may not necessarily agree with the system variables x(t) that provide the best description of the system dynamics. Thus a LSSM is defined with two equations, the system or dynamical Eq. (6 (click here)) and the observation Eq. (7 (click here)).

This definition is a multivariate description, which means that the AR[p] process is given as a p-dimensional AR process of order one, with a matrix that determines the dynamics. By combining the different dimensional terms of the multivariate description the typical AR[p] (see Eq. 1 (click here)) form can be derived easily. The observation y(t) is formulated as a linear combination of the random vectors and . The matrix maps the unobservable dynamics to the observation. The terms and represent the dynamical noise with covariance matrix and the observational noise with variance R, respectively.

The estimation of the parameters in LSSMs is more complicated than for AR or ARMA processes. There are two conceptually different procedures available to obtain the maximum likelihood parameters estimates. Both are iterative and start from some initial values that have to be specified. The first procedure uses explicit numerical optimization to maximize the likelihood. The other applies the so called Expectation-Maximization algorithm. The latter procedure is slower but numerically more stable than the former and is described in detail by Honerkamp (1993). Statistical evaluation of a fitted model is generally based on the prediction errors. The prediction errors are obtained by a Kalman filter which estimates the unobservable process (Hamilton 1995). Such a linear filter allows us to arrive at the variables (and its prediction errors), used to describe the system dynamics, starting from a given LSSM and the given observations y(t) (Brockwell & Davis 1991; Koen & Lombard 1993).

Multiplying the estimated process with the estimated yields an estimate of the observed time series y(t). A necessary condition that the model fits to the data is that the difference represents white noise, i.e. the time series of prediction errors should be uncorrelated. This can for example be judged by a Kolmogorov-Smirnov test that tests for a flat spectrum of the prediction errors or by the Portmanteau test using their autocorrelation function. We have used the first method to quantify the goodness of fit of the tested LSSMs (see Table 1 (click here)).

Another criterion to judge fitted models is the decrease in the variance of prediction errors with increasing order of the fitted models. A knee in this function gives evidence for the correct model order. Any further increase of the model order will not reduce the variance significantly. The so called Akaike information criterion (AIC) formulizes this procedure including the different number of parameters of the models (Hamilton 1995). Any oscillators and relaxators which might occur in unnecessarily more complex LSSMs should be highly damped and can be neglected therefore.

The last method to judge a fitted model is to compare the spectrum that results from the fitted parameters with the periodogram of the sample time series. The spectrum of a LSSM is given by:

The superscript T denotes transposition. Spectra of AR or ARMA processes are special cases of Eq. (8 (click here)). In the simplest case of an AR[1] process modelled with a LSSM, the corresponding spectrum is given by:

This function provides both the flattening at low and the decrease of power at medium frequencies seen in periodograms (e.g. see Fig. 4 (click here)).

In a first approach gaps in the observed lightcurve were filled with white noise with the same mean and rms as the original time series in order to create a continuous time series. In a second run these gaps were refilled with the predictions of the Kalman filter plus a white noise realization with the original lightcurves variance. Generally, gaps in an observed time series can be handled by the LSSM in a natural way avoiding the filling of gaps with Poisson noise. The key is again the Kalman filter. The Kalman filter considers the fact that there are still decaying processes taking place even if the object is not observed. In each cycle of the iterative parameter estimation procedure is estimated based on an internal prediction, corrected by information obtained from the actual data y(t). In case of gaps no information from y(t) is available and the internal prediction decays in its intrinsic manner until new information is given. In the case of the lightcurve of NGC 5506 the resulting parameters are consistent with those of the first approach due to the high duty cycle of the original time series.

Up: Analyzing X-ray