next previous
Up: Three stage period analysis

4 Bootstrap

Theoretical (e.g. $\chi^2$) and numerical (e.g. Monte Carlo) $\bar{\beta}_{\mathrm{min}}$ error estimates for nonlinear models are discussed, e.g. by Press et al. (1988). We estimate the $\bar{\beta}_{\mathrm{min}}$ errors with the bootstrap for the regression coefficients (Efron & Tibshirani 1986). This bootstrap also allows estimates of "other'' model parameters (and their errors) that would be difficult to solve analytically for $K\!\geq\!2$, e.g. the total amplitude (A) and the epochs of the minima ( tmin,1, tmin,2,... , tmin,K). The six stages of our bootstrap are:

Minimizing $\chi^2$ (Eq. 8) for $\bar{y}$ with $\bar{w}$ gives

 \begin{displaymath}{\epsilon}_i =
\end{displaymath} (11)

i.e. the "empirical distribution of residuals''.

A random sample $\bar{\epsilon}^*$ is selected from $\bar{\epsilon}$. The number of the same $\epsilon_i$entering into $\bar{\epsilon}^*$ may vary between 0 and nin this random sample with replacement. The $\bar{\epsilon}^*$ determine a unique random sample $\bar{w}^*$, where the connection of wi being the weight of $\epsilon_i$ is preserved.

A random sample $\bar{y}^{*} \! = \! \bar{g} \! + \! \bar{\epsilon}^{*}$ is obtained.

Minimizing $\chi^2$ (Eq. 8) for $\bar{y}^*$ with $\bar{w}^*$ gives one estimate for $\bar{\beta'}_{\mathrm{min}}$, as well as for the "other'' parameters of higher order models ( $K\!\geq\!2$). For example, measuring the difference between the minimum and maximum of $\bar{g}(\bar{\beta'}_{\mathrm{min}})$ gives a numerical A estimate.

The bootstrap returns to the 2nd stage, until S estimates of $\bar{\beta'}_{\mathrm{min}}$ and "other'' parameters have been obtained.

The expectation value and variance for any $\bar{\beta}_{\mathrm{min}}$ component are the mean and variance of its S estimates in $\bar{\beta'}_{\mathrm{min}}$. The same applies to the S estimates of "other'' model parameters.

Note that the $\bar{y}$, $\bar{w}$, $\bar{\epsilon}$ and $\bar{g}$remain unchanged, while $\bar{y}^*$, $\bar{w}^*$ and $\bar{\epsilon}^*$ are changing, and $\bar{\epsilon}^*$ determines both $\bar{y}^*$ and $\bar{w}^*$.

We introduce a few notations useful in testing the "Gaussian hypothesis''

HG: An arbitrary $\bar{x}$with S components represents a random sample drawn from a Gaussian distribution.
These $\bar{x}$ (e.g. $\bar{\epsilon}$, $\bar{M}$, $\bar{A}$, $\bar{P}$ or $\bar{t}_{\mathrm{min,1}}$) are first arranged into an ascending (i.e. rank) order $x_1 \! \leq \! x_2 \!\leq ... \leq x_{\mathrm{S}}$ and then transformed to $u_i\!=\!(x_i-m_x)/s_x$, where mx and sx are the mean and standard deviation of $\bar{x}$. The cumulative distribution function is


and the cumulative Gaussian distribution function is

{\mathrm{e}}^{\mathrm{-z^2/2}} {\rm d}z.
\end{displaymath} (13)

The preassigned significance level $\alpha \!=\!0.01$ for rejecting HG determines an upper limit $c(\alpha\!=\!0.01,S)$ for the Kolmogorov-Smirnov test statistic $a = {\mathrm{max}} [~\mid F_{\mathrm{S}}(u)- F(u) \mid~]$. HG is rejected if, and only if,

 \begin{displaymath}a \geq c(\alpha\!=\!0.01,S).
\end{displaymath} (14)

One might (correctly) argue that preserving the connection between $\epsilon_i$ and wi during the 2nd stage of the above bootstrap distorts the statistics in case of inhomogeneous data quality. In that case some low quality data with large $\epsilon_i$ and small wi will be randomly distributed among high quality data in each bootstrap sample. But this problem is eliminated by checking if the $\bar{\epsilon}$ distribution is Gaussian. If HG is rejected for $\bar{\epsilon}$ with Eq. (14), i.e. the data quality is inhomogeneous, then the modelling is rejected (see RI in Sect. 6.3). If, on the other hand, HG is not rejected for $\bar{\epsilon}$, preserving the connection between $\epsilon_i$ and wiin our bootstrap is justified.

next previous
Up: Three stage period analysis

Copyright The European Southern Observatory (ESO)