Up: Three stage period analysis

Subsections

# 3 TSPA with weights

This section contains the detailed formulation of a weighted three stage period analysis for the simple model of Sect. 2.1. The input data at are (time), (observation) and (error), i.e. the weights are , where . The free parameters ( ) for our K:th order model

 (1)

are: the mean (M), the amplitudes (BkCk) and the frequency (f), i.e. . The definition for the phases is , where FRAC removes the integer part of fti. The TSPA consists of three consecutive stages: the pilot- (PSch), the grid- (GSch) and the refined-search (RSch).

## 3.1 Pilot Search (PSch)

The order K (Eq. 1) determines the PSch "model'' (Eq. (2) below). The PSch searches for the best period "candidates'' over a long interval between Pmin ( fmax-1) and Pmax ( fmin-1). Typical correlation lengths in time ( Dmin, Dmax) and phase () are , , and

Note that only is connected to our model (Eq. 1). The consequences of and the reasons for selecting adjustable correlation lengths Dmin, Dmax and are discussed in the end of this section, as well the particular connection between and the modelling order K (see also the beginning of Sect. 3.2).

The PSch uses four sets of pairs: ti,j, yi,j, wi,j and ( and ). The first three, , and , are independent of f, but the are not. The PSch periodogram determined at all integer multiples of between fmin and fmax is

 (3)

where
 (4)

 (5)

The following algorithm is efficient for larger samples of data, because it eliminates the problem that the number of ti,j, yi,j, wi,j and increases with .
1.
Because W(ti,j) does not depend on f(Eq. 4), apply this function only once.
2.
Divide into bins ( for larger samples), where the bin width is and INT removes the decimal part of .
3.
Denote the nq values of ti,j, yi,j and wi,j in the q:th bin by t'k, y'k and w'k. Derive , and for every bin.
4.
Calculate the modified PSch periodogram

 (6)

where Z (Eq. 5) is applied to .

This algorithm divides the data into J bins with respect to the time differences ti,j between Dmin and Dmax. The original ti,j, yi,j and wi,j data are replaced by the J averages t'q, y'q and w'q within these bins. The algorithm is efficient, because the Z function in Eq. (6) has to applied only to J () values at each tested f.

The yi,j differences closer than in are smaller for a good f candidate, i.e. the data form a continuous curve. Such f minimize the PSch periodogram (Eqs. 3 or 6), the case being opposite for poor f candidates. Because W(ti,j) (Eq. 4) excludes yi,j too far in ti,j, phase shifts during time intervals longer than Dmax do not influence the periodogram. The combination of and would include all data (Eq. 4), but Dmin is applied, because yi,j do not contain significant information when ti,j goes below Pmin. Adjusting Dmin and Dmax determines the number of yi,j selected with W(ti,j). The function selects only closer than in (Eq. 5). For example, determines a sinusoidal model (Eqs. 1 and 2). The yi,j on a sinusoid correlate within , or those on a double wave () within . Reduction of enables detection of more complex variation, but reduces the number of yi,j selected with , i.e. requires more data.

 Figure 1: The second order TSPA for SET=42 (): a) (Eq. 6) is the PSch periodogram between and with correlation lengths , and . The number of independent frequencies is (Eq. 9). The diamonds on mark the five best periods P1, ..., P5. b-f) The with these P1, ..., P5. g-k) The diamonds on (Eq. 7: ) indicate the more accurate P1, ..., P5 obtained with the GSch. l-p) RSch determines the final P1, ..., P5, and their (Eq. 10). The continuous lines connect each to the closest point of the model . q-u) The versus (see end of Sect. 5) of each P1, ..., P5 for given by (Eq. 15). The critical levels for the linear correlations between and are

 Figure 2: The RSch bootstrap of SET=42: a) The model () with already shown in Fig. 1l. b) The FS(u) and F(u) (Eqs. 12 and 13) for the residuals confirm a Gaussian distribution (i.e. HG is not rejected with in Eq. 14). A dark rectangle indicates the location and height of the Kolmogorov-Smirnov test statistic ( ). c and d) The bootstrap M estimates (open squares) also follow a Gaussian distribution. e-j) The same as in c and d) for the P, A and tmin,1 estimates

 Figure 3: Same as in Fig. 1 for SET=114

 Figure 4: Same as in Fig. 2 for SET=114. Note: e and f) is rejected for P (Eq. 14). k) Only estimates are obtained for tmin,2. The bootstrap samples with no tmin,2 are marked with vertical lines

## 3.2 Grid Search (GSch)

The PSch over a long f interval usually detects numerous frequency "candidates'' f'. More accurate values for at least five best f' are determined with the GSch. The TSPA applications for real data may sometimes require testing of a much larger number of f' detected with the PSch. Here the limit of testing only the five best f' was chosen, because it is convenient for the graphical representation of the results, like in Figs. 1, 3, 5 and 7. The PSch "model'' can not fully constrain the modelling order K (Eqs. 1 and 2), because yi,j closer than in may correlate in several different models. For example, the PSch could detect a box function or a sinusoid with . The model is fixed in the GSch, e.g. if , GSch proceeds with a model (Eq. 1). If PSch detects f', then GSch tests all integer multiples of between and , where is called the "overfilling factor'' and . The discrete tested f set within this narrow interval is denser than in the PSch (i.e. ). Standard linear least squares fits to with for every tested f in Eq. (1) (i.e. f is not a free parameter) determine the GSch periodogram

 (7)

where minimizes the residuals (note that ), and the factor 2 adjusts to the quantitative level of . Thus the PSch periodogram with represents an approximation of the GSch periodogram with K.

 Figure 5: The fifth order TSPA () for the V magnitudes of the cepheid variable BL Her, otherwise as in Fig. 1

 Figure 6: The fifth order RSch bootstrap with for the V magnitudes of BL Her, otherwise as in Fig. 2. Note: k) Only estimates are obtained for tmin,2

 Figure 7: The first order TSPA () between and for the V magnitudes of SAO50205 during subsets SET=111, 112, 113 and 114, otherwise as in Fig. 1

 Figure 8: The first order RSch bootstrap with for the V magnitudes of SAO50205, otherwise as in Fig. 2

## 3.3 Refined Search (RSch)

The model (Eq. 1) is nonlinear when f is a free parameter. The RSch performs a standard Marquardt iteration (e.g. Press et al. 1988) to compute . But the result of this iterative refinement depends on the trial solution ( ). Large discrete f sets are tested with PSch and GSch only to provide a reliable for RSch. Combining the f' detected in the GSch to (Eq. 7) gives . While the PSch and GSch test discrete f sets, the RSch is continuous in f, because it utilizes the analytical properties of

 (8)

where minimizes the . This enables significance estimates. The number of independent frequencies (m) determines how many independent -tests are done with the model of Eq. (1). Since the TSPA is performed between fmin (Pmax-1) and fmax (Pmin-1), a logical definition is

 (9)

where . This definition is equivalent to that in Jetsu & Pelt (1996, Eq. 13), who presented an empirical approach to verify Eq. (9) (see also Buccheri & De Jager 1989). To understand the connection between f0 and m, consider an arbitrary test statistic z(f) that depends on . Because for any f, the correlation between z(f) values vanishes within f0 when the order is totally rearranged. An overfilling factor, e.g. in GSch, enables more accurate period determination, but does not deteriorate the statistics, i.e. testing many frequencies within amounts to testing only one independent frequency. Hence the TSPA performs m independent -tests over the frequency interval [fmin,fmax]. Our "null hypothesis'' is:
H0: The data are pure noise.
Under H0, the probability (i.e. the critical level) for reaching a particular, or an even smaller, value, is

 (10)

where is the degree of freedom for . The validity of statistics is verified in the next section by testing the hypothesis that the model residuals and parameters have a Gaussian distribution.

Up: Three stage period analysis

Copyright The European Southern Observatory (ESO)