3. The merging of multi-site velocity residuals

The need for quasi-continuous coverage in the study of any periodic astronomical signal has long been recognized - the introduction of artefacts characteristic of the window function of the data is a well-known phenomenon.

Quasi-periodic gaps are usually treated in time-series analysis by the introduction of zero-valued residuals. In a time series whose window-function characteristics are dominated by a day-night cycle, power is then re-distributed into a series of sidebands, which straddle signal peaks in the frequency domain at multiples of . This has two important consequences.

First, power leakage from the main signal peaks leads to a reduction in the signal to noise in the modes. This is clearly of much more importance near the low-frequency end of the p-mode spectrum, where the observed signal-to-noise ratios in full-disc helioseismology data are very small, and therefore the impact on mode detectibilities - and attainable frequency uncertainties - is high.

Secondly, since the , 2 spacing in solar p-mode spectra is of the order of near (at about ) the presence of sidebands can "pull'' the fitted frequencies of modes, giving not-insubstantial systematic frequency errors.

Various techniques can be applied in an effort to remove - or at least reduce - the effects of the window function. Perhaps the best-known of these is the CLEAN algorithm (e.g., Roberts et al. 1987). In addition, given the assumed stochastic nature of the p-mode signal, a direct deconvolution (e.g., Lazrek & Hill 1993) can in principle also be performed on the data. Lazrek & Hill note that some of the assumptions used to justify this procedure might invalidate its use with real data. These authors present a series of simulations that have been applied to a single, isolated p mode with a high signal-to-noise ratio - the real solar p-mode spectrum is of course quite complicated, with a wide variety of signal-to-noise ratios observed over the p-mode domain. We are currently investigating the application of such techniques to realistic oscillation spectra, in particular, for modes with low signal to noise, and for p-mode fine structure spacings near .

Owing to the high-Q nature of the solar oscillatory modes, the filling of gaps with reliable estimates of the true data should be possible (Brown & Christensen-Dalsgaard 1990). However, due to the rich nature of the low-degree p-mode spectrum, i.e., many modes, at fairly close intervals in frequency space, Brown & Christensen-Dalsgaard note that the techniques they have investigated might not be wholly reliable. In addition, a reasonably high signal-to-noise ratio is required in the modes to give a good reconstruction of the missing signal. As the fill in a time series of velocity residuals decreases, so the requirements on the signal to noise needed for signal reconstruction become more severe. From the typical gap distributions observed in BiSON data, we find that by filling all gaps of or less, we would improve our overall fill by per cent. It is clear from the observed signal-to-noise ratios in long BiSON spectra that low (say, below ) and high-frequency (rather lower Q) modes might be precluded from any gap-filling analysis. This may also be true for the modes we have observed in full-disc BiSON data (Chaplin et al. 1995b; 1996b). Nevertheless, we are investigating possible solutions to the gap-filling problem.

In an effort to provide reliable, round-the-clock coverage of the Sun, some degree of redundancy is desirable in a network. With several network sites fairly evenly distributed in longitude, there will be many instances where data are collected simultaneously at more than one site. These regions of "overlap'' give rise to a potential gain in signal-to-noise. It is this point we wish to address in detail below.

3.1. The weighted combination of residuals

The principal aims of a multi-station network are to reduce the number, and length, of data gaps in a 1-dimensional time series constructed from the velocity residuals collected at all the sites, and to provide as much signal overlap between stations as possible. The presence of multi-station overlaps, i.e., stretches where data are collected simultaneously from more than one site, provides the opportunity to increase the signal-to-noise ratio in the modes by appropriately combining the overlap data from each site for inclusion in the final time series. The residuals in the overlap region must be properly weighted in order to take full statistical advantage of the extra data (see below).

The presence of low-frequency offsets between stations demands the application of techniques to remove discontinuities in the merged time series - this we do by high-pass filtering the velocity residuals (see below). Here, we concern ourselves with the statistical advantages of merging such data.

Let the appropriate weight (see below for discussion) for the overlap data from each site be W_j, and let there be n sites in the overlap (such that ). With no low-frequency offsets between sites, the data can be combined in the time domain - over the region of the overlap - to give a combined velocity residual , according to

The data can also be combined in the frequency domain. Here, the frequency spectrum for each site - over the duration of the overlap - must be calculated. The weighted Fourier components at a given frequency are combined from each site, and the combined residuals are then recovered by taking the inverse Fourier transform of the combined Fourier spectrum. This is the approach we have adopted with the solar data, since the inherent noise in the solar velocity residuals - and therefore the appropriate weighting factor - can be determined by assessing power levels in the frequency domain. We have performed a variety of rigorous tests in order to quantify any systematic errors introduced by applying Fourier techniques on short data strings to recover the merged signal - we find that these are of the order of , i.e., at a level that introduces no significant artefacts into the data.

3.1.1. Signal content

In order to quantify the potential gains from a weighted merge, we must consider some straightforward statistics, and make some assumptions concerning the content of the velocity residuals.

The p-mode signal "seen'' by each instrument is assumed to be identical - this assumption is not strictly true for the following reasons.

Consider two stations separated by several tens of degrees in longitude - any overlap between these sites will contain data collected during the early part of the day at one of the sites, and the latter part of the day at the other. The line-of-sight velocity of the Sun with respect to each observatory (the solar topocentric velocity) will therefore be different for each set of data in the overlap. Since the passbands of a basic resonance-scattering device are narrow compared to the width of the solar reference line, different parts of the photospheric line - and by implication, different depths in the solar atmosphere - will be sampled by each instrument. Since the amplitude of the p modes changes with height in the solar atmosphere, the measured signal amplitudes for each station in the overlap will differ.

In order to estimate the magnitude of this effect, we have used the calculated heights of formation of Underhill & Speake (1996). These authors determined the atmospheric heights of formation for the blue and red passbands of a BiSON resonance-scattering spectrometer, as a function of the line-of-sight velocity of the Sun. We assume that the effective height sampled by the spectrometer is the average of that sampled by the blue and red passbands. Furthermore, we assume that the exponential variation of the velocity amplitude of the modes with height can be characterized by a velocity-amplitude scale height . Underhill & Speake found from an analysis of BiSON data. Taking this value, the estimated fractional difference in the mode velocity amplitudes for the extreme line-of-sight velocities 0 and is then approximately -0.07. Since the difference in the line-of-sight velocities of "overlapping'' sites is somewhat less than , we estimate that the typical fractional amplitude difference in overlaps should be of the order of only 2 to 3 per cent.

The separation of the spectrometer passbands is fixed by the Zeeman splitting induced by a permanent longitudinal magnetic field imposed on the potassium reference vapour in the instrument - different field strengths between network instruments would therefore give rise to a similar (small) effect.

There is also an additional line-of-sight velocity effect now known as Doppler imaging - here, the use of narrow spectral passbands when viewing the rotating Sun results in the "mapping'' of different parts of the solar Fraunhofer line onto different parts of the visible solar disc (Brookes et al. 1978; van der Raay 1991).

We also assume that the predominant velocity noise source over the p-mode region has the characteristics of a normal distribution, and that the noise contributions are completely incoherent between stations. This, of course, does not quite fully apply, when, for example, the solar background velocity noise is a substantial fraction of the total noise power (see later).

The non p-mode noise will be made up of contributions from photon shot noise, instrumental noise, atmospheric noise, and the solar velocity background noise continuum. The first three contributions serve to produce a combined noise background that is approximately Gaussian. However, the final contribution is common signal between each overlap station (allowing for some loss of coherence due to Doppler imaging). In addition, near the centre of the p-mode regime, there will be a large contribution due to the slowly-decaying Lorentzian wings of the modes, due to the presence of sidebands, and from a small, diffuse high- background. For the basic equations given below, we shall consider a single, isolated p mode - in a later section, the full effects of this additional power will be discussed.

Finally, we assume that there are no large, low-frequency discontinuities between residuals from different sites. As previously noted, for the purposes of the 5-minute analysis, the BiSON residuals are high-pass filtered with a moving mean that cuts in just below . If differences between the observed signal from each station are assumed to be purely due to contributions from Gaussian-like noise, there is no need to employ techniques to "smooth'' the joins between single-station and overlap parts of the time series.

3.1.2. Quantitative signal-to-noise gains

We now develop straightforward signal-to-noise expressions for an n-station overlap consisting of N data points. These will be expressed in terms of the straight ratio of the fitted mode-peak power to the mean background noise power level.

First we derive an expression for the noise power from single-site data. Let the sample standard deviation of the "white'' noise source, in the time domain, be . Its magnitude is determined by the distribution of noise-source velocities v_i: for N points in the time series

If the time series of white noise possesses a zero mean level, then for N>>1, the right-hand side of the above is simply the sum of the powers P_i in each real bin of the frequency domain of the Fourier transform of the data. Therefore, the average power per bin in the frequency domain - due to the Gaussian noise source - is given by

If the time series also contains some periodic signal of interest, and if this signal gives rise to a peak in the frequency spectrum of height , then we may express the signal-to-noise ratio, s/n, in the mode for the duration of the overlap according to

Now consider the weighted merging of residuals. The weights must be set appropriately, in order to take full statistical advantage of the extra data available in the overlap. Here, the correct weights are proportional to the inverse of the square of the sample standard deviations of the noise sources for each of the overlapping time series.

If there are n stations in the overlap, each with a white noise source characterized by (), and if the signal is common, then Eq. (2 (click here)) will be modified to

assuming that the overlap persists for the full duration of the time series.

Now consider the case where the overlap persists for some fraction, , of the total time series. Let the equivalent sample standard deviation of the site data present in the single-station parts of the time series be . In addition, let the equivalent sample standard deviation for the overlap sections be - this is, of course, given by

The modified signal-to-noise ratio is then given by

which, substituting Eq. (4 (click here)), gives

A more realistic representation of a real network will consist of different overlap combinations and a variation in the quality of the single-station segments - under these circumstances, Eq. (5 (click here)) must be expanded to a more generalized form in order to describe the resulting signal-to-noise ratio. Let there be a total of S single-station regions, each characterized by a fractional fill , and sample standard deviation , such that ; in addition, let there be a total of O regions of overlap, with fractional fills , and sample standard deviations , such that . Here, each is derived from the sample standard deviations of the sites in the considered overlap (cf. Eq. 4 (click here)). The generalized signal-to-noise expression for an "inhomogeneous'' time-series composition is then:

What do these equations imply concerning potential quantitative improvements? Figure 1 (click here) shows the signal-to-noise gains for a variety of two-station overlaps, and fractional overlap fills (fills indicated next to each curve). The fractional signal-to-noise gains are calculated with respect to a time series in which data from the better site only are used, and are plotted with respect to the ratio of the sample noise standard deviations of the two stations in the simulated overlap.

Figure 1: Fractional signal-to-noise improvements, as derived from Eq. (6), for a variety of two-station overlaps, plotted with respect to ratio of sample noise standard deviations of the stations in the overlap - the fractional signal-to-noise gains have been calculated with respect to a time series in which data from the better station only is used. Each curve corresponds to a particular fractional overlap fill, as indicated on the plot. The sample standard deviation of the data in the single-station parts of the simulated time series is assumed to equal that of the better station

A weighted combination of overlap residuals from two sites of similar quality will lead to an improvement in signal to noise - over the region of the overlap - by a factor of 2 (Fig. 1 (click here)). If the fractional time for which this type of overlap occurs is reduced from 100 to 50 per cent (assuming the single-station parts of the time series to be filled with residuals of similar quality), the signal-to-noise increase is reduced to a factor of . With a fractional overlap fill of only 10 per cent, the gain is only . The expected gains for overlaps with more stations can be readily extrapolated from these results.

What if the stations in an overlap are of different quality? This is of particular relevance here, since the quality of data differs from site to site within BiSON. With reference to Fig. 1 (click here), as the quality of the poorer site deteriorates, so the potential signal-to-noise gain in the overlap decreases. Once the ratio of the sample noise standard deviations for the sites increases beyond , the extra gain for all overlap fills shown in Fig. 1 (click here) will have halved (with respect to a two-site overlap in which the quality of the sites is the same). We now go on to consider the effect of merging on BiSON data, i.e., on a complex time series with variable station quality and gaps.

3.2. BiSON overlap characteristics

Figure 2 (click here) indicates where multi-station overlaps were present during the calendar year 1994. Black pixels indicate no overlap; dark grey, the presence of a 2-station overlap; light grey a 3-station overlap; and white, a 4-station overlap. Most of the overlap hours are accumulated during the central UT band of each day, when the instruments at Sutherland, Izańa and Las Campanas are collecting data. The total fraction of time for which multi-station overlaps were present in the data for the full calendar year 1994 was

Figure 2: Multi-station overlaps for 1994: black pixels indicate no multi-station overlap or no data; dark grey a two-station overlap; light grey a three-station overlap; and white a four-station overlap

As mentioned in the last section, the quality of the instrumentation varies between sites. The best-quality data are collected by the spectrometers at Sutherland, Las Campanas and Narrabri. Typical high-frequency noise powers for daily spectra generated from data collected at these sites are of the order of 2 to . Data collected by the older instruments at Izańa, Carnarvon and Mount Wilson are characterized by somewhat-larger high-frequency noise powers. (This disparity is due largely to lower counting rates.) There can be as much as a difference of 10 in the high-frequency noise powers of data collected at the older and newer sites.

In order to assess the impact of weighted merging, we have taken the 2-month BiSON window function for the period 1994 July to 1994 August inclusive. The fill of useful residuals for this period was quite high ( per cent), with a -per-cent overlap fill. A damped harmonic oscillator, excited at regular intervals by a "white'' forcing term, has been used as a model for the p-mode signal. (Full details of this model can be found in Chaplin et al. 1996c).

This signal was assumed to be common to all sites over the duration of the real 2-month window function. Appropriate quantities of noise, with the spectral characteristics of a normal distribution, were added to the simulated p-mode time series for each site, in order to match the observed relative noise levels. No allowance was made for the increased levels of noise - caused by atmospheric extinction effects - at the extremes of the daily data. We note, however, that there is only a small deterioration in quality in the real residuals, since the regular day-to-day BiSON analysis excludes data collected at high air masses (greater than ). The artificial data - modulated by the real 2-month window function for each of the stations - were then merged to give a single time series.

The weights for each of the stations in a given overlap were fixed according to the high-frequency power (from 8 to ) of the Fourier spectrum of the overlap data for the site. Assuming a Gaussian-like noise source, the mean high-frequency power per bin is, of course, proportional to the square of the sample standard deviation characterizing the noise source. The simulated time series was then constructed with the weighted-merging technique - as a reference, a time series was also constructed in which overlaps were treated by taking data from the best-quality station only.

For a single, simulated p-mode, forced through the real window function, with relative station noise levels characteristic of the current status of BiSON, the signal-to-noise gain given by using the weighted-merging technique instead of the best-quality-station-only technique was about 4 per cent. This rather modest gain can be readily explained by considering the make-up of the overlaps in the 6-station time series.

About 80 per cent of all overlaps over this two-month stretch occur between sites with an old and a new spectrometer. Only 16 per cent involve sites with at least two new instruments. The distribution of overlap types is similar for other 2-month stretches. As we saw in the previous section (and with reference to the derived equations), in order to provide a reasonable overall signal-to-noise gain, the noise level of the overlapping data must be of comparable quality. The overwhelming number of overlaps between old and new sites consequently restricts any gains to a low figure. By upgrading the older spectrometers, so that all instruments would be of similar quality, the expected overall signal-to-noise gain would increase to about 30 per cent.

In deriving the above gains, we have considered the signal to be composed of a single mode, and we have neglected the common signal from the solar velocity noise continuum spectrum. At low frequencies, where this common signal is high (Elsworth et al. 1994a), the solar continuum spectrum constitutes an insurmountable noise-floor level (the noise power due to the solar velocity continuum is comparable with the expected levels of photon shot noise). Hence, the overall gain for a homogeneous network is reduced from 30 to about 17 per cent at frequencies near ; for the current BiSON configuration, the expected gain at lower frequencies is about 2 per cent.

Furthermore, we also recognise that the single-mode representation constitutes an over-simplification of the true picture in the frequency domain, i.e., the close frequency spacing of modes in the low-degree spectrum gives rise to a substantial amount of power in between the modes from the slowly decaying Lorentzian peaks. This is most prominent near the centre of the p-mode envelope, where the power from the wings of the modes is comparable to, or larger than, the incoherent background noise power arising from instrumental and atmospheric contributions.

The above quantitative calculations have been based upon various assumptions concerning the content of the real data, some of which, as has been pointed out, are questionable - they do, however, provide a useful guide to the expected quantitative gains made available by using all data from regions of overlap in the construction of a multi-station time series. When the weighted-merging technique is applied for real network data over the 2-month period covering 1994 July to 1994 August inclusive, there is no discernable signal-to-noise gain compared with a spectrum generated using the best-station-only overlap technique. This validates the simulations performed above with artificial data, which indicated one might expect very small fractional gains only. We do, however, remind the reader of the other benefits to be had from the overlaps generated by BiSON, i.e., the study of the solar continuum velocity noise; confirming the solar nature of large transitory or rapid excitation phenomena; and cross checking timing and calibration between stations.

We have demonstrated that in order to produce substantial signal-to-noise gains by weighted merging (with a view to increasing the quality of long Fourier spectra generated from the coherent analysis of many-station data): the noise level of the data from sites comprising a given overlap should be comparable, i.e., an upgrade of the older BiSON spectrometers would clearly be beneficial; and there should be a large amount of overlap coverage during each observing day, which demands the presence of at least two (ideally more) observing sites in each 120-degree longitude band (i.e., observational redundancy). This would also clearly be desirable in terms of the requirement for a high data-fill, simply via the increased redundancy afforded by a greater number of network sites.

When the data from each station are not high-pass filtered, techniques must be used to smooth or reduce low-frequency steps between sites. Sharp discontinuities can produce extra Fourier components in the vicinity of the mode peaks. While the low-frequency content of the velocity residuals contains large amounts of noise due to atmospheric and instrumental contributions, there is also signal content of interest. The fundamental solar mode is expected to possess a period of the order of an hour or more; the motion of plages and sunspots across the visible solar disc give rise to a measurable signal with a periodic time scale of several days. On-going improvements to the low-frequency behaviour of our instrumentation are being made. However, complicated methods are still required to merge residuals which still retain very low-frequency information - these will be discussed in an up-coming paper in this series.

Up: Techniques used in