The need for quasi-continuous coverage in the study of any periodic astronomical signal has long been recognized - the introduction of artefacts characteristic of the window function of the data is a well-known phenomenon.
Quasi-periodic gaps are usually treated in time-series analysis by
the introduction of zero-valued residuals. In a time series whose
window-function characteristics are dominated by a day-night cycle,
power is then re-distributed into a series of sidebands, which
straddle signal peaks in the frequency domain at multiples of . This has two important consequences.
First, power leakage from the main signal peaks leads to a reduction in the signal to noise in the modes. This is clearly of much more importance near the low-frequency end of the p-mode spectrum, where the observed signal-to-noise ratios in full-disc helioseismology data are very small, and therefore the impact on mode detectibilities - and attainable frequency uncertainties - is high.
Secondly, since the , 2 spacing in solar p-mode spectra is
of the order of
near
(at about
) the presence of sidebands can "pull'' the fitted
frequencies of modes, giving not-insubstantial systematic frequency
errors.
Various techniques can be applied in an effort to remove - or at
least reduce - the effects of the window function. Perhaps the
best-known of these is the CLEAN algorithm (e.g.,
Roberts
et al. 1987). In addition, given the assumed stochastic
nature of the p-mode signal, a direct deconvolution (e.g.,
Lazrek
& Hill 1993) can in principle also be performed on the data.
Lazrek
& Hill note that some of the assumptions used to justify this
procedure might invalidate its use with real data. These authors
present a series of simulations that have been applied to a single,
isolated p mode with a high signal-to-noise ratio - the real solar
p-mode spectrum is of course quite complicated, with a wide variety
of signal-to-noise ratios observed over the p-mode domain. We are
currently investigating the application of such techniques to
realistic oscillation spectra, in particular, for modes with low
signal to noise, and for p-mode fine structure spacings near .
Owing to the high-Q nature of the solar oscillatory modes, the
filling of gaps with reliable estimates of the true data should be
possible (Brown & Christensen-Dalsgaard 1990). However, due to the
rich nature of the low-degree p-mode spectrum, i.e., many modes, at
fairly close intervals in frequency space, Brown &
Christensen-Dalsgaard note that the techniques they have investigated
might not be wholly reliable. In addition, a reasonably high
signal-to-noise ratio is required in the modes to give a good
reconstruction of the missing signal. As the fill in a time series of
velocity residuals decreases, so the requirements on the signal to
noise needed for signal reconstruction become more severe. From the
typical gap distributions observed in BiSON data, we find that by
filling all gaps of or less, we would improve our overall
fill by
per cent. It is clear from the observed
signal-to-noise ratios in long BiSON spectra that low (say, below
) and high-frequency (rather lower Q) modes might be
precluded from any gap-filling analysis. This may also be true for
the
modes we have observed in full-disc BiSON data
(Chaplin
et al. 1995b; 1996b). Nevertheless, we are investigating possible
solutions to the gap-filling problem.
In an effort to provide reliable, round-the-clock coverage of the Sun, some degree of redundancy is desirable in a network. With several network sites fairly evenly distributed in longitude, there will be many instances where data are collected simultaneously at more than one site. These regions of "overlap'' give rise to a potential gain in signal-to-noise. It is this point we wish to address in detail below.
The principal aims of a multi-station network are to reduce the number, and length, of data gaps in a 1-dimensional time series constructed from the velocity residuals collected at all the sites, and to provide as much signal overlap between stations as possible. The presence of multi-station overlaps, i.e., stretches where data are collected simultaneously from more than one site, provides the opportunity to increase the signal-to-noise ratio in the modes by appropriately combining the overlap data from each site for inclusion in the final time series. The residuals in the overlap region must be properly weighted in order to take full statistical advantage of the extra data (see below).
The presence of low-frequency offsets between stations demands the application of techniques to remove discontinuities in the merged time series - this we do by high-pass filtering the velocity residuals (see below). Here, we concern ourselves with the statistical advantages of merging such data.
Let the appropriate weight (see below for discussion) for the overlap
data from each site be Wj, and let there be n sites in the
overlap (such that ). With no low-frequency offsets
between sites, the data can be combined in the time domain - over
the region of the overlap - to give a combined velocity residual
, according to
The data can also be combined in the frequency domain. Here, the
frequency spectrum for each site - over the duration of the overlap
- must be calculated. The weighted Fourier components at a given
frequency are combined from each site, and the combined residuals are
then recovered by taking the inverse Fourier transform of the
combined Fourier spectrum. This is the approach we have adopted with
the solar data, since the inherent noise in the solar velocity
residuals - and therefore the appropriate weighting factor - can be
determined by assessing power levels in the frequency domain. We
have performed a variety of rigorous tests in order to quantify any
systematic errors introduced by applying Fourier techniques on short
data strings to recover the merged signal - we find that these are
of the order of , i.e., at a level that
introduces no significant artefacts into the data.
In order to quantify the potential gains from a weighted merge, we must consider some straightforward statistics, and make some assumptions concerning the content of the velocity residuals.
The p-mode signal "seen'' by each instrument is assumed to be identical - this assumption is not strictly true for the following reasons.
Consider two stations separated by several tens of degrees in longitude - any overlap between these sites will contain data collected during the early part of the day at one of the sites, and the latter part of the day at the other. The line-of-sight velocity of the Sun with respect to each observatory (the solar topocentric velocity) will therefore be different for each set of data in the overlap. Since the passbands of a basic resonance-scattering device are narrow compared to the width of the solar reference line, different parts of the photospheric line - and by implication, different depths in the solar atmosphere - will be sampled by each instrument. Since the amplitude of the p modes changes with height in the solar atmosphere, the measured signal amplitudes for each station in the overlap will differ.
In order to estimate the magnitude of this effect, we have used the
calculated heights of formation of Underhill & Speake (1996). These
authors determined the atmospheric heights of formation for the blue
and red passbands of a BiSON resonance-scattering spectrometer, as a
function of the line-of-sight velocity of the Sun. We assume that the
effective height sampled by the spectrometer is the average of that
sampled by the blue and red passbands. Furthermore, we assume that
the exponential variation of the velocity amplitude of the modes with
height can be characterized by a velocity-amplitude scale height
. Underhill & Speake found
from an
analysis of BiSON data. Taking this value, the estimated fractional
difference in the mode velocity amplitudes for the extreme
line-of-sight velocities 0 and
is then
approximately -0.07. Since the difference in the line-of-sight
velocities of "overlapping'' sites is somewhat less than
, we estimate that the typical fractional
amplitude difference in overlaps should be of the order of only 2 to
3 per cent.
The separation of the spectrometer passbands is fixed by the Zeeman splitting induced by a permanent longitudinal magnetic field imposed on the potassium reference vapour in the instrument - different field strengths between network instruments would therefore give rise to a similar (small) effect.
There is also an additional line-of-sight velocity effect now known as Doppler imaging - here, the use of narrow spectral passbands when viewing the rotating Sun results in the "mapping'' of different parts of the solar Fraunhofer line onto different parts of the visible solar disc (Brookes et al. 1978; van der Raay 1991).
We also assume that the predominant velocity noise source over the p-mode region has the characteristics of a normal distribution, and that the noise contributions are completely incoherent between stations. This, of course, does not quite fully apply, when, for example, the solar background velocity noise is a substantial fraction of the total noise power (see later).
The non p-mode noise will be made up of contributions from photon
shot noise, instrumental noise, atmospheric noise, and the solar
velocity background noise continuum. The first three contributions
serve to produce a combined noise background that is approximately
Gaussian. However, the final contribution is common signal between
each overlap station (allowing for some loss of coherence due to
Doppler imaging). In addition, near the centre of the p-mode
regime, there will be a large contribution due to the slowly-decaying
Lorentzian wings of the modes, due to the presence of sidebands, and
from a small, diffuse high- background. For the basic equations
given below, we shall consider a single, isolated p mode - in a
later section, the full effects of this additional power will be
discussed.
Finally, we assume that there are no large, low-frequency
discontinuities between residuals from different sites. As
previously noted, for the purposes of the 5-minute analysis, the
BiSON residuals are high-pass filtered with a moving mean that cuts
in just below . If differences between the observed
signal from each station are assumed to be purely due to
contributions from Gaussian-like noise, there is no need to employ
techniques to "smooth'' the joins between single-station and overlap
parts of the time series.
We now develop straightforward signal-to-noise expressions for an n-station overlap consisting of N data points. These will be expressed in terms of the straight ratio of the fitted mode-peak power to the mean background noise power level.
First we derive an expression for the noise power from single-site
data. Let the sample standard deviation of the "white'' noise
source, in the time domain, be . Its magnitude is determined
by the distribution of noise-source velocities vi: for N
points in the time series
If the time series of white noise possesses a zero mean level, then
for N>>1, the right-hand side of the above is simply the sum of the
powers Pi in each real bin of the frequency domain of the
Fourier transform of the data. Therefore, the average power per bin
in the frequency domain - due to the Gaussian noise source - is
given by
If the time series also contains some periodic signal of interest,
and if this signal gives rise to a peak in the frequency spectrum of
height , then we may express the signal-to-noise ratio,
s/n, in the mode for the duration of the overlap according to
Now consider the weighted merging of residuals. The weights must be set appropriately, in order to take full statistical advantage of the extra data available in the overlap. Here, the correct weights are proportional to the inverse of the square of the sample standard deviations of the noise sources for each of the overlapping time series.
If there are n stations in the overlap, each with a white noise
source characterized by (
), and if the
signal is common, then Eq. (2 (click here)) will be modified to
assuming that the overlap persists for the full duration of the time
series.
Now consider the case where the overlap persists for some fraction,
, of the total time series. Let the equivalent sample
standard deviation of the site data present in the single-station
parts of the time series be
. In addition, let the
equivalent sample standard deviation for the overlap sections be
- this is, of course, given by
The modified signal-to-noise ratio is then given by
which, substituting Eq. (4 (click here)), gives
A more realistic representation of a real network will consist of
different overlap combinations and a variation in the quality of the
single-station segments - under these circumstances,
Eq. (5 (click here)) must be expanded to a more generalized form
in order to describe the resulting signal-to-noise ratio. Let there
be a total of S single-station regions, each characterized by a
fractional fill , and sample standard deviation
, such that
; in addition, let there be a
total of O regions of overlap, with fractional fills
,
and sample standard deviations
, such that
. Here, each
is derived from the sample standard
deviations of the sites in the considered overlap (cf.
Eq. 4 (click here)). The generalized signal-to-noise
expression for an "inhomogeneous'' time-series composition is then:
What do these equations imply concerning potential quantitative improvements? Figure 1 (click here) shows the signal-to-noise gains for a variety of two-station overlaps, and fractional overlap fills (fills indicated next to each curve). The fractional signal-to-noise gains are calculated with respect to a time series in which data from the better site only are used, and are plotted with respect to the ratio of the sample noise standard deviations of the two stations in the simulated overlap.
Figure 1: Fractional signal-to-noise improvements, as derived from
Eq. (6), for a variety of two-station overlaps,
plotted with respect to ratio of sample noise standard deviations of the
stations in the overlap - the fractional signal-to-noise gains have
been calculated with respect to a time series in which data from the
better station only is used. Each curve corresponds to a particular
fractional overlap fill, as indicated on the plot. The sample
standard deviation of the data in the single-station parts of the
simulated time series is assumed to equal that of the better station
A weighted combination of overlap residuals from two sites of similar
quality will lead to an improvement in signal to noise - over the
region of the overlap - by a factor of 2 (Fig. 1 (click here)). If
the fractional time for which this type of overlap occurs is reduced
from 100 to 50 per cent (assuming the single-station parts of the
time series to be filled with residuals of similar quality), the
signal-to-noise increase is reduced to a factor of . With a
fractional overlap fill of only 10 per cent, the gain is only
. The expected gains for overlaps with more stations can be
readily extrapolated from these results.
What if the stations in an overlap are of different quality? This is
of particular relevance here, since the quality of data differs from
site to site within BiSON. With reference to Fig. 1 (click here), as
the quality of the poorer site deteriorates, so the potential
signal-to-noise gain in the overlap decreases. Once the ratio of the
sample noise standard deviations for the sites increases beyond , the extra gain for all overlap fills shown in
Fig. 1 (click here) will have halved (with respect to a two-site
overlap in which the quality of the sites is the same). We now go on
to consider the effect of merging on BiSON data, i.e., on a complex
time series with variable station quality and gaps.
Figure 2 (click here) indicates where multi-station overlaps were
present during the calendar year 1994. Black pixels indicate no
overlap; dark grey, the presence of a 2-station overlap; light grey a
3-station overlap; and white, a 4-station overlap. Most of the
overlap hours are accumulated during the central UT band of
each day, when the instruments at Sutherland, Izańa and
Las Campanas are collecting data. The total fraction of time for
which multi-station overlaps were present in the data for the full
calendar year 1994 was
Figure 2: Multi-station overlaps for 1994: black pixels indicate no
multi-station overlap or no data; dark grey a two-station overlap;
light grey a three-station overlap; and white a four-station overlap
As mentioned in the last section, the quality of the instrumentation
varies between sites. The best-quality data are collected by the
spectrometers at Sutherland, Las Campanas and Narrabri. Typical
high-frequency noise powers for daily spectra generated from data
collected at these sites are of the order of 2 to . Data collected by the older instruments at
Izańa, Carnarvon and Mount Wilson are characterized by
somewhat-larger high-frequency noise powers. (This disparity is due
largely to lower counting rates.) There can be as much as a
difference of 10 in the high-frequency noise powers of data collected
at the older and newer sites.
In order to assess the impact of weighted merging, we have taken the
2-month BiSON window function for the period 1994 July to 1994 August
inclusive. The fill of useful residuals for this period was quite
high ( per cent), with a
-per-cent overlap fill. A
damped harmonic oscillator, excited at regular intervals by a
"white'' forcing term, has been used as a model for the p-mode
signal. (Full details of this model can be found in Chaplin et al.
1996c).
This signal was assumed to be common to all sites over the duration
of the real 2-month window function. Appropriate quantities of
noise, with the spectral characteristics of a normal distribution,
were added to the simulated p-mode time series for each site, in
order to match the observed relative noise levels. No allowance was
made for the increased levels of noise - caused by atmospheric
extinction effects - at the extremes of the daily data. We note,
however, that there is only a small deterioration in quality in the
real residuals, since the regular day-to-day BiSON analysis excludes
data collected at high air masses (greater than ). The
artificial data - modulated by the real 2-month window function for
each of the stations - were then merged to give a single time
series.
The weights for each of the stations in a given overlap were fixed
according to the high-frequency power (from 8 to ) of
the Fourier spectrum of the overlap data for the site. Assuming a
Gaussian-like noise source, the mean high-frequency power per bin is,
of course, proportional to the square of the sample standard
deviation characterizing the noise source. The simulated time series
was then constructed with the weighted-merging technique - as a
reference, a time series was also constructed in which overlaps were
treated by taking data from the best-quality station only.
For a single, simulated p-mode, forced through the real window function, with relative station noise levels characteristic of the current status of BiSON, the signal-to-noise gain given by using the weighted-merging technique instead of the best-quality-station-only technique was about 4 per cent. This rather modest gain can be readily explained by considering the make-up of the overlaps in the 6-station time series.
About 80 per cent of all overlaps over this two-month stretch occur between sites with an old and a new spectrometer. Only 16 per cent involve sites with at least two new instruments. The distribution of overlap types is similar for other 2-month stretches. As we saw in the previous section (and with reference to the derived equations), in order to provide a reasonable overall signal-to-noise gain, the noise level of the overlapping data must be of comparable quality. The overwhelming number of overlaps between old and new sites consequently restricts any gains to a low figure. By upgrading the older spectrometers, so that all instruments would be of similar quality, the expected overall signal-to-noise gain would increase to about 30 per cent.
In deriving the above gains, we have considered the signal to be
composed of a single mode, and we have neglected the common signal
from the solar velocity noise continuum spectrum. At low frequencies,
where this common signal is high (Elsworth et al. 1994a), the solar
continuum spectrum constitutes an insurmountable noise-floor level
(the noise power due to the solar velocity continuum is comparable
with the expected levels of photon shot noise). Hence, the overall
gain for a homogeneous network is reduced from 30 to about 17 per
cent at frequencies near ; for the current BiSON
configuration, the expected gain at lower frequencies is about 2 per
cent.
Furthermore, we also recognise that the single-mode representation constitutes an over-simplification of the true picture in the frequency domain, i.e., the close frequency spacing of modes in the low-degree spectrum gives rise to a substantial amount of power in between the modes from the slowly decaying Lorentzian peaks. This is most prominent near the centre of the p-mode envelope, where the power from the wings of the modes is comparable to, or larger than, the incoherent background noise power arising from instrumental and atmospheric contributions.
The above quantitative calculations have been based upon various assumptions concerning the content of the real data, some of which, as has been pointed out, are questionable - they do, however, provide a useful guide to the expected quantitative gains made available by using all data from regions of overlap in the construction of a multi-station time series. When the weighted-merging technique is applied for real network data over the 2-month period covering 1994 July to 1994 August inclusive, there is no discernable signal-to-noise gain compared with a spectrum generated using the best-station-only overlap technique. This validates the simulations performed above with artificial data, which indicated one might expect very small fractional gains only. We do, however, remind the reader of the other benefits to be had from the overlaps generated by BiSON, i.e., the study of the solar continuum velocity noise; confirming the solar nature of large transitory or rapid excitation phenomena; and cross checking timing and calibration between stations.
We have demonstrated that in order to produce substantial signal-to-noise gains by weighted merging (with a view to increasing the quality of long Fourier spectra generated from the coherent analysis of many-station data): the noise level of the data from sites comprising a given overlap should be comparable, i.e., an upgrade of the older BiSON spectrometers would clearly be beneficial; and there should be a large amount of overlap coverage during each observing day, which demands the presence of at least two (ideally more) observing sites in each 120-degree longitude band (i.e., observational redundancy). This would also clearly be desirable in terms of the requirement for a high data-fill, simply via the increased redundancy afforded by a greater number of network sites.
When the data from each station are not high-pass filtered, techniques must be used to smooth or reduce low-frequency steps between sites. Sharp discontinuities can produce extra Fourier components in the vicinity of the mode peaks. While the low-frequency content of the velocity residuals contains large amounts of noise due to atmospheric and instrumental contributions, there is also signal content of interest. The fundamental solar mode is expected to possess a period of the order of an hour or more; the motion of plages and sunspots across the visible solar disc give rise to a measurable signal with a periodic time scale of several days. On-going improvements to the low-frequency behaviour of our instrumentation are being made. However, complicated methods are still required to merge residuals which still retain very low-frequency information - these will be discussed in an up-coming paper in this series.