Up: Minimum-component baselines: Fourier

3. Discussion

3.1. Criticism

In some instances the cyclic nature of the FT + inversion process may render it unsuitable. The most intractable cases are those for which something happens at the scan ends which is different in physical process from that governing the general run of the scan. It may be that an undesirable amount of high-frequency component is needed in the filter to fit the extremeties, leaving the center of the baseline fluctuating too violently. Judicious pruning of scan-ends (which may contain suspect data anyway) may solve the problem.

But what about buried lines or faint broad lines? Or just plain faint lines which have not been patched? It is fair criticism that such line amplitudes will be reduced because the unseen signal was not ``patched out", and the filter process has indeed removed some of the low-frequency components which would have contributed to their amplitudes. However, no objective technique is going to correct for unseen features; and if features are strong enough to recognize or sense, they are strong enough to patch. Concern about the signal reduction which results in not patching very weak lines, signal which is statistically known, can be assessed by Monte-Carlo analysis (e.g. Wall et al. 1982).

Techniques of polynomial or spline fits, or of heavy smoothing, are generally less objective than Fourier analysis. They do not avoid the signal-reduction difficulty, and in each of these cases, some form of ``patching" and treatment of the scan ends is going to be required.

Figure 5: Part of the set of distorted continua on which signals are placed to carry out the error analysis. Each continuum of unit length consists of a Gaussian of height b added at the mid-point of a base of unity. The 5 values of b are 4.0, 2.0, 1.0, -.25 and -.50. The calculations were carried out with 6 families of these baselines, the Gaussians having half-widths of 0.250, 0.395, 0.628, 0.995, 1.577 and 2.500 scan-lengths. The first three of these are shown above

3.2. Error analysis

It is difficult to make a comparative error analysis; the numerous and ill-disciplined ways in which baselines or continua are normally derived do not provide standard models against which comparison can be made. One of the advantages of the technique described above is to enable a formal estimate of the error in signal and/or equivalent width which results from continuum assessment.

The minimum-component technique results in perfect fits and perfect flux measurement (except for noise) if the baseline is linear. It is only when signal sits on bumps or in hollows that error results. To carry out an analysis of such errors, a simple model of a continuum and a signal based on Gaussians was adopted. The geometry, shown in Fig. 5 (click here) and Fig. 6 (click here), consists of Gaussian signals of sitting at the maxima or minima of continua built of a unit dc level plus a centered Gaussian of height b and given FWHM. The data stream is of unit length and the Gaussian signal is of height h(b+1), i.e. h is the ratio of the signal height to the centrepoint of the continuum.

Figure 6: Panel a) shows a Gaussian signal of sitting at the maximum of the member of the distorted-continuum family of Fig. 5 with b= 1.0, FWHM = 0.395. The true baseline is the solid curve and the minimum-component baseline selected to give a 1% error is the dash-dot curve. The signal amplitude . Panel b) shows the total curve with a patch at , while panel c) is the resultant baseline after filtering the patched curve of b) by limiting the Fourier components in the reconstruction of the patched approximation as described. Panels d) and e) are panels a) and b) with noise of rms = 0.15h added for verisimilitude, while panel f) shows the entire scan length with the minimum-component continuum. In this example the error in equivalent width with the minimum-component baseline is 13%; if is doubled at 0.02, this error rises to 50%

In this (or any) model of the continuum the sources of error in signal measurement are the following.

Badly-fitted continuum. If the continuum contains distortions whose scale-length approaches that of the signal, the problem of signal measurement becomes hopeless. In the case of astrophysical assessment, broad continuum features of this type may be due to a) poorly corrected instrumental response, or b) underlying signal features which are intrinsically broad, such as the blue bump in QSO spectra, the iron-line complexes in QSO spectra, or very broad line components. In these instances it is essential to decide what is to be measured and to design the baseline accordingly.
In the current methodology, it is possible that the baseline may be badly fit by not allowing enough Fourier components to take part in the assembly of the filtered baseline - we are trying to use the minimum number of components in order to produce the best approximation and to avoid including any noise or signal. The difficulty as illustrated in Fig. 6 (click here) may be that in the presence of noise, this badness of the fit goes unrecognized.
Deviation due to the patch. In the worst-case situation shown in Fig. 6 (click here), patching across the region on which the signal sits is going to exacerbate the poorness of fit of the baseline. The subsequent filtering which smooths the patch merely serves to drop the baseline yet futher away from the true level. The larger the patch in comparison with the scale of the baseline distortion, the larger this imposed deviation. Moreover, the more false signal from the ``peak" of the continuum will be included as signal.
The summing of flux. The region over which the flux is summed is vital and must be constrained. Too large a region drives the baseline error to dominate the flux error totally. (The problem is well recognized in stellar photometry, in which it is customary to measure out to between 80 and 90 percent of the projected light i.e. down the Gaussian to 2 and to do so for standard stars and programme stars. But this is only applicable if the profiles or light distribution are identical for both programme and standard stars).

Note that if equivalent width is calculated, it exacerbates the errors, in the sense that the flux error due to the baseline error increases the measured equivalent width over the true, while the baseline error itself decreases the baseline and thus also increases the measured equivalent width over the true.

Even with the simplistic model of Fig. 5 (click here) and Fig. 6 (click here), there is a large parameter space. The estimates carried out here are representative only, but provide a guide from which to determine approximate errors in most such analyses. To restrict the parameter space, two assumptions were made:

The width of the patch, , was taken as the width over which flux in the signal was summed.
``One percent" baselines were adopted; for each member of the family of baselines in Fig. 5 (click here), just enough width was allowed in the filter to include sufficient components so that when the true baselines were reconstructed after filtering, the maximum difference from the true baseline was one percent. Experience suggests that this is realistic to pessimistic. The situation is shown in Fig. 6 (click here).

Measured and true equivalent widths were calculated for Gaussian signals centered on baselines in the family of Fig. 5 (click here). This may be done analytically because of the additive nature of Fourier transforms. For the patched/filtered scan, the transform of two truncated Gaussians (regions A and C) separated by a rectangular region (B, corresponding to the patch), can be multiplied by the filter function

to obtain the signal error after reverse transformation, as

where is the true flux and the signal flux is measured out to about its centroid.

In practice it was simplest to use the FT system set up to analyze the real data (Laing et al. 1994 and in preparation). Checking such calculations is simple because there are at least three cases where geometrical analysis gives close approximation. Consider the following two examples.

The signal is very narrow in comparison to the scale-length (FWHM) of the baseline distortion, so that the baseline appears quasi-flat to it. For the geometry described in Figs. 5 (click here) and 6 (click here), the true equivalent width is

In this case the error on the baseline due to filtering (inclusion of too few components) dominates. Suppose this error is . Then the ratio of measured to true equivalent width is

where the flux is summed over about the signal centre.
If the signal is broad in comparison with the scan distortion scale-length, then the filtering of the patch has minimal effect. When it has no effect then from algebra of two Gaussians, a narrower one sitting centered on the broader one, the signal will be measured as

for measurement over . The last two terms represent the area in the top of the Gaussian distortion which is included by virtue of the patch at .
The measured equivalent width becomes

Figure 7: Ratio of measured equivalent width to true equivalent width against , the signal dispersion in units of scan length, for a Gaussian signal sitting at the peak of a continuum whose Gaussian FWHM = 0.628 and b = 1.0. The baseline has been estimated as described in the text, using a filter width to produce (in the absence of signal) a 1% deviation of the filtered baseline from the true baseline. The patch width and the width over which the flux is measured is . The height of the signal Gaussian is h(b+1), and the curves are computed for different signal strengths with values of h as shown

The computed results (with which these approximations agree) are shown in Fig. 7 (click here) to Fig. 11 (click here). In Fig. 7 (click here), the error dependence on width of signal is shown for a given member of the continuum family in Fig. 5 (click here). For small and narrow signals, the error is almost totally due to the 1% baseline deviation. The error drops slightly as signal width increases, the total flux of the signal dominating the extra signal. But as signal width increases further, the error rises rapidly as the patching process includes some of the peak of the Gaussian scan distortion in the total flux estimation. For strong signals the situation is completely different; at small the estimate is accurate as the 1% baseline error is negligible. But as increases, the ratio drops below unity because the patch is not at the base of the signal; the excess signal now pulls up the (relatively weak) baseline, and this overestimate of the baseline dominates the error in measured equivalent width. Hence for very strong sources, as increases, the equivalent width becomes progressively underestimated. These effects are in general identifiable, depending on the signal-to-noise ratio (Fig. 6 (click here)).

Figure 8: Ratio of measured to true equivalent width versus width of signal in units of scan-length. The height parameter of the Gaussian signal (total height = h(b+1)) is fixed at h=0.5, and the patch width is fixed at . The scale-length of the Gaussian distortion on the unit dc baseline is varied: FWHM = 0.25 scan-length, triple-dot - dash lines; FWHM = 0.395, solid lines; FWHM = 0.628, dashed lines; FWHM = 0.995, dot-dash lines; and FWHM = 1.577 dotted lines. The amplitude of the distortion is varied, with b taking five values, -.25, -.5, 1.0, 2.0 and 4.0, marked against the curves. There is a small asymmetry of 0.5% about the value of 1.0, a second-order effect due to the patch pulling the filtered baseline towards it

Figure 9: As for Fig. 8, except that the vertical scale is much expanded to show what happens for the continua of lesser distortion. The curves are for Gaussian distortions on unit baselines of FWHM = 0.395 scan length, solid lines; FWHM = 0.628, dashed lines; FWHM = 0.995, dot-dash lines; FWHM = 1.577, dotted lines; and FWHM = 2.50, triple-dot - dash lines

In Fig. 8 (click here), a fixed value of h, signal height, was adopted, and the error is shown again as a function of signal width. Here the effects of the different distortions of baseline are shown, i.e. a representative set of the continuum family of Fig. 5 (click here) is introduced. The central gap is the effect of the 1% baseline fit; for a convex scan (positive b) it is to overestimate the equivalent width, while for a concave scan (negative b), it is to underestimate. For smaller scale-length and for the larger b values, the measured equivalent width deviates rapidly from the true value because of error in continuum estimate and (more important) the inclusion of baseline in the signal estimate. These effects diminish with decreasing b and with increased FWHM. The error is drastically less for members of the family which curve gently, and Fig. 9 (click here) shows scales expanded about the central gap to illustrate the dependence of errors in these situations.

Figure 10: Ratio of measured equivalent width to true equivalent width as a function of patch width. The Gaussian signal, of standard deviation , 0.01 or 0.02 as marked, is positioned at the crest of the distorted baseline with b=1.0 and FWHM = 0.628. The dot-dash curves are for signal height h=0.1, the dotted curves for h=0.5, and the solid curves for h=10.0. The minimum components have been chosen to give a 1% maximum baseline error

Figure 11: Ratio of measured equivalent width to true equivalent width as a function of distance along scan. Previous error curves have all been computed with the signal centered at the max/min of the baseline. Here the Gaussian signal is moved along the scan from the edge to the centre (at 0.5). The single continuum model adopted has a Gaussian of height b = 1.0, FWHM = 0.628 sitting on a dc level of 1.0. The curves shown are for Gaussian signals of h = 0.1 (solid lines), 0.5 (dashed lines), 1.0 (dash-dot lines), and 5.0 (dotted lines). Patch width and flux measurement is over in each case. The two lines for each signal height are for (upper) and 0.01. The minimum components have been chosen to give a 1% maximum baseline error

Figure 12: The spectrum of 3C 67 (Laing et al. 1994) to which bandpass filtering has been applied. The lower panel shows the results of removing low-frequency components, i.e. the continuum, by the process described above, and by removing the highest-frequency components to improve signal-to-noise ratio

Figure 13: A portion of the optical spectrum of 3C 191 showing the broad emission line of MgII at 2800 Å (rest frame), cut by a narrow absorption line. The upper panel shows the emission line together with a baseline determined as for 3C 47; the lower panel shows a ``baseline" in which (rather than ) was used in the Gaussian taper, constructing a ``baseline" from many more low-frequency components. From these two ``continuum" estimates, parameters (equivalent width, etc.) of both the absorption and the emission line could be measured

Figure 14: The spectrum of 3C48, with baseline assessment by a fully objective procedure. The smooth lines show baseline iterations: 1 - dotted; 2 - dot-dash; 20 - dashed; 150 - full

Figure 10 (click here) illustrates the effect of varying the patch width. For the weaker sources (h=0.1 and 0.5) the equivalent width is consistently overestimated for convex baselines. The overestimate increasing dramatically and monotonically with patch width because the patch lowers the effective continuum used to estimate the equivalent width while increasing the amount of continuum erroneously included in measurement of signal. For strong sources the ratio becomes insensitive to patch width as the continuum appears almost flat to the source. The dominant effect is for short patches; as the width shrinks below , signal is underestimated while the patch becomes placed relatively high into the signal, raising the estimated continuum significantly with respect to the true continuum. Both effects reduce the measured equivalent width.

The following points emerge from the error analysis.

Except for very narrow signal, the baseline difference, i.e. the error in continuum assessment due to minimum-component fitting, will not dominate errors.
The patch width is critical. For noisy situations in which the signal is relatively weak, it is imperative to choose a patch width which is as narrow as possible. Gaussians cut off quickly and for the weaker signals, patch and flux measurement should be confined to , as determined from as accurate an estimate of as possible. For strong signals, the patch should be significantly broader.
For even the weakest apparent signals it is crucial to patch over the region in which signal measurement is carried out. This may seem self-evident, but when signal is weak (e.g. Fig. 6 (click here)e) it is tempting to fit a minimum-component continuum with minimal patching because the fit looks satisfactory. This is not so; the error introduced in the flux measurement may far exceed the noise uncertainty.

Two things must be borne in mind when using the present results to estimate errors on equivalent widths. Firstly, the worst-case situation has been examined in which the signal sits at the point of maximum inflection, at the centre of the continuum models. Figure 11 (click here) shows that for the Gaussian model continua on average the magnitude of the errors is about half these maximum values, the error on the linear flanks of the Gaussian being close to zero. Secondly, the curves have been computed on the basis of ``1%" continuum fits. These too are pessimistic; experience shows that the minimum-component baselines are generally more accurate than this.

The analysis indicates how rapidly errors in equivalent widths can escalate with non-linear continua, even when the procedures for continuum assessment and signal measurement are well defined. When yet broader wings are involved, the errors produced will be substantially greater. The analysis goes some way to explaining why estimates of line-fluxes in the literature can differ by a factor of two, even with reasonable signal-to-noise.

3.3. Further possibilities

Harmonic analysis for scans (data-series) is powerful and versatile. Having assessed the harmonic content via an FT, formal techniques can be developed on the basis e.g. of known instrumental parameters to apply low and high-frequency filtering automatically to the data both to remove (or assess) the continuum and to improve signal-to-noise ratio. This is bandpass filtering, and an example is shown in Fig. 12 (click here). And indeed there are further uses of the methodology, Fig. 13 (click here) showing as an example the evaluation of the equivalent width of an absorption line in the midst of a strong emission line.

In addition the technique advocated here may be automated if some assumptions about the signal are made. For instance, if all signals are unresolved, an iterative procedure may be developed using differences between first approximations to a baseline and the original data to decide upon regions to patch; patch widths are then simply the instrumental resolution width. A second iteration is then carried out with the new baseline to decide if additional regions need patching. For the more general case in which the signal is resolved, different algorithms may be appropriate. Figure 14 (click here) shows an example. As before, a ``baseline array" is formed from which a baseline is constructed from the few lowest-spatial-frequency components. In the first instance, the baseline array is set as the data array, and the first iteration consists of forming a baseline from the lowest-spatial-frequency components. Each subsequent iteration consists of finding the largest difference between the previous baseline and the data array, and then replacing the data in the baseline array with the data from the previous baseline iteration, and half-width of the instrument profile about this point. A different region of replacement was demanded for all subsequent iterations. The algorithm is very inefficient, but effective for virtually all the spectra tried so far, with the exception of spectra for which very strong broad emission/absorption lines occur at the ends of the scans. But almost all procedures struggle under this circumstance. Many different algorithms could be adopted to improve efficiency: broader patches, a line-list as a starting point, etc. There is resemblance in such procedures to the CLEAN technique (Högbom 1974) used in radio astronomy synthesis mapping.

Acknowledgements

I am grateful to Robert Laing, Charles Jenkins and Steve Unger for permission to use data before publication, and to Pierre Maxted for supplying me with the digitized version of the observation of RZ Cas. I appreciated helpful comments on drafts by David Carter, Charles Jenkins and a referee.

Up: Minimum-component baselines: Fourier