2. The spatial and spectral MEM algorithm

2.1. The concept

Although the data used in subsequent sections were obtained with a linear array of three antennas as noted above, our ultimate goal for the algorithm is to apply it to data from the 5-element solar interferometer at Owens Valley Radio Observatory (OVRO). This array currently consists of two 27-m antennas and three 2-m antennas with frequency-agile receivers, and is capable of generating 2-dimensional maps of solar sources at 45 frequencies in the range . See Gary & Hurford (1994) for an example of observations with the complete five-element array. The algorithm is thus being developed to handle fully three-dimensional data (one spectral and two spatial dimensions). Since this is the more general case, in this subsection we conceptually describe the complete 3-d algorithm. We implement and apply it to the simpler case of one spatial dimension in subsequent sections.

A typical interferometer array measures the spatial Fourier components of the brightness distribution at each frequency, , and the observations consist of data pairs of flux amplitude, , and phase, , as a function of frequency, , and baseline, b. Each data pair represents one point in a complex visibility plane:

The image of the observed radio source or its ``true" map, , as a function of spatial position and frequency has to be positive everywhere (continuum emission), and is related to the data via a Fourier transform.

with
eqnarray228

eqnarray234
where k is the Boltzmann constant, x and y are the spatial coordinates and and are the corresponding spatial frequencies, which are functions of observing frequency, , and projected baseline lengths, and . For spatially 1-d data (v=0), this relation reduces to

The measurement consists of a finite number of generally unevenly spaced points in the Fourier domain. To reconstruct an image of the radio source, unmeasured Fourier components have to be taken into account. If they are simply set to zero, the resulting ``dirty map" will show strong sidelobes, which can make it difficult to interpret the image. Thus, our objective is to approximate the true image with more likely values than zero for the unmeasured components, but without violating the data constraints; a general discussion of the imaging problem for radio data can be found in Christiansen & Hogbom (1985) and Cornwell & Braun (1989). Since the antennas of the OVRO solar array are equipped with frequency-agile receivers and existing algorithms such as CLEAN (Högbom 1974) or standard MEM do not take advantage of the spatial information available at adjacent frequencies, we wish to develop an algorithm that obtains a global solution to the visibilities in both the spatial and spectral domains.

For this task, following the usual procedure for a MEM algorithm, we chose a maximum entropy method which allows us to reconstruct a temperature map consistent with the data with the least amount of added a priori information or with the fewest assumptions. Narayan & Nityananda (1986) give a good description of MEM and its application to radio data, for a more general description of MEM see, for example, Bevensee (1993). The standard least-squares MEM ensures that the resulting image is spatially smooth and that it is positive everywhere. We add to it the information or assumption that a spectrum at a given spatial position changes only smoothly with frequency. Thus, our goal can be defined as finding the temperature map, , which maximizes the following objective function:

where and are the spatial and the spectral entropy terms, is the data constraint, and are two (adjustable) Lagrange multipliers and n is the iteration index.

As in standard MEM, the data constraint is expressed as a least-squares term

where and are the measured data and and are calculated from the iterated temperature map using Eqs. (3) and (4), is the number of frequencies and is the number of baselines. The squared deviation is weighted by the measurement error .

The spatial MEM term is defined, again as in standard MEM, as follows

with map temperature, T, being a function of spatial position and frequency.

The traditional least-squares MEM maximizes the spatial entropy term, , subject to the data constraints, . In our modified algorithm, we include a spectral entropy term, , which we define in analogy to the spatial term

with

The interpolated temperature, , is a function of spatial position and frequency and is logarithmically interpolated from the temperatures at the two neighboring frequencies and at the same spatial position

This can be alternatively written as
eqnarray331
or further simplified to

In our implementation, we use Eq. (13) to calculate and as can be easily seen in Eq. (14) the exponents are only frequency dependent and can be calculated outside the iteration loop. The difference (Eq. (11)) is small if the map is smooth in frequency and large otherwise, thus ensuring spectral smoothness. Since this difference can be exactly zero, we added a constant of unity so that is zero when equals .

To maximize the objective function we have to calculate its derivative with respect to the temperature, T, at each spatial position and frequency. This equation is the central part of the algorithm. We like to point out that throughout the rest of this paper, as in the actual algorithm, we use an equivalent formalism and minimize the ``negentropy", -S, instead of maximizing the entropy.

The data constraint derivative has the following form
eqnarray382
with and being the pixel size in the x- and y-direction. For 1-d data (v = 0), the data constraint derivative reduces to
eqnarray396

The derivative of the spatial entropy term is simply

The derivative of the spectral entropy term consists of three terms similar to Eq. (18) since T contributes to , , and . The terms and are easily derived from Eq. (14). The superscript (n) is omitted for clarity.
eqnarray426
with

eqnarray443

eqnarray455
where is one frequency up from and is one frequency down from . This means that while is determined from its nearest two neighbors in frequency, is calculated using the nearest four neighbors. At the ends of the frequency range, the highest and lowest frequencies, and , have to be treated separately. The ratios of logarithms of frequencies in Eqs. (21) and (22) are about and are easily extrapolated for the highest and the lowest frequencies by assuming that the spacing between frequencies stays the same. At the upper boundary, we substitute the values of and at for the ones at the frequencies and , and at the lower boundary, we substitute their values at for the ones at the frequencies and .

In standard MEM, the choice of Lagrange multiplier is problematic, and generally must be done with trial and error for a particular application. Our algorithm compounds the problem by introducing an additional parameter . We now discuss the choice of these Lagrange multipliers for this application. The reconstruction starts with an initial temperature map, such as a ``flat" map or a model of the source, and ends with a final map which is the reconstructed image. The Lagrange multipliers, and , have to balance the three terms of in Eq. (15). The spatial entropy term, , is always positive, while the spectral entropy term, , can be either positive, negative or zero. The data term, , is negative if the initial map is a flat map. Thus, the temperature, T, will increase if is negative and decrease if is positive. The initial map is chosen to be smooth everywhere. Therefore, the spectral entropy term will be zero at the first iteration and will remain very small at the first few iterations, and, as a consequence, can be determined from the initial map and the data constraint.

The parameter, , has to be small enough to allow the temperature to increase from the initial value at any point in the map, where the source of interest is present, while should also be large enough to ensure that the temperature stays at the initial value or increases only slightly at other points. If the initial map is a flat map characterized by a single temperature , this requirement leads to the following relation

for all points in the map where the source of interest is present. This relation provides an upper limit for and since the objective is to reproduce the source and at the same time to reduce ripples due to unmeasured Fourier components, this is the optimum value to use.

For the parameter , an analogous relation to Eq. (24) is not useful because most of the iterated maps would have to be taken into account, not just the initial one. To determine the best value of , we reconstructed several maps using different values of with the same , derived from Eq. (24), and then compared the spectra of the different maps. As an initial value of , we note that the term is most likely smaller than since is the difference of two temperatures, thus is a good lower limit. For a small , the reconstruction will not differ much from a reconstruction using the spatial entropy term alone. With increasing , the sidelobes will be steadily reduced, and the spectra will appear steadily smoother, but the overall spectral shape should remain the same. When gets too large, however, the spectral slopes begin to decrease due to overly weighting the spectral smoothness parameter. We have found that this change in spectral character is very pronounced and easily discerned. The optimum value of is then the largest value before which this artificial flattening occurs. We do not yet have enough experience to say whether the optimum value of will be similar for all sources (and spectral shapes), or whether the above procedure must be followed for each individual data set.

2.2. The 1-d data

We apply our 1-d algorithm to observations taken with the OVRO frequency-agile interferometer of active region AR 5417 near the solar limb on March 20, 1989 (vernal equinox) using the two 27-m antennas and the 40-m antenna arranged in a linear east-west array with the two 27-m dishes at 122 and 488 m east and the 40-m dish at 1066 m east. The array was operated with an observing sequence that sampled 45 frequencies in the range 1 - 18 GHz, in both the right- and left-handed circular polarization every 10 s. The amplitudes and phases were calibrated with respect to our primary flux calibrator, 3C 84, which in turn was calibrated relative to the flux standard 3C 286. Unfortunately, the receiver on the 40-m antenna experienced a hardware problem that resulted in no phase lock above 10 GHz, and there was external interference at 1.0 GHz, so the data presented here range from .

For each of the three baselines, we analyze the observations at the 31 frequencies in the range taken between 21:05 UT and 23:58 UT. The projected baselines change during this time due to Earth rotation, resulting in fringe spacings that range from 6.8'' at 10 GHz to 276'' at 1.2 GHz. By dividing the data into 80 time samples, we get Fourier points at each observed frequency. To adequately map the active region, we use a map size of 256'' (with a shift of 23'' from the nominal phase center to bring the solar limb to the center of the map) and a spatial resolution or pixel size of 1''. The lowest frequency determines the necessary map size, while the highest frequency determines the required spatial resolution.

Since the active region was located close to the solar limb, we have to correct for the interferometer response to the solar limb. For this purpose, we calculated its response to a uniform disk of the size of the Sun when pointed to the edge of the disk. We used the quiet Sun temperatures measured by Zirin et al. (1991) as the frequency-dependent disk temperature. The resulting amplitudes were generally less than 0.1 sfu (solar flux units) which is within the assumed measurement error (see Eq. (27) below). We subtracted the limb response from the observed data and used the corrected data pairs for the reconstruction.

2.3. The 1-d implementation

In this section, we describe the current implementation of our MEM algorithm and give parameter values used to reconstruct an image of active region AR 5417. Different implementations of the MEM algorithm have their advantages and disadvantages, as discussed by Narayan & Nityananda (1986). After some testing of different schemes, we chose the following method for its speed.

We start with a uniform ``flat" map which has the same temperature at all frequencies and spatial positions. For each subsequent iteration n, we evaluate the gradient at each point of the map (Eq. (15)) and according to its sign we either increase or decrease the temperature by a fraction of its value
equation532

The gain, , is a function of iteration n only, while and are, in addition, functions of spatial position x and frequency . We decrease exponentially with n to make sure that changes in T are getting smaller the closer T gets to the actual temperature value.
eqnarray553
where , , and are parameters to be specified.

The parameters have to be chosen so that any real solar temperature is within the range of reconstructable temperatures; i.e the initial temperature, , has to be much smaller than any solar temperature and the largest possible iterated temperature has to be much larger than the largest solar temperature. In addition, the parameters have to be chosen so that the increase at the beginning is not too large in order not to ``freeze in" an early iteration; a large requires a small . We set the final gain, , to 0.001 and we use the following additional parameter values of , , to fulfill the constraints.

For the measurement error, (in sfu), we use the following relation

with being the amplitude.

To estimate the Lagrange multiplier for the OVRO data of AR 5417, we used Eq. (24), calculated for the whole map, and from a dirty map reconstruction estimated the position of the source in the map. This led to the following relation at spatial positions where a source is present:
eqnarray580
The chosen flat map temperature adds the factor . The estimated value of is then

With this value of , we calculated several reconstructions using different values of . We found that was too small to yield improvements beyond those for a reconstruction using the spatial entropy term alone and that eliminated all ripples but flattened the spectra at high frequencies. Thus, the ``best" value of for the analyzed data was chosen to be

After the MEM reconstruction was done, the reconstructed map was convolved with the so-called ``clean" beam, a Gaussian fit to the inner part of the actual beam, to limit the effect of ``superresolution" (e.g. Cornwell & Braun 1989). All imaging algorithms produce some degree of superresolution since they estimate unmeasured Fourier components; i.e. the reconstructed image shows structures on spatial scales smaller than the fringe spacing of the largest baseline. A superresolution of a factor two or less is generally acceptable (cf. Narayan & Nityananda 1986). In our case, we have an additional complication due to the added spectral term which causes the reconstructed map to be spatially not as smooth as it would be without this term, which produces ``signal" at high spatial frequencies. We compared spatial power spectra of the convolved restored map with power spectra of the corresponding dirty map and found that the largest spatial frequency present in the reconstructed map is comparable to the one in the dirty map within a factor of 1.5 or less. This implies that convolving with the clean beam sufficiently limits the influence of superresolution and that at the same time its effect is limited enough to preserve the spatial information present in the original data.

In our initial algorithm, used to create the maps in this paper, we directly compute the Fourier transform (Eqs. (5) and (6)) which has the advantage of simplicity over the Fast Fourier Transform (FFT) in that it does not require gridding of the visibilities (see Thompson et al. 1991). However, in looking ahead to the 2-d version of the algorithm, execution time becomes more important than simplicity. Therefore, we have extended our 1-d algorithm to perform gridding and employ the FFT. Because gridding of multifrequency data introduces special problems, we briefly outline the gridding process here. The gridding process is described mathematically in terms of convolution and resampling, where an appropriate convolution function has to be chosen. A simple choice for a convolution function, and the one we employ, is a rectangular function (cell averaging); the (uv) plane is divided into rectangular cells of dimension by centered on the grid point and all measurements within each cell are assigned to the grid point.

Figure 1: A white-light image and a magnetogram of active region AR 5417 near the solar limb on March 20, 1989 (vernal equinox) from BBSO. The radio image at 3.6 GHz (covering 256'') is superposed for comparison

Figure 2: The CLEAN reconstruction. The spatial information encoded in the spectrum is not used. The contours are at logarithmic intervals to bring out the lower level sidelobes. (contour 4: , contour 5: , contour 6: )

Figure 3: The spatial MEM reconstruction. The spatial information encoded in the spectrum is not used. The contours are at logarithmic intervals to bring out the lower level sidelobes. (contour 4: , contour 5: , contour 6: )

Figure 4: The reconstructed map using the spatial/spectral MEM algorithm. The contours are at logarithmic intervals to show an improvement in sidelobe level in comparison with Figs. 2 and 3. (contour 4: , contour 5: , contour 6: )

Figure 5: The brightness temperature maps at frequencies a) 2.0 GHz, b) 5.0 GHz, c) 7.0 GHz, and d) 9.0 GHz of the spatial/spectral MEM reconstruction (filled gray areas), the spatial MEM (solid line) and the CLEAN reconstruction (dotted line)

The problem introduced with gridding of multifrequency data lies in its smoothing effect on the visibilities. For example, if we divide the spatial frequencies between 0 and the Nyquist frequency, , into 512 grid points and use a pixel size of 1'' (), the observations occupy 106 grid points at 10 GHz and only 15 at . To reduce the amount of smoothing at lower frequencies, we divide our frequency range into three ranges with different spatial resolution and grid each range separately. We choose a low-frequency range () with a pixel size of 4'', a mid-frequency range () with 2'', and a high-frequency range () with 1'' to ensure that the smoothing introduced by the gridding process remains within a factor of two to five. The combined effect of using FFT instead of direct Fourier transform, gridding and dividing into ranges with different spatial resolution reduces the execution time of the MEM algorithm by a factor of , for the same 1-d algorithm. The gain is even greater when extended to 2-d.

Up: A spatial and