Temperature anisotropies are
described by a 2-dimensional *random*
field
,
where
is
a unit vector on the sphere. This means
we imagine that the
temperature at each point has been randomly
selected from an underlying probability
distribution, characteristic of
the mechanism generating the perturbations
(e.g., Inflation).
It is convenient to expand the field in spherical
harmonics:

(1) |

For Inflation generated perturbations, the coefficients

This latter equation defines the

(3) |

where

Observationally, one works with sky brightness
integrated over the experimental beam

(4) |

where

for the beam-smeared correlation function, or covariance between experimental beams separated by . The beam harmonic coefficients,

(6) |

with . For example, for a Gaussian beam, and .

Given these relations and a CMB map, it is now
straightforward to construct the likelihood function,
whose role is to relate the
observed sky temperatures, which we arrange in a
*data vector* with elements
,
to the model parameters,
represented by a *parameter vector*
.
As advertised, for *Gaussian* fluctuations
(with Gaussian noise) this is simply a multivariate
Gaussian:

The first equality reminds us that the likelihood function is the probability of obtaining the data vector given the model as defined by its set of parameters. In this expression, is the pixel covariance matrix:

where the expectation value is understood to be over the theoretical ensemble of all possible universes realisable with the same parameter vector. The second equality separates the model's pixel covariance, , from the noise induced covariance, . According to Eq. (6), . The parameters may be either the individual

Many experiments report temperature
*differences*; and even if the starting
point is a true map, one may wish to subject it
to a linear transformation in order to define
bands in *l*-space over which power estimates
are to be given. Thus, it is useful to generalize
our approach to arbitrary homogeneous, linear
data combinations, represented by a transformation
matrix :
.
Since the transformation is linear, the
new data vector retains a multivariate
Gaussian distribution (with zero mean), but
with a modified covariance matrix:
.
As a consequence, the transformed pixels,
,
may be treated
in the same manner as the originals, and
so we will hereafter use the term *generalized
pixels* to refer to the elements of a general
data vector which may be either real sky pixels or
some transformed version thereof.
The elements of the new theory covariance matrix
are (using the summation convention)

where . The

An example is helpful. Consider a
simple, single difference
,
whose *variance* is given by
.
This may be written in terms of multipoles as

(10) |

identifying the diagonal elements of as the expression in curly brackets. Notice that the power in this variance is localized in

Band-powers are defined via Eq. (9).
One reduces the set of *C*_{l} contained within the
window to a single number by adopting a spectral form.
The so-called *flat band-power*,
,
is established by
using
,
leading to

In this fashion, we may write Eq. (7) in terms of the band-power and treat the latter as a parameter to be estimated. This then becomes the band-power likelihood function, . One obtains the points shown in Fig. 1 by maximizing this likelihood function; the errors are typically found by in a Bayesian manner, by integration over with a uniform prior. Notice that the variance due to the finite sample size (i.e., the sample variance, but also known as cosmic variance when one has full sky coverage) is fully incorporated into the analysis - the likelihood function "knows'' how many pixels there are.

An important remark at this stage concerns the construction of Fig. 1. We see here that this figure is only valid for Gaussian perturbations, because it relies in Eq. (7), which assumes Gaussianity at the outset. If the sky fluctuations are non-Gaussian, then these estimates must all be re-evaluated based on the true nature of the sky fluctuations, i.e., the likelihood function in Eq. (7) must be redefined. The same comment applies to any experiment which has an important non-Gaussian noise component - the likelihood function must incorporate this aspect in order to properly yield the power estimate and associated error bars.

What is the *raison d'être* for
these band powers? The likelihood function is clearly
greatly simplified if we can find a
transformation
which diagonalizes
(signal plus noise). This can be done
for a given model, but because
depends on the model parameters, there is in
general no unique such transformation valid
for all parameter values.
The one exception is for an ideal experiment
(no noise, or uniform, uncorrelated noise) with
full-sky coverage - in this case
the spherical harmonic transformation
is guaranteed, by Eq. (2),
to diagonalize
for any and all values of the model
parameters. This linear transformation is
represented by a matrix
,
where
*i*= *l*^{2} + *l* + *m* + 1 is
a unidimensional index for the pair (*l*,*m*).
It is the role of band-powers to approximately
diagonalize the covariance matrix in
more realistic situations, where sky
coverage is always limited and noise is
never uniform (and sometimes correlated),
and in such a way as to concentrate the
power estimates in as narrow bands as
possible. Since this is not possible for
arbitrary parameter values,
in practice one adopts a fiducial model
(particular values for the parameters)
to define a transformation which compromises between the desires
for narrow and independent bands
(Bond 1995; Tegmark et al. 1997; Tegmark 1997;
Bunn & White 1997).

Consider, then, a situation in which
the band temperatures (that is, generalized pixels which
are the elements of the general
data vector
)
are independent random variables
(
is diagonal) and that the experimental noise
is spatially uncorrelated and uniform:

where is the model-predicted variance and is the constant noise variance. For simplicity, we assume that all diagonal elements of are the same, implying that is a constant, independent of

(independent of

as follows from maximizing the likelihood function

Notice that this is

It clearly peaks at . Thus, in this ideal case, we have a simple band-power likelihood function, with corresponding best estimator, , given by Eq. (14).

Although not immediately relevant to our present
goals, it is all the same instructive to consider the *distribution* of
.
This is most easily done by noting that the quantity

(16) |

is -distributed with degrees of freedom. We may express the maximum likelihood estimator for the band-power in terms of this quantity as

From , we see immediately that the estimator is unbiased

Its variance is

explicitly demonstrating the influence of sample/cosmic variance (related to ).

All the above relations are *exact*
for the adopted situation - Eq. (16)
is the *complete* likelihood function for the band-power
defined by the *generalized* pixels satisfying
Eq. (12). Such a situation could
be practically realized on the sky by observing well
separated generalized pixels to the same noise level;
for example, a set of double differences scattered
about the sky, all with the same signal-to-noise.
This is rarely the case, however, as scanning strategies
must be concentrated within a relatively small
area of sky (one makes maps!). This creates important
off-diagonal elements in the theory covariance
matrix ,
representing correlations between
nearby pixels due to long wavelength perturbation
modes. In addition, the noise
level is quite often not uniform and sometimes
even correlated, adding off-diagonal elements
to the noise covariance matrix. Thus, the simple
form proposed in Eq. (12) is
never achieved in actual observations.
Nevertheless, as mentioned, even in this case one
could adopt a fiducial theoretical model
and find a transformation
which diagonalizes the full covariance
matrix ,
thereby regaining one important
simplifying property of the above ideal situation.
The diagonal elements
of the matrix are then its eigenvalues.
Because of the correlations in the original
matrix, we expect there to be fewer significant
eigenvalues than generalized pixels; this
will be relevant shortly. One could then
work with a reduced matrix consisting
of only the significant eigenvalues, an
approach reminiscent of the signal-to-noise
eigenmodes proposed by Bond (1995), and
also known as the Karhunen-Loeve transform
(Bunn & White 1997; Tegmark et al. 1997).
There remain two technical difficulties:
the covariance matrix does not remain
diagonal as we move away from the adopted fiducial
model by varying
- only when this band-power
corresponds to the fiducial model is the
matrix really diagonal. The second
complicating factor is that the eigenvalues
are not identical, which greatly simplified
the previous calculation.

All of this motivates us to examine the possibility
that a likelihood function of
the form (16) could be applied,
with appropriate redefinitions of
and
.
We therefore proceed by renaming
these latter
and ,
respectively,
and treating them as parameters to be adjusted
to best fit the full likelihood function.
Thus, given an actual band-power estimate,
(i.e., an experimental result), *we propose
*
*as an ansatz for the band-power likelihood function,
with parameters ** and *:

We have only two parameters - and - to determine in order to apply the ansatz. This can be done if two confidence intervals of the complete likelihood function are known in advance. For example, suppose we were given both the 68% ( & ) and 95% ( & ) confidence intervals; then we could fix the two parameters with the equations

We shall see in the next section (Figs. 2-7) that this produces excellent approximations. This is the main result of this

Figure 3:
Comparison to the Saskatoon Q-band
1995 10-point difference. The line-styles are
the same as in the previous figure; here
for the dot-dashed (blue) line |

Unfortunately, most of the time only the 68% confidence interval is reported along with an experimental result (we hope that in the future authors will in fact supply at least two confidence intervals). Is there any way to proceed in this case? For example, one could try to judiciously choose and then adjust with Eq. (19). The most obvious choice for would be , although from our previous discussion, we expect this to be an upper limit to the number of significant degrees-of-freedom (the significant eigenvalues of ), due to correlations between pixels. The comparisons we are about to make in the following section show that a smaller number of effective pixels (i.e., value for ) is in fact required for a good fit to the true likelihood function. One could try other games, such as setting (scan length)/(beam FWHM) for unidimensional scans. This also seems reasonable, and certainly this number is less than or equal to the actual number of pixels in the data set, but we have found that this does not always work satisfactorily. The availability of a second confidence interval permits both parameters, and , to be unambiguously determined and in such a way as to provide the best possible approximation with the proposed ansatz.

Bond et al. (2000) have recently examined the nature
of the likelihood function and discussed two
possible approximations. The form of the ansatz just presented is
in fact identical to one of their proposed approximations,
parameterized by *x* and *G*. These parameters
are simply related to our
and
as follows:
and .

Notice that the above development and motivation
for the ansatz essentially follow for a single band-power.
A set of uncorrelated power estimates is then easily treated
by simple multiplication. However, the approximation as
proposed does not simultaneously account for several
*correlated* band-powers, and it's accuracy is
therefore limited by the extent to which such inter-band
correlations are important in a given data set.
As a further remark along these lines,
we have noted that flat-band estimates of any kind,
be it from a complete likelihood analysis or not, do
not always contain all relevant experimental information,
(Douspis et al. 2000); any method based on their use
is then fundamentally limited by nature of the lost
information.

The only way to test the ansatz is, of course, by direct comparison to the full likelihood function calculated for a number of experiments. If it appears to work for a few such cases, then we may hope that it's general application is justified. We now turn to this issue.

Copyright The European Southern Observatory (ESO)