2 Likelihood method

2.1 Generalities

Temperature anisotropies are described by a 2-dimensional random field $\Delta(\hat{n})\equiv (\delta T/T) (\hat{n})$ , where $\hat{n}$ is a unit vector on the sphere. This means we imagine that the temperature at each point has been randomly selected from an underlying probability distribution, characteristic of the mechanism generating the perturbations (e.g., Inflation). It is convenient to expand the field in spherical harmonics:

$\begin{displaymath}% \Delta(\hat{n}) = \sum_{lm} a_{lm} Y_{lm}(\hat{n}). \end{displaymath}$

(1)

For Inflation generated perturbations, the coefficients a_lm are Gaussian random variables with zero mean - $\langle a_{lm}\rangle_{\rm ens} = 0$ - and covariance

$\begin{displaymath}% \langle a_{lm}a^*_{l'm'} \rangle_{\rm ens} = C_l \delta_{ll'}\delta_{mm'}. \end{displaymath}$

(2)

This latter equation defines the power spectrum as the set of C_l. The indicated averages are to be taken over the theoretical ensemble of all possible anisotropy fields, of which our observed CMB sky is but one realization. Since the harmonic coefficients are Gaussian variables and the expansion is linear, it is clear that the temperature values on the sky are also Gaussian, and they therefore follow a multivariate Gaussian distribution (with an uncountably infinite number of variables, one for each position on the sky). The covariance of temperatures separated by an angle $\theta$ on the sky is given by the correlation function

$\begin{displaymath}% C(\theta) \equiv \langle \Delta(\hat{n}_1)\Delta(\hat{n}_2)\rangle_{\rm ens} = \frac{1}{4\pi}\sum_{l} (2l+1) C_l P_l(\mu) \end{displaymath}$

(3)

where P_l is the Legendre polynomial of order land $\mu = \cos\theta = \hat{n}_1\cdot\hat{n}_2$ . The form of this equation, which follows directly from Eq. (2), is dictated by the statistical isotropy of the perturbations - the two-point correlation function can only depend on separation.

Observationally, one works with sky brightness integrated over the experimental beam

$\begin{displaymath}% \Delta_{\rm b}(\hat{n}_{\rm p}) = \int\; {\rm d}\Omega \Delta(\hat{n}) B(\hat{n}_{\rm p},\hat{n}) \end{displaymath}$

(4)

where B is the beam profile and $\hat{n}_{\rm p}$ gives the position of the beam axis. The beam profile may or may not be a sole function of $\hat{n}_{\rm p}\cdot\hat{n}$ , i.e., of the separation between sky point and beam axis; if it is, then this equation is a simple convolution on the sphere, and we may write

$\displaystyle % C_{\rm b}(\theta) \equiv \langle \Delta_{\rm b}(\hat{n}_1)\Delta_{\rm b}(\hat{n}_2)\rangle_{\rm ens}$	=	$\displaystyle \frac{1}{4\pi} \sum_l (2l+1) C_l$	(5)
		$\displaystyle \hspace*{1cm} \times \vert B_l\vert^2 P_l(\mu)$

for the beam-smeared correlation function, or covariance between experimental beams separated by $\theta$ . The beam harmonic coefficients, B_l, are defined by

$\begin{displaymath}% B(\theta') = \frac{1}{4\pi}\sum_l (2l+1) B_l P_l(\mu') \end{displaymath}$

(6)

with $\hat{n}_{\rm p}\cdot\hat{n} = \cos\theta' = \mu'$ . For example, for a Gaussian beam, $B(\theta) = 1/(2\pi\sigma^2) {\rm e}^{-\theta^2/2\sigma^2}$ and $B_l = {\rm e}^{-l(l+1)\sigma^2/2}$ .

Given these relations and a CMB map, it is now straightforward to construct the likelihood function, whose role is to relate the $N_{\rm pix}$ observed sky temperatures, which we arrange in a data vector with elements $d_i \equiv \Delta_b(\hat{n}_i)$ , to the model parameters, represented by a parameter vector $\overrightarrow {\Theta}$ . As advertised, for Gaussian fluctuations (with Gaussian noise) this is simply a multivariate Gaussian:

$\begin{displaymath}% {\cal L}(\overrightarrow {\Theta})\! \equiv\! {\rm Prob}(\o... ...ightarrow {d}^t \cdot {\bf C}^{-1} \cdot \overrightarrow {d}}. \end{displaymath}$

(7)

The first equality reminds us that the likelihood function is the probability of obtaining the data vector given the model as defined by its set of parameters. In this expression, ${\bf C}$ is the pixel covariance matrix:

$\begin{displaymath}% C_{ij} \equiv \langle d_id_j\rangle_{\rm ens} = T_{ij} + N_{ij} \end{displaymath}$

(8)

where the expectation value is understood to be over the theoretical ensemble of all possible universes realisable with the same parameter vector. The second equality separates the model's pixel covariance, ${\bf T}$ , from the noise induced covariance, ${\bf N}$ . According to Eq. (6), $T_{ij} = C_b(\theta_{ij})$ . The parameters may be either the individual C_l (or band-powers, discussed below), or the fundamental cosmological constants, $\Omega, H_0$ , etc. In the former case, Eq. (6) shows how the parameters enter the likelihood; in the latter situation, the parameter dependence enters through detailed relations of the kind $C_l[\overrightarrow {\Theta}]$ , specified by the adopted model (e.g., Inflation). Notice that if one only desires to determine the C_l, then only the assumption of Gaussianity is required.

Many experiments report temperature differences; and even if the starting point is a true map, one may wish to subject it to a linear transformation in order to define bands in l-space over which power estimates are to be given. Thus, it is useful to generalize our approach to arbitrary homogeneous, linear data combinations, represented by a transformation matrix ${\bf A}$ : $\overrightarrow {d}^\prime = {\bf A}\cdot \overrightarrow {d}$ . Since the transformation is linear, the new data vector retains a multivariate Gaussian distribution (with zero mean), but with a modified covariance matrix: ${\bf C}^\prime = {\bf A}\cdot {\bf C}\cdot {\bf A}^t$ . As a consequence, the transformed pixels, $\overrightarrow {d}'$ , may be treated in the same manner as the originals, and so we will hereafter use the term generalized pixels to refer to the elements of a general data vector which may be either real sky pixels or some transformed version thereof. The elements of the new theory covariance matrix are (using the summation convention)

$\begin{displaymath}% T^\prime_{ij} = A_{im}A_{jn}T_{mn} = \frac{1}{4\pi} \sum_l (2l+1) C_l W_{ij}(l) \end{displaymath}$

(9)

where $W_{ij}(l) \equiv A_{im}A_{jn}P_l(\mu_{mn})\vert B_l\vert^2$ . The window function is usually defined as W_ii(l), i.e., the diagonal elements of a more general matrix ${\bf W}(l)$ . Normally, one tries to find a transformation which leads to a strongly diagonal ${\bf W}(l)$ and diagonal noise matrix (see comment below).

An example is helpful. Consider a simple, single difference $\Delta_{\rm diff} \equiv \Delta_b(\hat{n}_1)-\Delta_b(\hat{n}_2)$ , whose variance is given by $\langle \Delta_{\rm diff}^2\rangle_{\rm ens} = 2[C_{\rm b}(0) - C_{\rm b}(\theta)]$ . This may be written in terms of multipoles as

$\begin{displaymath}% \langle \Delta_{\rm diff}^2\rangle = \frac{1}{4\pi} \sum_l ... ... \left\{ 2 \vert B_l\vert^2 \left[1 - P_l(\mu)\right] \right\} \end{displaymath}$

(10)

identifying the diagonal elements of ${\bf W}$ as the expression in curly brackets. Notice that the power in this variance is localized in l-space, being bounded towards large l by the beam smearing and towards small l by the difference. The off-diagonal elements of ${\bf C}$ depend on the relative positions and orientations of the differences on the sky; in general these elements are not expressible as simple Legendre series.

Band-powers are defined via Eq. (9). One reduces the set of C_l contained within the window to a single number by adopting a spectral form. The so-called flat band-power, $\delta T_{\rm fb}$ , is established by using $C_l \equiv 2\pi (\delta T_{\rm fb})^2/[l(l+1)]$ , leading to

$\begin{displaymath}% {\bf T}= \frac{1}{2} \delta T_{\rm fb}^2 \sum_l \frac{(2l+1)}{l(l+1)} {\bf W}(l). \end{displaymath}$

(11)

In this fashion, we may write Eq. (7) in terms of the band-power and treat the latter as a parameter to be estimated. This then becomes the band-power likelihood function, ${\cal L}(\delta T_{\rm fb})$ . One obtains the points shown in Fig. 1 by maximizing this likelihood function; the errors are typically found by in a Bayesian manner, by integration over ${\cal L}$ with a uniform prior. Notice that the variance due to the finite sample size (i.e., the sample variance, but also known as cosmic variance when one has full sky coverage) is fully incorporated into the analysis - the likelihood function "knows'' how many pixels there are.

An important remark at this stage concerns the construction of Fig. 1. We see here that this figure is only valid for Gaussian perturbations, because it relies in Eq. (7), which assumes Gaussianity at the outset. If the sky fluctuations are non-Gaussian, then these estimates must all be re-evaluated based on the true nature of the sky fluctuations, i.e., the likelihood function in Eq. (7) must be redefined. The same comment applies to any experiment which has an important non-Gaussian noise component - the likelihood function must incorporate this aspect in order to properly yield the power estimate and associated error bars.

What is the raison d'être for these band powers? The likelihood function is clearly greatly simplified if we can find a transformation ${\bf A}$ which diagonalizes ${\bf C}$ (signal plus noise). This can be done for a given model, but because ${\bf C}$ depends on the model parameters, there is in general no unique such transformation valid for all parameter values. The one exception is for an ideal experiment (no noise, or uniform, uncorrelated noise) with full-sky coverage - in this case the spherical harmonic transformation is guaranteed, by Eq. (2), to diagonalize ${\bf C}$ for any and all values of the model parameters. This linear transformation is represented by a matrix $A_{ij} \equiv Y_i(\hat{n}_j)$ , where i= l² + l + m + 1 is a unidimensional index for the pair (l,m). It is the role of band-powers to approximately diagonalize the covariance matrix in more realistic situations, where sky coverage is always limited and noise is never uniform (and sometimes correlated), and in such a way as to concentrate the power estimates in as narrow bands as possible. Since this is not possible for arbitrary parameter values, in practice one adopts a fiducial model (particular values for the parameters) to define a transformation ${\bf A}$ which compromises between the desires for narrow and independent bands (Bond 1995; Tegmark et al. 1997; Tegmark 1997; Bunn & White 1997).

2.2 Motivating an ansatz

Given a set of band-powers, how should one proceed to constrain the fundamental cosmological parameters, denoted in this subsection by $\overrightarrow {\Theta}$ ? If we had an expression for ${\cal L}(\overrightarrow{\delta T_{\rm fb}})$ , for our set of band-powers $\overrightarrow{\delta T_{\rm fb}}$ , then we could write ${\cal L}(\overrightarrow{\delta T_{\rm fb}}) = {\rm Prob}(\overrightarrow {d}\v... ...lta T_{\rm fb}}[\overrightarrow {\Theta}]) = {\cal L}(\overrightarrow {\Theta})$ . Thus, our problem is reduced to finding an expression for ${\cal L}(\overrightarrow{\delta T_{\rm fb}})$ , but as we have seen, this is a complicated function of $\overrightarrow{\delta T_{\rm fb}}$ , requiring use of all the measured pixel values and the full covariance matrix with noise - the very thing we are trying to avoid. Our task then is to find an approximation for ${\cal L}(\overrightarrow{\delta T_{\rm fb}})$ . In order to better understand the general form expected for ${\cal L}(\overrightarrow{\delta T_{\rm fb}})$ , we shall proceed by first considering a simple situation in which we may find an exact analytic expression for this function. We are guided by the observation that the covariance matrix may always be diagonalized around an adopted fiducial model. Although this remains strictly applicable only for this model, we imagine that the likelihood function could be approximated as a simple product of one-dimensional Gaussians near this point in parameter space. If we further suppose that the diagonal elements of the covariance matrix (its eigenvalues) are all identical, we can find a very manageable analytic expression for the likelihood in terms of the best power estimate. We will then pose this general form as an ansatz for more realistic situations, one which we shall test in the following section. We return to these remarks after developing the ansatz.

Consider, then, a situation in which the band temperatures (that is, generalized pixels which are the elements of the general data vector $\overrightarrow {d}'$ ) are independent random variables ( ${\bf C}$ is diagonal) and that the experimental noise is spatially uncorrelated and uniform:

$\displaystyle % C_{ij} = (\sigma_M^2 + \sigma_N^2)\delta_{ij}$

(12)

where $\sigma_M^2$ is the model-predicted variance and $\sigma_N^2$ is the constant noise variance. For simplicity, we assume that all diagonal elements of ${\bf W}$ are the same, implying that $\sigma_M^2$ is a constant, independent of i. We discuss shortly the nature of such a data vector in actual observational set-ups. This situation is identical to one where $N_{\rm pix}$ values are randomly selected from a single parent distribution described by a Gaussian of zero mean and variance $\sigma_M^2 + \sigma_N^2$ . The band-power we wish to estimate is proportional to the model-predicted variance according to (i.e., Eq. 11)

$\begin{displaymath}% \sigma_M^2 = \delta T_{\rm fb}^2 \times \frac{1}{2} \sum_l ... ...1)} W_{ii}(l) \equiv \delta T_{\rm fb}^2 {\cal R}_{\rm band} \end{displaymath}$

(13)

(independent of i), and we know that in this situation the maximum likelihood estimator for the model-predicted variance is simply

$\begin{displaymath}% [\hat{\sigma}_M]^2 = \frac{1}{N_{\rm pix}} \sum_{i=1}^{N_{\... ...ma_N^2 \equiv [\hat{\delta T_{\rm fb}}]^2 {\cal R}_{\rm band} \end{displaymath}$

(14)

as follows from maximizing the likelihood function

$\begin{displaymath}{\cal L}(\sigma_M) = \frac{1}{[2\pi(\sigma_M^2+\sigma_N^2)]^{... ...pix}(\hat{\sigma}_M^2+\sigma_N^2)}{2(\sigma_M^2+\sigma_N^2)}}. \end{displaymath}$

Notice that this is a function of $\sigma_M$ , which peaks at the best estimate $\hat{\sigma}_M$ , and whose form is specified by the parameters $\hat{\sigma}_M$ , $\sigma_N$ and $N_{\rm pix}$ . To obtain the likelihood function for the band-power, we simply treat this as a function of $\delta T_{\rm fb}$ , using Eq. (13), parameterized by $\hat{\delta T_{\rm fb}}$ , $\sigma_N$ and $N_{\rm pix}$ :

$\displaystyle % {\cal L}(\delta T_{\rm fb})$	=	$\displaystyle \frac{1}{[2\pi(\delta T_{\rm fb}^2{\cal R}_{\rm band}+\sigma_N^2)] ^{N_{\rm pix}/2}}$	(15)
		$\displaystyle \hspace*{2cm} \times {\rm e}^{-\frac{N_{\rm pix}(\hat{\delta T_{\... ...{\rm band}+\sigma_N^2)} {2(\delta T_{\rm fb}^2{\cal R}_{\rm band}+\sigma_N^2)}}$

	$\textstyle \equiv$	$\displaystyle G(\delta T_{\rm fb};\hat{\delta T_{\rm fb}},\sigma_N,N_{\rm pix}).$

It clearly peaks at $\hat{\delta T_{\rm fb}}$ . Thus, in this ideal case, we have a simple band-power likelihood function, with corresponding best estimator, $\hat{\delta T_{\rm fb}}$ , given by Eq. (14).

Although not immediately relevant to our present goals, it is all the same instructive to consider the distribution of $\hat{\delta T_{\rm fb}}$ . This is most easily done by noting that the quantity

$\begin{displaymath}% \chi^2_{N_{\rm pix}} \equiv \sum_{i=1}^{N_{\rm pix}} \frac{d_i^2}{\sigma_M^2 + \sigma_N^2} \end{displaymath}$

(16)

is $\chi^2$ -distributed with $N_{\rm pix}$ degrees of freedom. We may express the maximum likelihood estimator for the band-power in terms of this quantity as

$\begin{displaymath}% \hat{\delta T_{\rm fb}}^2 = {\cal R}_{\rm band}^{-1} \lef... ...igma_N^2)}{N_{\rm pix}} \chi^2_{N_{\rm pix}}-\sigma_N^2\right] \end{displaymath}$

(17)

From $<\chi^2_{N_{\rm pix}}>~=N_{\rm pix}$ , we see immediately that the estimator is unbiased

$\begin{displaymath}<\hat{\delta T_{\rm fb}}^2>_{\rm ens} = {\cal R}_{\rm band}^{-1}\sigma_M^2 = \delta T_{\rm fb}^2. \end{displaymath}$

Its variance is

$\begin{eqnarray*}\nonumber {\rm Var}(\hat{\delta T_{\rm fb}}^2) & = & {\cal R}_{... ...{\cal R}_{\rm band}^{-2} (\sigma_M^2 + \sigma_N^2)^2/N_{\rm pix} \end{eqnarray*}$

explicitly demonstrating the influence of sample/cosmic variance (related to $N_{\rm pix}$ ).

All the above relations are exact for the adopted situation - Eq. (16) is the complete likelihood function for the band-power defined by the generalized pixels satisfying Eq. (12). Such a situation could be practically realized on the sky by observing well separated generalized pixels to the same noise level; for example, a set of double differences scattered about the sky, all with the same signal-to-noise. This is rarely the case, however, as scanning strategies must be concentrated within a relatively small area of sky (one makes maps!). This creates important off-diagonal elements in the theory covariance matrix ${\bf T}$ , representing correlations between nearby pixels due to long wavelength perturbation modes. In addition, the noise level is quite often not uniform and sometimes even correlated, adding off-diagonal elements to the noise covariance matrix. Thus, the simple form proposed in Eq. (12) is never achieved in actual observations. Nevertheless, as mentioned, even in this case one could adopt a fiducial theoretical model and find a transformation ${\bf A}$ which diagonalizes the full covariance matrix ${\bf C}$ , thereby regaining one important simplifying property of the above ideal situation. The diagonal elements of the matrix are then its eigenvalues. Because of the correlations in the original matrix, we expect there to be fewer significant eigenvalues than generalized pixels; this will be relevant shortly. One could then work with a reduced matrix consisting of only the significant eigenvalues, an approach reminiscent of the signal-to-noise eigenmodes proposed by Bond (1995), and also known as the Karhunen-Loeve transform (Bunn & White 1997; Tegmark et al. 1997). There remain two technical difficulties: the covariance matrix does not remain diagonal as we move away from the adopted fiducial model by varying $\delta T_{\rm fb}$ - only when this band-power corresponds to the fiducial model is the matrix really diagonal. The second complicating factor is that the eigenvalues are not identical, which greatly simplified the previous calculation.

All of this motivates us to examine the possibility that a likelihood function of the form (16) could be applied, with appropriate redefinitions of $N_{\rm pix}$ and $\sigma_N$ . We therefore proceed by renaming these latter $\nu$ and $\beta$ , respectively, and treating them as parameters to be adjusted to best fit the full likelihood function. Thus, given an actual band-power estimate, $\delta T_{\rm fb}^{(\rm o)}$ (i.e., an experimental result), we propose $G(\delta T_{\rm fb};\delta T_{\rm fb}^{(\rm o)},\beta,\nu)$ as an ansatz for the band-power likelihood function, with parameters $\nu$ and $\beta$ :

$\displaystyle % {\cal L}(\delta T_{\rm fb})$	$\textstyle \propto$	$\displaystyle X^{\nu/2} {\rm e}^{-X/2}$	(18)
$\displaystyle X[\delta T_{\rm fb}]$	$\textstyle \equiv$	$\displaystyle \frac{([\delta T_{\rm fb}^{({\rm o})}]^2 + \beta^2)}{([\delta T_{\rm fb}]^2 + \beta^2)}\nu\cdot$

We have only two parameters - $\nu$ and $\beta$ - to determine in order to apply the ansatz. This can be done if two confidence intervals of the complete likelihood function are known in advance. For example, suppose we were given both the 68% ( $\sigma^+_{68}$ & $\sigma^-_{68}$ ) and 95% ( $\sigma^+_{95}$ & $\sigma^-_{95}$ ) confidence intervals; then we could fix the two parameters with the equations

0.68	=	$\displaystyle \frac{\int_{\delta T_{\rm fb}^{(\rm o)}-\sigma^-_{68}}^{\delta T_... ... fb})} {\int_0^\infty {\rm d}[\delta T_{\rm fb}]\; {\cal L}(\delta T_{\rm fb})}$	(19)
0.95	=	$\displaystyle \frac{\int_{\delta T_{\rm fb}^{({\rm o})}-\sigma^-_{95}}^{\delta ... ...} {\int_0^\infty {\rm d}[\delta T_{\rm fb}]\; {\cal L}(\delta T_{\rm fb})}\cdot$	(20)

We shall see in the next section (Figs. 2-7) that this produces excellent approximations. This is the main result of this paper.

$\begin{figure} \par\includegraphics[width=12cm,clip]{saskcomp_95_4.ps} \end{figure}$

Figure 2: Comparison to the Saskatoon Q-band 1995 4-point difference. The value of the likelihood is plotted as a function of the band-power, $\delta T_{\rm fb}$ , in both linear (left) and logrithmic (right) scales. The solid (black) curve in each case gives the true likelihood function, while the dashed (red) curve corresponds to the proposed approximation based on two confidence intervals. The dot-dashed (blue) curve is the ansatz with $\nu =N_{\rm pix}=24$ and $\beta$ adjusted to the 68% confidence interval (see text). A "2-winged Gaussian'' with different positive-going and negative-going errors is shown as the three-dotted-dashed (green) curve. All curves have been normalized to unity at their peaks

$\begin{figure} \par\includegraphics[width=12cm,clip]{saskcomp_95_10.ps}\end{figure}$

Figure 3: Comparison to the Saskatoon Q-band 1995 10-point difference. The line-styles are the same as in the previous figure; here $N_{\rm pix}=48$ for the dot-dashed (blue) line

Unfortunately, most of the time only the 68% confidence interval is reported along with an experimental result (we hope that in the future authors will in fact supply at least two confidence intervals). Is there any way to proceed in this case? For example, one could try to judiciously choose $\nu$ and then adjust $\beta$ with Eq. (19). The most obvious choice for $\nu$ would be $\nu=N_{\rm pix}$ , although from our previous discussion, we expect this to be an upper limit to the number of significant degrees-of-freedom (the significant eigenvalues of ${\bf C}$ ), due to correlations between pixels. The comparisons we are about to make in the following section show that a smaller number of effective pixels (i.e., value for $\nu$ ) is in fact required for a good fit to the true likelihood function. One could try other games, such as setting $\nu\equiv$ (scan length)/(beam FWHM) for unidimensional scans. This also seems reasonable, and certainly this number is less than or equal to the actual number of pixels in the data set, but we have found that this does not always work satisfactorily. The availability of a second confidence interval permits both parameters, $\nu$ and $\beta$ , to be unambiguously determined and in such a way as to provide the best possible approximation with the proposed ansatz.

Bond et al. (2000) have recently examined the nature of the likelihood function and discussed two possible approximations. The form of the ansatz just presented is in fact identical to one of their proposed approximations, parameterized by x and G. These parameters are simply related to our $\nu$ and $\beta$ as follows: $x=\beta^2$ and $G=\nu$ .

Notice that the above development and motivation for the ansatz essentially follow for a single band-power. A set of uncorrelated power estimates is then easily treated by simple multiplication. However, the approximation as proposed does not simultaneously account for several correlated band-powers, and it's accuracy is therefore limited by the extent to which such inter-band correlations are important in a given data set. As a further remark along these lines, we have noted that flat-band estimates of any kind, be it from a complete likelihood analysis or not, do not always contain all relevant experimental information, (Douspis et al. 2000); any method based on their use is then fundamentally limited by nature of the lost information.

The only way to test the ansatz is, of course, by direct comparison to the full likelihood function calculated for a number of experiments. If it appears to work for a few such cases, then we may hope that it's general application is justified. We now turn to this issue.

Up: An approximation to the