2 Maximum likelihood estimators

Some of the properties of MLE were given by Toutain & Appourchaux (1994). We repeat them here for completeness. We also address 2 issues that were not covered in their paper: are MLE biased?, and how significant are the estimated parameters.

2.1 Fundamental properties

The aim of this section is to introduce some definitions and properties of MLE. A comprehensive study of this area of statistics can be found, e.g. in Kendall & Stuart (1967). Given a random variable x with a probability distribution $f(x,\vec{\lambda})$ , where $\vec{\lambda}$ is a vector of p parameters. We define the logarithmic likelihood function $\ell$ of N independent measurements x_k of x as

$\begin{eqnarray} \ln{L}=\ell = -\sum_{k=1}^{N} \ln f(x_k,\vec{\lambda}).\end{eqnarray}$

(1)

where L is the likelihood. The main property of $\ell$ is that the position of its minimum in the $\vec{\lambda}$ -space gives an estimate of the most likely value of $\vec{\lambda}$ , denoted hereafter as $\tilde{\vec{\lambda}}$ . Hence $\tilde{\vec{\lambda}}$ is the solution of the set of p simultaneous equations:

$\begin{eqnarray} {{\partial \ell} \over{\partial {\lambda_i}}}=0 &{\rm with}&i = 1, 2, ..., p.\end{eqnarray}$

(2)

Moreover, in the limit of very large sample ( $N\rightarrow\infty$ ) this estimator $\tilde{\vec{\lambda}}$ tends to have a multi-normal probability distribution. In this case, this estimator is asymptotically unbiased with minimum variance; which implies its expectation and variance are respectively:

	$\begin{eqnarray} \lim_{N\rightarrow\infty} E(\tilde{\vec{\lambda}}) &=& \vec{\la... ..._{N\rightarrow\infty} \sigma^{2}(\tilde{\vec{\lambda}}) &=& c_{ii}\end{eqnarray}$	(3)
		(4)

where c_ii are the diagonal elements of the inverse of the Hessian matrix h, with elements:

$\begin{eqnarray} h_{ij}=E({{\partial^{2} \ell} \over{\partial {\lambda_i}\partial {\lambda_j}}}).\end{eqnarray}$

(5)

The covariances between any 2 components of $\tilde{\vec{\lambda}}$ are given by the corresponding off-diagonal elements of the inverse matrix. Equation (5) is used when computing the so-called formal error bars on $\tilde{\vec{\lambda}}$ ; as a matter of fact according to the Cramer-Rao theorem, Eq. (5) gives only a lower bound to the error bars (Kendall & Stuart 1967, reference therein). Toutain & Appourchaux (1994) showed that Eq. (5) is valid for most purpose in helioseismology.

2.2 Biased or unbiased?

The fact that MLE are asymptotically unbiased does not necessarily mean that this property is kept for a finite amount of data. As an example, it is well known that an estimator of the standard deviation ( $\sigma$ ) of N measurement of a normally distributed random variable x is given by:

$\begin{displaymath} \sigma^{2}=\frac{1}{N-1} \sum_{i=1}^{N} (x_{i}-\tilde{m})^{2} \end{displaymath}$

(6)

where x_i is the i-th measurement of the random variable x and $\tilde{m}$ is an estimate of the mean. It is well known that the $\sigma$ of Eq. (6) is unbiased. In this case, MLE would give the following estimator:

$\begin{displaymath} \sigma_{\rm MLE}^{2}=\frac{N-1}{N} \sigma^{2} \end{displaymath}$

(7)

Clearly the MLE expression give a bias that vanish asymptotically for an infinite number of points. It is often difficult to derive explicit relation, similar to Eq. (7) between the estimator and the finite number of data points. When analytical expression can not be found, we advice to use Monte-Carlo simulations to verify the unbiasness; an example for l = 1 splittings is given in Chang (1996) and Appourchaux et al. (1997).

In any case MLE are intrinsically biased estimators because they are also minimum variance estimators (Kendall & Stuart 1967). It may be useful to find other estimators that do not bias the estimates (Quenouille 1956); they might not necessarily have minimum variance. These estimators are yet to be found.

2.3 Significance of fitted parameters

When one uses Least Square for fitting data, one can test the significance of its fitted parameters using the so-called R test (Frieden 1983). For MLE, a useful test can be used: the likelihood ratio test. It was first used by Appourchaux et al. (1994). This method requires to maximize the likelihood e $^{-\ell(\omega_{p})}$ of a given event where p parameters are used to described the line profile. If one wants to describe the same event with n additional parameters, the likelihood e $^{-\ell(\Omega_{p+n})}$ will have to be maximized. The likelihood ratio test consists in making the ratio of the two likelihood (Brownlee 1965). Using the logarithmic likelihood, we can define the ratio $\Lambda$ as:

$\begin{displaymath} \ln(\Lambda)=\ell(\Omega_{p+n})-\ell(\omega_{p}).\end{displaymath}$

(8)

If $\Lambda$ is close to 1, it means that there is no improvement in the maximized likelihood and that the additional parameters are not significant. On the other hand, if $\Lambda \ll 1$ , it means that $\ell(\Omega_{p+n}) \ll \ell(\omega_{p})$ and that the additional parameters are very significant. In order to define a significance for the n additional parameters, we need to know the statistics of $\ln(\Lambda)$ under the null hypothesis, i.e. when the n additional parameters are not significant. For this null hypothesis, Wilks (1938) showed that for large sample size the distribution of -2ln $\Lambda$ tends to the $\chi^{2}(n)$ distribution.

Up: The art of fitting