next previous
Up: Blind source separation and


6 The Source Mutual Information (SMI)

After a set of tests we noticed that the previous tools were not sufficient to select the best BSS. We introduced another test based on the mutual information between the sources.

The mutual information.

Let us consider two sources S1 and S2. Their entropies are defined as:
 
E(S1) = $\displaystyle -\sum_{n} p_{S_1}(n)\log_2 p_{S_1}(n)$ (16)
E(S2) = $\displaystyle -\sum_{n} p_{S_2}(n)\log_2 p_{S_2}(n)$ (17)

where pS1(n) and pS2(n) are the probabilities of the pixel value n respectively in sources S1 and S2.

The entropy of the couple S1 and S2 is:

 \begin{displaymath}%
E(S_1,S_2)=-\sum_{n_1,n_2} p(n_1,n_2)\log_2 p(n_1,n_2)
\end{displaymath} (18)

where p(n1,n2) is the joint probability of pixel value n1for S1 and n2 for S2. If the sources are independent:

 
E(S1,S2)=E(S1)+E(S2). (19)

The quantity:

 
I(S1,S2)=E(S1)+E(S2)-E(S1,S2) (20)

is called the mutual information between S1 and S2. It is the information on S1 knowing S2 and inversely (Rényi 1966). From Eqs. (16-18) and (20) I(S1,S2) can be written as:

 \begin{displaymath}%
I(S_1,S_2)=\sum_{n_1,n_2} p(n_1,n_2)\log_2 {p(n_1,n_2)\over
p_{S_1}(n_1)p_{S_2}(n_2)}
\end{displaymath} (21)

which is the Kullback-Leibler divergence between the joint probability and the probability obtained from the marginal probabilities (Comon 1994). If the sources are independent, the joint probability is the product of the marginal ones and this divergence is equal to 0.

Then, the mutual information of a set of l sources is defined as a generalization of Eq. (21). We have:

 
$\displaystyle  I(S_1,\ldots,S_l)$ = $\displaystyle \sum_{n_1,\ldots,n_l}
p(n_1,\ldots,n_l)$  
    $\displaystyle \log_2 {p(n_1,\ldots,n_l)\over
\prod_{i=1,l} p_{S_i}(n_i)}\cdot$ (22)

The observed SMI.

The observed SMI can be derived from Eq. (22). We have to extract from experimental data an available estimation of the probability $p(n_1,n_2,\ldots,n_l)$. This probability is obtained from the number of pixels having a value n1 in source S1, n2 in source S2, $\ldots$, nl in source Sl. Then we distribute the pixels among Kl cells, where Kcorresponds to the number of levels per source.

As Kl increases exponentially with the number of sources, the mean number of pixels per cell decreases rapidly with l, and $E(S_1,S_2,\ldots,S_l)$ is badly estimated. In order to avoid this difficulty, we notice that we do not need to compute the exact value $I(S_1,\ldots,S_l)$, but only to say for which mixing matrix A or its inverse B=A-1, the mutual information is minimum. We can write the approximation:

 
S=BX. (23)

We obtain the sources from the images from a linear transformation, since the entropy of the source set $E(S_1,\ldots,S_n)$ is the entropy of the image set $E(X_1,\ldots,X_n)$ plus the logarithm of the Jacobian of the transformation:

 \begin{displaymath}%
E(S_1,\ldots,S_l)=E(X_1,\ldots,X_n)+{\rm log}_2\vert\det B\vert.
\end{displaymath} (24)

We obtain:
 
$\displaystyle  I(S_1,\ldots,S_l)$ = $\displaystyle \sum_l E(S_l)-E(X_1,\ldots,X_n)$  
    $\displaystyle - {\rm log}_2\vert\det B\vert.$ (25)

Equation (23), and consequently Eq. (25) are true only as a limit for an increasing number of pixels. This number must be such that it allows one to get an experimental PDF with a very small sampling interval. Then it is sufficient to minimize the function (Comon 1994):

 \begin{displaymath}%
C=\sum_l E(S_l)-{\rm log}_2\vert\det B\vert.
\end{displaymath} (26)

We applied Eq. (26) and the optimal set of sources was not the one retained by visual inspection. The drawback in Eq. (26) is that we never have enough pixels to validate it. We preferred to define SMI by summing the pairwise mutual information values, as explained below.

The SMI algorithm.

The entropy depends on the coding step between two levels:
-
If the step is too small the number of pixels per cell is too low and then the estimation is not available;
-
If the step is too large, the entropy is too small, the PDF is smoothed and it is not sensitive to non-Gaussian features.
Then the SMI determination is achieved in four steps:

1.
For each source i we determine the mean value mi and the standard deviation $\sigma_i$ after a $3\sigma_i$ clipping. In this algorithm we compute iteratively these parameters and we reject the values outside the interval $[m_i-3\sigma_i,m_i+3\sigma_i]$. After a few iterations (4 to 5) the algorithm converges. In the case of a true Gaussian distribution, the obtained mean is correct, while the $\sigma_i$ bias is of the order of $2\%$. If we are not dealing with a Gaussian distribution, these parameters define a Gaussian kernel of the PDF, and we will measure more values outside the interval $[m_i-3\sigma_i,m_i+3\sigma_i]$ than for a Gaussian PDF;
2.
The histogram Hi(k) of source i is determined with a cell size equal to this deviation $\sigma_i$. We evaluate the entropy of the source Ei by:

 \begin{displaymath}%
E_i=-\sum_k {H_i(k)\over N}\log_2 {H_i(k)\over N}
\end{displaymath} (27)

where N is the number of pixels;
3.
We determine then the mutual histogram Hij with the same cell size, and we compute the resulting entropy Eij. The mutual information between i and j is equal to:

 
Iij=Ei+Ej-Eij. (28)

This mutual information is independent of the cell size, for values smaller than $\sigma_i$, for a large number of pixels. A faint bias is introduced for a small number of pixels per cell. It is a possible compromise to choose a cell size equal to $\sigma_i$;
4.
We quantify the quality of each separation by the sum:

 \begin{displaymath}%
I=\sum_{i<j} I_{ij}.
\end{displaymath} (29)

We note that I only takes into account the pixel PDF. Therefore it favors the algorithms based on PDF, such as KL, JADE and FastICA, and penalizes SOBI.


next previous
Up: Blind source separation and

Copyright The European Southern Observatory (ESO)