6 The Source Mutual Information (SMI)

After a set of tests we noticed that the previous tools were not sufficient to select the best BSS. We introduced another test based on the mutual information between the sources.

The mutual information.

Let us consider two sources S₁ and S₂. Their entropies are defined as:

E(S₁)	=	$\displaystyle -\sum_{n} p_{S_1}(n)\log_2 p_{S_1}(n)$	(16)
E(S₂)	=	$\displaystyle -\sum_{n} p_{S_2}(n)\log_2 p_{S_2}(n)$	(17)

where p_S₁(n) and p_S₂(n) are the probabilities of the pixel value n respectively in sources S₁ and S₂.

The entropy of the couple S₁ and S₂ is:

$\begin{displaymath}% E(S_1,S_2)=-\sum_{n_1,n_2} p(n_1,n_2)\log_2 p(n_1,n_2) \end{displaymath}$

(18)

where p(n₁,n₂) is the joint probability of pixel value n₁for S₁ and n₂ for S₂. If the sources are independent:

E(S₁,S₂)=E(S₁)+E(S₂).

(19)

The quantity:

I(S₁,S₂)=E(S₁)+E(S₂)-E(S₁,S₂)

(20)

is called the mutual information between S₁ and S₂. It is the information on S₁ knowing S₂ and inversely (Rényi 1966). From Eqs. (16-18) and (20) I(S₁,S₂) can be written as:

$\begin{displaymath}% I(S_1,S_2)=\sum_{n_1,n_2} p(n_1,n_2)\log_2 {p(n_1,n_2)\over p_{S_1}(n_1)p_{S_2}(n_2)} \end{displaymath}$

(21)

which is the Kullback-Leibler divergence between the joint probability and the probability obtained from the marginal probabilities (Comon 1994). If the sources are independent, the joint probability is the product of the marginal ones and this divergence is equal to 0.

Then, the mutual information of a set of l sources is defined as a generalization of Eq. (21). We have:

$\displaystyle I(S_1,\ldots,S_l)$	=	$\displaystyle \sum_{n_1,\ldots,n_l} p(n_1,\ldots,n_l)$
		$\displaystyle \log_2 {p(n_1,\ldots,n_l)\over \prod_{i=1,l} p_{S_i}(n_i)}\cdot$	(22)

The observed SMI.

The observed SMI can be derived from Eq. (22). We have to extract from experimental data an available estimation of the probability $p(n_1,n_2,\ldots,n_l)$ . This probability is obtained from the number of pixels having a value n₁ in source S₁, n₂ in source S₂, $\ldots$ , n_l in source S_l. Then we distribute the pixels among K^l cells, where Kcorresponds to the number of levels per source.

As K^l increases exponentially with the number of sources, the mean number of pixels per cell decreases rapidly with l, and $E(S_1,S_2,\ldots,S_l)$ is badly estimated. In order to avoid this difficulty, we notice that we do not need to compute the exact value $I(S_1,\ldots,S_l)$ , but only to say for which mixing matrix A or its inverse B=A^-1, the mutual information is minimum. We can write the approximation:

S=BX.

(23)

We obtain the sources from the images from a linear transformation, since the entropy of the source set $E(S_1,\ldots,S_n)$ is the entropy of the image set $E(X_1,\ldots,X_n)$ plus the logarithm of the Jacobian of the transformation:

$\begin{displaymath}% E(S_1,\ldots,S_l)=E(X_1,\ldots,X_n)+{\rm log}_2\vert\det B\vert. \end{displaymath}$

(24)

We obtain:

$\displaystyle I(S_1,\ldots,S_l)$	=	$\displaystyle \sum_l E(S_l)-E(X_1,\ldots,X_n)$
		$\displaystyle - {\rm log}_2\vert\det B\vert.$	(25)

Equation (23), and consequently Eq. (25) are true only as a limit for an increasing number of pixels. This number must be such that it allows one to get an experimental PDF with a very small sampling interval. Then it is sufficient to minimize the function (Comon 1994):

$\begin{displaymath}% C=\sum_l E(S_l)-{\rm log}_2\vert\det B\vert. \end{displaymath}$

(26)

We applied Eq. (26) and the optimal set of sources was not the one retained by visual inspection. The drawback in Eq. (26) is that we never have enough pixels to validate it. We preferred to define SMI by summing the pairwise mutual information values, as explained below.

The SMI algorithm.

The entropy depends on the coding step between two levels:

-: If the step is too small the number of pixels per cell is too low and then the estimation is not available;
-: If the step is too large, the entropy is too small, the PDF is smoothed and it is not sensitive to non-Gaussian features.

Then the SMI determination is achieved in four steps:

1.

For each source i we determine the mean value m_i and the standard deviation $\sigma_i$ after a $3\sigma_i$ clipping. In this algorithm we compute iteratively these parameters and we reject the values outside the interval $[m_i-3\sigma_i,m_i+3\sigma_i]$ . After a few iterations (4 to 5) the algorithm converges. In the case of a true Gaussian distribution, the obtained mean is correct, while the $\sigma_i$ bias is of the order of $2\%$ . If we are not dealing with a Gaussian distribution, these parameters define a Gaussian kernel of the PDF, and we will measure more values outside the interval $[m_i-3\sigma_i,m_i+3\sigma_i]$ than for a Gaussian PDF;

2.

The histogram H_i(k) of source i is determined with a cell size equal to this deviation $\sigma_i$ . We evaluate the entropy of the source E_i by:

$\begin{displaymath}% E_i=-\sum_k {H_i(k)\over N}\log_2 {H_i(k)\over N} \end{displaymath}$

(27)

where N is the number of pixels;

3.

We determine then the mutual histogram H_ij with the same cell size, and we compute the resulting entropy E_ij. The mutual information between i and j is equal to:

I_ij=E_i+E_j-E_ij.

(28)

This mutual information is independent of the cell size, for values smaller than $\sigma_i$ , for a large number of pixels. A faint bias is introduced for a small number of pixels per cell. It is a possible compromise to choose a cell size equal to $\sigma_i$ ;

4.

We quantify the quality of each separation by the sum:

$\begin{displaymath}% I=\sum_{i<j} I_{ij}. \end{displaymath}$

(29)

We note that I only takes into account the pixel PDF. Therefore it favors the algorithms based on PDF, such as KL, JADE and FastICA, and penalizes SOBI.

Up: Blind source separation and