3 Introduction to BSS

Independent component analysis (ICA) is a statistical signal processing technique that decomposes a set of m observed random or deterministic data into n independent unobserved source signals where $m\ge n$ and estimates the mixing matrix A. The simplest model is to consider observed mixed signals X as a linear combination of unobserved source signals S, based on the mixing matrix A with added sensor noise N, as shown in Eq. (1). ICA applications can be BSS, feature extraction or blind deconvolution. Current solution techniques are based on a white noise model. From the generalized Anscombe transform, this hypothesis is satisfied in our experiments.

In the BSS context, a full identification of the mixture matrix A is impossible because the exchange of a fixed scalar factor between a given source image and the corresponding column of Adoes not affect the observations. For each observed image X_i:

$\begin{displaymath}% X_i=\sum_j {a_{ij} \over \alpha_{j}} \alpha_{j} S_j+N_i \end{displaymath}$

(3)

where $\alpha_{j}$ is an arbitrary factor. To cater for this indeterminacy, the sources are constrained to have unit variance. A is consequently normalized. The correlation matrices are defined by:

$\begin{displaymath}% R(\tau)=E[X(t+\tau)\cdot X^*(t)]=A\cdot R_S(\tau)\cdot A^H \end{displaymath}$

(4)

where H denotes the complex conjugate transpose of the matrix and $R_S(\tau)$ is the correlation matrix of the source. For unit variance,

$\begin{displaymath}% R(0)=R_S(0)=I=A\cdot A^H. \end{displaymath}$

(5)

The mixture matrix A can be estimated up to permutation and phase shifts. Some considerations developed below must be added to improve the knowledge of A and S.

The Karhunen-Loève expansion.

For many decades the KL expansion has been applied for extracting the main information from a set of celestial images (see Murtagh & Heck 1987 for references). From the correlation matrix of the observed signals, the eigenvalues are evaluated in decreasing order, and the most significant ones are kept. From the eigenvectors, orthornormal sources are obtained. The KL expansion allows us to whiten the images and it is considered as the first step of BSS. The resulting demixed images show clearly that non-independence still exists (Fig. 3).

The KL expansion is not the only transformation which leads to a diagonal correlation matrix. Any rotation of the resulting sources keeps this property, but the KL expansion is the one which maximizes the energy concentration.

Orthogonality and rotations.

If the image PDFs are Gaussian ones, uncorrelated data mean independent ones, and then the KL expansion cannot be improved upon the basis of the PDFs. Generally the distributions are not Gaussian, and a null covariance does not mean that the sources are independent. In the space defined by the sources, any rotation keeps the orthogonality, and the norms. But the energy related to each source is spread. We can search for what rotation an independence criterion is optimized.

The optimal rotation results from n(n-1)/2 elementary rotations of angle $\theta_{ij}$ in the plane defined by sources i and j, n is the number of sources. This decomposition allows one to design algorithms to optimize an independence criterion.

The independence criteria.

Modern ICA work has been inspired by neural networks (Jutten & Hérault 1991). An historical review of this new field can be found in Jutten & Taleb (2000). The ICA concept was defined by Comon (1994). The link between neural network and entropy was proposed by Bell & Sejnowsky (1995), while Amari & Cichocki (1998) introduced an algorithm based on the natural gradient (Amari 1998).

In an environment where no adaptation is needed, one prefers because of computation time and convergence properties to use batch optimization algorithms, which act on the whole set of data without interaction. Second order algorithms are based on the hypothesis of temporally (or spatially) correlated sources and allow efficient second order separation. The cross-correlation between shifted sources taken two by two is decreased, while the source correlation is increased. Other batch computations minimize or maximize contrast functions (Comon 1994) based on higher order cumulants like in JADE (Cardoso & Souloumiac 1993). They allow signals with non-Gaussian PDFs to be separated. Stochastic gradient methods are implemented in neural networks (Hyvärinen & Oja 1997). These methods will be developed below.

Up: Blind source separation and