next previous
Up: Spectral analysis of stellar networks


3 PCA neural nets  

3.1 Introduction

Principal Component analysis (PCA) is a widely used technique in data analysis. Mathematically, it is defined as follows: let ${\bf C}=E(
\vec{x}\vec{x}^{\rm T})$ be the covariance matrix of L-dimensional zero mean input data vectors $\vec{x}$. The i-th principal component of $\vec{x}$ is defined as $\vec{x}^{\rm T}\vec{c}(i)$, where $\vec{c}(i)$ is the normalized eigenvector of ${\bf C}$ corresponding to the i-th largest eigenvalue $\lambda (i)$. The subspace spanned by the principal eigenvectors $\vec{c}(1),~\ldots ,\vec{c}(M),(M<L))$ is called the PCA subspace (of dimensionality M) (Oja et al. 1991; Oja et al. 1996). PCA's can be neurally realized in various ways (Baldi & Hornik 1989; Jutten & Herault 1991; Oja 1982; Oja et al. 1991; Plumbley 1993; Sanger 1989). The PCA neural network used by us is a one layer feedforward neural network which is able to extract the principal components of the stream of input vectors. Typically, Hebbian type learning rules are used, based on the one unit learning algorithm originally proposed by Oja (Oja 1982). Many different versions and extensions of this basic algorithm have been proposed during the recent years; see (Karhunen & Joutsensalo 1994; Karhunen & Joutsensalo 1995; Oja et al. 1996; Sanger 1989).

3.2 Linear, robust, nonlinear PCA neural nets

The structure of the PCA neural network can be summarised as follows (Karhunen & Joutsensalo 1994; Karhunen & Joutsensalo 1995; Oja et al. 1996; Sanger 1989): there is one input layer, and one forward layer of neurons totally connected to the inputs; during the learning phase there are feedback links among neurons, that classify the network structure as either hierarchical or symmetric. After the learning phase the network becomes purely feedforward. The hierarchical case leads to the well known GHA algorithm (Karhunen & Joutsensalo 1995; Sanger 1989); in the symmetric case we have the Oja's subspace network (Oja 1982).

PCA neural algorithms can be derived from optimisation problems, such as variance maximization and representation error minimisation (Karhunen & Joutsensalo 1994; Karhunen & Joutsensalo 1995) so obtaining nonlinear algorithms (and relative neural networks). These neural networks have the same architecture of the linear ones: either hierarchical or symmetric. These learning algorithms can be further classified in: robust PCA algorithms and nonlinear PCA algorithms. We define robust a PCA algorithm when the objective function grows less than quadratically (Karhunen & Joutsensalo 1994; Karhunen & Joutsensalo 1995). The nonlinear learning function appears at selected places only. In nonlinear PCA algorithms all the outputs of the neurons are nonlinear function of the responses.

3.2.1 Robust PCA algorithms

In the robust generalization of variance maximisation, the objective function f(t) is assumed to be a valid cost function (Karhunen & Joutsensalo 1994; Karhunen & Joutsensalo 1995), such as $\ln\cos (t)$ and |t|. This leads to the algorithm:
\vec{w}_{k+1}(i) &=&\vec{w}_{k}(i)+\mu _{k}g(y_{k}(i))\vec{e}_{...
 ...&=&\vec{x}_{k}-\sum_{j=1}^{I(i)}y_{k}(j)\vec{w}_{k}(j) .
\nonumber\end{eqnarray} (4)
In the hierarchical case we have I(i)=i. In the symmetric case I(i)=M, the error vector $\vec{e}_k(i)$ becomes the same $\vec{e}_k$ for all the neurons, and Eq. (4) can be compactly written as:
{\bf W}_{k+1}={\bf W}_k+\mu\vec{e}_kg(\vec{y}_k^{\rm T})\end{eqnarray} (5)

where $\vec{y}={\bf W}^{\rm T}_k\vec{x}$ is the instantaneous vector of neuron responses. The learning function g, derivative of f, is applied separately to each component of the argument vector.

The robust generalisation of the representation error problem (Karhunen & Joutsensalo 1994; Karhunen & Joutsensalo 1995), with $f(t)\le t^2$, leads to the stochastic gradient algorithm:
\vec{w}_{k+1}(i) & = & \vec{w}_k(i)+\mu (\vec{w}_k(i)^{\rm T}g(...
 ... \\  & + & \vec{x}_k^{\rm T}\vec{w}_k(i)g(\vec{e}_k(i))) \nonumber\end{eqnarray} (6)

This algorithm can be again considered in both hierarchical and symmetric cases. In the symmetric case I(i)=M, the error vector is the same $(\vec{e}_k)$ for all the weights $\vec{w}_k$. In the hierarchical case I(i)=i, Eq. (6) gives the robust counterparts of principal eigenvectors $\vec{c}(i)$.

3.2.2 Approximated algorithms

The first update term $\vec{w}_k(i)^{\rm T}g(\vec{e}_k(i))\vec{x}_k$ in Eq. (6) is proportional to the same vector $\vec{x}_k$ for all weights $\vec{w}_k(i)$. Furthermore, we can assume that the error vector $\vec{e}_k$should be relatively small after the initial convergence. Hence, we can neglet the first term in Eq. (6) and this leads to:  
\vec{w}_{k+1}(i)=\vec{w}_k(i)+\mu \vec{x}_k^{\rm T} y_k(i)g(\vec{e}_k(i)).\end{displaymath} (7)

3.2.3 Nonlinear PCA algorithms

Let us consider now the nonlinear extensions of PCA algorithms. We can obtain them in a heuristic way by requiring all neuron outputs to be always nonlinear in the Eq. (4) (Karhunen & Joutsensalo 1994; Karhunen & Joutsensalo 1995). This leads to:

\vec{w}_{k+1}(i) & = & \vec{w}_{k}(i)+\mu g(y_{k}(i))\vec{b}_{k...
\vec{w}_{k}(j)\quad \forall i=1,~\ldots ,p .\nonumber\end{eqnarray} (8)

next previous
Up: Spectral analysis of stellar networks

Copyright The European Southern Observatory (ESO)