(decorative)
(decorative)
 
RESEARCH
[ CNB ]

[ CSIC ]
(decorative)

Pattern recognition and machine learning

We have been working for some time in the development of new types of self-organizing neural networks based on well-defined cost functions.

Motivation:

The well-known self-organizing map (SOM) algorithm (Kohonen, 1997) is extremely simple in its description and practical implementation. It has been demonstrated to be very successful in producing an orderly mapping of high dimensional data items onto a regular low dimensional grid. Specifically, its main property is that it conserves quite consistently the original topological and metric relationships of the items. However, despite the long history and widespread use of the method (Kaski, et. al.; 1998), its theoretical properties are still not fully understood.
The main theoretical approach towards an understanding of the SOM algorithm in general has been based on efforts to solve the following inverse problem: “find the functional (or equivalently, the cost function) whose numerical optimization corresponds to the SOM algorithm”.

Due to these difficulties, some research groups have developed other procedures different from SOM that attempt to conserve the topological structure of the data. The methodology used for the development of the algorithms is based on the optimization of well-defined cost functions. This approach allows a complete mathematical characterization of the mapping.

In this framework, we have developed a new self-organizing map algorithm (Pascual-Marqui, R., Pascual-Montano, A., Kochi, K., Carazo, J.M, Smoothly Distributed Fuzzy c-Means: a New Self-Organizing Map, Pattern Recognition (2001). 34:2395-2402). Unlike the well-known method of Kohonen, the new algorithm corresponds to the optimization of an unambiguously defined cost function. It consists of a modified version of the widely used fuzzy c-means functional, where the code vectors are distributed on a regular low dimensional grid, and a penalization term is added in order to guarantee a smooth distribution for the values of the code vectors on the grid. The mapping properties of the new method are similar to those of Kohonen's algorithm. Unlike the method of Kohonen, the advantage of our new method is that there is complete control over the mapping process.

triangle1.jpg (48594 bytes)ani01.gif (17257 bytes)triangle2.jpg (39677 bytes)

Projection of a 2D data set sampled from a triangle. The items are mapped onto a linear 1D grid forming the famous Peano-like curve.

cactus1.jpg (76745 bytes)Ani2.gif (25019 bytes)cactus2.jpg (50398 bytes)

Projection of a 3D data set sampled from three line segments. The items are mapped onto a squared grid.

A software package for Windows that implements this algorithm is freely available on http://www.unizh.ch/keyinst/SMap1/SMap.htm
In addition, the source code and binaries for Linux/system are included in the Xmipp software package that is also freely available.

A second form of self-organizing map has also being developed, based on a "smoothly distributed kernel density estimator". This new method is based on a cost function and its optimization algorithm explicitly designed to get a set of code vectors whose probability density resembles as best as possible the probability density of the input data. Although this algorithm is not quite as simple as SOM, it has the advantage of offering a better understanding and control of the mapping process. For details see (Pascual-Montano, et. al.; 2001).

Preliminary results show that this method is superior to the classical method of Kohonen in two main aspects: better mapping, and a better estimate of the probability density. The importance of these results is that the new mapping techniques will allow improved pattern recognition in diverse applications (cluster analysis, discriminant analysis, and others).

References

  1. T. Kohonen, Self-Organizing maps, Second Edition, Springer-Verlag (1997)

  2. S. Kaski, J. Kangas, T. Kohonen, Bibliography of Self-Organizing Map (SOM) Papers: 1981--1997, Neural Computing Surveys 1 (1998) 102-350. (Available in electronic form at http://www.icsi.berkeley.edu/~jagota/NCS/vol1.html)

  3. Pascual-Marqui, R.D., Pascual-Montano, A., Kochi, K., Carazo, J.M., (2001) "Smoothly Distributed Fuzzy c-Means: a New Self-Organizing Map". Pattern Recognition, 34:2395-2402.
  4. Pascual-Montano, A., Donate, L.E., Valle, M., Bárcena, M., Pascual-Marqui, R.D, Carazo, J.M. (2001). "A Novel Neural Network Technique for Analysis and Classification of EM Single Particle Images". J. Struct. Biol. 133(2-3):233-45