Pyclustering Library

The library (pyclustering) was born in 2014 when I decided to implement oscillatory network based on Kuramoto model to study it in my PhD. I have tried to adapt it for solving practical problems and to compare obtained models with already existed algorithms and methods. Due to personal circumstances I work in one big corporation as a R&D engineer in absolutely different area from my PhD, thus I spend my time on it in the evenings, at the weekends and during vacations. And as a result pyclustering library has been implemented after two years and it consists of bio-inspired and classical data mining algorithms (cluster analysis, image segmentation, graph coloring algorithms and models of oscillatory networks).

LEGION_three_ensembles
Fig. 1. Example of simulation of LEGION.

Beginning my PhD research I planned to create several oscillatory networks based on various modified Kuramoto models which have the most interesting properties. Implemented oscillatory networks based on Kuramoto model are able to solve clustering (modules syncnet, syncsom, hsyncnet in pyclustering.cluster), image segmentation (module syncsegm in pyclustering.nnet), graph coloring (module syncgcolor in pyclustering.gcolor) and even pattern recognition (module syncpr in pyclustering.nnet) problems. But a little bit later I decided to implement several other oscillatory networks using corresponding articles to compare them, for example, LEGION – local excitatory global inhibitory oscillatory network (module legion in pyclustering.nnet) that has been created for image segmentation problem (object extraction, an example of simulation where three objects are allocated is presented in figure 1, each ensemble of synchronous oscillators encodes only one object), PCNN – pulse-coupled neural network (module pcnn in pyclustering.nnet) that is also implemented for image segmentation (edges extraction, an example of segmentation is presented in figure 2), HHN – Hodgkin Huxley neural network (module hnn in pyclustering.nnet) for image segmentation (color extraction), SOM – self organized feature map (module som in pyclustering.nnet, an example of visualization of dataspace by distance and density matrices is presented in figure 3) for encoding data space, Hysteresis oscillatory network (module hysteresis in pyclustering.nnet) which modification is used for graph coloring (module hysteresis in pyclustering.gcolor) and so on. Despite specified purposes of the mentioned networks they are presented by abstract models which ensure abilities to simulate them and to study their properties without references to the purposes. I have studied their abilities and have tried to find out answer to the question: “Are they really applicable for practical problems such as cluster analysis, image segmentation, pattern recognition, graph coloring?”. I`m not going to present results of the study in this note, but examples and demos “how do they work” and “what they can do” can be found for each oscillatory network and bio-inspired algorithm in example modules (for example, pyclustering.nnet.examples.* – where general features of each implemented oscillatory or neural network are demonstrated). The library provides tools for visualizing and analyzing results of network simulation.

PCNN_building_segmentation
Fig. 2. Example of segmentation by PCNN.

After some time I`ve had an idea not only to create models of oscillatory networks and bio-inspired algorithms but also to compare bio-inspired and classical (traditional) algorithms because almost each article about bio-inspired method or algorithm or approach says that presented solution is the best and in vast majority cases shows only advantages, but almost always keeps silence about disadvantages. And sometimes such articles consists of unfair comparison (from my point of view) with classical algorithms. Thus more attention was paid to implementaion of cluster analysis algorithms (vast majority of them are well-known): agglomerative, BIRCH, CLARANS, CURE, DBSCAN, heirarchical Sync (bio-inspired algorithm), K-Means, K-Medians, K-Medoids, OPTICS, ROCK, SyncNet (bio-inspired), Sync-SOM (bio-inspired), X-Means. Each algorithm has examples and demos in the library (the rule of the library – each algorithms is followed by a set of examples). If you need to compare mentioned algorithm by yourself so your are welcome to use the library for that and make your own conclusions. An example of visualization of clustering result is presented in figure 4.

SOM_pumatrix_target
Fig. 3. Example visualization of density (P) and distance (U) matrices allocated by SOM.

Initially the library had been implemented using only python code, but modeling of oscillatory networks requires a lot of computational resources and it is really slow to simulate the networks on python, therefore core of the library has been written in C/C++ language and it is used as a shared library CCORE. CCORE is an independent part of pyclustering where the same algorithms and models are implemented. CCORE doesn`t use python library (Python.h) to communicate with python part, it uses C special interface that is used by python via ctypes. It was done like this to save ability to use CCORE library in other projects and applications where python is not required.

dbscan_results
Fig. 4. Example of visualization of clustering results.

The pyclustering library is an open source project; people report about bugs, propose new features, contribute and help to develop the library. The library consists of following general modules:

  • pyclustering.cluster – algorithms for cluster analysis.
  • pyclustering.nnet – models of oscillatory and neural networks.
  • pyclustering.gcolor – algorithms for graph coloring.
  • pyclustering.tsp – algorithms for travelling salesman problem.
  • pyclustering.container – special data structures that are used by other modules.

The library provides simple and intuitive API. There is the listing for simulation of Hodgkin-Huxley oscillatory network (where oscillator is Hodgkin-Huxley neuron model) using python code:

from pyclustering.utils import draw_dynamics;
from pyclustering.nnet.hhn import hhn_network;

# six oscillator in the network
# and external stimulus for them
net = hhn_network(6, [11, 11, 11, 25, 25, 25]);

# simulation of the network during
# 100 time units (consider it like time in
# some units). And this 100 time unites
# should be simulated by 750 steps (consider
# it like iterations).
(t, dyn) = net.simulate(750, 100);

# visualization of the output dynamic
draw_dynamics(t, dyn, x_title = "t", y_title = "V");

Thus if you need to study or to use or just to see oscillatory networks, clustering algorithms and other data mining algorithms then pyclustering library or some part of it might be found as a useful instrument for that. I hope you will find it useful.

Andrei