Visualization of Genetic Clustering Algorithm

In this technical note I would like to share visualization of clustering process that was done genetic algorithm that demonstrates so called evolution.

There are a lot of articles, presentations and books devoted to genetic algorithms, therefore I am not going to describe genetic algorithm in details, only few words about them. Roughly speaking, the genetic algorithm is for searching global optimum using evolutionary principles that are close to natural selection. There are several general procedures that are used by the algorithm: crossing, mutation, estimation, formation of new generation. This cycle is repeated until stop condition is reached, for example, amount of iterations for processing is exceeded, or deceleration of convergence of fitness function.

One of the implementations of the genetic algorithm for clustering can be found in pyclustering library (module: that provide intuitive API for cluster analysis. There is usage example where data ‘SAMPLE_SIMPLE3’ is processed to extract three clusters and then obtained results are visualized (clustering results and clustering process):

from pyclustering.samples.definitions import SIMPLE_SAMPLES;
from import genetic_algorithm, ga_observer, ga_visualizer;
from pyclustering.utils import read_sample;

# Read data for text file
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3);

# Create observer for visualization
observer_instance = ga_observer(True, True, True);

# Create instance of genetic clustering algorithm
ga_instance = genetic_algorithm(data=sample,

# Start processing

# Show allocated clusters and fitness function evolution together
ga_visualizer.show_clusters(sample, observer_instance);

# Show animation of evolution
ga_visualizer.animate_cluster_allocation(sample, observer_instance);

Just few additional comments to this example. Observer instance of ga_observer can be omitted, it is used only for visualization process in the example, so if you want to extract clusters then forget about observer. Another important parameter here is population_count defines amount of evolution steps (iterations). Others parameters are common for any genetic algorithm.

As it was mentioned at the begging I have rendered several movies related to evolution process during clustering and some of them are presented below. I hope you will find them useful to understand general principles of clustering process that is performed by genetic algorithm.

The first video demonstrates clustering process of SAMPLE_SIMPLE2 by the genetic algorithm. The best current result (on current particular iteration that is considered as a local optimum) is shown at the lest side. The global optimum is the best result that was obtained up to current iteration and it is shown at the right side. Fitness functions (local optimum, global optimum and average) are shown below them. Here amount of chromosomes is reduced and amount of mutation is increased to make fitness function much more “aggressive” and “alive”.

The second video demonstrates clustering of SAMPLE_SIMPLE3 with much more peaceful behavior of fitness functions.

And the last video shows clustering process of SAMPLE_SIMPLE5 with almost the same parameters as in the previous.


hSync: Oscillatory based Clustering Algorithm

This technical note is a continuation of the previous ‘Sync: Oscillatory based Clustering Algorithm‘. Hierarchical Sync algorithm ‘hSync’ for cluster analysis will be considered as a one of the representatives of algorithms based on oscillatory networks that use Kuramoto model for synchronization.

As it was mentioned in the previous note, sometimes output dynamic of the Sync algorithm can be analysed to extract required amount of clusters. It can be useful when connectivity radius is greater than real because in this case global synchronization will be observed in the oscillatory network and it means that only one cluster is encoded. But keyword here is ‘sometimes’, it does not mean that we can set bigger radius for every data to reach global synchronization and then analyse it, so the question is why? Output dynamic of Sync algorithm can be analysed only when oscillatory network consists of local highly coupled sub-networks that have a few links to each other as it presented on figure 1.

Figure 1. Two highly coupled sub-networks in case of sample ‘TwoDiamonds’. These sub-networks have a few connections to each other.

Global synchronization will be reached for the oscillatory network with such structure (because there no any sub-networks that are not connected to each other) and it will be possible to analyse output dynamic due to small number of connections between these sub-networks – figure 2. At the beggining synchronization will be reached in sub-networks (two groups of oscillators that are marked as [1], [2]) and after time these two ensembles will be synchronized with each other.

Figure 2. Illustration of global synchronization in case of two highly coupled sub-networks.

But in other cases it is hard or even impossible to extract required amount of clusters. An idea to process network dynamic has been successfully implemented in hSync algorithm.

How does hSync algorithm work? hSync algorithm uses one mandatory parameter – amount of clusters that should be allocated instead of connectivity radius as in Sync. At the beginning the smallest connectivity radius is chosen, for example, average Euclidean distance to connect 3-nearest neighbors, or initial radius can be specified as an optional parameter by user. At the second step classic Sync algorithm is run to allocate clusters and if amount of allocated clusters is greater than required then radius is increased and Sync algorithm is rerun again. The last step is repeated until required amount of clusters is obtained. hSync algorithm can be described by following pseudo-code:

hSync(data D, uint N) {
  radius = k_nearest(D, 3);
  while(true) {
    # hSync algorithm uses Sync algorithm.
    clusters = Sync(radius);
    if (len(clusters) > N) {
      radius = increase(radius);
    else {
      return clusters;

Authors of the algorithm do not specify what is rule should be used for radius increasing and this parameter can be also specified by user as an optional parameter of the algorithm in case pyclustering library (module hsyncnet in pyclustering.cluster). Let’s process the simplest data-set ‘SampleSimple01’ to demonstrate how does it work:

# Read dataset that consists 10 objects.
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE1);

# Two clusters should be extracted.
amount_clusters = 2;

# Create hSync clustering algorithm instance.
network = hsyncnet(sample, amount_clusters, ccore = True);

# Start clustering process.
analyser = network.process(collect_dynamic = True);

# Display output dynamic of the network.

# Display allocated clusters.
clusters = analyser.allocate_clusters();
draw_clusters(sample, clusters);
Figure 3. Output dynamic of the hSync algorithm and corresponding clusters.

At the beginning 10 oscillators has own phases that are initialized randomly. Each oscillator corresponds to only one object in the data ‘SampleSimple1’. After the first step of when Sync processing is over (t = 1), 7 groups (ensembles) are formed. Connectivity radius is increased because 7 ensembles encodes 7 clusters and it is greater than required 2 clusters, so simulation with new radius is continued using Sync. At the end of the second step (t = 2) we have 4 groups and it is still greater, and we repeat the last procedure again: increase radius and continue simulation. Finally (t = 3), 2 ensembles are formed that encode 2 clusters and it means that clustering process is over.

Other hSync clustering examples can be found in pyclustering library:

$ python pyclustering/cluster/examples/


1. If real number of clusters is known then it is much easy to extract them using hSync than try to find out proper connectivity radius for Sync algorithm.

2. hSync can be used to build some kind of dendrogram that is used to illustrate the arrangements of clusters that is similar to dendrogram that is produced by classical hierarchical algorithm.

3. Clustering abilities (quality) are the same as for Sync, DBSCAN, OPTICS.


1. Time processing is the real problem for this algorithm. It is used Sync on each iteration that have O(n^3) complexity in worse case and amount of iterations depends on initial radius and increasing rule. Moreover on each iteration connectivity radius should be calculated and using k-nearest methods it takes O(n^2), i.e. even in sunny case one iteration cost is O(n^2). So, in case of lack of any information about connectivity radius time processing is dramatically increased.

2. Results are not stable as well as for Sync algorithm, because oscillator ensembles may occupy the same phase coordinates because there is no any desynchronization mechanism between ensembles in Sync and hSync.


Sync: Oscillatory based Clustering Algorithm

Kuramoto model is able to ensure global synchronization in oscillatory networks. Global synchronization is a state of oscillatory network where each oscillator has the same phase coordinate and this state is not interested from practical point of view, because nothing is encoded by oscillators (see previous note ‘Oscillatory networks based on Kuramoto model #1 – Introduction‘). Thus, Kuramoto model should be modified to ensure partial synchronization when there are two or more groups of oscillators where each oscillator within group has the same phase coordinate. In other words each group encodes objects that are similar to each other that and such set of objects is considered as a cluster.

There are different clustering algorithms that are based on modified Kuramoto model: Sync, hSync, Sync-SOM and so on. Let’s start from the first one and consider Sync algorithm in this note.

Algorithm Sync

Sync clustering algorithm is based on oscillatory network that uses modified Kuramoto model. Each oscillator corresponds to only one object from input data and thus amount of oscillators is equal to amount of objects. Two oscillators are connected if Euclidean distance between two corresponding objects is less than threshold value. Output dynamic of each oscillator is described by following differential equation:

\dot{\theta}_{i}= \dfrac{K}{N_{i}} \displaystyle\sum_{j \in N_{i}} sin({\theta}_{j} - {\theta}_{i})

This formula is simpler than original Kuramoto model: intrinsic frequency is omitted because it affects phase coordinate as a permanent bias. {N_{i}} is a amount of neighbors of oscillator i. And sum is for neighbors of oscillator i as well. K is a constant value that denotes coupling strength between oscillators. Phase coordinate is main variable of each oscillator.

When should simulation process be stopped? Or in other words, when can we obtain results of cluster analysis? Estimation of local synchronization is used for that purpose, when it close to 1.0 then clustering is close to the end and output dynamic of the oscillatory network can be used for extracting clusters. The estimation is defined by following formula:

r_{c} = \left| \displaystyle\sum_{i = 0}^{N} \dfrac{1}{N_{i}} \displaystyle\sum_{j \in N_{i}} e^{{\theta}_{j} - {\theta}_{i}} \right |

Ability to estimate end of the synchronization process is distinctive feature of oscillatory networks based on Kuramoto model. Such estimation helps to simulate network only required number of steps until desire state of system is not reached.

Implementation of the Sync algorithm can be found in pyclustering library. Using pyclustering let’s perform cluster analysis of sample LSUN:

from pyclustering.cluster.syncnet import syncnet, syncnet_visualizer;

from pyclustering.samples.definitions import FCPS_SAMPLES;
from pyclustering.utils import draw_clusters;

# Read sample 'Lsun' from file.
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);

# Create oscillatory network based algorithm Sync, using connectivity
# radius 0.45 and ccore calculatation to make process faster.
network = syncnet(sample, 0.45, ccore = True);

# Start simulation process until rc = 0.998 (default value, can be changed).
analyser = network.process();

# Show output dynamic of the network.

# Allocate clusters and display them
clusters = analyser.allocate_clusters();
draw_clusters(sample, clusters);

Here, connectivity radius is 0.45 – as it has been mentioned this parameter is similar to connectivity radius in DSCAN and OPTICS. Two oscillators are connected if Euclidean distance between their objects is less than 0.45. Estimation of local synchronization is considered as a stop condition for simulation – time point when clustering process is finished. In our example local synchronization is 0.998 by default.

Let’s consider output dynamic of the oscillatory network – figure 1.

Figure 1. Output dynamic of the oscillatory network where three clusters are encoded by three synchronous ensembles (groups).

Oscillators that have the same phase coordinate encode cluster and therefore it is easy to extract clusters. Using pyclustering library and method draw_clusters() from package pyclustering.utils:

clusters = analyser.allocate_clusters();
draw_clusters(sample, clusters);

Or using instance of the class cluster_visualizer from package pyclustering.cluster:

clusters = analyser.allocate_clusters();
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, sample);;
Figure 2. Clustering result of LSUN by Sync algorithm where connectivity radius is 0.45.

Points that have the same color belong to the one cluster and as it can be observed on figure 2 – three clusters are allocated by the Sync algorithm. Class cluster_visualizer or function draw_clusters can visualize only 1-, 2-, 3-dimensional data, but anyway clusters can be printed to file or to console, clustering result is represented by list of clusters, where each cluster is represented by list of object indexes from input data.

Animation of clustering process of sample ‘Hepta’ by Sync algorithm is shown in following movie:


1. Clustering results of Sync are similar to DBSCAN algorithm when amount of neighbors is equal to 1. For example, this algorithm is able to process data with elongated clusters with homogenous density.

2. Output dynamic can be analysed to extract true number of clusters in some cases (this idea has been extended in hSync algorithm that will be considered in the next note ‘hSync: Oscillatory based Clustering Algorithm‘) when connectivity radius is slightly bigger. Illustration of the cluster allocation by output dynamic analysis is shown on figure 3.

Figure 3. Illustration of extracting of two clusters (true number of clusters) in case of global synchronization.

3. Differential equation that used to calculate phase coordinate of each oscillator can be simplified and as a result increase performance (pyclustering uses simplified equation by default):

{\theta}_{i}[k + 1] = {\theta}_{i}[k] +\dfrac{K}{N_{i}} \displaystyle\sum_{j \in N_{i}} sin({\theta}_{j}[k] - {\theta}_{i}[k])


1. Groups of oscillators do not communicate to each other and they may occasionally occupy the same phase coordinate and in this case it will not be possible to extract true number of clusters and use output dynamic. In this case there is nothing to do, simulation should be repeated again.

2. Complexity tends to O(n^3)  in case of elongated clusters such as LSUN. Execution time of processing of each sample from FCPS collection is much slower in compare to traditional algorithms (DBSCAN, OPTICS).

3. It is hard to find correct connectivity radius and for some samples it is even impossible.


Pyclustering Library

The library (pyclustering) was born in 2014 when I decided to implement oscillatory network based on Kuramoto model to study it in my PhD. I have tried to adapt it for solving practical problems and to compare obtained models with already existed algorithms and methods. Due to personal circumstances I work in one big corporation as a R&D engineer in absolutely different area from my PhD, thus I spend my time on it in the evenings, at the weekends and during vacations. And as a result pyclustering library has been implemented after two years and it consists of bio-inspired and classical data mining algorithms (cluster analysis, image segmentation, graph coloring algorithms and models of oscillatory networks).

Fig. 1. Example of simulation of LEGION.

Beginning my PhD research I planned to create several oscillatory networks based on various modified Kuramoto models which have the most interesting properties. Implemented oscillatory networks based on Kuramoto model are able to solve clustering (modules syncnet, syncsom, hsyncnet in pyclustering.cluster), image segmentation (module syncsegm in pyclustering.nnet), graph coloring (module syncgcolor in pyclustering.gcolor) and even pattern recognition (module syncpr in pyclustering.nnet) problems. But a little bit later I decided to implement several other oscillatory networks using corresponding articles to compare them, for example, LEGION – local excitatory global inhibitory oscillatory network (module legion in pyclustering.nnet) that has been created for image segmentation problem (object extraction, an example of simulation where three objects are allocated is presented in figure 1, each ensemble of synchronous oscillators encodes only one object), PCNN – pulse-coupled neural network (module pcnn in pyclustering.nnet) that is also implemented for image segmentation (edges extraction, an example of segmentation is presented in figure 2), HHN – Hodgkin Huxley neural network (module hnn in pyclustering.nnet) for image segmentation (color extraction), SOM – self organized feature map (module som in pyclustering.nnet, an example of visualization of dataspace by distance and density matrices is presented in figure 3) for encoding data space, Hysteresis oscillatory network (module hysteresis in pyclustering.nnet) which modification is used for graph coloring (module hysteresis in pyclustering.gcolor) and so on. Despite specified purposes of the mentioned networks they are presented by abstract models which ensure abilities to simulate them and to study their properties without references to the purposes. I have studied their abilities and have tried to find out answer to the question: “Are they really applicable for practical problems such as cluster analysis, image segmentation, pattern recognition, graph coloring?”. I`m not going to present results of the study in this note, but examples and demos “how do they work” and “what they can do” can be found for each oscillatory network and bio-inspired algorithm in example modules (for example, pyclustering.nnet.examples.* – where general features of each implemented oscillatory or neural network are demonstrated). The library provides tools for visualizing and analyzing results of network simulation.

Fig. 2. Example of segmentation by PCNN.

After some time I`ve had an idea not only to create models of oscillatory networks and bio-inspired algorithms but also to compare bio-inspired and classical (traditional) algorithms because almost each article about bio-inspired method or algorithm or approach says that presented solution is the best and in vast majority cases shows only advantages, but almost always keeps silence about disadvantages. And sometimes such articles consists of unfair comparison (from my point of view) with classical algorithms. Thus more attention was paid to implementaion of cluster analysis algorithms (vast majority of them are well-known): agglomerative, BIRCH, CLARANS, CURE, DBSCAN, heirarchical Sync (bio-inspired algorithm), K-Means, K-Medians, K-Medoids, OPTICS, ROCK, SyncNet (bio-inspired), Sync-SOM (bio-inspired), X-Means. Each algorithm has examples and demos in the library (the rule of the library – each algorithms is followed by a set of examples). If you need to compare mentioned algorithm by yourself so your are welcome to use the library for that and make your own conclusions. An example of visualization of clustering result is presented in figure 4.

Fig. 3. Example visualization of density (P) and distance (U) matrices allocated by SOM.

Initially the library had been implemented using only python code, but modeling of oscillatory networks requires a lot of computational resources and it is really slow to simulate the networks on python, therefore core of the library has been written in C/C++ language and it is used as a shared library CCORE. CCORE is an independent part of pyclustering where the same algorithms and models are implemented. CCORE doesn`t use python library (Python.h) to communicate with python part, it uses C special interface that is used by python via ctypes. It was done like this to save ability to use CCORE library in other projects and applications where python is not required.

Fig. 4. Example of visualization of clustering results.

The pyclustering library is an open source project; people report about bugs, propose new features, contribute and help to develop the library. The library consists of following general modules:

  • pyclustering.cluster – algorithms for cluster analysis.
  • pyclustering.nnet – models of oscillatory and neural networks.
  • pyclustering.gcolor – algorithms for graph coloring.
  • pyclustering.tsp – algorithms for travelling salesman problem.
  • pyclustering.container – special data structures that are used by other modules.

The library provides simple and intuitive API. There is the listing for simulation of Hodgkin-Huxley oscillatory network (where oscillator is Hodgkin-Huxley neuron model) using python code:

from pyclustering.utils import draw_dynamics;
from pyclustering.nnet.hhn import hhn_network;

# six oscillator in the network
# and external stimulus for them
net = hhn_network(6, [11, 11, 11, 25, 25, 25]);

# simulation of the network during
# 100 time units (consider it like time in
# some units). And this 100 time unites
# should be simulated by 750 steps (consider
# it like iterations).
(t, dyn) = net.simulate(750, 100);

# visualization of the output dynamic
draw_dynamics(t, dyn, x_title = "t", y_title = "V");

Thus if you need to study or to use or just to see oscillatory networks, clustering algorithms and other data mining algorithms then pyclustering library or some part of it might be found as a useful instrument for that. I hope you will find it useful.


Comparison of Clustering Algorithms

There are a lot of clustering algorithms. Nevertheless, there are questions: how to decide which algorithm should be used, how fast are they and what kind of data they can process properly. In this technical note I would like to compare several classical and bio-inspired clustering algorithms (that are based on oscillatory networks).

For comparison clustering algorithms python implementation of them is used (library pyclustering – branch 0.6.dev0). The library provides opportunity to use CCORE library (C/C++ part of the library), but currently amount of python implemented clustering algorithms is greater than amount of C/C++ implemented algorithms. Nevertheless, functionality and cluster analysis abilities can be compared using only python implementation.

Following algorithms are considered in this technical note: Agglomerative, BIRCH, CLARANS, CURE, DBSCAN, HSyncNet, K-Means, K-Medians, K-Medoids (PAM), OPTICS, ROCK, SyncNet, SYNC-SOM, X-Means. Almost of them are classical algorithms, except bio-inspired class: HSyncNet, SyncNet, SYNC-SOM. Let’s divide them into following categories:

  • Hierarchical algorithms: Agglomerative, BIRCH, CURE, HSyncNet.
  • Density based algorithms: DBSCAN, OPTICS, ROCK, SyncNet, SYNC-SOM.
  • Partitioning based algorithms: K-Means, K-Medians, K-Medoids (PAM), X-Means.

Almost all these algorithms require input data in RAM except BIRCH and SYNC-SOM, and thus there is limitation to use them for processing large databases. BIRCH encodes input data space (reduces it) by CF-Tree and SYNC-SOM encodes input data by SOM layer (self-organized feature map). FCPS data set collection for clustering testing is used. This data collection consists of 2D and 3D data sets with various complexity. Examples of data and proper clustering results (DBSCAN with correct parameters) are presented on figure 1.

Figure 1. Clustering results of FCPS data by DBSCAN algorithm.

Another one example of clustering where some clusters are allocated incorrectly (K-Medians) presented on figure 2.

Figure 2. Clustering results of FCPS data by K-Mediands algorithm.

Time processing and correctness (proper allocation is marked by green color, improper allocation – red color) of clusters allocation for FCPS are presented in table 1.

Table 1. Time processing and correctness of clustering algorithms.

Some words about bio-inspired algorithms: SyncNet, HSyncNet, SYNC-SOM are able to allocate clusters but allocation depends on output of these oscillatory networks. Approach is pretty simple: each oscillator of a first layer (SyncNet and HSyncNet have only one layer) corresponds to only one object from input data space (SYNC-SOM’s neurons of the first layer compete with each other to capture objects). After simulation (processing input data space) oscillatory network has one or several ensembles of synchronous oscillators where each ensembles corresponds to only one cluster. But ensembles might have the same phases because considered algorithms have got de-synchronization mechanism – ensembles just do not interact with each other. It means that sometimes clustering result might be incorrect and simulation should be repeated. This problem can be resolved by using high level of local synchronization but it leads to increasing of simulation time.