hSync: Oscillatory based Clustering Algorithm

This technical note is a continuation of the previous ‘Sync: Oscillatory based Clustering Algorithm‘. Hierarchical Sync algorithm ‘hSync’ for cluster analysis will be considered as a one of the representatives of algorithms based on oscillatory networks that use Kuramoto model for synchronization.

As it was mentioned in the previous note, sometimes output dynamic of the Sync algorithm can be analysed to extract required amount of clusters. It can be useful when connectivity radius is greater than real because in this case global synchronization will be observed in the oscillatory network and it means that only one cluster is encoded. But keyword here is ‘sometimes’, it does not mean that we can set bigger radius for every data to reach global synchronization and then analyse it, so the question is why? Output dynamic of Sync algorithm can be analysed only when oscillatory network consists of local highly coupled sub-networks that have a few links to each other as it presented on figure 1.

Sync_Two_highly_coupled_networks
Figure 1. Two highly coupled sub-networks in case of sample ‘TwoDiamonds’. These sub-networks have a few connections to each other.

Global synchronization will be reached for the oscillatory network with such structure (because there no any sub-networks that are not connected to each other) and it will be possible to analyse output dynamic due to small number of connections between these sub-networks – figure 2. At the beggining synchronization will be reached in sub-networks (two groups of oscillators that are marked as [1], [2]) and after time these two ensembles will be synchronized with each other.

Sync_TwoDiamonds_Global_Synchronization
Figure 2. Illustration of global synchronization in case of two highly coupled sub-networks.

But in other cases it is hard or even impossible to extract required amount of clusters. An idea to process network dynamic has been successfully implemented in hSync algorithm.

How does hSync algorithm work? hSync algorithm uses one mandatory parameter – amount of clusters that should be allocated instead of connectivity radius as in Sync. At the beginning the smallest connectivity radius is chosen, for example, average Euclidean distance to connect 3-nearest neighbors, or initial radius can be specified as an optional parameter by user. At the second step classic Sync algorithm is run to allocate clusters and if amount of allocated clusters is greater than required then radius is increased and Sync algorithm is rerun again. The last step is repeated until required amount of clusters is obtained. hSync algorithm can be described by following pseudo-code:

hSync(data D, uint N) {
  radius = k_nearest(D, 3);
  while(true) {
    # hSync algorithm uses Sync algorithm.
    clusters = Sync(radius);
    if (len(clusters) > N) {
      radius = increase(radius);
    }
    else {
      return clusters;
    }
  }
}

Authors of the algorithm do not specify what is rule should be used for radius increasing and this parameter can be also specified by user as an optional parameter of the algorithm in case pyclustering library (module hsyncnet in pyclustering.cluster). Let’s process the simplest data-set ‘SampleSimple01’ to demonstrate how does it work:

# Read dataset that consists 10 objects.
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE1);

# Two clusters should be extracted.
amount_clusters = 2;

# Create hSync clustering algorithm instance.
network = hsyncnet(sample, amount_clusters, ccore = True);

# Start clustering process.
analyser = network.process(collect_dynamic = True);

# Display output dynamic of the network.
sync_visualizer.show_output_dynamic(analyser);

# Display allocated clusters.
clusters = analyser.allocate_clusters();
draw_clusters(sample, clusters);
hSync_SampleSimple01
Figure 3. Output dynamic of the hSync algorithm and corresponding clusters.

At the beginning 10 oscillators has own phases that are initialized randomly. Each oscillator corresponds to only one object in the data ‘SampleSimple1’. After the first step of when Sync processing is over (t = 1), 7 groups (ensembles) are formed. Connectivity radius is increased because 7 ensembles encodes 7 clusters and it is greater than required 2 clusters, so simulation with new radius is continued using Sync. At the end of the second step (t = 2) we have 4 groups and it is still greater, and we repeat the last procedure again: increase radius and continue simulation. Finally (t = 3), 2 ensembles are formed that encode 2 clusters and it means that clustering process is over.

Other hSync clustering examples can be found in pyclustering library:

$ python pyclustering/cluster/examples/hsync_examples.py

Advantages

1. If real number of clusters is known then it is much easy to extract them using hSync than try to find out proper connectivity radius for Sync algorithm.

2. hSync can be used to build some kind of dendrogram that is used to illustrate the arrangements of clusters that is similar to dendrogram that is produced by classical hierarchical algorithm.

3. Clustering abilities (quality) are the same as for Sync, DBSCAN, OPTICS.

Disadvantages

1. Time processing is the real problem for this algorithm. It is used Sync on each iteration that have O(n^3) complexity in worse case and amount of iterations depends on initial radius and increasing rule. Moreover on each iteration connectivity radius should be calculated and using k-nearest methods it takes O(n^2), i.e. even in sunny case one iteration cost is O(n^2). So, in case of lack of any information about connectivity radius time processing is dramatically increased.

2. Results are not stable as well as for Sync algorithm, because oscillator ensembles may occupy the same phase coordinates because there is no any desynchronization mechanism between ensembles in Sync and hSync.

Andrei

Sync: Oscillatory based Clustering Algorithm

Kuramoto model is able to ensure global synchronization in oscillatory networks. Global synchronization is a state of oscillatory network where each oscillator has the same phase coordinate and this state is not interested from practical point of view, because nothing is encoded by oscillators (see previous note ‘Oscillatory networks based on Kuramoto model #1 – Introduction‘). Thus, Kuramoto model should be modified to ensure partial synchronization when there are two or more groups of oscillators where each oscillator within group has the same phase coordinate. In other words each group encodes objects that are similar to each other that and such set of objects is considered as a cluster.

There are different clustering algorithms that are based on modified Kuramoto model: Sync, hSync, Sync-SOM and so on. Let’s start from the first one and consider Sync algorithm in this note.

Algorithm Sync

Sync clustering algorithm is based on oscillatory network that uses modified Kuramoto model. Each oscillator corresponds to only one object from input data and thus amount of oscillators is equal to amount of objects. Two oscillators are connected if Euclidean distance between two corresponding objects is less than threshold value. Output dynamic of each oscillator is described by following differential equation:

\dot{\theta}_{i}= \dfrac{K}{N_{i}} \displaystyle\sum_{j \in N_{i}} sin({\theta}_{j} - {\theta}_{i})

This formula is simpler than original Kuramoto model: intrinsic frequency is omitted because it affects phase coordinate as a permanent bias. {N_{i}} is a amount of neighbors of oscillator i. And sum is for neighbors of oscillator i as well. K is a constant value that denotes coupling strength between oscillators. Phase coordinate is main variable of each oscillator.

When should simulation process be stopped? Or in other words, when can we obtain results of cluster analysis? Estimation of local synchronization is used for that purpose, when it close to 1.0 then clustering is close to the end and output dynamic of the oscillatory network can be used for extracting clusters. The estimation is defined by following formula:

r_{c} = \left| \displaystyle\sum_{i = 0}^{N} \dfrac{1}{N_{i}} \displaystyle\sum_{j \in N_{i}} e^{{\theta}_{j} - {\theta}_{i}} \right |

Ability to estimate end of the synchronization process is distinctive feature of oscillatory networks based on Kuramoto model. Such estimation helps to simulate network only required number of steps until desire state of system is not reached.

Implementation of the Sync algorithm can be found in pyclustering library. Using pyclustering let’s perform cluster analysis of sample LSUN:

from pyclustering.cluster.syncnet import syncnet, syncnet_visualizer;

from pyclustering.samples.definitions import FCPS_SAMPLES;
from pyclustering.utils import draw_clusters;

# Read sample 'Lsun' from file.
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);

# Create oscillatory network based algorithm Sync, using connectivity
# radius 0.45 and ccore calculatation to make process faster.
network = syncnet(sample, 0.45, ccore = True);

# Start simulation process until rc = 0.998 (default value, can be changed).
analyser = network.process();

# Show output dynamic of the network.
syncnet_visualizer.show_output_dynamic(analyser);

# Allocate clusters and display them
clusters = analyser.allocate_clusters();
draw_clusters(sample, clusters);

Here, connectivity radius is 0.45 – as it has been mentioned this parameter is similar to connectivity radius in DSCAN and OPTICS. Two oscillators are connected if Euclidean distance between their objects is less than 0.45. Estimation of local synchronization is considered as a stop condition for simulation – time point when clustering process is finished. In our example local synchronization is 0.998 by default.

Let’s consider output dynamic of the oscillatory network – figure 1.

Sync_OutputDynamic_Lsun
Figure 1. Output dynamic of the oscillatory network where three clusters are encoded by three synchronous ensembles (groups).

Oscillators that have the same phase coordinate encode cluster and therefore it is easy to extract clusters. Using pyclustering library and method draw_clusters() from package pyclustering.utils:

clusters = analyser.allocate_clusters();
draw_clusters(sample, clusters);

Or using instance of the class cluster_visualizer from package pyclustering.cluster:

clusters = analyser.allocate_clusters();
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, sample);
visualizer.show();
Sync_ClusteringResult_Lsun
Figure 2. Clustering result of LSUN by Sync algorithm where connectivity radius is 0.45.

Points that have the same color belong to the one cluster and as it can be observed on figure 2 – three clusters are allocated by the Sync algorithm. Class cluster_visualizer or function draw_clusters can visualize only 1-, 2-, 3-dimensional data, but anyway clusters can be printed to file or to console, clustering result is represented by list of clusters, where each cluster is represented by list of object indexes from input data.

Animation of clustering process of sample ‘Hepta’ by Sync algorithm is shown in following movie:

Advantages

1. Clustering results of Sync are similar to DBSCAN algorithm when amount of neighbors is equal to 1. For example, this algorithm is able to process data with elongated clusters with homogenous density.

2. Output dynamic can be analysed to extract true number of clusters in some cases (this idea has been extended in hSync algorithm that will be considered in the next note ‘hSync: Oscillatory based Clustering Algorithm‘) when connectivity radius is slightly bigger. Illustration of the cluster allocation by output dynamic analysis is shown on figure 3.

Sync_OutputDynamic_TwoDiamonds_GlobalSync
Figure 3. Illustration of extracting of two clusters (true number of clusters) in case of global synchronization.

3. Differential equation that used to calculate phase coordinate of each oscillator can be simplified and as a result increase performance (pyclustering uses simplified equation by default):

{\theta}_{i}[k + 1] = {\theta}_{i}[k] +\dfrac{K}{N_{i}} \displaystyle\sum_{j \in N_{i}} sin({\theta}_{j}[k] - {\theta}_{i}[k])

Disadvantages

1. Groups of oscillators do not communicate to each other and they may occasionally occupy the same phase coordinate and in this case it will not be possible to extract true number of clusters and use output dynamic. In this case there is nothing to do, simulation should be repeated again.

2. Complexity tends to O(n^3)  in case of elongated clusters such as LSUN. Execution time of processing of each sample from FCPS collection is much slower in compare to traditional algorithms (DBSCAN, OPTICS).

3. It is hard to find correct connectivity radius and for some samples it is even impossible.

Andrei

Introduction to Oscillatory Networks based on Kuramoto Model

What is an oscillatory network? Oscillatory network is a special type of neural networks that is represented by dynamic system that consists of oscillatory elements. The general distinction from classical neural networks is dynamic output like in real neurons.

There is an assumption that synchronization processes are used to implement cognitive functions, for example, vision, motion or memory. Oscillatory networks use this assumption to encode patterns and features. How the networks does encodes features in synchronization? The answer is simple: oscillators that have the same phase coordinate forms special groups – synchronous ensembles. An oscillatory network might have more than one synchronous ensemble and it means that the network encodes more than one feature where each ensemble of synchronous oscillators encodes only one feature. For example, we want to extract colors from an image – color image segmentation; in this case, each synchronous ensemble of oscillators encodes only one color. Let us consider another example – data clustering, in this case synchronous ensemble encodes only one cluster of data.

How does synchronization look like in the network? Suppose we have network with four oscillators and the first, the second encodes a feature A and the third, and the fourth encodes a feature B. Thus there are two synchronous ensembles [0, 1] and [2, 3] – figure 1.

Lesson 01 - Synchronous ensembles example
Figure 1. Example of two synchronous ensembles – [neuron #1, neuron #2] and [neuron #3, neuron #4].

Spike neuron output is used for simplicity in the figure 1. Here neuron #1 and neuron #2 have the same phase coordinate and at the same time, their phases are different from phases of neuron #3 and #4 that from another synchronous ensemble. Output of neuron depends on type of transfer function, for example, it might have wave-form or even chaotic-form. Let us return to our example, suppose we have image 2×2 where two pixels have blue color and other two has green color. Each ensemble encodes only one feature as we mentioned it before, so for such image neuron #1 and #2 correspond to pixel #1 and #2 (blue color) and neuron #3 and #4 correspond to pixel #3 and #4 (green color).

Now we understood how oscillatory network encodes features, but there is next question: “How to synchronize oscillators?”. We need to synchronize oscillators that should correspond to the same feature. The easiest way to do it is to use Kuramoto model. Original Kuramoto model ensures global synchronization in a full-interconnected network that uses phase oscillator:

\dot{\theta}_{i}=\omega_{i} + \sum_{j=0}^{N}K_{ij}\sin(\theta_{j} - \theta_{i})

Here θi – phase that is considered as a current state of oscillator. ωi – oscillator intrinsic frequency that can be considered as a oscillator bias. Kij – is a connection between i and j oscillators. Oscillatory network based on Kuramoto model is able to ensure global synchronization when all oscillators have the same phase. Let us simulate the oscillatory network using 100 oscillators using pyclustering library 0.6.0:

from pyclustering.nnet import conn_type;
from pyclustering.nnet.sync import sync_network, sync_visualizer;

# amout of oscillators in the network
num_osc = 100;

# multiplier for coupling strength between oscillators
k = 1;

# type of network structure - all-to-all (full-interconnected)
conn = conn_type.ALL_TO_ALL;

# permission to use core of the library to reach maximum performance
ccore_flag = True;

# create the oscillatory network
network = sync_network(num_osc, k, type_conn = conn, ccore = ccore_flag);

# simulate it until global synchronization and collect out
sync_output_dynamic = network.simulate_dynamic(collect_dynamic = True);

# show output dynamic of oscillators in the network
sync_visualizer.show_output_dynamic(sync_output_dynamic);

# show animation of synchronization process in polar coordinates
sync_visualizer.animate_output_dynamic(sync_output_dynamic);

The model of oscillatory network is simple and it can be easily implemented. Moreover, simple Euler method can be used for solving the differential equation for each oscillator, no need to implement classical Runge-Kutta fourth order method. Nevertheless, it is enough for review to use pyclustering library, therefore the library is used in this technical note.

The presented code simulate oscillatory network where global synchronization is observed at the end of simulation. It means that oscillators have the phase coordinate. Amount of simulation steps that is required to reach global synchronization is different for various network structures as you can see it in figure 2.

network_structures_synchronizations
Figure 2. Synchronization process in oscillatory networks with various structures.

As you can see from example above synchronization is reached in all networks. Pay attention to forms of output dynamic for grid and bidirectional list structures. I have prepared several movies devoted to the global synchronization in Kuramoto model.

Kuramoto model is interesting system and many researches devoted to it. Using pyclustering library you can perform your own investigation and study the model. And I hope this article encourage you to check what this model can do, try to play with coupling strength, added new components of the differential equation such as noise or try to substitute sinus function to another.

Now we have next question that should be clarified to understand how to apply it for practical problem, for example, cluster analysis, graph coloring, image segmentation and so on. We have global synchronization, but how to get local synchronization when oscillatory network has more than one synchronous ensemble? This issue will be considered in the next technical note – Sync: Oscillatory based Clustering Algorithm.

Andrei