Sync: Oscillatory based Clustering Algorithm

Kuramoto model is able to ensure global synchronization in oscillatory networks. Global synchronization is a state of oscillatory network where each oscillator has the same phase coordinate and this state is not interested from practical point of view, because nothing is encoded by oscillators (see previous note ‘Oscillatory networks based on Kuramoto model #1 – Introduction‘). Thus, Kuramoto model should be modified to ensure partial synchronization when there are two or more groups of oscillators where each oscillator within group has the same phase coordinate. In other words each group encodes objects that are similar to each other that and such set of objects is considered as a cluster.

There are different clustering algorithms that are based on modified Kuramoto model: Sync, hSync, Sync-SOM and so on. Let’s start from the first one and consider Sync algorithm in this note.

Algorithm Sync

Sync clustering algorithm is based on oscillatory network that uses modified Kuramoto model. Each oscillator corresponds to only one object from input data and thus amount of oscillators is equal to amount of objects. Two oscillators are connected if Euclidean distance between two corresponding objects is less than threshold value. Output dynamic of each oscillator is described by following differential equation:

\dot{\theta}_{i}= \dfrac{K}{N_{i}} \displaystyle\sum_{j \in N_{i}} sin({\theta}_{j} - {\theta}_{i})

This formula is simpler than original Kuramoto model: intrinsic frequency is omitted because it affects phase coordinate as a permanent bias. {N_{i}} is a amount of neighbors of oscillator i. And sum is for neighbors of oscillator i as well. K is a constant value that denotes coupling strength between oscillators. Phase coordinate is main variable of each oscillator.

When should simulation process be stopped? Or in other words, when can we obtain results of cluster analysis? Estimation of local synchronization is used for that purpose, when it close to 1.0 then clustering is close to the end and output dynamic of the oscillatory network can be used for extracting clusters. The estimation is defined by following formula:

r_{c} = \left| \displaystyle\sum_{i = 0}^{N} \dfrac{1}{N_{i}} \displaystyle\sum_{j \in N_{i}} e^{{\theta}_{j} - {\theta}_{i}} \right |

Ability to estimate end of the synchronization process is distinctive feature of oscillatory networks based on Kuramoto model. Such estimation helps to simulate network only required number of steps until desire state of system is not reached.

Implementation of the Sync algorithm can be found in pyclustering library. Using pyclustering let’s perform cluster analysis of sample LSUN:

from pyclustering.cluster.syncnet import syncnet, syncnet_visualizer;

from pyclustering.samples.definitions import FCPS_SAMPLES;
from pyclustering.utils import draw_clusters;

# Read sample 'Lsun' from file.
sample = read_sample(FCPS_SAMPLES.SAMPLE_LSUN);

# Create oscillatory network based algorithm Sync, using connectivity
# radius 0.45 and ccore calculatation to make process faster.
network = syncnet(sample, 0.45, ccore = True);

# Start simulation process until rc = 0.998 (default value, can be changed).
analyser = network.process();

# Show output dynamic of the network.
syncnet_visualizer.show_output_dynamic(analyser);

# Allocate clusters and display them
clusters = analyser.allocate_clusters();
draw_clusters(sample, clusters);

Here, connectivity radius is 0.45 – as it has been mentioned this parameter is similar to connectivity radius in DSCAN and OPTICS. Two oscillators are connected if Euclidean distance between their objects is less than 0.45. Estimation of local synchronization is considered as a stop condition for simulation – time point when clustering process is finished. In our example local synchronization is 0.998 by default.

Let’s consider output dynamic of the oscillatory network – figure 1.

Sync_OutputDynamic_Lsun
Figure 1. Output dynamic of the oscillatory network where three clusters are encoded by three synchronous ensembles (groups).

Oscillators that have the same phase coordinate encode cluster and therefore it is easy to extract clusters. Using pyclustering library and method draw_clusters() from package pyclustering.utils:

clusters = analyser.allocate_clusters();
draw_clusters(sample, clusters);

Or using instance of the class cluster_visualizer from package pyclustering.cluster:

clusters = analyser.allocate_clusters();
visualizer = cluster_visualizer();
visualizer.append_clusters(clusters, sample);
visualizer.show();
Sync_ClusteringResult_Lsun
Figure 2. Clustering result of LSUN by Sync algorithm where connectivity radius is 0.45.

Points that have the same color belong to the one cluster and as it can be observed on figure 2 – three clusters are allocated by the Sync algorithm. Class cluster_visualizer or function draw_clusters can visualize only 1-, 2-, 3-dimensional data, but anyway clusters can be printed to file or to console, clustering result is represented by list of clusters, where each cluster is represented by list of object indexes from input data.

Animation of clustering process of sample ‘Hepta’ by Sync algorithm is shown in following movie:

Advantages

1. Clustering results of Sync are similar to DBSCAN algorithm when amount of neighbors is equal to 1. For example, this algorithm is able to process data with elongated clusters with homogenous density.

2. Output dynamic can be analysed to extract true number of clusters in some cases (this idea has been extended in hSync algorithm that will be considered in the next note ‘hSync: Oscillatory based Clustering Algorithm‘) when connectivity radius is slightly bigger. Illustration of the cluster allocation by output dynamic analysis is shown on figure 3.

Sync_OutputDynamic_TwoDiamonds_GlobalSync
Figure 3. Illustration of extracting of two clusters (true number of clusters) in case of global synchronization.

3. Differential equation that used to calculate phase coordinate of each oscillator can be simplified and as a result increase performance (pyclustering uses simplified equation by default):

{\theta}_{i}[k + 1] = {\theta}_{i}[k] +\dfrac{K}{N_{i}} \displaystyle\sum_{j \in N_{i}} sin({\theta}_{j}[k] - {\theta}_{i}[k])

Disadvantages

1. Groups of oscillators do not communicate to each other and they may occasionally occupy the same phase coordinate and in this case it will not be possible to extract true number of clusters and use output dynamic. In this case there is nothing to do, simulation should be repeated again.

2. Complexity tends to O(n^3)  in case of elongated clusters such as LSUN. Execution time of processing of each sample from FCPS collection is much slower in compare to traditional algorithms (DBSCAN, OPTICS).

3. It is hard to find correct connectivity radius and for some samples it is even impossible.

Andrei

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s