hSync: Oscillatory based Clustering Algorithm

This technical note is a continuation of the previous ‘Sync: Oscillatory based Clustering Algorithm‘. Hierarchical Sync algorithm ‘hSync’ for cluster analysis will be considered as a one of the representatives of algorithms based on oscillatory networks that use Kuramoto model for synchronization.

As it was mentioned in the previous note, sometimes output dynamic of the Sync algorithm can be analysed to extract required amount of clusters. It can be useful when connectivity radius is greater than real because in this case global synchronization will be observed in the oscillatory network and it means that only one cluster is encoded. But keyword here is ‘sometimes’, it does not mean that we can set bigger radius for every data to reach global synchronization and then analyse it, so the question is why? Output dynamic of Sync algorithm can be analysed only when oscillatory network consists of local highly coupled sub-networks that have a few links to each other as it presented on figure 1.

Sync_Two_highly_coupled_networks
Figure 1. Two highly coupled sub-networks in case of sample ‘TwoDiamonds’. These sub-networks have a few connections to each other.

Global synchronization will be reached for the oscillatory network with such structure (because there no any sub-networks that are not connected to each other) and it will be possible to analyse output dynamic due to small number of connections between these sub-networks – figure 2. At the beggining synchronization will be reached in sub-networks (two groups of oscillators that are marked as [1], [2]) and after time these two ensembles will be synchronized with each other.

Sync_TwoDiamonds_Global_Synchronization
Figure 2. Illustration of global synchronization in case of two highly coupled sub-networks.

But in other cases it is hard or even impossible to extract required amount of clusters. An idea to process network dynamic has been successfully implemented in hSync algorithm.

How does hSync algorithm work? hSync algorithm uses one mandatory parameter – amount of clusters that should be allocated instead of connectivity radius as in Sync. At the beginning the smallest connectivity radius is chosen, for example, average Euclidean distance to connect 3-nearest neighbors, or initial radius can be specified as an optional parameter by user. At the second step classic Sync algorithm is run to allocate clusters and if amount of allocated clusters is greater than required then radius is increased and Sync algorithm is rerun again. The last step is repeated until required amount of clusters is obtained. hSync algorithm can be described by following pseudo-code:

hSync(data D, uint N) {
  radius = k_nearest(D, 3);
  while(true) {
    # hSync algorithm uses Sync algorithm.
    clusters = Sync(radius);
    if (len(clusters) > N) {
      radius = increase(radius);
    }
    else {
      return clusters;
    }
  }
}

Authors of the algorithm do not specify what is rule should be used for radius increasing and this parameter can be also specified by user as an optional parameter of the algorithm in case pyclustering library (module hsyncnet in pyclustering.cluster). Let’s process the simplest data-set ‘SampleSimple01’ to demonstrate how does it work:

# Read dataset that consists 10 objects.
sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE1);

# Two clusters should be extracted.
amount_clusters = 2;

# Create hSync clustering algorithm instance.
network = hsyncnet(sample, amount_clusters, ccore = True);

# Start clustering process.
analyser = network.process(collect_dynamic = True);

# Display output dynamic of the network.
sync_visualizer.show_output_dynamic(analyser);

# Display allocated clusters.
clusters = analyser.allocate_clusters();
draw_clusters(sample, clusters);
hSync_SampleSimple01
Figure 3. Output dynamic of the hSync algorithm and corresponding clusters.

At the beginning 10 oscillators has own phases that are initialized randomly. Each oscillator corresponds to only one object in the data ‘SampleSimple1’. After the first step of when Sync processing is over (t = 1), 7 groups (ensembles) are formed. Connectivity radius is increased because 7 ensembles encodes 7 clusters and it is greater than required 2 clusters, so simulation with new radius is continued using Sync. At the end of the second step (t = 2) we have 4 groups and it is still greater, and we repeat the last procedure again: increase radius and continue simulation. Finally (t = 3), 2 ensembles are formed that encode 2 clusters and it means that clustering process is over.

Other hSync clustering examples can be found in pyclustering library:

$ python pyclustering/cluster/examples/hsync_examples.py

Advantages

1. If real number of clusters is known then it is much easy to extract them using hSync than try to find out proper connectivity radius for Sync algorithm.

2. hSync can be used to build some kind of dendrogram that is used to illustrate the arrangements of clusters that is similar to dendrogram that is produced by classical hierarchical algorithm.

3. Clustering abilities (quality) are the same as for Sync, DBSCAN, OPTICS.

Disadvantages

1. Time processing is the real problem for this algorithm. It is used Sync on each iteration that have O(n^3) complexity in worse case and amount of iterations depends on initial radius and increasing rule. Moreover on each iteration connectivity radius should be calculated and using k-nearest methods it takes O(n^2), i.e. even in sunny case one iteration cost is O(n^2). So, in case of lack of any information about connectivity radius time processing is dramatically increased.

2. Results are not stable as well as for Sync algorithm, because oscillator ensembles may occupy the same phase coordinates because there is no any desynchronization mechanism between ensembles in Sync and hSync.

Andrei

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s