# hSync: Oscillatory based Clustering Algorithm

This technical note is a continuation of the previous ‘Sync: Oscillatory based Clustering Algorithm‘. Hierarchical Sync algorithm ‘hSync’ for cluster analysis will be considered as a one of the representatives of algorithms based on oscillatory networks that use Kuramoto model for synchronization.

As it was mentioned in the previous note, sometimes output dynamic of the Sync algorithm can be analysed to extract required amount of clusters. It can be useful when connectivity radius is greater than real because in this case global synchronization will be observed in the oscillatory network and it means that only one cluster is encoded. But keyword here is ‘sometimes’, it does not mean that we can set bigger radius for every data to reach global synchronization and then analyse it, so the question is why? Output dynamic of Sync algorithm can be analysed only when oscillatory network consists of local highly coupled sub-networks that have a few links to each other as it presented on figure 1.

Global synchronization will be reached for the oscillatory network with such structure (because there no any sub-networks that are not connected to each other) and it will be possible to analyse output dynamic due to small number of connections between these sub-networks – figure 2. At the beggining synchronization will be reached in sub-networks (two groups of oscillators that are marked as [1], [2]) and after time these two ensembles will be synchronized with each other.

But in other cases it is hard or even impossible to extract required amount of clusters. An idea to process network dynamic has been successfully implemented in hSync algorithm.

How does hSync algorithm work? hSync algorithm uses one mandatory parameter – amount of clusters that should be allocated instead of connectivity radius as in Sync. At the beginning the smallest connectivity radius is chosen, for example, average Euclidean distance to connect 3-nearest neighbors, or initial radius can be specified as an optional parameter by user. At the second step classic Sync algorithm is run to allocate clusters and if amount of allocated clusters is greater than required then radius is increased and Sync algorithm is rerun again. The last step is repeated until required amount of clusters is obtained. hSync algorithm can be described by following pseudo-code:

```hSync(data D, uint N) {
while(true) {
# hSync algorithm uses Sync algorithm.
if (len(clusters) > N) {
}
else {
return clusters;
}
}
}
```

Authors of the algorithm do not specify what is rule should be used for radius increasing and this parameter can be also specified by user as an optional parameter of the algorithm in case pyclustering library (module hsyncnet in pyclustering.cluster). Let’s process the simplest data-set ‘SampleSimple01’ to demonstrate how does it work:

```# Read dataset that consists 10 objects.

# Two clusters should be extracted.
amount_clusters = 2;

# Create hSync clustering algorithm instance.
network = hsyncnet(sample, amount_clusters, ccore = True);

# Start clustering process.
analyser = network.process(collect_dynamic = True);

# Display output dynamic of the network.
sync_visualizer.show_output_dynamic(analyser);

# Display allocated clusters.
clusters = analyser.allocate_clusters();
draw_clusters(sample, clusters);
```

At the beginning 10 oscillators has own phases that are initialized randomly. Each oscillator corresponds to only one object in the data ‘SampleSimple1’. After the first step of when Sync processing is over (t = 1), 7 groups (ensembles) are formed. Connectivity radius is increased because 7 ensembles encodes 7 clusters and it is greater than required 2 clusters, so simulation with new radius is continued using Sync. At the end of the second step (t = 2) we have 4 groups and it is still greater, and we repeat the last procedure again: increase radius and continue simulation. Finally (t = 3), 2 ensembles are formed that encode 2 clusters and it means that clustering process is over.

Other hSync clustering examples can be found in pyclustering library:

```\$ python pyclustering/cluster/examples/hsync_examples.py
```

1. If real number of clusters is known then it is much easy to extract them using hSync than try to find out proper connectivity radius for Sync algorithm.

2. hSync can be used to build some kind of dendrogram that is used to illustrate the arrangements of clusters that is similar to dendrogram that is produced by classical hierarchical algorithm.

3. Clustering abilities (quality) are the same as for Sync, DBSCAN, OPTICS.

1. Time processing is the real problem for this algorithm. It is used Sync on each iteration that have $O(n^3)$ complexity in worse case and amount of iterations depends on initial radius and increasing rule. Moreover on each iteration connectivity radius should be calculated and using k-nearest methods it takes $O(n^2)$, i.e. even in sunny case one iteration cost is $O(n^2)$. So, in case of lack of any information about connectivity radius time processing is dramatically increased.