In this technical note I would like to share visualization of clustering process that was done genetic algorithm that demonstrates so called evolution.
There are a lot of articles, presentations and books devoted to genetic algorithms, therefore I am not going to describe genetic algorithm in details, only few words about them. Roughly speaking, the genetic algorithm is for searching global optimum using evolutionary principles that are close to natural selection. There are several general procedures that are used by the algorithm: crossing, mutation, estimation, formation of new generation. This cycle is repeated until stop condition is reached, for example, amount of iterations for processing is exceeded, or deceleration of convergence of fitness function.
One of the implementations of the genetic algorithm for clustering can be found in pyclustering library (module: pyclustering.cluster.ga) that provide intuitive API for cluster analysis. There is usage example where data ‘SAMPLE_SIMPLE3’ is processed to extract three clusters and then obtained results are visualized (clustering results and clustering process):
from pyclustering.samples.definitions import SIMPLE_SAMPLES; from pyclustering.cluster.ga import genetic_algorithm, ga_observer, ga_visualizer; from pyclustering.utils import read_sample; # Read data for text file sample = read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3); # Create observer for visualization observer_instance = ga_observer(True, True, True); # Create instance of genetic clustering algorithm ga_instance = genetic_algorithm(data=sample, count_clusters=4, chromosome_count=100, population_count=200, count_mutation_gens=2, coeff_mutation_count=0.8, select_coeff=0.3, observer=observer_instance); # Start processing ga_instance.process(); # Show allocated clusters and fitness function evolution together ga_visualizer.show_clusters(sample, observer_instance); # Show animation of evolution ga_visualizer.animate_cluster_allocation(sample, observer_instance);
Just few additional comments to this example. Observer instance of ga_observer can be omitted, it is used only for visualization process in the example, so if you want to extract clusters then forget about observer. Another important parameter here is population_count defines amount of evolution steps (iterations). Others parameters are common for any genetic algorithm.
As it was mentioned at the begging I have rendered several movies related to evolution process during clustering and some of them are presented below. I hope you will find them useful to understand general principles of clustering process that is performed by genetic algorithm.
The first video demonstrates clustering process of SAMPLE_SIMPLE2 by the genetic algorithm. The best current result (on current particular iteration that is considered as a local optimum) is shown at the lest side. The global optimum is the best result that was obtained up to current iteration and it is shown at the right side. Fitness functions (local optimum, global optimum and average) are shown below them. Here amount of chromosomes is reduced and amount of mutation is increased to make fitness function much more “aggressive” and “alive”.
The second video demonstrates clustering of SAMPLE_SIMPLE3 with much more peaceful behavior of fitness functions.
And the last video shows clustering process of SAMPLE_SIMPLE5 with almost the same parameters as in the previous.