Petrov, Dmitri, Che, Jianwei, Zhou, Bin, Santrosyan, Andrey, Zhou, Yingyao, Hadj Khodabakhshi, Alireza, Tanaseichuk, Olga and Jiang, Tao (2015) An Ecient Hierarchical Clustering Algorithm for Large Datasets. Austin Journal of Proteomics, Bioinformatics, 2 (1). pp. 1-6. ISSN 2471-0423


Hierarchical clustering is a widely adopted unsupervised learning
algorithm for discovering intrinsic groups embedded within a dataset. Standard
implementations of the exact algorithm for hierarchical clustering require O(n2 )
time and O(n2 ) memory and thus are unsuitable for processing datasets
containing more than 20 000 objects. In this study, we present a hybrid
hierarchical clustering algorithm requiring approximately O(n n ) time and
O(n n ) memory while still preserving the most desirable properties of the exact
algorithm. The algorithm was capable of clustering one million compounds within
a few hours on a single processor. The clustering program is freely available to
the research community at

