An Ecient Hierarchical Clustering Algorithm for Large Datasets
Petrov, Dmitri, Che, Jianwei, Zhou, Bin, Santrosyan, Andrey, Zhou, Yingyao, Hadj Khodabakhshi, Alireza, Tanaseichuk, Olga and Jiang, Tao (2015) An Ecient Hierarchical Clustering Algorithm for Large Datasets. Austin Journal of Proteomics, Bioinformatics, 2 (1). pp. 1-6. ISSN 2471-0423
Abstract
Hierarchical clustering is a widely adopted unsupervised learning
algorithm for discovering intrinsic groups embedded within a dataset. Standard
implementations of the exact algorithm for hierarchical clustering require O(n2 )
time and O(n2 ) memory and thus are unsuitable for processing datasets
containing more than 20 000 objects. In this study, we present a hybrid
hierarchical clustering algorithm requiring approximately O(n n ) time and
O(n n ) memory while still preserving the most desirable properties of the exact
algorithm. The algorithm was capable of clustering one million compounds within
a few hours on a single processor. The clustering program is freely available to
the research community at http://carrier.gnf.org/publications/cluster.
Item Type: | Article |
---|---|
Date Deposited: | 12 Oct 2016 00:45 |
Last Modified: | 12 Oct 2016 00:45 |
URI: | https://oak.novartis.com/id/eprint/11040 |