Browse views: by Year, by Function, by GLF, by Subfunction, by Conference, by Journal

Learning from the data: mining of large high-throughput screening databases.

Yan, Frank and King, Frederick and He, Yun and Caldwell, Jeremy and Zhou, Yingyao (2006) Learning from the data: mining of large high-throughput screening databases. Journal of Chemical Information and Modeling, 46 (6). pp. 2381-2395. ISSN 1549-9596

Abstract

High-throughput screening (HTS) campaigns in pharmaceutical companies have accumulated a large amount of data for several million compounds over a couple of hundred assays. Despite the general awareness that rich information is hidden inside the vast amount of data, little has been reported for a systematic data mining method that can reliably extract relevant knowledge of interest for chemists and biologists. We developed a data mining approach based on an algorithm called ontology-based pattern identification (OPI) and applied it to our in-house HTS database. We identified nearly 1500 scaffold families with statistically significant structure-HTS activity profile relationships. Among them, dozens of scaffolds were characterized as leading to artifactual results stemming from the screening technology employed, such as assay format and/or readout. Four types of compound scaffolds can be characterized based on this data mining effort: tumor cytotoxic, general toxic, potential reporter gene assay artifact, and target family specific. The OPI-based data mining approach can reliably identify compounds that are not only structurally similar but also share statistically significant biological activity profiles. Statistical tests such as Kruskal-Wallis test and analysis of variance (ANOVA) can then be applied to the discovered scaffolds for effective assignment of relevant biological information. The scaffolds identified by our HTS data mining efforts are an invaluable resource for designing SAR-robust diversity libraries, generating in silico biological annotations of compounds on a scaffold basis, and providing novel target family specific scaffolds for focused compound library design.

Item Type: Article
Related URLs:
Additional Information: archiving not formally supported by this publisher
Related URLs:
Date Deposited: 14 Dec 2009 14:04
Last Modified: 31 Jan 2013 01:27
URI: https://oak.novartis.com/id/eprint/134

Search

Email Alerts

Register with OAK to receive email alerts for saved searches.