Activity–aware clustering of high throughput screening data and elucidation of orthogonal structure–activity relationships
Lounkine, Eugen, Nigsch, Florian, Jenkins, Jeremy and Glick, Meir (2011) Activity–aware clustering of high throughput screening data and elucidation of orthogonal structure–activity relationships. Journal of Chemical Information and Modelling, 51 (12). pp. 3158-3168. ISSN 1549-9596
Abstract
Molecular clustering of large and diverse compound datasets like hit lists from high throughput screening (HTS) campaigns can facilitate the identification of structure-activity relationships (SAR) and molecular scaffolds characteristic of active compounds. However, typical clustering techniques rely on a general notion of chemical similarity or standard rules of scaffold decomposition, and are thus insensitive to molecular features that are enriched in biologically active compounds. By contrast, Bayesian can identify activity-characteristic features, even in diverse and noisy data sets.
In the present study, we combine molecular similarity and Bayesian models and introduce (I) a robust, activity-aware molecular similarity and clustering approach that uses structural features weighted according to their posterior probability in a Bayesian activity model and (II) a feature mapping approach for elucidation of distinct SAR determinants in polypharmacologic compounds. We applied the approach to over 450 assays from the Pubchem Bioassay repository.
Activity-aware clustering grouped compounds according to biologically active molecular cores that were specific for the target or pathway at hand, rather than grouping inactive scaffolds commonly found in compound series. Furthermore, we used Bayesian weights to highlight the activity-conferring molecular features, thereby providing an easily accessible, visual interpretation of the results. Numerical comparison of the molecular feature maps derived from different bioassays allowed identification of orthogonal SAR for individual compounds across multiple assays. Our method was able to identify the structural prerequisites for polypharmacology, i.e., multiple bioactive regions within a single compound, as well as highlight selectivity determinants, i.e., differing bioactive regions across compounds.
The method presented here is generally applicable to any type of activity data and may help elucidate SAR in heterogeneous compound data sets.
Item Type: | Article |
---|---|
Related URLs: | |
Additional Information: | archiving not formally supported by this publisher |
Related URLs: | |
Date Deposited: | 13 Oct 2015 13:15 |
Last Modified: | 13 Oct 2015 13:15 |
URI: | https://oak.novartis.com/id/eprint/5536 |