Profile-QSAR 2.0: Kinase Virtual Screening Accuracy comparable to 4-Concentration IC50 for Realistically Novel Compounds

Tools

Martin, Eric, Polyakov, Valery and Tian, Li (2017) Profile-QSAR 2.0: Kinase Virtual Screening Accuracy comparable to 4-Concentration IC50 for Realistically Novel Compounds. Journal of Chemical Information and Modeling, 57 (8). pp. 2077-2088. ISSN DOI 10.1021/acs.jcim.7b00166

Official URL: http://pubs.acs.org/action/doSearch?AllField=pqsar...

Abstract

Conventional random forest regression (RFR) virtual screening models appear to have excellent accuracy on random held-out test sets, but prove lacking in actual practice. Analysis of 18 historical virtual screens show that random test sets are far more similar to their training sets than are the compounds teams actually order. A new, cluster-based “realistic test set”, which mirrors the chemical novelty of real-life virtual screens, recapitulates the poor predictive power of RFR models in real projects. The original Profile-QSAR method greatly broadens the domain of application by using as independent variables a profile of activity predictions from all historical assays in a large protein family. However, it still falls short of experiment on the realistic test sets. The improved “Profile-QSAR 2.0” method replaces probabilities of activity from Pipeline Pilot naïve Bayes categorical models at several thresholds with predicted IC50s from RFR models. Besides increasing accuracy, this reduces the number of independent variables by about four-fold, allowing smaller training sets. Although the individual RFRs are slower to compute, they replace expensive software with open-source programs distributed across the cluster for much faster overall calculation. Unexpectedly, the accuracy is greatly improved by also removing the RFR model for the actual assay of interest from the independent variable profile. With these improvements, Profile-QSAR activity predictions are now statistically comparable to medium-throughput 4-concentration IC50 measurements even on the realistic test set. Beyond the yes/no activity predictions from typical HTS and conventional virtual screens, these semi-quantitative IC50 predictions allow for predicted potency, ligand efficiency, lipid efficiency and selectivity against undesirable anti-targets. They also enable virtual screening panels such as toxicity panels and overall promiscuity predictions.

Item Type:	Article
Keywords:	Profile-QSAR, virtual screen, overfitting, random forest regression, training/test set split
Date Deposited:	09 Sep 2017 00:45
Last Modified:	09 Sep 2017 00:45
URI:	https://oak.novartis.com/id/eprint/31984

Search

Contact Us

oak.support@novartis.com