The experimental uncertainty of heterogeneous public Ki data

Tools

Kramer, Christian, Kalliokoski, Tuomo, Gedeck, Peter and Vulpetti, Anna (2012) The experimental uncertainty of heterogeneous public Ki data. Journal of Medicinal Chemistry, 55 (11). pp. 5165-5173. ISSN 0022-2623

Official URL: http://pubs.acs.org/doi/abs/10.1021/jm300131x

Abstract

The maximum achievable accuracy of all in silico models relies on the quality of the experimental data. Therefore, experimental uncertainty defines a natural upper limit to the achievable predictive performance. Models that yield errors smaller than the experimental uncertainty are necessarily overtrained. For these reasons, a reliable estimate of the experimental uncertainty is of high importance to all users and creators of in silico models.
There are two fundamentally different kinds of experimental uncertainties. Repeatability within the same laboratory is reported by the authors of research papers and is the uncertainty measure of choice for models developed on a consistently measured series of biological data. Reproducibility between laboratories is important for models validated on mixed Ki data from different sources, which applies to most large public Ki datasets for common biological targets. The latter also gives an indication of how much measurement bias there might be in Ki measurements.
The availability of large databases of biological measurements such as ChEMBL allows identifying multiple independent biological measurements for a compound, which can be used to estimate reproducibility. However, conclusions derived directly from the raw data pairs of measurements from ChEMBL (and probably all other large databases) are misleading. More than 90% of the measurements in fact are not real independent measurements. In this contribution it is shown that data on multiple measurements derived from the ChEMBL database has to be further processed since it contains systematically detectable unit-transcription errors, undifferentiated stereoisomers and repeated citations of single measurements. After careful removal of all dubious pairs of measurements, a mean error of 0.44 pKi units, (corresponding to a factor of 2.8 in Ki) a standard deviation of 0.54 pKi units and a median error of 0.34 pKi units were derived as error estimates for individual published Ki values. The maximum possible performance on large datasets was estimated to be R2Pearson,MAX = 0.81. These are important numbers for everybody who works with and judges in silico models.

Item Type:	Article
Related URLs:	http://www.ncbi.nlm.nih.gov/pubmed?term=...
Additional Information:	archiving not formally supported by this publisher
Related URLs:	http://www.ncbi.nlm.nih.gov/pubmed?term=...
Date Deposited:	13 Oct 2015 13:14
Last Modified:	13 Oct 2015 13:14
URI:	https://oak.novartis.com/id/eprint/6818

Search

Contact Us

oak.support@novartis.com