Optimal deconvolution of transciptional profiling data using quadratic proagramming with application to complex clinical blood samples
Gong, Ting, Hartmann, Nicole, Kohane, Isaac S., Brinkmann, Volker, Staedtler, Frank, Letzkus, Martin and Szustakowski, Joseph (2010) Optimal deconvolution of transciptional profiling data using quadratic proagramming with application to complex clinical blood samples. RECOMB2011-Journal of Computational Biology.
Abstract
Large-scale molecular profiling technologies have enabled measurements of mRNA expression on the scale of whole genomes, which may assist the identification of disease biomarkers and facilitate the basic understanding of cellular processes. Specifically, peripheral blood is the most readily accessible human tissue for the studies of disease association and drug response in clinical trials. However, samples collected from human subjects in clinical trials possess a level of complexity that can hinder or obfuscate the analysis of data derived from them. Solid tissues can vary in composition depending on disease state, anatomical location, and collection method. Even so-called simple samples such as blood represent a complex mixture of circulating cell types of varying origin and functions. Failure to incorporate these sources of heterogeneity into an analysis can have widespread and detrimental effects on the following statistical studies.
We introduce an approach that explicitly builds upon a linear latent variable model, in which expression from a mixed cell population are modeled as the weighted average of expression from different cell types. We employ quadratic programming to efficiently search for the globally optimal solution in the linear latent model framework that preserves non-negativity of the fraction of the cells. We applied our method to various existing platforms to estimate proportions of different pure cell and tissue types and gene expression profilings of distinct phenotypes, with a focus on complex samples collected in clinical trials. Our method solves one of the open questions regarding the analysis of complex transcriptional data: namely, how to identify the optimal mixing fractions in a given experiment.
We have tested our methods to several well controlled benchmark data sets with known mixing fractions. Accurate agreement between predicted and actual mixing fractions was observed as expected, and robust to the experimental system. In addition, we have applied our method to more challenging mRNA expression profiling data from whole blood samples collected in a clinical trial (CFTY720D2201, ClinicalTrials.gov identifier NCT00333138). Our method was able to predict mixing fractions for more than ten species of circulating cells, and was even able to provide accurate estimates for relatively rare cell types (<10% total population). The concordance of our predictions with measured Complete Blood Counts (CBC) was very good (correlation > 0.75). In addition, our method was able to accurately identify changes in leukocyte trafficking associated with FTY720 treatment that is consistent with previous results generated by both Complete Blood Counts and flow cytometry.
Item Type: | Article |
---|---|
Additional Information: | This is for the submission to RECOMB2011 and if it will be accepted, it will also be publised in JCB or Genome Research. |
Keywords: | microarray, deconvolution, complex tissues, quadratic programming, whole blood samples |
Date Deposited: | 13 Oct 2015 13:16 |
Last Modified: | 13 Oct 2015 13:16 |
URI: | https://oak.novartis.com/id/eprint/3421 |