Background: The last decade has seen a major increase in the availability of genomic data. This includes expert-curated databases that describe the biological activity of genes, as well as high-throughput assays that measure the gene expression of bulk tissue and single cells. Integrating these heterogeneous data sources can generate new hypotheses about biological systems. Our primary objective is to combine population-level drug-response data with patient-level single-cell expression data to predict how any gene will respond to any drug for any patient.
Methods: We use a “dual-channel” random walk with restart algorithm to perform 3 analyses. First, we use glioblastoma single cells from 5 individual patients to discover genes whose functions differ between cancers. Second, we use drug screening data from the Library of Integrated Network-Based Cellular Signatures (LINCS) to show how a cell-specific drug-response signature can be accurately predicted from a baseline (drug-free) gene co-expression network. Finally, we combine both data streams to show how we can predict how any gene will respond to any drug for each of the 5 glioblastoma patients.
Conclusions: Our manuscript introduces two innovations to the integration of heterogeneous biological data. First, we use a “dual-channel” method to predict up-regulation and down-regulation separately. Second, we use individualized single-cell gene co-expression networks to make personalized predictions. These innovations let us predict gene function and drug response for individual patients. When applied to real data, we identify a number of genes that exhibit a patient-specific drug response, including the pan-cancer oncogene EGFR.