Background: Bridging heterogeneous mutation data fills in the gap between various data categories and propels discovery of disease-related genes. It is known that genome-wide association study (GWAS) infers significant mutation associations which link genotype and phenotype, and it is under-powered for pinpointing causal genes due to high false positive or negative rate. In the meantime, mutation events widely reported in literature unveil typical functional biological process, including mutation types like gain-of-function and loss-of-function.
Methods: To bring together the heterogeneous mutation data, we propose a pipeline, “Gene-Disease Association prediction by Mutation Data Bridging (GDAMDB)”, with a statistic generative model. The model learns the distribution parameters of mutation associations and mutation types, and recovers false negative GWAS mutations which fail to pass significant test but represent supportive evidences of functional biological process in literature.
Results: Eventually, GDAMDB is applied in Alzheimer’s disease which is a common inheritable neurodegenerative disorder with unknown pathological mechanism, and it predicted 79 AD-associated genes. Besides 12 of them come from the original GWAS study, 57 of them are supported to be AD-related by other GWAS or literature report.
Conclusion: Our model is capable of enhancing the GWAS-based gene association discovery by well combining text mining results. The positive result indicates that bridging the heterogeneous mutation data is contributory for the novel disease-related gene discovery.