Background: As a pioneer in Literature-based Discovery (LBD), Swanson’s ABC model brought together isolated public knowledge and assembled them into putative hypotheses with logical connections. Existing LBD studies rely on co-occurrences and semantic triples to represent knowledge. A simple entity-based knowledge graph with co-occurrences and/or semantic triples as basic building blocks has shown potential in inferring implicit knowledge in LBD. However, our analysis of a knowledge graph constructed for a recent binary LBD system reveals limitations arising from pairwise relationships, which further negatively impact knowledge inference. By using LBD as the context and motivation in this work, we explore limitations of using pairwise relationships only as knowledge representation in knowledge graphs, and we identify impacts of these limitations on knowledge inference. We find enhanced knowledge representation beneficial for biological knowledge representation in general, as well as for both the quality and the specificity of hypotheses proposed with LBD.
Results: Based on a systematic analysis of one binary LBD system focusing on Alzheimer’s Disease, we identify 7 types of pairwise relationship limitations in a standard knowledge graph (Lack of context for a general concept, More than two entities interacting together, Mechanism/process from a modified entity, etc.) and 3 types of negative impacts on knowledge inferred with the graph (Experimentally infeasible hypotheses, Literature-inconsistent hypotheses, and Oversimplified hypotheses explanations). We also discover an indicative distribution of different types of relationships. A pairwise relationship is an essential component in representation frameworks for analyzed knowledge discoveries. 20% are perfectly represented with pairwise relationships only. 73% of discoveries are represented by combining pairwise relationships and nested relationships. The rest 7% are represented with pairwise relationships, nested relationships, and hypergraphs.
Conclusion: We argue that the standard entity-based knowledge graph has important limitations for biological knowledge representation and downstream tasks such as proposing meaningful discoveries in LBD. However, pairwise relationships adopted in a standard knowledge graph are essential in knowledge representation for LBD. These limitations can be mitigated by integrating more semantically complex knowledge representa- tion strategies, including capturing collective interactions and allowing for nested entities. The use of more sophisticated knowledge representation will benefit biological fields with more expressive knowledge graphs. A down- stream task of knowledge graph, such as LBD, can be benefitted as well, allowing for generation of implicit knowledge discoveries and explanations for disease diagnosis, treatment, and mechanism that are more biologically meaningful.
Availability: Corpus and annotation are available at https://github.com/ READ-BioMed/readbiomed-semantics/tree/main.