Presentation
Constructing a Large-Scale Biomedical Knowledge Graph and Its Applications in Drug Discovery
DescriptionIn the past few decades, the biomedical research community has acquired a wealth of knowledge, much of which is stored in scientific literature as unstructured text. Converting this text into structured form is crucial for developing new methodologies and applications that can fully utilize this knowledge. To achieve this goal, two basic problems must be addressed: named entity recognition (NER) and relation extraction (RE). NER involves identifying the concepts or entities in texts, such as diseases, genes/proteins, and chemical compounds. RE, on the other hand, aims to extract the relationships between these entities. The information extracted from NER and RE can be used to create knowledge graphs, where nodes represent entities in the text and edges represent their relationships. This presentation will discuss our team's work on the LitCoin NLP Challenge organized by NIH, for which we were awarded first place. Using pipelines developed for the challenge, we processed all PubMed articles and created a large-scale biomedical knowledge graph. The accuracy of this large-scale relation extraction is estimated using manual verification of a sample of the extracted data and found to be at the human annotation level. We also incorporated relation information from 40 public databases and relations inferred from publicly available genomics datasets. Our knowledge graph consists of over 11 million entities and more than 40 million relations. We have developed versatile query functions and knowledge discovery tools for accessing and mining structured data in the knowledge graph. Finally, we will discuss some drug discovery-related applications enabled by this large-scale knowledge graph.