RECAP-KG: Mining Knowledge Graphs from Raw GP Notes for Remote COVID-19 Assessment in Primary Care (2306.17175v2)
Abstract: Clinical decision-making is a fundamental stage in delivering appropriate care to patients. In recent years several decision-making systems designed to aid the clinician in this process have been developed. However, technical solutions currently in use are based on simple regression models and are only able to take into account simple pre-defined multiple-choice features, such as patient age, pre-existing conditions, smoker status, etc. One particular source of patient data, that available decision-making systems are incapable of processing is the collection of patient consultation GP notes. These contain crucial signs and symptoms - the information used by clinicians in order to make a final decision and direct the patient to the appropriate care. Extracting information from GP notes is a technically challenging problem, as they tend to include abbreviations, typos, and incomplete sentences. This paper addresses this open challenge. We present a framework that performs knowledge graph construction from raw GP medical notes written during or after patient consultations. By relying on support phrases mined from the SNOMED ontology, as well as predefined supported facts from values used in the RECAP (REmote COVID-19 Assessment in Primary Care) patient risk prediction tool, our graph generative framework is able to extract structured knowledge graphs from the highly unstructured and inconsistent format that consultation notes are written in. Our knowledge graphs include information about existing patient symptoms, their duration, and their severity. We apply our framework to consultation notes of COVID-19 patients in the UK COVID-19 Clinical Assesment Servcie (CCAS) patient dataset. We provide a quantitative evaluation of the performance of our framework, demonstrating that our approach has better accuracy than traditional NLP methods when answering questions about patients.
- Deep learning-based natural language processing for screening psychiatric patients. Frontiers in Psychiatry, 11, 1 2021. ISSN 16640640. doi: 10.3389/fpsyt.2020.533949.
- A deep learning system for differential diagnosis of skin diseases. Nature Medicine, 26(6):900–908, 6 2020. ISSN 1546170X. doi: 10.1038/s41591-020-0842-3.
- Knowledge graph generation from text. 11 2022. URL http://arxiv.org/abs/2211.10511.
- From unstructured text to causal knowledge graphs: a transformer-based approach. 2 2022. URL http://arxiv.org/abs/2202.11768.
- AutoKG: constructing virtual knowledge graphs from unstructured documents for question answering. 8 2020. URL http://arxiv.org/abs/2008.08995.
- Building knowledge graphs from unstructured texts: applications and impact analyses in cybersecurity education. Information (Switzerland), 13(11), 11 2022. ISSN 20782489. doi: 10.3390/info13110526.
- A practical approach to constructing a knowledge graph for cybersecurity. Engineering, 4(1):53–60, 2 2018. ISSN 20958099. doi: 10.1016/j.eng.2018.01.004.
- KnowIME: a system to construct a knowledge graph for intelligent manufacturing equipment. IEEE Access, 8:41805–41813, 2020. ISSN 21693536. doi: 10.1109/ACCESS.2020.2977136.
- Nitisha Jain. Domain-specific knowledge graph construction for semantic analysis, pages 250–260. 11 2020. ISBN 978-3-030-62326-5. doi: 10.1007/978-3-030-62327-2˙40.
- SNOMED CT standard ontology based on the ontology for general medical science. BMC Medical Informatics and Decision Making, 18(1), 8 2018. ISSN 14726947. doi: 10.1186/s12911-018-0651-5.
- Remote COVID-19 Assessment in Primary Care (RECAP) risk prediction tool: derivation and real-world validation studies. The Lancet Digital Health, 4(9):e646–e656, 9 2022. ISSN 25897500. doi: 10.1016/S2589-7500(22)00123-6.
- Yuwen Zhang. Bert for question answering on squad 2.0. 2019. URL https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/reports/default/15848021.pdf.
- Bilal Abu-Salih. Domain-specific knowledge graphs: A survey. CoRR, abs/2011.00235, 2020. URL https://arxiv.org/abs/2011.00235.
- Amit Singhal. Introducing the knowledge graph: things, not strings, 2012. URL https://www.blog.google/products/search/introducing-knowledge-graph-things-not/. 2020-11-13.
- Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10):78–85, 9 2014. ISSN 15577317. doi: 10.1145/2629489.
- BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217–250, 12 2012. ISSN 00043702. doi: 10.1016/j.artint.2012.07.001.
- Entity extraction: From unstructured text to dbpedia rdf triples. In WoLE@ISWC, 2012.
- Expert-guided entity extraction using expressive rules. In SIGIR 2019 - Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1353–1356. Association for Computing Machinery, Inc, 7 2019. ISBN 9781450361729. doi: 10.1145/3331184.3331392.
- Seq2KG: an end-to-end neural model for domain agnostic knowledge graph (not text graph) construction from text. Technical report, 2020. URL https://spacy.io/.
- HKGB: an inclusive, extensible, intelligent, semi-auto-constructed knowledge graph framework for healthcare with clinicians’ expertise incorporated. Information Processing and Management, 57(6), 11 2020. ISSN 03064573. doi: 10.1016/j.ipm.2020.102324.
- Concurrence of big data analytics and healthcare: A systematic review, 6 2018. ISSN 18728243.
- Automated domain-specific healthcare knowledge graph curation framework: Subarachnoid hemorrhage as phenotype. Expert Systems with Applications, 145, 5 2020. ISSN 09574174. doi: 10.1016/j.eswa.2019.113120.
- BERT based clinical knowledge extraction for biomedical knowledge graph construction and analysis. Computer Methods and Programs in Biomedicine Update, 1:100042, 2021. ISSN 26669900. doi: 10.1016/j.cmpbup.2021.100042.
- Assessment of electronic health record use between US and non-US health systems. JAMA Internal Medicine, 181(2):251–259, 2 2021. ISSN 21686114. doi: 10.1001/jamainternmed.2020.7071.
- AllenNLP: a deep semantic natural language processing platform. 3 2018. URL http://arxiv.org/abs/1803.07640.
- BioPortal: Enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Research, 39(SUPPL. 2), 7 2011. ISSN 03051048. doi: 10.1093/nar/gkr469.
- Leap Beyond. Scrubadub: Remove personally identifiable information from free text, Accessed on 26 Februrary, 2023. URL https://github.com/LeapBeyond/scrubadub.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 10 2018. URL http://arxiv.org/abs/1810.04805.
- SQuAD: 100,000+ questions for machine comprehension of text. 6 2016. URL http://arxiv.org/abs/1606.05250.
- Torsten Schaub. Answer set solving in practice, Accessed on 26 Februrary, 2023. URL https://www.cs.uni-potsdam.de/~torsten/Potassco/Slides/asp.pdf.
- URL honnibal.github.io/spaCy.
- Adapting bidirectional encoder representations from transformers (BERT) to assess clinical semantic textual similarity: algorithm development and validation study. JMIR Medical Informatics, 9(2), 2 2021. ISSN 22919694. doi: 10.2196/22795.
- Inductive learning of answer set programs from noisy examples. 8 2018. URL http://arxiv.org/abs/1808.08441.