Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging deep active learning to identify low-resource mobility functioning information in public clinical notes (2311.15946v1)

Published 27 Nov 2023 in cs.CL

Abstract: Function is increasingly recognized as an important indicator of whole-person health, although it receives little attention in clinical natural language processing research. We introduce the first public annotated dataset specifically on the Mobility domain of the International Classification of Functioning, Disability and Health (ICF), aiming to facilitate automatic extraction and analysis of functioning information from free-text clinical notes. We utilize the National NLP Clinical Challenges (n2c2) research dataset to construct a pool of candidate sentences using keyword expansion. Our active learning approach, using query-by-committee sampling weighted by density representativeness, selects informative sentences for human annotation. We train BERT and CRF models, and use predictions from these models to guide the selection of new sentences for subsequent annotation iterations. Our final dataset consists of 4,265 sentences with a total of 11,784 entities, including 5,511 Action entities, 5,328 Mobility entities, 306 Assistance entities, and 639 Quantification entities. The inter-annotator agreement (IAA), averaged over all entity types, is 0.72 for exact matching and 0.91 for partial matching. We also train and evaluate common BERT models and state-of-the-art Nested NER models. The best F1 scores are 0.84 for Action, 0.7 for Mobility, 0.62 for Assistance, and 0.71 for Quantification. Empirical results demonstrate promising potential of NER models to accurately extract mobility functioning information from clinical text. The public availability of our annotated dataset will facilitate further research to comprehensively capture functioning information in electronic health records (EHRs).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Use of functional assessment to define therapeutic goals and treatment. Journal of the American Geriatrics Society, 67(9):1782–1790, 2019.
  2. Functioning: the third health indicator in the health system and the key indicator for rehabilitation. European journal of physical and rehabilitation medicine, 53(1):134–138, 2017.
  3. Optimizing health system response to patient’s needs: an argument for the importance of functioning information. Disability and rehabilitation, 40(19):2325–2330, 2018.
  4. Clinical information extraction applications: a literature review. Journal of biomedical informatics, 77:34–49, 2018.
  5. WHO. International classification of functioning, disability, and health : Icf. geneva: World health organization, 2001.
  6. Inductive identification of functional status information and establishing a gold standard corpus: A case study on the mobility domain. In 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2319–2321. IEEE, 2017.
  7. Embedding transfer for low-resource medical named entity recognition: A case study on patient mobility. In Proceedings of the BioNLP 2018 workshop, pages 1–11, Melbourne, Australia, July 2018. Association for Computational Linguistics.
  8. Classifying the reported ability in clinical mobility descriptions. arXiv preprint arXiv:1906.03348, 2019.
  9. Linking free text documentation of functioning and disability to the icf with natural language processing. Frontiers in rehabilitation sciences, 2:742702, 2021.
  10. A comprehensive study of mobility functioning information in clinical notes: entity hierarchy, corpus annotation, and sequence labeling. International journal of medical informatics, 147:104351, 2021.
  11. Department of Biomedical Informatics at Harvard Medical School. n2c2 nlp research data sets, 2021.
  12. Deep active learning for named entity recognition. arXiv preprint arXiv:1707.05928, 2017.
  13. Active deep learning-based annotation of electroencephalography reports for cohort identification. AMIA Summits on Translational Science Proceedings, 2017:229, 2017.
  14. Alice: Active learning with contrastive natural language explanations. arXiv preprint arXiv:2009.10259, 2020.
  15. Active learning for sequence tagging with deep pre-trained models and bayesian uncertainty estimates. arXiv preprint arXiv:2101.08133, 2021.
  16. David D Lewis. A sequential algorithm for training text classifiers: Corrigendum and additional data. In Acm Sigir Forum, volume 29, pages 13–19. ACM New York, NY, USA, 1995.
  17. Query by committee. In Proceedings of the fifth annual workshop on Computational learning theory, pages 287–294, 1992.
  18. An analysis of active learning strategies for sequence labeling tasks. In proceedings of the 2008 conference on empirical methods in natural language processing, pages 1070–1079, 2008.
  19. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  20. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 363–370, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics.
  21. Representation of functional status concepts from clinical documents and social media sources by standard terminologies. In AMIA Annual Symposium Proceedings, volume 2015, page 795. American Medical Informatics Association, 2015.
  22. A method to compare icf and snomed ct for coverage of us social security administration’s disability listing criteria. In AMIA Annual Symposium Proceedings, volume 2015, page 1224. American Medical Informatics Association, 2015.
  23. Icf based automation system for spinal cord injuries rehabilitation. In 2014 9th International Conference on Computer Engineering & Systems (ICCES), pages 192–197. IEEE, 2014.
  24. A novel model for predicting rehospitalization risk incorporating physical function, cognitive status, and psychosocial support using natural language processing. Medical care, 55(3):261–266, 2017.
  25. Characterizing functional health status of surgical patients in clinical notes. AMIA Summits on Translational Science Proceedings, 2018:379, 2018.
  26. Human and automated coding of rehabilitation discharge summaries according to the international classification of functioning, disability, and health. Journal of the American Medical Informatics Association, 13(5):508–515, 2006.
  27. Broadening horizons: the case for capturing function and the role of health informatics in its use. BMC Public Health, 19:1–13, 2019.
  28. A whole-person function dictionary for the mobility, self-care and domestic life domains: a seedset expansion approach. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2850–2855, Marseille, France, June 2022. European Language Resources Association.
  29. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  30. Mimic-iv. PhysioNet. Available online at: https://physionet. org/content/mimiciv/1.0/(accessed August 23, 2021), 2020.
  31. Biomedical and clinical english model packages in the stanza python nlp library. arXiv preprint arXiv:2007.14640, 2020.
  32. Apache lucene 4. In OSIR@SIGIR, 2012.
  33. NLTK: The natural language toolkit. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, pages 214–217, Barcelona, Spain, July 2004. Association for Computational Linguistics.
  34. The inception platform: Machine-assisted and knowledge-oriented interactive annotation. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 5–9. Association for Computational Linguistics, June 2018. Event Title: The 27th International Conference on Computational Linguistics (COLING 2018).
  35. Committee-based sampling for training probabilistic classifiers. In Machine Learning Proceedings 1995, pages 150–157. Elsevier, 1995.
  36. Recognising nested named entities in biomedical text. In Biological, translational, and clinical language processing, pages 65–72, 2007.
  37. Publicly available clinical bert embeddings. arXiv preprint arXiv:1904.03323, 2019.
  38. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.
  39. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
  40. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  41. Pyramid: A layered model for nested named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5918–5928, 2020.
  42. Optimizing bi-encoder for named entity recognition via contrastive learning. arXiv preprint arXiv:2208.14565, 2022.
  43. The automatic content extraction (ace) program-tasks, data, and evaluation. In Lrec, volume 2, pages 837–840. Lisbon, 2004.
  44. Ace 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia, 57:45, 2006.
  45. Nne: A dataset for nested named entity recognition in english newswire. arXiv preprint arXiv:1906.01359, 2019.
  46. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
  47. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  48. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  49. Zero-shot clinical entity recognition using chatgpt. arXiv preprint arXiv:2303.16416, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Tuan-Dung Le (2 papers)
  2. Zhuqi Miao (4 papers)
  3. Samuel Alvarado (1 paper)
  4. Brittany Smith (2 papers)
  5. William Paiva (2 papers)
  6. Thanh Thieu (3 papers)
Citations (1)