Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Span-based Model for Extracting Overlapping PICO Entities from RCT Publications (2401.06791v1)

Published 8 Jan 2024 in cs.IR, cs.AI, and cs.CL

Abstract: Objectives Extraction of PICO (Populations, Interventions, Comparison, and Outcomes) entities is fundamental to evidence retrieval. We present a novel method PICOX to extract overlapping PICO entities. Materials and Methods PICOX first identifies entities by assessing whether a word marks the beginning or conclusion of an entity. Then it uses a multi-label classifier to assign one or more PICO labels to a span candidate. PICOX was evaluated using one of the best-performing baselines, EBM-NLP, and three more datasets, i.e., PICO-Corpus, and RCT publications on Alzheimer's Disease or COVID-19, using entity-level precision, recall, and F1 scores. Results PICOX achieved superior precision, recall, and F1 scores across the board, with the micro F1 score improving from 45.05 to 50.87 (p << 0.01). On the PICO-Corpus, PICOX obtained higher recall and F1 scores than the baseline and improved the micro recall score from 56.66 to 67.33. On the COVID-19 dataset, PICOX also outperformed the baseline and improved the micro F1 score from 77.10 to 80.32. On the AD dataset, PICOX demonstrated comparable F1 scores with higher precision when compared to the baseline. Conclusion PICOX excels in identifying overlapping entities and consistently surpasses a leading baseline across multiple datasets. Ablation studies reveal that its data augmentation strategy effectively minimizes false positives and improves precision.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. The well-built clinical question: a key to evidence-based decisions. ACP J. Club, 123(3):A12–3, 1995. ISSN 1056-8751.
  2. EvidenceMap: a three-level knowledge representation for medical evidence computation and comprehension. J. Am. Med. Inform. Assoc., March 2023. ISSN 1067-5027, 1527-974X. 10.1093/jamia/ocad036.
  3. AI-generated text may have a role in evidence-based medicine. Nat. Med., May 2023. ISSN 1078-8956, 1546-170X. 10.1038/s41591-023-02366-9.
  4. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open, 7(2):e012545, February 2017. ISSN 2044-6055. 10.1136/bmjopen-2016-012545.
  5. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, pages 188–191, 2003. URL https://aclanthology.org/W03-0430.
  6. Incorporating non-local information into information extraction systems by gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pages 363–370, USA, June 2005. Association for Computational Linguistics. 10.3115/1219840.1219885.
  7. End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1064–1074, Stroudsburg, PA, USA, 2016. Association for Computational Linguistics. 10.18653/v1/p16-1101.
  8. Design challenges and misconceptions in neural sequence labeling. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3879–3889, Santa Fe, New Mexico, USA, August 2018. Association for Computational Linguistics.
  9. Pretraining to recognize PICO elements from randomized controlled trial literature. Stud. Health Technol. Inform., 264:188–192, August 2019. ISSN 0926-9630, 1879-8365. 10.3233/SHTI190209.
  10. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. 10.18653/v1/N18-1202.
  11. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. 10.18653/v1/N19-1423.
  12. Domain-Specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare, 3(1):1–23, October 2021. ISSN 2691-1957. 10.1145/3458754.
  13. PICO entity extraction for preclinical animal literature. Syst. Rev., 11(1):209, September 2022. ISSN 2046-4053. 10.1186/s13643-022-02074-4.
  14. SciBERT: A pretrained language model for scientific text. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China, November 2019. Association for Computational Linguistics. 10.18653/v1/D19-1371. URL https://aclanthology.org/D19-1371.
  15. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, February 2020. ISSN 1367-4803, 1367-4811. 10.1093/bioinformatics/btz682.
  16. BioELECTRA:pretrained biomedical text encoder using discriminators. In Dina Demner-Fushman, Kevin Bretonnel Cohen, Sophia Ananiadou, and Junichi Tsujii, editors, Proceedings of the 20th Workshop on Biomedical Language Processing, pages 143–154, Online, June 2021. Association for Computational Linguistics. 10.18653/v1/2021.bionlp-1.16. URL https://aclanthology.org/2021.bionlp-1.16.
  17. A corpus with Multi-Level annotations of patients, interventions and outcomes to support language processing for medical literature. Proc Conf Assoc Comput Linguist Meet, 2018:197–207, July 2018. ISSN 0736-587X.
  18. Entity, relation, and event extraction with contextualized span representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5784–5789, Stroudsburg, PA, USA, 2019. Association for Computational Linguistics. 10.18653/v1/d19-1585.
  19. A general framework for information extraction using dynamic span graphs. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3036–3046, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. 10.18653/v1/N19-1308. URL https://aclanthology.org/N19-1308.
  20. HAMNER: Headword amplified Multi-Span distantly supervised method for domain specific named entity recognition. AAAI, 34(05):8401–8408, April 2020. ISSN 2374-3468, 2374-3468. 10.1609/aaai.v34i05.6358.
  21. Boundary enhanced neural span classification for nested named entity recognition. AAAI, 34(05):9016–9023, April 2020. ISSN 2374-3468, 2374-3468. 10.1609/aaai.v34i05.6434.
  22. SpanNER: Named entity Re-/Recognition as span prediction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7183–7195, Online, August 2021. Association for Computational Linguistics. 10.18653/v1/2021.acl-long.558.
  23. A Span-Based model for joint overlapped and discontinuous named entity recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4814–4828, Online, August 2021. Association for Computational Linguistics. 10.18653/v1/2021.acl-long.372.
  24. Nested named entity recognition with span-level graphs. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 892–903, Dublin, Ireland, May 2022. Association for Computational Linguistics. 10.18653/v1/2022.acl-long.63.
  25. Boundary smoothing for named entity recognition. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7096–7108, Dublin, Ireland, May 2022. Association for Computational Linguistics. 10.18653/v1/2022.acl-long.490.
  26. Mohammad Golam Sohrab and Md Shoaib Bhuiyan. Span-based neural model for multilingual flat and nested named entity recognition. In 2021 IEEE 10th Global Conference on Consumer Electronics (GCCE), pages 80–84, October 2021. 10.1109/GCCE53005.2021.9621966.
  27. A span-graph neural model for overlapping entity relation extraction in biomedical texts. Bioinformatics, 37(11):1581–1589, July 2021. ISSN 1367-4803, 1367-4811. 10.1093/bioinformatics/btaa993.
  28. Named entity recognition as structured span prediction. In Proceedings of the Workshop on Unimodal and Multimodal Induction of Linguistic Structures (UM-IoS), pages 1–10, Abu Dhabi, United Arab Emirates (Hybrid), December 2022a. Association for Computational Linguistics. 10.18653/v1/2022.umios-1.1.
  29. GNNer: Reducing overlapping in span-based NER using graph neural networks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 97–103, Dublin, Ireland, May 2022b. Association for Computational Linguistics. 10.18653/v1/2022.acl-srw.9.
  30. Locate and label: A two-stage identifier for nested named entity recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2782–2794, Online, August 2021. Association for Computational Linguistics. 10.18653/v1/2021.acl-long.216.
  31. PICO corpus: A publicly available corpus to support automatic data extraction from biomedical literature. In Proceedings of the first Workshop on Information Extraction from Scientific Publications, pages 26–31, Online, November 2022. Association for Computational Linguistics.
  32. Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach. Bioinformatics, 39(9), September 2023. ISSN 1367-4803, 1367-4811. 10.1093/bioinformatics/btad542.
  33. Identifying experimental evidence in biomedical abstracts relevant to Drug-Drug interactions. In Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB ’18, pages 414–418, New York, NY, USA, August 2018. Association for Computing Machinery. ISBN 9781450357944. 10.1145/3233547.3233568.
  34. Improving sentence classification in abstracts of randomized controlled trial using prompt learning. In 2022 IEEE 10th International Conference on Healthcare Informatics (ICHI), pages 606–607, June 2022. 10.1109/ICHI54592.2022.00119.
  35. Document-Level relation extraction with adaptive thresholding and localized context pooling. AAAI, 35(16):14612–14620, May 2021. ISSN 2374-3468, 2374-3468. 10.1609/aaai.v35i16.17717.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Gongbo Zhang (14 papers)
  2. Yiliang Zhou (11 papers)
  3. Yan Hu (75 papers)
  4. Hua Xu (78 papers)
  5. Chunhua Weng (16 papers)
  6. Yifan Peng (147 papers)

Summary

We haven't generated a summary for this paper yet.