Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 81 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization (2405.04163v2)

Published 7 May 2024 in cs.CL

Abstract: This work presents a dynamic vocabulary adaptation strategy, MEDVOC, for fine-tuning pre-trained LLMs (PLMs) like BertSumAbs, BART, and PEGASUS for improved medical text summarization. In contrast to existing domain adaptation approaches in summarization, MEDVOC treats vocabulary as an optimizable parameter and optimizes the PLM vocabulary based on fragment score conditioned only on the downstream task's reference summaries. Unlike previous works on vocabulary adaptation (limited only to classification tasks), optimizing vocabulary based on summarization tasks requires an extremely costly intermediate fine-tuning step on large summarization datasets. To that end, our novel fragment score-based hyperparameter search very significantly reduces this fine-tuning time -- from 450 days to less than 2 days on average. Furthermore, while previous works on vocabulary adaptation are often primarily tied to single PLMs, MEDVOC is designed to be deployable across multiple PLMs (with varying model vocabulary sizes, pre-training objectives, and model sizes) -- bridging the limited vocabulary overlap between the biomedical literature domain and PLMs. MEDVOC outperforms baselines by 15.74% in terms of Rouge-L in zero-shot setting and shows gains of 17.29% in high Out-Of-Vocabulary (OOV) concentrations. Our human evaluation shows MEDVOC generates more faithful medical summaries (88% compared to 59% in baselines). We make the codebase publicly available at https://github.com/gb-kgp/MEDVOC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Improving the factual accuracy of abstractive clinical text summarization using multi-objective optimization. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1615–1618. IEEE, 2022.
  2. Another look at the data sparsity problem. In Text, Speech and Dialogue: 9th International Conference, TSD 2006, Brno, Czech Republic, September 11-15, 2006. Proceedings 9, pages 327–334. Springer, 2006.
  3. On the summarization of consumer health questions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2228–2234, July 2019.
  4. Rethinking why intermediate-task fine-tuning works. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 706–713, November 2021.
  5. Meditron-70b: Scaling medical pretraining for large language models. arXiv preprint arXiv:2311.16079, 2023.
  6. BDKG at MEDIQA 2021: System report for the radiology report summarization task. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 103–111, June 2021.
  7. Taming pre-trained language models with n-gram representations for low-resource domain adaptation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3336–3349, August 2021.
  8. Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 704–717, June 2021.
  9. SummEval: Re-evaluating summarization evaluation. Transactions of the Association for Computational Linguistics, 9:391–409, 2021.
  10. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare, 3(1), oct 2021.
  11. damo_nlp at MEDIQA 2021: Knowledge-based preprocessing and coverage-oriented reranking for medical question summarization. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 112–118, June 2021.
  12. AVocaDo: Strategy for adapting vocabulary to downstream domain. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4692–4700, November 2021.
  13. PubMedQA: A dataset for biomedical research question answering. In Proceedings of the EMNLP-IJCNLP 2019, pages 2567–2577, November 2019.
  14. Attention-based clinical note summarization. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, SAC ’22, page 813–820, 2022.
  15. Vocabulary modifications for domain-adaptive pretraining of clinical language models. In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - HEALTHINF, pages 180–188. INSTICC, 2022.
  16. Domain Adaptation with Pre-trained Transformers for Query-Focused Abstractive Text Summarization. Computational Linguistics, 48(2):279–320, 06 2022.
  17. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 09 2019.
  18. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, 2020.
  19. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, 2004.
  20. Text summarization with pretrained encoders. In Proceedings of EMNLP-IJCNLP, pages 3730–3740, 2019.
  21. Task-adaptive tokenization: Enhancing long-form text generation efficacy in mental health and beyond. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15264–15281, 2023.
  22. Development of a corpus for evidence based medicine summarisation. In Proceedings of the Australasian Language Technology Association Workshop, pages 86–94, 2011.
  23. Pre-training transformers on indian legal text, 2022.
  24. Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks, 2019.
  25. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  26. How good is your tokenizer? on the monolingual performance of multilingual language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3118–3135, August 2021.
  27. Rare words: A major problem for contextualized embeddings and how to fix it by attentive mimicking. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):8766–8774, Apr. 2020.
  28. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers, pages 1073–1083, 2017.
  29. Quickumls: a fast, unsupervised approach for medical concept extraction. In MedIR workshop, SIGIR, pages 1–4, 2016.
  30. Intermediate domain finetuning for weakly supervised domain-adaptive clinical NER. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 320–325, July 2023.
  31. exBERT: Extending pre-trained models with domain-specific vocabulary under constrained training resources. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1433–1439, November 2020.
  32. An overview of the bioasq large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, 16:138, 2015.
  33. Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016.
  34. Pmc-llama: Towards building open-source language models for medicine, 2023.
  35. Pre-trained language models with domain knowledge for biomedical extractive summarization. Knowledge-Based Systems, 252:109460, 2022.
  36. Factreranker: Fact-guided reranker for faithful radiology report summarization. arXiv preprint arXiv:2303.08335, 2023.
  37. A survey for biomedical text summarization: From pre-trained to large language models, 2023.
  38. Vocabulary learning via optimal transport for neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7361–7373, August 2021.
  39. Retrieval-augmented domain adaptation of language models. In Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023), pages 54–64, July 2023.
  40. Chq-summ: A dataset for consumer healthcare question summarization. arXiv preprint arXiv:2206.06581, 2022.
  41. BioBART: Pretraining and evaluation of a biomedical generative language model. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 97–109, May 2022.
  42. PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 11328–11339, 13–18 Jul 2020.
  43. Bertscore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR, 2020.
  44. Leveraging pretrained models for automatic summarization of doctor-patient conversations. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3693–3712, November 2021.
  45. Biomedgpt: A unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal tasks. arXiv preprint arXiv:2305.17100, 2023.
  46. Famesumm: Investigating and improving faithfulness of medical summarization. arXiv preprint arXiv:2311.02271, 2023.
  47. Parameter-efficient fine-tuning with layer pruning on free-text sequence-to-sequence modeling, 2023.
Citations (2)

Summary

  • The paper introduces MEDVOC, a method that dynamically adjusts vocabularies in pre-trained language models to better summarize medical texts.
  • It utilizes a fragment score to identify and optimize key out-of-vocabulary terms by intersecting with medical corpora, reducing fine-tuning time from 450 to 2 days.
  • Experimental results show a 15.74% average improvement in Rouge-L scores and enhanced factual consistency, highlighting its cross-PLM applicability.

MEDVOC: Dynamic Vocabulary Adaptation for Medical Text Summarization

Introduction

Let’s talk about summarizing medical texts. It’s a handy task, especially in healthcare where concise, accurate summaries of clinical records, health queries, or radiology reports can be incredibly useful. But here’s the kicker: most existing models (like BertSumAbs, BART, and PEGASUS) were initially trained on open-domain texts, which means they’re not very good at handling medical terminology. That's where this paper introduces MEDVOC, a novel method that tweaks the vocabularies of pre-trained LLMs (PLMs) to better summarize medical texts.

How MEDVOC Works

MEDVOC approaches vocabulary as a dynamic, tunable parameter rather than something fixed. It uses the “fragment score,” a clever trick that measures how words are broken down into subwords by a tokenizer. Lower fragment scores mean the tokenizer is doing a good job. Let’s break down how MEDVOC smartly uses this to adapt vocabularies:

Dynamic Vocabulary Construction

  1. Identify Relevant Subwords: MEDVOC first identifies important out-of-vocabulary (OOV) words from the reference summaries that are crucial but poorly tokenized (split into many subwords).
  2. Intersect with a Medical Corpus: It then narrows down this list by intersecting it with a set of frequently occurring medical terms from a large corpus of PubMed articles.
  3. Hyperparameter Tuning: By treating vocabulary size and composition as hyperparameters, MEDVOC runs an efficient search to optimize these values based on the fragment score, skipping the need for extremely lengthy intermediate fine-tuning.

This procedure significantly cuts down the time needed for fine-tuning from a theoretical 450 days to just about 2 days on average.

Key Points

  • Cross-PLM Applicability: One standout aspect of MEDVOC is its flexibility. Unlike other vocab-adapting methods tied to specific models, MEDVOC can be applied to various PLMs.
  • Efficient Fine-Tuning: The fragment score-based hyperparameter search allows MEDVOC to avoid the traditionally expensive intermediate fine-tuning steps.
  • Faithful Summaries: The human evaluation results showed that MEDVOC-generated summaries were significantly more faithful to the source material compared to other baselines.

Experimental Results

The numerical results from the experiments are pretty compelling. In zero-shot scenarios (where the model generates summaries without any task-specific fine-tuning), MEDVOC outperformed the baselines by an average of 15.74% in Rouge-L scores. The method also showed average improvements of 17.29% in situations with high OOV concentrations. Besides Rouge-L, which measures the overall content similarity, MEDVOC also improved on Concept Score, signifying better inclusion of medical concepts.

Practical Implications

  1. Real-Time Summarization: Medical practitioners could potentially use MEDVOC to generate real-time, accurate summaries of medical documents.
  2. Enhanced NLP Tools: Integrating MEDVOC into NLP pipelines could improve the relevance and faithfulness of automatically generated medical records and query responses.

Future Directions

Imagine a world where summarizing complicated medical texts is as easy as pressing a button. This paper certainly moves us closer to that reality. The authors suggest extending MEDVOC into multi-document summarization, which could be another game-changer. Also, since MEDVOC improves the factual consistency of summaries, it might be a valuable addition to current models focused on improving the factual accuracy of textual outputs.

Overall, while MEDVOC is not labeled as revolutionary, it does introduce a thoughtfully designed, effective method for improving medical text summarization. By optimizing vocabularies dynamically and efficiently, MEDVOC sets a new standard in handling domain-specific terms in NLP tasks. It’s a neat piece of work that could have significant real-world applications.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 posts and received 25 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube