MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization (2405.04163v2)
Abstract: This work presents a dynamic vocabulary adaptation strategy, MEDVOC, for fine-tuning pre-trained LLMs (PLMs) like BertSumAbs, BART, and PEGASUS for improved medical text summarization. In contrast to existing domain adaptation approaches in summarization, MEDVOC treats vocabulary as an optimizable parameter and optimizes the PLM vocabulary based on fragment score conditioned only on the downstream task's reference summaries. Unlike previous works on vocabulary adaptation (limited only to classification tasks), optimizing vocabulary based on summarization tasks requires an extremely costly intermediate fine-tuning step on large summarization datasets. To that end, our novel fragment score-based hyperparameter search very significantly reduces this fine-tuning time -- from 450 days to less than 2 days on average. Furthermore, while previous works on vocabulary adaptation are often primarily tied to single PLMs, MEDVOC is designed to be deployable across multiple PLMs (with varying model vocabulary sizes, pre-training objectives, and model sizes) -- bridging the limited vocabulary overlap between the biomedical literature domain and PLMs. MEDVOC outperforms baselines by 15.74% in terms of Rouge-L in zero-shot setting and shows gains of 17.29% in high Out-Of-Vocabulary (OOV) concentrations. Our human evaluation shows MEDVOC generates more faithful medical summaries (88% compared to 59% in baselines). We make the codebase publicly available at https://github.com/gb-kgp/MEDVOC.
- Improving the factual accuracy of abstractive clinical text summarization using multi-objective optimization. In 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 1615–1618. IEEE, 2022.
- Another look at the data sparsity problem. In Text, Speech and Dialogue: 9th International Conference, TSD 2006, Brno, Czech Republic, September 11-15, 2006. Proceedings 9, pages 327–334. Springer, 2006.
- On the summarization of consumer health questions. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2228–2234, July 2019.
- Rethinking why intermediate-task fine-tuning works. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 706–713, November 2021.
- Meditron-70b: Scaling medical pretraining for large language models. arXiv preprint arXiv:2311.16079, 2023.
- BDKG at MEDIQA 2021: System report for the radiology report summarization task. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 103–111, June 2021.
- Taming pre-trained language models with n-gram representations for low-resource domain adaptation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3336–3349, August 2021.
- Improving zero and few-shot abstractive summarization with intermediate fine-tuning and data augmentation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 704–717, June 2021.
- SummEval: Re-evaluating summarization evaluation. Transactions of the Association for Computational Linguistics, 9:391–409, 2021.
- Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare, 3(1), oct 2021.
- damo_nlp at MEDIQA 2021: Knowledge-based preprocessing and coverage-oriented reranking for medical question summarization. In Proceedings of the 20th Workshop on Biomedical Language Processing, pages 112–118, June 2021.
- AVocaDo: Strategy for adapting vocabulary to downstream domain. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4692–4700, November 2021.
- PubMedQA: A dataset for biomedical research question answering. In Proceedings of the EMNLP-IJCNLP 2019, pages 2567–2577, November 2019.
- Attention-based clinical note summarization. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, SAC ’22, page 813–820, 2022.
- Vocabulary modifications for domain-adaptive pretraining of clinical language models. In Proceedings of the 15th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2022) - HEALTHINF, pages 180–188. INSTICC, 2022.
- Domain Adaptation with Pre-trained Transformers for Query-Focused Abstractive Text Summarization. Computational Linguistics, 48(2):279–320, 06 2022.
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 09 2019.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, 2020.
- Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, 2004.
- Text summarization with pretrained encoders. In Proceedings of EMNLP-IJCNLP, pages 3730–3740, 2019.
- Task-adaptive tokenization: Enhancing long-form text generation efficacy in mental health and beyond. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15264–15281, 2023.
- Development of a corpus for evidence based medicine summarisation. In Proceedings of the Australasian Language Technology Association Workshop, pages 86–94, 2011.
- Pre-training transformers on indian legal text, 2022.
- Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks, 2019.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- How good is your tokenizer? on the monolingual performance of multilingual language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3118–3135, August 2021.
- Rare words: A major problem for contextualized embeddings and how to fix it by attentive mimicking. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):8766–8774, Apr. 2020.
- Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers, pages 1073–1083, 2017.
- Quickumls: a fast, unsupervised approach for medical concept extraction. In MedIR workshop, SIGIR, pages 1–4, 2016.
- Intermediate domain finetuning for weakly supervised domain-adaptive clinical NER. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 320–325, July 2023.
- exBERT: Extending pre-trained models with domain-specific vocabulary under constrained training resources. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1433–1439, November 2020.
- An overview of the bioasq large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, 16:138, 2015.
- Google’s neural machine translation system: Bridging the gap between human and machine translation, 2016.
- Pmc-llama: Towards building open-source language models for medicine, 2023.
- Pre-trained language models with domain knowledge for biomedical extractive summarization. Knowledge-Based Systems, 252:109460, 2022.
- Factreranker: Fact-guided reranker for faithful radiology report summarization. arXiv preprint arXiv:2303.08335, 2023.
- A survey for biomedical text summarization: From pre-trained to large language models, 2023.
- Vocabulary learning via optimal transport for neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7361–7373, August 2021.
- Retrieval-augmented domain adaptation of language models. In Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023), pages 54–64, July 2023.
- Chq-summ: A dataset for consumer healthcare question summarization. arXiv preprint arXiv:2206.06581, 2022.
- BioBART: Pretraining and evaluation of a biomedical generative language model. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 97–109, May 2022.
- PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 11328–11339, 13–18 Jul 2020.
- Bertscore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR, 2020.
- Leveraging pretrained models for automatic summarization of doctor-patient conversations. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3693–3712, November 2021.
- Biomedgpt: A unified and generalist biomedical generative pre-trained transformer for vision, language, and multimodal tasks. arXiv preprint arXiv:2305.17100, 2023.
- Famesumm: Investigating and improving faithfulness of medical summarization. arXiv preprint arXiv:2311.02271, 2023.
- Parameter-efficient fine-tuning with layer pruning on free-text sequence-to-sequence modeling, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.