Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

WisPerMed at BioLaySumm: Adapting Autoregressive Large Language Models for Lay Summarization of Scientific Articles (2405.11950v2)

Published 20 May 2024 in cs.CL and cs.LG

Abstract: This paper details the efforts of the WisPerMed team in the BioLaySumm2024 Shared Task on automatic lay summarization in the biomedical domain, aimed at making scientific publications accessible to non-specialists. LLMs, specifically the BioMistral and Llama3 models, were fine-tuned and employed to create lay summaries from complex scientific texts. The summarization performance was enhanced through various approaches, including instruction tuning, few-shot learning, and prompt variations tailored to incorporate specific context information. The experiments demonstrated that fine-tuning generally led to the best performance across most evaluated metrics. Few-shot learning notably improved the models' ability to generate relevant and factually accurate texts, particularly when using a well-crafted prompt. Additionally, a Dynamic Expert Selection (DES) mechanism to optimize the selection of text outputs based on readability and factuality metrics was developed. Out of 54 participants, the WisPerMed team reached the 4th place, measured by readability, factuality, and relevance. Determined by the overall score, our approach improved upon the baseline by approx. 5.5 percentage points and was only approx 1.5 percentage points behind the first place.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Challenges in Domain-Specific Abstractive Summarization and How to Overcome Them. In Proceedings of the 15th International Conference on Agents and Artificial Intelligence, pages 682–689, Lisbon, Portugal. SCITEPRESS - Science and Technology Publications.
  2. AI@Meta. 2024. Llama 3 Model Card.
  3. Artificial Intelligence to Improve Patient Understanding of Radiology Reports. The Yale Journal of Biology and Medicine, 96(3):407–417.
  4. Readability Revisited: The New Dale-Chall Readability Formula, volume 1. Brookline Books.
  5. Meri Coleman and T. L. Liau. 1975. A computer readability formula designed for machine scoring. Journal of Applied Psychology, 60(2):283–284.
  6. QLoRA: Efficient Finetuning of Quantized LLMs. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, New Orleans, LA, USA.
  7. Overview of the BioLaySumm 2024 Shared Task on the Lay Summarization of Biomedical Research Articles. In The 23rd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, Bangkok, Thailand. Association for Computational Linguistics.
  8. Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10589–10604, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  9. Mistral 7B. arXiv preprint. ArXiv:2310.06825.
  10. J.P. Kincaid. 1975. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. Research Branch report. Chief of Naval Technical Training, Naval Air Station Memphis, Millington, Tennessee.
  11. SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization. Transactions of the Association for Computational Linguistics, 10:163–177.
  12. BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains. arXiv preprint. ArXiv:2402.10373.
  13. Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, pages 74–81, Barcelona, Spain.
  14. Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey. arXiv preprint. ArXiv:2305.18703.
  15. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In 7th International Conference on Learning Representations, New Orleans, LA, USA. OpenReview.net.
  16. LENS: A Learnable Evaluation Metric for Text Simplification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16383–16408, Toronto, Canada. Association for Computational Linguistics.
  17. Large Language Models: A Survey. arXiv preprint. ArXiv:2402.06196.
  18. Ankit Pal and Malaikannan Sankarasubbu. 2024. OpenBioLLMs: Advancing Open-Source Large Language Models for Healthcare and Life Sciences.
  19. Attention is All you Need. In Advances in Neural Information Processing Systems 30, pages 5998–6008, Long Beach, CA, USA.
  20. Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.
  21. AlignScore: Evaluating Factual Consistency with A Unified Alignment Function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11328–11348, Toronto, Canada. Association for Computational Linguistics.
  22. BERTScore: Evaluating Text Generation with BERT. In 8th International Conference on Learning Representations, Addis Ababa, Ethiopia. OpenReview.net.
  23. Benchmarking Large Language Models for News Summarization. Transactions of the Association for Computational Linguistics, 12:39–57.
Citations (1)

Summary

We haven't generated a summary for this paper yet.