Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SoftTiger: A Clinical Foundation Model for Healthcare Workflows (2403.00868v3)

Published 1 Mar 2024 in cs.CL and cs.AI
SoftTiger: A Clinical Foundation Model for Healthcare Workflows

Abstract: We introduce SoftTiger, a clinical LLM (CLaM) designed as a foundation model for healthcare workflows. The narrative and unstructured nature of clinical notes is a major obstacle for healthcare intelligentization. We address a critical problem of structuring clinical notes into clinical data, according to international interoperability standards. We collect and annotate data for three subtasks, namely, international patient summary, clinical impression and medical encounter. We then supervised fine-tuned a state-of-the-art LLM using public and credentialed clinical data. The training is orchestrated in a way that the target model can first support basic clinical tasks such as abbreviation expansion and temporal information extraction, and then learn to perform more complex downstream clinical tasks. Moreover, we address several modeling challenges in the healthcare context, e.g., extra long context window. Our blind pairwise evaluation shows that SoftTiger outperforms other popular open-source models and GPT-3.5, comparable to Gemini-pro, with a mild gap from GPT-4. We believe that LLMs may become a step-stone towards healthcare digitalization and democratization. Therefore, we publicly release SoftTiger models at scales of 13 billion and 70 billion parameters, as well as datasets and code for our innovative scalable evaluation, hopefully, making a significant contribution to the healthcare industry.

Introducing SoftTiger: Elevating Healthcare Workflows through Clinical LLM

Overview

The healthcare industry confronts a myriad of challenges exacerbated by overburdened clinicians facing an unsustainable demand for their time and expertise. The burgeoning field of LLMs offers a beacon of hope, promising to streamline healthcare workflows through intelligized clinical notes processing. In this context, the paper presents SoftTiger, a clinical LLM specifically engineered to restructure clinical notes into structured data, adhering to international interoperability standards. Beyond presenting the model, the authors elaborate on their methodologies for tackling unique domain-specific challenges, such as handling extended context windows and medical jargons. This summary explores the foundational aspects of SoftTiger, unpacking its contributions, methodologies, and implications for the future of healthcare digitalization.

The Genesis of SoftTiger

At the heart of healthcare digitalization is the challenge of distilling actionable insights from the extensive narrative found in clinical notes. Traditional LLMs stumble upon two significant hurdles when applied to clinical practices: aligning with healthcare workflows and managing the extensive length of clinical notes. Through a meticulous analysis of existing clinical tasks, the authors identify the critical gap SoftTiger aims to bridge - transforming unstructured clinical notes into structured clinical data. The model hence focuses on essential clinical subtasks such as Patient Clinical Summary, Clinical Impression, and Medical Encounter, aiming to provide a robust foundation for various healthcare applications.

Model Architecture and Training

SoftTiger represents a significant leap forward, operating on the cutting edge of LLM technology by leveraging a state-of-the-art foundational model, fine-tuned specifically for the healthcare domain. Here are key takeaways from the training phase:

  • Foundation Model Selection: Building on TigerBot, a foundation model known for its rich biomedical vocabulary and general-purpose task handling, SoftTiger enriches its capabilities through supervised fine-tuning (SFT), optimizing it for clinical applications.
  • Training Data: The core of SoftTiger’s training regime comprised 134 million tokens from a mix of domain-specific and general-purpose datasets, with a significant emphasis on clinical structuring tasks.
  • Challenges in Context: Addressing the challenge of lengthy clinical notes, SoftTiger efficiently handles a context window up to 8k tokens, a feature critical for the domain.

Evaluation and Contributions

The evaluation of SoftTiger, performed through next-token prediction and blind pairwise comparisons (LLM-as-a-Judge), showcases its superiority against widely used models like Llama-2 and even GPT-3.5, while closely rivaling GPT-4. These evaluations emphasize SoftTiger's capability in accurately processing and structuring clinical notes. The contributions of this paper are manifold:

  • Release of SoftTiger Models: The authors have made two configurations of SoftTiger (13 billion and 70 billion parameters) publicly available, catering to a broad spectrum of clinical data processing needs.
  • Bespoke Training and Evaluation Frameworks: By innovating in training methodologies and employing a cost-effective evaluation mechanism, the paper lays a foundation for future refinement and adaptation in the healthcare sector.
  • Open-Source Contributions: In addition to the models, the authors release their training datasets, codes, and evaluation protocols, ensuring transparency and fostering community engagement in further development.

Future Directions and Implications

The unveiling of SoftTiger ignites a plethora of opportunities for advancing healthcare digitalization. By effectively structuring clinical data, SoftTiger not only streamlines clinical workflows but also paves the way for deploying more sophisticated AI-driven diagnostic and treatment strategies. As future directions, the authors ambitiously aim to tackle hallucination issues and explore retrieval-augmented generation methods, promising avenues that could further enhance the reliability and utility of clinical LLMs.

SoftTiger represents a pivotal step towards the digitalization and democratization of healthcare, encapsulating the tremendous potential of LLM technology to revolutionize an industry at the brink of a transformative shift.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. A.Maria Nancy, R. 2020. A Review on Ununstructured Data in Medical Data. Journal of Critical Reviews, 7.
  2. The global health workforce stock and distribution in 2020 and 2030: a threat to equity and ‘universal’ health coverage? BMJ Global Health.
  3. Language Models are Few-Shot Learners. arXiv:2005.14165v4 [cs.CL].
  4. TigerBot: An Open Multilingual Multitask LLM. arXiv:2312.08688 [cs.CL].
  5. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models. arXiv:2311.16079 [cs.CL].
  6. MIMIC-IV, a freely accessible electronic health record dataset. Nature Scientific Data.
  7. Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes. arXiv:2309.00237 [cs.CL].
  8. Medical error—the third leading cause of death in the US. National Library of Medicine.
  9. Microsoft. 2023. Megatron-DeepSpeed. GitHub repository.
  10. Understanding the perceived role of electronic health records and workflow fragmentation on clinician documentation burden in emergency departments. Journal of the American Medical Informatics Association.
  11. Burden of serious harms from diagnostic error in the USA. BMJ Quality and Safety.
  12. Patterns in Physician Burnout in a Stable-Linked Cohort. JAMA Network Open.
  13. Pichai, S. 2023. Introducing Gemini: our largest and most capable AI model. https://blog.google/technology/ai/google-gemini-ai/.
  14. Revisiting the Time Needed to Provide Adult Primary Care. Society of General Internal Medicine.
  15. Allocation of Physician Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties. Annals of Internal Medicine.
  16. Distill and Replay for Continual Language Learning. In Proceedings of the 28th International Conference on Computational Linguistics.
  17. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL].
  18. The shaky foundations of large language models and foundation models for electronic health records. NPJ Digital Medicine.
  19. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv:2306.05685 [cs.CL].
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ye Chen (52 papers)
  2. Igor Couto (1 paper)
  3. Wei Cai (130 papers)
  4. Cong Fu (24 papers)
  5. Bruno Dorneles (1 paper)
X Twitter Logo Streamline Icon: https://streamlinehq.com