QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training (2506.00711v1)

Published 31 May 2025 in cs.LG, cs.AI, and cs.CV

Abstract: Clinical decision-making routinely demands reasoning over heterogeneous data, yet existing multimodal LLMs (MLLMs) remain largely vision-centric and fail to generalize across clinical specialties. To bridge this gap, we introduce QoQ-Med-7B/32B, the first open generalist clinical foundation model that jointly reasons across medical images, time-series signals, and text reports. QoQ-Med is trained with Domain-aware Relative Policy Optimization (DRPO), a novel reinforcement-learning objective that hierarchically scales normalized rewards according to domain rarity and modality difficulty, mitigating performance imbalance caused by skewed clinical data distributions. Trained on 2.61 million instruction tuning pairs spanning 9 clinical domains, we show that DRPO training boosts diagnostic performance by 43% in macro-F1 on average across all visual domains as compared to other critic-free training methods like GRPO. Furthermore, with QoQ-Med trained on intensive segmentation data, it is able to highlight salient regions related to the diagnosis, with an IoU 10x higher than open models while reaching the performance of OpenAI o4-mini. To foster reproducibility and downstream research, we release (i) the full model weights, (ii) the modular training pipeline, and (iii) all intermediate reasoning traces at https://github.com/DDVD233/QoQ_Med.

Authors (4)

Wei Dai (230 papers)
Peilin Chen (19 papers)
Chanakya Ekbote (9 papers)
Paul Pu Liang (103 papers)

Summary

Overview of "QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training"

In the pursuit of advancing clinical decision support systems, "QoQ-Med" presents a novel approach that integrates multimodal data encompassing medical imaging, time-series signals, and textual reports into a cohesive learning framework. Notably, QoQ-Med-7B/32B models are introduced as foundational models capable of generalizing across diverse clinical domains, representing a pivotal step towards robust clinical multimodal understanding. This paper proposes an innovative training methodology, Domain-aware Relative Policy Optimization (DRPO), tailored to address challenges in data heterogeneity and imbalance, which are prevalent in clinical datasets spanning multiple modalities and specialties.

Key Contributions

Integration of Multi-Modal Data: Unlike previous models that predominantly cater to singular modalities or integrate only limited dimensional data, QoQ-Med effectively harmonizes 1D, 2D, and 3D data inputs. This allows for comprehensive clinical reasoning involving ECG, EEG signals, medical images like CT and MRIs, and unstructured clinical textual data.
Domain-aware Relative Policy Optimization (DRPO): DRPO emerges as a novel training paradigm crafted specifically to mitigate the issues of imbalanced data distributions prevalent in multimodal clinical datasets. This approach scales the RL optimization process by learning domain-specific scaling factors, thus promoting balanced learning across both abundant and scarce datasets.
Enhanced Diagnostic Performance: Empirical results underscore a 43% improvement in macro-F1 scores across visual clinical domains when compared to existing RL approaches. Additionally, QoQ-Med achieves a 10-fold improvement in Intersection over Union (IoU) rate for segmentation tasks, illustrating its utility in highlighting diagnostically relevant regions in medical images.

Methodology

Model Architecture: The architecture comprises a vision encoder that processes image data, and time-series input encoders integrated into a LLM backbone able to integrate textual data. The model output includes predictive diagnoses, interpretative reasoning traces, and spatial annotations (bounding boxes) that highlight regions related to diagnostic insights.

DRPO Training: The DRPO training pipeline is characterized by its hierarchical scaling mechanism. This involves scaling the norms of rewards based on domain rarity and difficulty levels, thus focusing model learning on underrepresented clinical challenges without succumbing to optimization bias towards abundant easier cases.

Implications and Future Directions

The introduction of QoQ-Med models provides a substantial contribution to the field of computational medicine by enabling more holistic patient assessments through a unified model that processes heterogeneous data efficiently. Practically, this translates into potential enhancements in automated diagnostics and interpretable AI applications in healthcare, fostering trust and broader adoption.

Theoretically, the DRPO framework sets a precedent for subsequent developments in RL applications within multi-domain settings. It highlights the capacity to leverage adaptive weighting mechanisms to balance learning across heterogeneous domains, which could be applied beyond the medical sphere to domains facing similar challenges with data imbalance and modality diversity.

Future developments might explore enhancing sample efficiency, possibly incorporating supervised avenues for reasoning augmentation, and extending the scope of modalities and clinical settings the model can handle. Further, the continued evolution of these models can expect to bridge gaps in healthcare access, especially in regions struggling with resource constraints, by providing advanced AI support tools when direct clinical expertise is limited. The public release of model weights and training pipeline offers the research community a foundation upon which to build further advancements in the domain of AI-driven medical diagnosis and decision-making support systems.

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training (2506.00711v1)

Summary

Overview of "QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training"

Key Contributions

Methodology

Implications and Future Directions

Related Papers

GitHub

YouTube