BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation

Published 4 Dec 2025 in cs.AI | (2512.04629v1)

Abstract: Molecules play a crucial role in biomedical research and discovery, particularly in the field of small molecule drug development. Given the rapid advancements in LLMs, especially the recent emergence of reasoning models, it is natural to explore how a general-purpose LLM can be efficiently adapted for molecular science applications. In this work, we introduce BioMedGPT-Mol, a molecular LLM designed to support molecular understanding and generation tasks. By curating and unifying existing public instruction datasets, we have assembled a large-scale, comprehensive, and high-quality training dataset. The model is then fine-tuned through a meticulously designed multi-task learning framework. On a consolidated benchmark derived from LlaSMol, TOMG-Bench, and MuMOInstruct, BioMedGPT-Mol achieves remarkable performance. Our experimental results demonstrate that a general-purpose reasoning model can be effectively and efficiently post-trained into a professional molecular LLM through a well-structured multi-task curriculum. Leveraging the power of it, we further explore retrosynthetic planning task, and the performance on RetroBench demonstrates its competitive capability of acting as an end-to-end retrosynthetic planner. We anticipate that our approach can be extended to other biomedical scientific domains.