Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models (2504.10479v3)

Published 14 Apr 2025 in cs.CV

Abstract: We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only LLM into a multimodal LLM (MLLM) that supports visual inputs, InternVL3 jointly acquires multimodal and linguistic capabilities from both diverse multimodal data and pure-text corpora during a single pre-training stage. This unified training paradigm effectively addresses the complexities and alignment challenges commonly encountered in conventional post-hoc training pipelines for MLLMs. To further improve performance and scalability, InternVL3 incorporates variable visual position encoding (V2PE) to support extended multimodal contexts, employs advanced post-training techniques such as supervised fine-tuning (SFT) and mixed preference optimization (MPO), and adopts test-time scaling strategies alongside an optimized training infrastructure. Extensive empirical evaluations demonstrate that InternVL3 delivers superior performance across a wide range of multi-modal tasks. In particular, InternVL3-78B achieves a score of 72.2 on the MMMU benchmark, setting a new state-of-the-art among open-source MLLMs. Its capabilities remain highly competitive with leading proprietary models, including ChatGPT-4o, Claude 3.5 Sonnet, and Gemini 2.5 Pro, while also maintaining strong pure-language proficiency. In pursuit of open-science principles, we will publicly release both the training data and model weights to foster further research and development in next-generation MLLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (51)
  1. Jinguo Zhu (20 papers)
  2. Weiyun Wang (20 papers)
  3. Zhe Chen (237 papers)
  4. Zhaoyang Liu (42 papers)
  5. Shenglong Ye (11 papers)
  6. Lixin Gu (6 papers)
  7. Yuchen Duan (6 papers)
  8. Hao Tian (146 papers)
  9. Weijie Su (37 papers)
  10. Jie Shao (53 papers)
  11. Zhangwei Gao (9 papers)
  12. Erfei Cui (9 papers)
  13. Yue Cao (147 papers)
  14. Yangzhou Liu (6 papers)
  15. Weiye Xu (12 papers)
  16. Hao Li (803 papers)
  17. Jiahao Wang (88 papers)
  18. Han Lv (3 papers)
  19. Songze Li (73 papers)
  20. Yinan He (34 papers)
Youtube Logo Streamline Icon: https://streamlinehq.com