Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning (2306.04387v2)

Published 7 Jun 2023 in cs.CV and cs.CL

Abstract: Instruction tuning has significantly advanced LLMs such as ChatGPT, enabling them to align with human instructions across diverse tasks. However, progress in open vision-LLMs (VLMs) has been limited due to the scarcity of high-quality instruction datasets. To tackle this challenge and promote research in the vision-language field, we introduce the Multi-Modal, Multilingual Instruction Tuning (M$3$IT) dataset, designed to optimize VLM alignment with human instructions. Our M$3$IT dataset comprises 40 carefully curated datasets, including 2.4 million instances and 400 manually written task instructions, reformatted into a vision-to-text structure. Key tasks are translated into 80 languages with an advanced translation system, ensuring broader accessibility. M$3$IT surpasses previous datasets regarding task coverage, instruction number and instance scale. Moreover, we develop Ying-VLM, a VLM model trained on our M$3$IT dataset, showcasing its potential to answer complex questions requiring world knowledge, generalize to unseen video tasks, and comprehend unseen instructions in Chinese. We have open-sourced the dataset to encourage further research.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Lei Li (1293 papers)
  2. Yuwei Yin (21 papers)
  3. Shicheng Li (23 papers)
  4. Liang Chen (360 papers)
  5. Peiyi Wang (48 papers)
  6. Shuhuai Ren (30 papers)
  7. Mukai Li (17 papers)
  8. Yazheng Yang (16 papers)
  9. Jingjing Xu (80 papers)
  10. Xu Sun (194 papers)
  11. Lingpeng Kong (134 papers)
  12. Qi Liu (485 papers)
Citations (102)