Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

i-Code: An Integrative and Composable Multimodal Learning Framework (2205.01818v2)

Published 3 May 2022 in cs.LG, cs.AI, cs.CL, cs.CV, and eess.AS

Abstract: Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel attention mechanisms and other architectural innovations to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (20)
  1. Ziyi Yang (77 papers)
  2. Yuwei Fang (31 papers)
  3. Chenguang Zhu (100 papers)
  4. Reid Pryzant (17 papers)
  5. Dongdong Chen (164 papers)
  6. Yu Shi (153 papers)
  7. Yichong Xu (42 papers)
  8. Yao Qian (37 papers)
  9. Mei Gao (8 papers)
  10. Yi-Ling Chen (13 papers)
  11. Liyang Lu (15 papers)
  12. Yujia Xie (29 papers)
  13. Robert Gmyr (20 papers)
  14. Noel Codella (21 papers)
  15. Naoyuki Kanda (61 papers)
  16. Bin Xiao (93 papers)
  17. Lu Yuan (130 papers)
  18. Takuya Yoshioka (77 papers)
  19. Michael Zeng (76 papers)
  20. Xuedong Huang (22 papers)
Citations (42)