Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces (2407.11895v1)

Published 16 Jul 2024 in cs.CV

Abstract: Recently, human-computer interaction with various modalities has shown promising applications, like GPT-4o and Gemini. Given the foundational role of multimodal joint representation in understanding and generation pipelines, high-quality omni joint representations would be a step toward co-processing more diverse multimodal information. In this work, we present OmniBind, large-scale multimodal joint representation models ranging in scale from 7 billion to 30 billion parameters, which support 3D, audio, image, and language inputs. Due to the scarcity of data pairs across all modalities, instead of training large models from scratch, we propose remapping and binding the spaces of various pre-trained specialist models together. This approach enables "scaling up" by indirectly increasing the model parameters and the amount of seen data. To effectively integrate various spaces, we dynamically assign weights to different spaces by learning routers with two objectives: cross-modal overall alignment and language representation decoupling. Notably, since binding and routing spaces both only require lightweight networks, OmniBind is extremely training-efficient. Learning the largest 30B model requires merely unpaired unimodal data and approximately 3 days on a single 8-4090 node. Extensive experiments demonstrate the versatility and superiority of OmniBind as an omni representation model, highlighting its great potential for diverse applications, such as any-query and composable multimodal understanding.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zehan Wang (38 papers)
  2. Ziang Zhang (25 papers)
  3. Hang Zhang (164 papers)
  4. Luping Liu (16 papers)
  5. Rongjie Huang (62 papers)
  6. Xize Cheng (29 papers)
  7. Hengshuang Zhao (118 papers)
  8. Zhou Zhao (219 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com