Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Alt-MoE: Multimodal Alignment via Alternating Optimization of Multi-directional MoE with Unimodal Models (2409.05929v2)

Published 9 Sep 2024 in cs.LG and cs.AI

Abstract: Recent Large Multi-Modal Models (LMMs) have made significant advancements in multi-modal alignment by employing lightweight connection modules to facilitate the representation and fusion of knowledge from existing pre-trained uni-modal models. However, these methods still rely on modality-specific and direction-specific connectors, leading to compartmentalized knowledge representations and reduced computational efficiency, which limits the model's ability to form unified multi-modal representations. To address these issues, we introduce a novel training framework, Alt-MoE, which employs the Mixture of Experts (MoE) as a unified multi-directional connector across modalities, and employs a multi-step sequential alternating unidirectional alignment strategy, which converges to bidirectional alignment over iterations. The extensive empirical studies revealed the following key points: 1) Alt-MoE achieves competitive results by integrating diverse knowledge representations from uni-modal models. This approach seamlessly fuses the specialized expertise of existing high-performance uni-modal models, effectively synthesizing their domain-specific knowledge into a cohesive multi-modal representation. 2) Alt-MoE efficiently scales to new tasks and modalities without altering its model architecture or training strategy. Furthermore, Alt-MoE operates in latent space, supporting vector pre-storage and real-time retrieval via lightweight multi-directional MoE, thereby facilitating massive data processing. Our methodology has been validated on several well-performing uni-modal models (LLAMA3, Qwen2, and DINOv2), achieving competitive results on a wide range of downstream tasks and datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Hongyang Lei (3 papers)
  2. Xiaolong Cheng (1 paper)
  3. Dan Wang (154 papers)
  4. Qi Qin (20 papers)
  5. Huazhen Huang (2 papers)
  6. Yetao Wu (5 papers)
  7. Qingqing Gu (10 papers)
  8. Zhonglin Jiang (11 papers)
  9. Yong Chen (299 papers)
  10. Luo Ji (16 papers)
  11. Kun Fan (5 papers)