Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation (2305.15296v3)

Published 24 May 2023 in cs.CV, cs.AI, and cs.LG

Abstract: The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MutliFusion leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Marco Bellagente (13 papers)
  2. Manuel Brack (25 papers)
  3. Hannah Teufel (7 papers)
  4. Felix Friedrich (40 papers)
  5. Björn Deiseroth (16 papers)
  6. Constantin Eichenberg (8 papers)
  7. Andrew Dai (17 papers)
  8. Robert Baldock (2 papers)
  9. Souradeep Nanda (2 papers)
  10. Koen Oostermeijer (5 papers)
  11. Andres Felipe Cruz-Salinas (5 papers)
  12. Patrick Schramowski (48 papers)
  13. Kristian Kersting (205 papers)
  14. Samuel Weinbach (11 papers)
Citations (14)