Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EasyGen: Easing Multimodal Generation with BiDiffuser and LLMs (2310.08949v3)

Published 13 Oct 2023 in cs.AI, cs.CL, and cs.CV

Abstract: We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and LLMs, Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge modalities,EasyGen leverages BiDiffuser,a bidirectional conditional diffusion model, to foster more efficient modality interactions. Easygen achieves text generation by training a projection layer linking BiDiffuser and an LLM, and facilities image generation by training an adapter to align the LLM's text space with the BiDiffuser's image space, Comprehensive quantitative and qualitative experiments show that EasyGen excels in data-efficient training, high-quality image generation, and extendibility, effectively addressing the challenges in multimodal generation. The source code is available at https://github.com/zxy556677/EasyGen.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xiangyu Zhao (192 papers)
  2. Bo Liu (484 papers)
  3. Qijiong Liu (22 papers)
  4. Guangyuan Shi (8 papers)
  5. Xiao-Ming Wu (91 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets