Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation (2112.15283v1)

Published 31 Dec 2021 in cs.CV and cs.CL

Abstract: Conventional methods for the image-text generation tasks mainly tackle the naturally bidirectional generation tasks separately, focusing on designing task-specific frameworks to improve the quality and fidelity of the generated samples. Recently, Vision-Language Pre-training models have greatly improved the performance of the image-to-text generation tasks, but large-scale pre-training models for text-to-image synthesis task are still under-developed. In this paper, we propose ERNIE-ViLG, a unified generative pre-training framework for bidirectional image-text generation with transformer model. Based on the image quantization models, we formulate both image generation and text generation as autoregressive generative tasks conditioned on the text/image input. The bidirectional image-text generative modeling eases the semantic alignments across vision and language. For the text-to-image generation process, we further propose an end-to-end training method to jointly learn the visual sequence generator and the image reconstructor. To explore the landscape of large-scale pre-training for bidirectional text-image generation, we train a 10-billion parameter ERNIE-ViLG model on a large-scale dataset of 145 million (Chinese) image-text pairs which achieves state-of-the-art performance for both text-to-image and image-to-text tasks, obtaining an FID of 7.9 on MS-COCO for text-to-image synthesis and best results on COCO-CN and AIC-ICC for image captioning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Han Zhang (338 papers)
  2. Weichong Yin (8 papers)
  3. Yewei Fang (7 papers)
  4. Lanxin Li (3 papers)
  5. Boqiang Duan (1 paper)
  6. Zhihua Wu (24 papers)
  7. Yu Sun (226 papers)
  8. Hao Tian (146 papers)
  9. Hua Wu (191 papers)
  10. Haifeng Wang (194 papers)
Citations (59)

Summary

We haven't generated a summary for this paper yet.