Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation (2310.07749v2)

Published 11 Oct 2023 in cs.CV

Abstract: This work investigates a challenging task named open-domain interleaved image-text generation, which generates interleaved texts and images following an input query. We propose a new interleaved generation framework based on prompting large-LLMs and pre-trained text-to-image (T2I) models, namely OpenLEAF. In OpenLEAF, the LLM generates textual descriptions, coordinates T2I models, creates visual prompts for generating images, and incorporates global contexts into the T2I models. This global context improves the entity and style consistencies of images in the interleaved generation. For model assessment, we first propose to use large multi-modal models (LMMs) to evaluate the entity and style consistencies of open-domain interleaved image-text sequences. According to the LMM evaluation on our constructed evaluation set, the proposed interleaved generation framework can generate high-quality image-text content for various domains and applications, such as how-to question answering, storytelling, graphical story rewriting, and webpage/poster generation tasks. Moreover, we validate the effectiveness of the proposed LMM evaluation technique with human assessment. We hope our proposed framework, benchmark, and LMM evaluation could help establish the intriguing interleaved image-text generation task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jie An (36 papers)
  2. Zhengyuan Yang (86 papers)
  3. Linjie Li (89 papers)
  4. Jianfeng Wang (149 papers)
  5. Kevin Lin (98 papers)
  6. Zicheng Liu (153 papers)
  7. Lijuan Wang (133 papers)
  8. Jiebo Luo (355 papers)
Citations (10)