Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GlyphDiffusion: Text Generation as Image Generation (2304.12519v2)

Published 25 Apr 2023 in cs.CL, cs.CV, and cs.LG

Abstract: Diffusion models have become a new generative paradigm for text generation. Considering the discrete categorical nature of text, in this paper, we propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image generation. Our key idea is to render the target text as a glyph image containing visual language content. In this way, conditional text generation can be cast as a glyph image generation task, and it is then natural to apply continuous diffusion models to discrete texts. Specially, we utilize a cascaded architecture (ie a base and a super-resolution diffusion model) to generate high-fidelity glyph images, conditioned on the input text. Furthermore, we design a text grounding module to transform and refine the visual language content from generated glyph images into the final texts. In experiments over four conditional text generation tasks and two classes of metrics (ie quality and diversity), GlyphDiffusion can achieve comparable or even better results than several baselines, including pretrained LLMs. Our model also makes significant improvements compared to the recent diffusion model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Junyi Li (92 papers)
  2. Wayne Xin Zhao (196 papers)
  3. Jian-Yun Nie (70 papers)
  4. Ji-Rong Wen (299 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.