Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Image-to-Image Translation with Text Guidance (2002.05235v1)

Published 12 Feb 2020 in cs.CV, cs.CL, and cs.LG

Abstract: The goal of this paper is to embed controllable factors, i.e., natural language descriptions, into image-to-image translation with generative adversarial networks, which allows text descriptions to determine the visual attributes of synthetic images. We propose four key components: (1) the implementation of part-of-speech tagging to filter out non-semantic words in the given description, (2) the adoption of an affine combination module to effectively fuse different modality text and image features, (3) a novel refined multi-stage architecture to strengthen the differential ability of discriminators and the rectification ability of generators, and (4) a new structure loss to further improve discriminators to better distinguish real and synthetic images. Extensive experiments on the COCO dataset demonstrate that our method has a superior performance on both visual realism and semantic consistency with given descriptions.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Bowen Li (166 papers)
  2. Xiaojuan Qi (133 papers)
  3. Philip H. S. Torr (219 papers)
  4. Thomas Lukasiewicz (125 papers)
Citations (20)

Summary

We haven't generated a summary for this paper yet.