Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Semantically Invariant Text-to-Image Generation (1809.10274v1)

Published 27 Sep 2018 in cs.LG, cs.CL, cs.CV, and stat.ML

Abstract: Image captioning has demonstrated models that are capable of generating plausible text given input images or videos. Further, recent work in image generation has shown significant improvements in image quality when text is used as a prior. Our work ties these concepts together by creating an architecture that can enable bidirectional generation of images and text. We call this network Multi-Modal Vector Representation (MMVR). Along with MMVR, we propose two improvements to the text conditioned image generation. Firstly, a n-gram metric based cost function is introduced that generalizes the caption with respect to the image. Secondly, multiple semantically similar sentences are shown to help in generating better images. Qualitative and quantitative evaluations demonstrate that MMVR improves upon existing text conditioned image generation results by over 20%, while integrating visual and text modalities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Shagan Sah (7 papers)
  2. Dheeraj Peri (4 papers)
  3. Ameya Shringi (4 papers)
  4. Chi Zhang (567 papers)
  5. Miguel Dominguez (3 papers)
  6. Andreas Savakis (27 papers)
  7. Ray Ptucha (1 paper)
Citations (9)

Summary

We haven't generated a summary for this paper yet.