Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TOSS:High-quality Text-guided Novel View Synthesis from a Single Image (2310.10644v1)

Published 16 Oct 2023 in cs.CV

Abstract: In this paper, we present TOSS, which introduces text to the task of novel view synthesis (NVS) from just a single RGB image. While Zero-1-to-3 has demonstrated impressive zero-shot open-set NVS capability, it treats NVS as a pure image-to-image translation problem. This approach suffers from the challengingly under-constrained nature of single-view NVS: the process lacks means of explicit user control and often results in implausible NVS generations. To address this limitation, TOSS uses text as high-level semantic information to constrain the NVS solution space. TOSS fine-tunes text-to-image Stable Diffusion pre-trained on large-scale text-image pairs and introduces modules specifically tailored to image and camera pose conditioning, as well as dedicated training for pose correctness and preservation of fine details. Comprehensive experiments are conducted with results showing that our proposed TOSS outperforms Zero-1-to-3 with more plausible, controllable and multiview-consistent NVS results. We further support these results with comprehensive ablations that underscore the effectiveness and potential of the introduced semantic guidance and architecture design.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yukai Shi (44 papers)
  2. Jianan Wang (44 papers)
  3. He Cao (18 papers)
  4. Boshi Tang (11 papers)
  5. Xianbiao Qi (38 papers)
  6. Tianyu Yang (67 papers)
  7. Yukun Huang (39 papers)
  8. Shilong Liu (60 papers)
  9. Lei Zhang (1691 papers)
  10. Heung-Yeung Shum (32 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com