Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text (2204.11964v3)

Published 25 Apr 2022 in cs.CV

Abstract: In this paper, we extend scene understanding to include that of human sketch. The result is a complete trilogy of scene representation from three diverse and complementary modalities -- sketch, photo, and text. Instead of learning a rigid three-way embedding and be done with it, we focus on learning a flexible joint embedding that fully supports the ``optionality" that this complementarity brings. Our embedding supports optionality on two axes: (i) optionality across modalities -- use any combination of modalities as query for downstream tasks like retrieval, (ii) optionality across tasks -- simultaneously utilising the embedding for either discriminative (e.g., retrieval) or generative tasks (e.g., captioning). This provides flexibility to end-users by exploiting the best of each modality, therefore serving the very purpose behind our proposal of a trilogy in the first place. First, a combination of information-bottleneck and conditional invertible neural networks disentangle the modality-specific component from modality-agnostic in sketch, photo, and text. Second, the modality-agnostic instances from sketch, photo, and text are synergised using a modified cross-attention. Once learned, we show our embedding can accommodate a multi-facet of scene-related tasks, including those enabled for the first time by the inclusion of sketch, all without any task-specific modifications. Project Page: \url{http://www.pinakinathc.me/scenetrilogy}

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Pinaki Nath Chowdhury (37 papers)
  2. Ayan Kumar Bhunia (63 papers)
  3. Aneeshan Sain (40 papers)
  4. Subhadeep Koley (21 papers)
  5. Tao Xiang (324 papers)
  6. Yi-Zhe Song (120 papers)
Citations (23)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com