Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adversarial Open Domain Adaptation for Sketch-to-Photo Synthesis (2104.05703v2)

Published 12 Apr 2021 in cs.CV and cs.AI

Abstract: In this paper, we explore open-domain sketch-to-photo translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data. It is challenging due to the lack of training supervision and the large geometric distortion between the freehand sketch and photo domains. To synthesize the absent freehand sketches from photos, we propose a framework that jointly learns sketch-to-photo and photo-to-sketch generation. However, the generator trained from fake sketches might lead to unsatisfying results when dealing with sketches of missing classes, due to the domain gap between synthesized sketches and real ones. To alleviate this issue, we further propose a simple yet effective open-domain sampling and optimization strategy to "fool" the generator into treating fake sketches as real ones. Our method takes advantage of the learned sketch-to-photo and photo-to-sketch mapping of in-domain data and generalizes it to the open-domain classes. We validate our method on the Scribble and SketchyCOCO datasets. Compared with the recent competing methods, our approach shows impressive results in synthesizing realistic color, texture, and maintaining the geometric composition for various categories of open-domain sketches. Our code is available at https://github.com/Mukosame/AODA

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xiaoyu Xiang (26 papers)
  2. Ding Liu (52 papers)
  3. Xiao Yang (158 papers)
  4. Yiheng Zhu (18 papers)
  5. Xiaohui Shen (67 papers)
  6. Jan P. Allebach (17 papers)
Citations (36)

Summary

Adversarial Open Domain Adaptation for Sketch-to-Photo Synthesis

The paper "Adversarial Open Domain Adaptation for Sketch-to-Photo Synthesis" presents a novel framework that addresses the challenge of open-domain sketch-to-photo translation through a process of generating realistic photos from freehand sketches. The key problem tackled is the synthesis of photos from sketches without paired training data, a scenario where class labels may exist even if the corresponding sketches are not present in the training set. This challenge is exacerbated by the geometric distortions inherently present between sketch and photo domains.

Framework Overview

The proposed solution introduces an Adversarial Open Domain Adaptation (AODA) framework. It jointly learns the mapping between sketch-to-photo and photo-to-sketch using generative adversarial networks (GANs), an approach that has been effective in various image synthesis tasks. The architecture involves two main generators that facilitate the translation between photos and sketches, supported by discriminators that ensure the synthetic outputs remain indistinguishable from actual data within their respective domains. Additionally, a photo classifier is incorporated to reinforce that the generated photos align with the specified class labels, thereby enhancing the fidelity of the generated outputs.

The primary innovation is the use of a novel open-domain sampling and optimization strategy. This approach trains the generator to process synthesized sketches as if they are real, improving its ability to generalize and synthesize realistic outputs for classes absent in training data. By leveraging the mapping learned in in-domain cases, this method effectively extends to open-domain classes, bridging the domain gap between synthesized and real sketches.

Numerical Results and Claims

The performance of the proposed framework was validated on the Scribble and SketchyCOCO datasets, which contain a wide array of sketch categories. The results highlight the ability of the AODA framework to synthesize high-quality outputs that maintain realistic textures, colors, and compositions, even in the face of significant class and data variability.

Quantitatively, the paper utilizes metrics such as Fréchet Inception Distance (FID), which measures the difference between distributions of generated and real images, and classification accuracy of generated images using a trained classifier. The results demonstrate superior performance of the proposed method across all evaluated datasets. The method achieves lower FID scores, indicating higher fidelity, and higher classification accuracy, confirming the realism of the generated images. Additionally, a user preference paper further supports the robust performance, where human evaluators preferred outputs generated by the proposed method over those from existing competitors.

Implications and Future Directions

The implications of this work extend across both theoretical and practical domains. Theoretically, it provides insights into the generalizability of GANs in scenarios where training data is explicitly incomplete or missing. Practically, the framework broadens the applicability of sketch-based content creation tools, enhancing user-oriented applications such as sketch-based image retrieval, augmented reality, and rapid content prototyping.

Future research directions could explore further enhancements of the AODA framework by integrating more sophisticated sketch abstraction techniques, leveraging higher resolution output generation, and adapting the framework to other modalities like 3D shapes or animations. Additionally, examining alternative adversarial training paradigms or architectures to further fine-tune generator and discriminator interactions may yield even better results in open-domain settings.

In conclusion, the research presented in this paper makes a significant contribution by advancing the capabilities of sketch-to-photo synthesis through adversarial open domain adaptation, offering a promising direction for future exploration in computer vision and image synthesis fields.

Youtube Logo Streamline Icon: https://streamlinehq.com