Cross-domain Correspondence Learning for Exemplar-based Image Translation (2004.05571v1)

Published 12 Apr 2020 in cs.CV, cs.GR, and eess.IV

Abstract: We present a general framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain (e.g., semantic segmentation mask, or edge map, or pose keypoints), given an exemplar image. The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar. We propose to jointly learn the crossdomain correspondence and the image translation, where both tasks facilitate each other and thus can be learned with weak supervision. The images from distinct domains are first aligned to an intermediate domain where dense correspondence is established. Then, the network synthesizes images based on the appearance of semantically corresponding patches in the exemplar. We demonstrate the effectiveness of our approach in several image translation tasks. Our method is superior to state-of-the-art methods in terms of image quality significantly, with the image style faithful to the exemplar with semantic consistency. Moreover, we show the utility of our method for several applications

Authors (5)

Pan Zhang (153 papers)
Bo Zhang (633 papers)
Dong Chen (219 papers)
Lu Yuan (130 papers)
Fang Wen (42 papers)

Citations (228)

View on Semantic Scholar

Summary

The paper introduces a novel Cross-domain Correspondence Network (CoCosNet) that aligns diverse image representations to enhance localized style transfer.
It employs a spatially adaptive translation network with weakly supervised learning to generate photo-realistic images with high semantic consistency.
Experimental results on benchmarks like ADE20k and CelebA-HQ show superior image quality and style adherence compared to previous methods.

Cross-domain Correspondence Learning for Exemplar-based Image Translation

The paper discusses an approach towards improving the methods of exemplar-based image translation by establishing and leveraging cross-domain correspondence. This approach takes an input image from one domain, such as a semantic segmentation mask, edge map, or pose keypoints, along with an exemplar image from another domain, to generate a photo-realistic image. The compelling aspect of the approach lies in its ability to effectively maintain semantic consistency and style adherence between the input and exemplar images during translation.

Key Contributions

Cross-domain Correspondence Network: The authors introduce a Cross-domain Correspondence Network (CoCosNet) that aligns diverse domain representations within a shared intermediate domain to establish dense semantic correspondence. This network is pivotal in overcoming limitations of previous methods that could only encode global style, thereby leading to an enhanced fidelity of localized style transfer across domains.
Image Translation Network: In conjunction with the correspondence network, a translation network is employed to synthesize the final output using spatially adaptive de-normalization techniques. This part of the model benefits substantially from the dense semantic correspondence, allowing it to align semantic structures effectively.
Weakly Supervised Learning Framework: The whole translation approach is unique in its weak supervision methodology, where the interplay between correspondence and translation networks fosters indirect supervision. This essentially bypasses the need for explicit correspondence annotations, making the model highly adaptable to varying translation tasks.

Experimental Results

The results are assessed on various benchmarks, such as ADE20k, CelebA-HQ, and DeepFashion datasets. The CoCosNet is evaluated against multiple established methods like Pix2pixHD, SPADE, and MUNIT. The assessments focus on measures of image quality (using FID and SWD), semantic consistency, and style adherence.

Image Quality: CoCosNet outperforms all previous methods by achieving lower FID and SWD scores, indicating not just improved semantic richness but also finer levels of texture and color correspondence.
Semantic Consistency: The use of VGG-based feature comparison reflects that CoCosNet maintains higher semantic alignment with original inputs across transformation operations.
Style Relevance: A highlighted aspect of this paper is the capability of CoCosNet to preserve instance-level style details in alignment with the exemplar image, showcasing significant improvements over prior methods.

Practical Applications

The paper proposes multiple practical applications of this technology, including semantic image editing and makeup transfer. Both applications highlight the capability of CoCosNet to transfer and adapt styles at a detailed, instance level, showcasing flexibility and precision in style-related transformations.

Challenges and Future Directions

The paper notes certain limitations in CoCosNet, such as handling multiple-instance style adherence and computational complexity associated with high-resolution outputs. Future work is suggested to improve the scalability and efficiency of the network, particularly in handling higher-dimensional data and enhancing operational speed.

Overall, the paper presents a robust framework for exemplar-based image translation by integrating cross-domain correspondence learning. It creates new possibilities for highly controlled image generation tasks, with implications in diverse fields such as visual content creation, artistic style transfer, and automated editing. The approach enriches the existing landscape of AI-driven image synthesis, setting a foundation for further exploration and sophistication in the domain.

PDF Markdown