QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation (2203.08483v1)

Published 16 Mar 2022 in cs.CV

Abstract: Unpaired image-to-image (I2I) translation often requires to maximize the mutual information between the source and the translated images across different domains, which is critical for the generator to keep the source content and prevent it from unnecessary modifications. The self-supervised contrastive learning has already been successfully applied in the I2I. By constraining features from the same location to be closer than those from different ones, it implicitly ensures the result to take content from the source. However, previous work uses the features from random locations to impose the constraint, which may not be appropriate since some locations contain less information of source domain. Moreover, the feature itself does not reflect the relation with others. This paper deals with these problems by intentionally selecting significant anchor points for contrastive learning. We design a query-selected attention (QS-Attn) module, which compares feature distances in the source domain, giving an attention matrix with a probability distribution in each row. Then we select queries according to their measurement of significance, computed from the distribution. The selected ones are regarded as anchors for contrastive loss. At the same time, the reduced attention matrix is employed to route features in both domains, so that source relations maintain in the synthesis. We validate our proposed method in three different I2I datasets, showing that it increases the image quality without adding learnable parameters.

Citations (62)

View on Semantic Scholar

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper introduces QS-Attn as a novel module that uses entropy-based attention to select relevant features for enhanced contrastive learning in I2I translation.
The methodology integrates QS-Attn within a shared encoder framework, efficiently guiding feature routing while preserving critical image details.
Experimental results on datasets like Cityscapes and Horse→Zebra demonstrate that QS-Attn outperforms benchmarks such as CUT and CycleGAN in image fidelity metrics.

An Expert Overview of "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation"

The paper entitled "QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation" presents a novel approach for improving unpaired image-to-image (I2I) translation using an innovative query-selected attention mechanism. This approach addresses key limitations present in existing methods by enhancing feature selection for contrastive learning, which subsequently improves the quality and relevancy of the translated images.

Core Contribution

At the heart of this research is the Query-Selected Attention (QS-Attn) module, designed to refine the contrastive learning process by selectively choosing anchor points based on feature significance within the source domain. Traditional contrastive learning approaches, such as CUT, rely on random selection of features which may inadvertently include those with little relevance to the domain translation. QS-Attn improves upon this by calculating an entropy-based significance metric, allowing the model to focus on domain-relevant features that drive more precise and coherent translations.

Methodology

The proposed method involves the integration of the QS-Attn module within the I2I translation framework. The process begins with extracting features through a shared encoder from both the source and translated images. The novel QS-Attn mechanism then calculates attention matrices by comparing query and key features within the source domain. The entropy of these matrices informs the selection of significant features. Selected queries (those with low entropy) form a reduced attention matrix that guides feature routing in both source and target domains, ensuring the preservation of critical inter-feature relationships during translation.

Experimental Evaluation

The effectiveness of QS-Attn was validated across three diverse datasets: Cityscapes, Horse $\rightarrow$ Zebra, and Cat $\rightarrow$ Dog. QS-Attn demonstrated superior performance over existing methods like CUT and CycleGAN in terms of Fréchet Inception Distance (FID) and Sliced Wasserstein Distance (SWD), indicating higher fidelity and realism in the generated images. Notably, the method achieved state-of-the-art results on multiple evaluation metrics, including mean average precision (mAP), pixel accuracy, and class accuracy on the Cityscapes dataset.

Implications

Practically, QS-Attn offers a more reliable pathway for I2I translation tasks by effectively balancing between computational demands and translation quality. Theoretical implications of this work lie in its innovative use of attention-based mechanisms to drive contrastive learning, a paradigm that could be adapted to other domains requiring nuanced feature distinction and translation.

Future Directions

The scalability of QS-Attn to various translation tasks, including those with multi-directional or multi-domain requirements, represents a promising avenue for future research. Additionally, exploring alternative metrics for feature significance and incorporating hybrid attention mechanisms could further enhance the robustness and versatility of the model.

In conclusion, the QS-Attn module marks a significant refinement in contrastive learning strategies for I2I translations, fostering advancements in both image synthesis quality and computational efficiency. This research offers a compelling blueprint for leveraging attention-based systems to augment the performance of generative models in unsupervised learning contexts.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (6)

YouTube

Show All Videos