AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks (1911.11897v5)

Published 27 Nov 2019 in cs.CV, cs.LG, and eess.IV

Abstract: State-of-the-art methods in image-to-image translation are capable of learning a mapping from a source domain to a target domain with unpaired image data. Though the existing methods have achieved promising results, they still produce visual artifacts, being able to translate low-level information but not high-level semantics of input images. One possible reason is that generators do not have the ability to perceive the most discriminative parts between the source and target domains, thus making the generated images low quality. In this paper, we propose a new Attention-Guided Generative Adversarial Networks (AttentionGAN) for the unpaired image-to-image translation task. AttentionGAN can identify the most discriminative foreground objects and minimize the change of the background. The attention-guided generators in AttentionGAN are able to produce attention masks, and then fuse the generation output with the attention masks to obtain high-quality target images. Accordingly, we also design a novel attention-guided discriminator which only considers attended regions. Extensive experiments are conducted on several generative tasks with eight public datasets, demonstrating that the proposed method is effective to generate sharper and more realistic images compared with existing competitive models. The code is available at https://github.com/Ha0Tang/AttentionGAN.

View on arXiv

Authors (5)

Hao Tang (379 papers)
Hong Liu (395 papers)
Dan Xu (120 papers)
Philip H. S. Torr (219 papers)
Nicu Sebe (270 papers)

Citations (187)

View on Semantic Scholar

Summary

AttentionGAN: Unpaired Image-to-Image Translation using Attention-Guided Generative Adversarial Networks

The paper introduces "AttentionGAN," a novel approach for unpaired image-to-image translation utilizing Attention-Guided Generative Adversarial Networks. This method addresses the limitations of conventional GAN models, which often struggle with visual artifacts and fail to fully translate high-level semantics across domains.

Key Contributions

Attention-Guided Generators and Discriminators: AttentionGAN employs attention-guided generators capable of identifying key discriminative foreground elements while maintaining background integrity. This is complemented by attention-guided discriminators that focus solely on the prominent regions, enhancing the translation quality.
Two Attention-Guided Generation Schemes: The paper proposes two schemes. Scheme I integrates attention into the basic CycleGAN architecture, showing strengths in facial expression translation. Scheme II, with separate networks for attention and content masks, tackles more complex translations such as horse-to-zebra, ensuring better foreground translation without altering the background.
Enhanced Loss Functions: The introduction of attention loss and an updated cycle-consistency loss facilitates effective generator training, steering the attention masks toward relevant image regions.
Extensive Experimental Validation: Experiments across eight datasets demonstrate superior performance of AttentionGAN against various state-of-the-art models. Notably, it outperforms alternatives on tasks with significant semantic changes and complex backgrounds.

Strong Results

Quantitative Improvements: The model achieves excellent performance metrics such as lower FID and KID scores compared to other leading methods, confirming improved realism and quality of generated images.
Visual Fidelity: AttentionGAN maintains clear contextual details in outputs, substantiating its effectiveness through AMT perceptual studies and visual inspections.

Implications and Future Directions

The proposed methodology sets a precedent for leveraging attention mechanisms within unsupervised domain translation tasks, prompting potential applications in fields requiring precise visual alterations, such as medical imaging or autonomous driving.

Future research could explore extending these mechanisms to multi-modal translations or integrating additional domain knowledge to further constrain and guide the generation process. Further, investigating lightweight models or real-time implementations could facilitate broader applicability in latency-sensitive applications.

In summary, AttentionGAN represents an advanced strategy in unpaired image-to-image translation, showcasing the substantial benefits of incorporating attention into GAN frameworks. The approach paves the way for more nuanced and effective domain translation techniques, inspiring continued exploration in GAN research.