Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation (1903.12296v3)

Published 28 Mar 2019 in cs.CV

Abstract: The state-of-the-art approaches in Generative Adversarial Networks (GANs) are able to learn a mapping function from one image domain to another with unpaired image data. However, these methods often produce artifacts and can only be able to convert low-level information, but fail to transfer high-level semantic part of images. The reason is mainly that generators do not have the ability to detect the most discriminative semantic part of images, which thus makes the generated images with low-quality. To handle the limitation, in this paper we propose a novel Attention-Guided Generative Adversarial Network (AGGAN), which can detect the most discriminative semantic object and minimize changes of unwanted part for semantic manipulation problems without using extra data and models. The attention-guided generators in AGGAN are able to produce attention masks via a built-in attention mechanism, and then fuse the input image with the attention mask to obtain a target image with high-quality. Moreover, we propose a novel attention-guided discriminator which only considers attended regions. The proposed AGGAN is trained by an end-to-end fashion with an adversarial loss, cycle-consistency loss, pixel loss and attention loss. Both qualitative and quantitative results demonstrate that our approach is effective to generate sharper and more accurate images than existing models. The code is available at https://github.com/Ha0Tang/AttentionGAN.

PDF Abstract

Attention-Guided Generative Adversarial Networks for Unsupervised Image-to-Image Translation

This paper introduces the Attention-Guided Generative Adversarial Network (AGGAN), a novel approach in the domain of unsupervised image-to-image translation using Generative Adversarial Networks (GANs). Traditional GAN models, while proficient in mapping between image domains using unpaired data, often suffer from artifacts and limitations in transferring high-level semantic features. AGGAN addresses these shortcomings by leveraging built-in attention mechanisms within both generator and discriminator modules to enhance image quality through focused semantic manipulation.

Key Contributions

Attention-Guided Generators: AGGAN incorporates an attention mechanism directly into the generators, enabling the detection of discriminative semantic objects while minimizing changes to non-discriminative areas. This approach generates attention masks to isolate important image regions, allowing for better semantic conversion and sharper imagery.
Attention-Guided Discriminators: The paper proposes novel discriminator architectures that focus only on attended regions. By considering only the areas highlighted by attention masks, these discriminators improve the overall adversarial training process, supporting the generators in producing more accurate image translations.
Comprehensive Loss Function: AGGAN is optimized using a combination of adversarial, cycle-consistency, pixel, and attention losses. This multi-faceted loss function ensures that the model not only balances between semantic accuracy and image quality but also maintains consistency and reduces unnecessary alterations.

Experimental Evaluation

AGGAN is evaluated across several datasets, including CelebA, RaFD, AR Face, and Bu3dfe, each offering a diverse range of facial expressions and backgrounds. The experimental results demonstrate AGGAN’s ability to generate high-quality, realistic images that more accurately reflect the intended semantic transformations compared to existing models like CycleGAN and StarGAN.

Quantitatively, AGGAN shows improvements in metrics such as Peak Signal-to-Noise Ratio (PSNR) and Mean Squared Error (MSE), outperforming many baseline models in producing visually compelling images. Furthermore, a user paper using Amazon Mechanical Turk (AMT) indicates higher human preference for AGGAN-generated images over other baseline models.

Implications and Future Work

The introduction of attention mechanisms into both the generator and discriminator modules represents a significant advancement in the development of unsupervised image translation models. By focusing computational effort on the most semantically relevant areas, AGGAN reduces artifacts and enhances image quality. This attention-guided approach can be a valuable addition to other GAN-based frameworks, potentially extending beyond face translation tasks to broader domains requiring fine-grained semantic manipulation.

Future research could explore the integration of AGGAN with other GAN variants and the potential adaptation of its attention mechanisms for tasks involving more complex and diverse datasets. Moreover, further exploration into reducing model complexity while maintaining performance could enhance AGGAN's applicability in real-time or resource-constrained environments.

Ultimately, AGGAN exemplifies how incorporating attention can refine unsupervised image-to-image translation, marking a notable contribution to the field and paving the way for more sophisticated models in artificial intelligence.