AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer (2108.03647v2)

Published 8 Aug 2021 in cs.CV

Abstract: Fast arbitrary neural style transfer has attracted widespread attention from academic, industrial and art communities due to its flexibility in enabling various applications. Existing solutions either attentively fuse deep style feature into deep content feature without considering feature distributions, or adaptively normalize deep content feature according to the style such that their global statistics are matched. Although effective, leaving shallow feature unexplored and without locally considering feature statistics, they are prone to unnatural output with unpleasing local distortions. To alleviate this problem, in this paper, we propose a novel attention and normalization module, named Adaptive Attention Normalization (AdaAttN), to adaptively perform attentive normalization on per-point basis. Specifically, spatial attention score is learnt from both shallow and deep features of content and style images. Then per-point weighted statistics are calculated by regarding a style feature point as a distribution of attention-weighted output of all style feature points. Finally, the content feature is normalized so that they demonstrate the same local feature statistics as the calculated per-point weighted style feature statistics. Besides, a novel local feature loss is derived based on AdaAttN to enhance local visual quality. We also extend AdaAttN to be ready for video style transfer with slight modifications. Experiments demonstrate that our method achieves state-of-the-art arbitrary image/video style transfer. Codes and models are available.

Citations (258)

View on Semantic Scholar

Summary

The paper introduces the AdaAttN module that computes spatial attention using both low-level and high-level features for precise local style transfer.
A novel local feature loss preserves fine content details while enhancing artistic style patterns in the output images.
The method extends to video style transfer by employing cosine similarity and image-wise similarity loss to achieve temporal consistency.

An Overview of AdaAttN: Revisiting Attention Mechanism in Arbitrary Neural Style Transfer

The paper entitled "AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer" introduces a novel methodology for improving the quality of arbitrary style transfer in neural networks by focusing on local feature integrity. Traditional arbitrary neural style transfer methods typically focus on global characteristics of style and content images, resulting in suboptimal local stylization with possible unnatural distortions and artifacts. This paper proposes the Adaptive Attention Normalization (AdaAttN) module as a solution to enhance the realism and pleasing aesthetics of stylized outputs.

Key Contributions

Adaptive Attention Normalization Module:
- The AdaAttN module concurrently utilizes low-level and high-level features from both style and content images to calculate spatial attention scores.
- It focuses on per-point distributions, ensuring that the local statistics of content feature maps are aligned with the calculated statistics of style feature maps, providing more localized and detailed style transfer.
Local Feature Loss:
- Introduces a novel local feature loss that operates alongside the global style loss, ensuring that local features in the style transfer are preserved and enhanced, helping to produce naturalistic stylized imagery.
Extension to Video Style Transfer:
- By modifying attention mechanisms to use cosine similarity for score calculation and introducing an image-wise similarity loss, the paper extends the application of AdaAttN to video style transfer, achieving temporal consistency without requiring optical flow constraints.

Discussion of Results

The results from the experiments conducted with AdaAttN showcase state-of-the-art performance in both image and video style transfer tasks. The paper demonstrates a significant qualitative improvement in stylized outputs, exhibiting enhanced balance in style-pattern application and content-structure preservation. User studies highlighted the favorable perception of AdaAttN outputs over existing methods, particularly in aspects of content preservation and overall visual appeal.

Implications and Future Directions

The introduction of AdaAttN offers a compelling advancement in the arbitrary style transfer domain, emphasizing the importance of local feature integration in achieving high-quality stylization. The implications for practical applications are expansive, from enhancing artistic software tools to improving virtual reality and augmented reality aesthetics.

Future development could explore optimizing the computational efficiency of such attention mechanisms to leverage them in real-time applications. Moreover, investigating other potential applications in image translation or synthesis tasks might further demonstrate the versatility of the AdaAttN module. Researchers could also explore integrating AdaAttN into different network architectures to understand its potential beyond traditional convolutional setups.

Conclusion

This paper's presentation of the Adaptive Attention Normalization module alongside traditional global transformations represents a significant advancement in neural style transfer methodologies. By addressing the intricacies of local detail transfer and introducing novel loss functions, it sets forth a framework that not only improves upon current models but also offers robust adaptation to video domains. The AdaAttN module signifies an important stride towards achieving more refined artistic stylization in computational imaging.

PDF Markdown