Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AesFA: An Aesthetic Feature-Aware Arbitrary Neural Style Transfer (2312.05928v3)

Published 10 Dec 2023 in cs.CV and cs.AI

Abstract: Neural style transfer (NST) has evolved significantly in recent years. Yet, despite its rapid progress and advancement, existing NST methods either struggle to transfer aesthetic information from a style effectively or suffer from high computational costs and inefficiencies in feature disentanglement due to using pre-trained models. This work proposes a lightweight but effective model, AesFA -- Aesthetic Feature-Aware NST. The primary idea is to decompose the image via its frequencies to better disentangle aesthetic styles from the reference image while training the entire model in an end-to-end manner to exclude pre-trained models at inference completely. To improve the network's ability to extract more distinct representations and further enhance the stylization quality, this work introduces a new aesthetic feature: contrastive loss. Extensive experiments and ablations show the approach not only outperforms recent NST methods in terms of stylization quality, but it also achieves faster inference. Codes are available at https://github.com/Sooyyoungg/AesFA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Generalized octave convolutions for learned multi-frequency image compression. arXiv preprint arXiv:2002.10032.
  2. Is Bigger Always Better? An Empirical Study on Efficient Architectures for Style Transfer and Beyond. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 4084–4094.
  3. Frequency domain image translation: More photo-realistic, better identity-preserving. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 13930–13940.
  4. Adaptive convolutions for structure-aware style transfer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7972–7981.
  5. Big-little net: An efficient multi-scale feature representation for visual and speech recognition. arXiv preprint arXiv:1807.03848.
  6. Artistic style transfer with internal-external learning and contrastive learning. Advances in Neural Information Processing Systems, 34: 26561–26573.
  7. Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337.
  8. Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In Proceedings of the IEEE/CVF international conference on computer vision, 3435–3444.
  9. A loss function for generative neural networks based on watson’s perceptual model. Advances in Neural Information Processing Systems, 33: 2051–2061.
  10. StyTr22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Image Style Transfer with Transformers. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  11. A learned representation for artistic style. arXiv preprint arXiv:1610.07629.
  12. Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7890–7899.
  13. Stabilizing GANs with Soft Octave Convolutions. arXiv preprint arXiv:1905.12534.
  14. Image style transfer using convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2414–2423.
  15. Controlling perceptual factors in neural style transfer. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3985–3993.
  16. Fast Fourier transforms: for fun and profit. In Proceedings of the November 7-10, 1966, fall joint computer conference, 563–578.
  17. Exploring the structure of a real-time, arbitrary neural artistic stylization network. arXiv preprint arXiv:1705.06830.
  18. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.
  19. Multi-scale dense networks for resource efficient image classification. arXiv preprint arXiv:1703.09844.
  20. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, 1501–1510.
  21. Efficient wavelet boost learning-based multi-stage progressive refinement network for underwater image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1944–1952.
  22. Dynamic Instance Normalization for Arbitrary Style Transfer. In AAAI.
  23. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14, 694–711. Springer.
  24. A modified split-radix FFT with fewer arithmetic operations. IEEE Transactions on Signal Processing, 55(1): 111–119.
  25. Multigrid neural architectures. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6665–6673.
  26. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  27. Content and style disentanglement for artistic style transfer. In Proceedings of the IEEE/CVF international conference on computer vision, 4422–4431.
  28. Frequency separation network for image super-resolution. IEEE Access, 8: 33768–33777.
  29. Universal style transfer via feature transforms. Advances in neural information processing systems, 30.
  30. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2117–2125.
  31. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, 740–755. Springer.
  32. Lindeberg, T. 2013. Scale-space theory in computer vision, volume 256. Springer Science & Business Media.
  33. Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF international conference on computer vision, 6649–6658.
  34. Image compression based on octave convolution and semantic segmentation. Knowledge-Based Systems, 228: 107254.
  35. Lowe, D. G. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60: 91–110.
  36. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  37. Wiki Art Gallery, Inc.: A case for critical thinking. Issues in Accounting Education, 26(3): 593–608.
  38. Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592.
  39. Neural style transfer via meta networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8061–8069.
  40. Avatar-net: Multi-scale zero-shot style transfer by feature decoration. In Proceedings of the IEEE conference on computer vision and pattern recognition, 8242–8250.
  41. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
  42. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5693–5703.
  43. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1–9.
  44. Texture networks: Feed-forward synthesis of textures and stylized images. arXiv preprint arXiv:1603.03417.
  45. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6924–6932.
  46. Van Loan, C. 1992. Computational frameworks for the fast Fourier transform. SIAM.
  47. Elastic: Improving cnns with dynamic scaling policies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2258–2267.
  48. High-frequency component helps explain the generalization of convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8684–8694.
  49. Dual-channel capsule generation adversarial network for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 60: 1–16.
  50. Rethinking and improving the robustness of image style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 124–133.
  51. Semi-supervised learning for few-shot image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4453–4462.
  52. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4): 600–612.
  53. AesUST: towards aesthetic-enhanced universal style transfer. In Proceedings of the 30th ACM International Conference on Multimedia, 1095–1106.
  54. MicroAST: Towards Super-Fast Ultra-Resolution Arbitrary Style Transfer. In Proceedings of the AAAI Conference on Artificial Intelligence.
  55. Learning in the frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1740–1749.
  56. Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523.
  57. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, 586–595.
  58. Exact feature distribution matching for arbitrary style transfer and domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8035–8045.
  59. Multi-scale frequency separation network for image deblurring. arXiv preprint arXiv:2206.00798.
  60. Pyramid scene parsing network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2881–2890.
Citations (5)

Summary

We haven't generated a summary for this paper yet.