Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Short-Time Fourier Transform for deblurring Variational Autoencoders (2401.03166v1)

Published 6 Jan 2024 in eess.IV and cs.CV

Abstract: Variational Autoencoders (VAEs) are powerful generative models, however their generated samples are known to suffer from a characteristic blurriness, as compared to the outputs of alternative generating techniques. Extensive research efforts have been made to tackle this problem, and several works have focused on modifying the reconstruction term of the evidence lower bound (ELBO). In particular, many have experimented with augmenting the reconstruction loss with losses in the frequency domain. Such loss functions usually employ the Fourier transform to explicitly penalise the lack of higher frequency components in the generated samples, which are responsible for sharp visual features. In this paper, we explore the aspects of previous such approaches which aren't well understood, and we propose an augmentation to the reconstruction term in response to them. Our reasoning leads us to use the short-time Fourier transform and to emphasise on local phase coherence between the input and output samples. We illustrate the potential of our proposed loss on the MNIST dataset by providing both qualitative and quantitative results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Simpler is better: spectral regularization and up-sampling techniques for variational autoencoders. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3778–3782. IEEE, 2022.
  2. Deep generative modelling: A comparative review of vaes, gans, normalizing flows, energy-based and autoregressive models. IEEE transactions on pattern analysis and machine intelligence, 2021.
  3. Explicitly minimizing the blur error of variational autoencoders. In Submission ICLR 2023, 2022.
  4. Fourier space losses for efficient perceptual image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2360–2369, 2021.
  5. Non-blind image deblurring using neural networks.
  6. Focal frequency loss for image reconstruction and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13919–13929, 2021.
  7. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  8. William Chen Lamberta, Mark Daoust. Convolutional Variational Autoencoder, 2021. URL https://www.tensorflow.org/tutorials/generative/cvae.
  9. Perceived blur in naturally contoured images depends on phase. Frontiers in psychology, 1:185, 2010.
  10. An introduction to deep generative modeling. GAMM-Mitteilungen, 44(2):e202100008, 2021.
  11. Leslie N Smith. Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV), pages 464–472. IEEE, 2017.
  12. Local phase coherence and the perception of blur. Advances in neural information processing systems, 16, 2003.
  13. Bart Wronski. Comparing images in frequency domain. “Spectral loss” – does it make sense?, 2021. URL https://bartwronski.com/2021/07/06/comparing-images-in-frequency-domain-spectral-loss-does-it-make-sense/.
  14. Loss functions for image restoration with neural networks. IEEE Transactions on computational imaging, 3(1):47–57, 2016.

Summary

We haven't generated a summary for this paper yet.