Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SC-VAE: Sparse Coding-based Variational Autoencoder with Learned ISTA (2303.16666v2)

Published 29 Mar 2023 in cs.CV and eess.IV

Abstract: Learning rich data representations from unlabeled data is a key challenge towards applying deep learning algorithms in downstream tasks. Several variants of variational autoencoders (VAEs) have been proposed to learn compact data representations by encoding high-dimensional data in a lower dimensional space. Two main classes of VAEs methods may be distinguished depending on the characteristics of the meta-priors that are enforced in the representation learning step. The first class of methods derives a continuous encoding by assuming a static prior distribution in the latent space. The second class of methods learns instead a discrete latent representation using vector quantization (VQ) along with a codebook. However, both classes of methods suffer from certain challenges, which may lead to suboptimal image reconstruction results. The first class suffers from posterior collapse, whereas the second class suffers from codebook collapse. To address these challenges, we introduce a new VAE variant, termed sparse coding-based VAE with learned ISTA (SC-VAE), which integrates sparse coding within variational autoencoder framework. The proposed method learns sparse data representations that consist of a linear combination of a small number of predetermined orthogonal atoms. The sparse coding problem is solved using a learnable version of the iterative shrinkage thresholding algorithm (ISTA). Experiments on two image datasets demonstrate that our model achieves improved image reconstruction results compared to state-of-the-art methods. Moreover, we demonstrate that the use of learned sparse code vectors allows us to perform downstream tasks like image generation and unsupervised image segmentation through clustering image patches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Convolutional sparse coding for compressed sensing ct reconstruction. IEEE transactions on medical imaging, 38(11):2607–2619, 2019.
  2. Sparse-coding variational auto-encoders. BioRxiv, page 399246, 2018.
  3. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009.
  4. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 35(8):1798–1828, 2013.
  5. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
  6. Onegan: Simultaneous unsupervised learning of conditional image generation, foreground segmentation, and fine-grained clustering. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16, pages 514–530. Springer, 2020.
  7. Emergence of object segmentation in perturbed generative models. Advances in Neural Information Processing Systems, 32, 2019.
  8. Unsupervised object segmentation by redrawing. Advances in neural information processing systems, 32, 2019.
  9. Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems, 31, 2018.
  10. The importance of encoding versus training with sparse coding and vector quantization. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 921–928, 2011.
  11. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57(11):1413–1457, 2004.
  12. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  13. Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341, 2020.
  14. Comgan: unsupervised disentanglement and segmentation via image composition. Advances in neural information processing systems, 35:4638–4651, 2022.
  15. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  16. Variational sparse coding with learned thresholding. In International Conference on Machine Learning, pages 6034–6058. PMLR, 2022.
  17. Vae-sne: a deep generative model for simultaneous dimensionality reduction and clustering. BioRxiv, pages 2020–07, 2020.
  18. Multi-object representation learning with iterative variational inference. In International Conference on Machine Learning, pages 2424–2433. PMLR, 2019.
  19. Learning fast approximations of sparse coding. In Proceedings of the 27th international conference on international conference on machine learning, pages 399–406, 2010.
  20. Skin lesion analysis toward melanoma detection: A challenge at the international symposium on biomedical imaging (isbi) 2016, hosted by the international skin imaging collaboration (isic). arXiv preprint arXiv:1605.01397, 2016.
  21. Ganseg: Learning to segment by unsupervised hierarchical image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1225–1235, 2022.
  22. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  23. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations, 2017.
  24. Patrik O Hoyer. Non-negative matrix factorization with sparseness constraints. Journal of machine learning research, 5(9), 2004.
  25. Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations, 2016.
  26. Perceptual losses for real-time style transfer and super-resolution. In Computer Vision–ECCV 2016: 14th European Conference, Proceedings, Part II 14, pages 694–711. Springer, 2016.
  27. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  28. Disentangling by factorising. In International Conference on Machine Learning, pages 2649–2658. PMLR, 2018.
  29. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  30. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  31. On implicit regularization in β𝛽\betaitalic_β-vaes. In International Conference on Machine Learning, pages 5480–5490. PMLR, 2020.
  32. Variational inference of disentangled latent concepts from unlabeled observations. In International Conference on Learning Representations.
  33. Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning, pages 1558–1566. PMLR, 2016.
  34. Autoregressive image generation using residual quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11523–11532, 2022.
  35. Object-centric learning with slot attention. Advances in Neural Information Processing Systems, 33:11525–11538, 2020.
  36. Delving into the whorl of flower segmentation. In BMVC, volume 2007, pages 1–10, 2007.
  37. Automated flower classification over a large number of classes. In 2008 Sixth Indian conference on computer vision, graphics & image processing, pages 722–729. IEEE, 2008.
  38. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607–609, 1996.
  39. Data efficient unsupervised domain adaptation for cross-modality image segmentation. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part II 22, pages 669–677. Springer, 2019.
  40. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
  41. Image reconstruction: From sparsity to data-adaptive methods and machine learning. Proceedings of the IEEE, 108(1):86–109, 2019.
  42. Preventing posterior collapse with delta-vaes. In International Conference on Learning Representations.
  43. Variational autoencoders pursue pca directions (by accident). In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12406–12415, 2019.
  44. ” grabcut” interactive foreground extraction using iterated graph cuts. ACM transactions on graphics (TOG), 23(3):309–314, 2004.
  45. Dictionaries for sparse representation modeling. Proceedings of the IEEE, 98(6):1045–1057, 2010.
  46. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259–268, 1992.
  47. A sparsity-promoting dictionary model for variational autoencoders. arXiv preprint arXiv:2203.15758, 2022.
  48. Information-theoretic segmentation by inpainting error maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4029–4039, 2021.
  49. Variational sparse coding. In Uncertainty in Artificial Intelligence, pages 690–700. PMLR, 2020.
  50. Neural discrete representation learning. Advances in neural information processing systems, 30, 2017.
  51. The caltech-ucsd birds-200-2011 dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
  52. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  53. W-net: A deep model for fully unsupervised image segmentation. arXiv preprint arXiv:1711.08506, 2017.
  54. Multi-vae: Learning disentangled view-common and view-peculiar visual representations for multi-view clustering. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9234–9243, 2021.
  55. Linear spatial pyramid matching using sparse coding for image classification. In 2009 IEEE Conference on computer vision and pattern recognition, pages 1794–1801. IEEE, 2009.
  56. Vector-quantized image modeling with improved vqgan. arXiv preprint arXiv:2110.04627, 2021.
  57. Unsupervised foreground extraction via deep region competition. Advances in Neural Information Processing Systems, 34:14264–14279, 2021.
  58. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  59. Infovae: Balancing learning and inference in variational autoencoders. In Proceedings of the aaai conference on artificial intelligence, volume 33, pages 5885–5892, 2019.
  60. Movq: Modulating quantized vectors for high-fidelity image generation. arXiv preprint arXiv:2209.09002, 2022.
  61. Saliency optimization from robust background detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2814–2821, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Pan Xiao (4 papers)
  2. Peijie Qiu (35 papers)
  3. Sungmin Ha (1 paper)
  4. Abdalla Bani (1 paper)
  5. Shuang Zhou (65 papers)
  6. Aristeidis Sotiras (29 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.