Papers
Topics
Authors
Recent
2000 character limit reached

Variational Classification (2305.10406v5)

Published 17 May 2023 in cs.LG, cs.AI, and cs.CV

Abstract: We present a latent variable model for classification that provides a novel probabilistic interpretation of neural network softmax classifiers. We derive a variational objective to train the model, analogous to the evidence lower bound (ELBO) used to train variational auto-encoders, that generalises the softmax cross-entropy loss. Treating inputs to the softmax layer as samples of a latent variable, our abstracted perspective reveals a potential inconsistency between their anticipated distribution, required for accurate label predictions, and their empirical distribution found in practice. We augment the variational objective to mitigate such inconsistency and induce a chosen latent distribution, instead of the implicit assumption found in a standard softmax layer. Overall, we provide new theoretical insight into the inner workings of widely-used softmax classifiers. Empirical evaluation on image and text classification datasets demonstrates that our proposed approach, variational classification, maintains classification accuracy while the reshaped latent space improves other desirable properties of a classifier, such as calibration, adversarial robustness, robustness to distribution shift and sample efficiency useful in low data settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Classification and diagnosis of cervical cancer with stacked autoencoder and softmax classification. Expert Systems with Applications, 115:557–564, 2019.
  2. A probabilistic model for discriminative and neuro-symbolic semi-supervised learning. arXiv preprint arXiv:2006.05896, 2020.
  3. From detection of individual metastases to classification of lymph node status at the patient level: the camelyon17 challenge. In IEEE Transactions on Medical Imaging, 2018.
  4. A large annotated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326, 2015.
  5. Deep clustering for unsupervised learning of visual features. In European Conference on Computer Vision, 2018.
  6. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, 2020.
  7. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In International Conference on Machine Learning, 2016.
  8. From variational to deterministic autoencoders. arXiv preprint arXiv:1903.12436, 2019.
  9. Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082, 2013.
  10. Generative adversarial nets. In Neural Information Processing Systems, 2014a.
  11. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014b.
  12. Your classifier is secretly an energy based model and you should treat it like one. In International Conference on Learning Representations, 2019.
  13. On calibration of modern neural networks. In International Conference on Machine Learning, 2017.
  14. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In International Conference on Artificial Intelligence and Statistics, 2010.
  15. Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition, 2016.
  16. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
  17. Variational deep embedding: An unsupervised and generative approach to clustering. arXiv preprint arXiv:1611.05148, 2016.
  18. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
  19. A hierarchical grocery store image dataset with visual and semantic labels. In IEEE Winter Conference on Applications of Computer Vision, 2019.
  20. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, 2021.
  21. Imagenet classification with deep convolutional neural networks. In Neural Information Processing Systems, 2012.
  22. Large-margin softmax loss for convolutional neural networks. In International Conference on Machine Learning, 2016.
  23. Deep learning face attributes in the wild. In International Conference on Computer Vision, 2015.
  24. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
  25. Deep conditional gaussian mixture model for constrained clustering. In Neural Information Processing Systems, 2021.
  26. Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In International Conference on Machine Learning, 2017.
  27. Artificial intelligence in disease diagnostics: A critical review and classification on the current state of research guiding future direction. Health and Technology, 11(4):693–731, 2021.
  28. Deep deterministic uncertainty: A simple baseline. arXiv e-prints, pp.  arXiv–2102, 2021.
  29. Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
  30. Obtaining well calibrated probabilities using bayesian binning. In AAAI Conference on Artificial Intelligence, 2015.
  31. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010.
  32. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  33. Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In Neural Information Processing Systems, 2019.
  34. John Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999.
  35. Variational clustering: Leveraging variational autoencoders for image clustering. In International Joint Conference on Neural Networks, 2020.
  36. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, 2014.
  37. von mises-fisher loss: An exploration of embedding geometries for supervised learning. In International Conference on Computer Vision, 2021.
  38. Opening the black box of deep neural networks via information. arXiv preprint arXiv:1703.00810, 2017.
  39. Learning structured output representation using deep conditional generative models. In Neural Information Processing Systems, 2015.
  40. Auto-encoder based data clustering. In Iberoamerican Congress on Pattern Recognition, 2013.
  41. Learning stochastic feedforward neural networks. In Neural Information Processing Systems, 2013.
  42. Detecting exoplanets with machine learning: A comparative study between convolutional neural networks and support vector machines, 2019.
  43. Deep learning and the information bottleneck principle. In IEEE Information Theory Workshop, 2015.
  44. The information bottleneck method. arXiv preprint physics/0004057, 2000.
  45. Attention is all you need. In Neural Information Processing Systems, 2017.
  46. Rethinking feature distribution for loss functions in image classification. In Conference on Computer Vision and Pattern Recognition, 2018.
  47. Shaping deep feature space towards gaussian mixture for visual classification. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  48. A discriminative feature learning approach for deep face recognition. In European Conference on Computer Vision, 2016.
  49. A broad-coverage challenge corpus for sentence understanding through inference. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018.
  50. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning, 2016.
  51. Towards k-means-friendly spaces: Simultaneous deep learning and clustering. In International Conference on Machine Learning, 2017.
  52. Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis. In International Symposium on Biomedical Imaging, 2021.
  53. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.