Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards noise contrastive estimation with soft targets for conditional models (2404.14076v2)

Published 22 Apr 2024 in cs.LG, cs.CV, and stat.ML

Abstract: Soft targets combined with the cross-entropy loss have shown to improve generalization performance of deep neural networks on supervised classification tasks. The standard cross-entropy loss however assumes data to be categorically distributed, which may often not be the case in practice. In contrast, InfoNCE does not rely on such an explicit assumption but instead implicitly estimates the true conditional through negative sampling. Unfortunately, it cannot be combined with soft targets in its standard formulation, hindering its use in combination with sophisticated training strategies. In this paper, we address this limitation by proposing a loss function that is compatible with probabilistic targets. Our new soft target InfoNCE loss is conceptually simple, efficient to compute, and can be motivated through the framework of noise contrastive estimation. Using a toy example, we demonstrate shortcomings of the categorical distribution assumption of cross-entropy, and discuss implications of sampling from soft distributions. We observe that soft target InfoNCE performs on par with strong soft target cross-entropy baselines and outperforms hard target NLL and InfoNCE losses on popular benchmarks, including ImageNet. Finally, we provide a simple implementation of our loss, geared towards supervised classification and fully compatible with deep classification models trained with cross-entropy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Learning representations by maximizing mutual information across views. Advances in neural information processing systems, 32, 2019.
  2. Learning imbalanced datasets with label-distribution-aware margin loss. Advances in neural information processing systems, 32, 2019.
  3. Celltypegraph: A new geometric computer vision benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20897–20907, 2022.
  4. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  5. Debiased contrastive learning. Advances in neural information processing systems, 33:8765–8775, 2020.
  6. Contrastive learning unifies t𝑡titalic_t-sne and umap. arXiv preprint arXiv:2206.01816, 2022.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  8. Large margin deep networks for classification. Advances in neural information processing systems, 31, 2018.
  9. The continuous categorical: a novel simplex-valued exponential family. In International Conference on Machine Learning, pages 3637–3647. PMLR, 2020a.
  10. Uses and abuses of the cross-entropy loss: Case studies in modern deep learning. arXiv preprint arXiv:2011.05231, 2020b.
  11. On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR, 2017.
  12. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 297–304. JMLR Workshop and Conference Proceedings, 2010.
  13. G-mixup: Graph data augmentation for graph classification. In International Conference on Machine Learning, pages 8230–8248. PMLR, 2022.
  14. The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  16. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  17. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  18. Olivier Henaff. Data-efficient image recognition with contrastive predictive coding. In International conference on machine learning, pages 4182–4192. PMLR, 2020.
  19. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  20. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
  21. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.
  22. Hard negative mixing for contrastive learning. Advances in Neural Information Processing Systems, 33:21798–21809, 2020.
  23. Supervised contrastive learning. Advances in Neural Information Processing Systems, 33:18661–18673, 2020.
  24. Learning multiple layers of features from tiny images. University of Toronto, 2009.
  25. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  26. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
  27. Deepgcns: Can gcns go as deep as cnns? In The IEEE International Conference on Computer Vision (ICCV), 2019.
  28. Deepergcn: All you need to train deeper gcns. arXiv preprint arXiv:2006.07739, 2020.
  29. Rethinking negative pairs in code search. arXiv preprint arXiv:2310.08069, 2023.
  30. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
  31. Ralph Linsker. Self-organization in a perceptual network. Computer, 21(3):105–117, 1988.
  32. Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency. arXiv preprint arXiv:1809.01812, 2018.
  33. When does label smoothing help? Advances in neural information processing systems, 32, 2019.
  34. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  35. On variational bounds of mutual information. In International Conference on Machine Learning, pages 5171–5180. PMLR, 2019.
  36. Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080, 2014.
  37. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  38. Contrastive multiview coding. In European conference on computer vision, pages 776–794. Springer, 2020.
  39. Deep graph infomax. arXiv preprint arXiv:1809.10341, 2018.
  40. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019.
  41. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
  42. Generalized cross entropy loss for training deep neural networks with noisy labels. Advances in neural information processing systems, 31, 2018.
  43. Contrastive learning inverts the data generating process. In International Conference on Machine Learning, pages 12979–12990. PMLR, 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com