Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bayesian Self-Supervised Contrastive Learning (2301.11673v4)

Published 27 Jan 2023 in cs.LG

Abstract: Recent years have witnessed many successful applications of contrastive learning in diverse domains, yet its self-supervised version still remains many exciting challenges. As the negative samples are drawn from unlabeled datasets, a randomly selected sample may be actually a false negative to an anchor, leading to incorrect encoder training. This paper proposes a new self-supervised contrastive loss called the BCL loss that still uses random samples from the unlabeled data while correcting the resulting bias with importance weights. The key idea is to design the desired sampling distribution for sampling hard true negative samples under the Bayesian framework. The prominent advantage lies in that the desired sampling distribution is a parametric structure, with a location parameter for debiasing false negative and concentration parameter for mining hard negative, respectively. Experiments validate the effectiveness and superiority of the BCL loss.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Language models are unsupervised multitask learners. In OpenAI blog, pp.  9, 2019.
  2. A theoretical analysis of contrastive unsupervised representation learning. In ICML, pp.  5628–5637, 2019.
  3. A closer look at memorization in deep networks. In ICML, pp.  233–242, 2017.
  4. Learning representations by maximizing mutual information across views. pp.  22243–22255, 2019.
  5. Mine:mutual information neural estimation. In ICML, pp.  531–540, 2018.
  6. Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Transactions on Neural Networks, 19(4):713–722, 2008.
  7. Generalized accept-reject sampling schemes. Lecture Notes-Monograph Series, pp.  342–347, 2004.
  8. A simple framework for contrastive learning of visual representations. In ICML, pp.  1597–1607, 2020a.
  9. Big self-supervised models are strong semi-supervised learners. NeurIPS, 33:22243–22255, 2020b.
  10. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020c.
  11. Learning a similarity metric discriminatively, with application to face verification. In CVPR, volume 1, pp.  539–546, 2005.
  12. Micro-supervised disturbance learning: A perspective of representation probability distribution. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45:7542–7558, 2023.
  13. Debiased contrastive learning. In NeurIPS, pp.  8765–8775, 2020.
  14. Order statistics. 2004. ISBN 9780471654018.
  15. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  16. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
  17. Synthetic hard negative samples for contrastive learning. arXiv preprint arXiv: 2304.02971, 2023.
  18. Discriminative unsupervised feature learning with convolutional neural networks. In NeurIPS, pp.  766–774, 2014.
  19. Convex formulation for learning from positive and unlabeled data. In ICML, pp.  1386–1394, 2015.
  20. Analysis of learning from positive and unlabeled data. In NeurIPS, 2014.
  21. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728, 2018.
  22. Glivenko, V. Sulla determinazione empirica delle leggi di probabilita. Gion. Ist. Ital. Attauri., 4:92–99, 1933.
  23. Bootstrap your own latent-a new approach to self-supervised learning. In NeurIPS, pp.  21271–21284, 2020.
  24. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In ICAIS, pp.  297–304, 2010.
  25. Dimensionality reduction by learning an invariant mapping. In CVPR, volume 2, pp.  1735–1742, 2006.
  26. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In NeurIPS, pp.  31, 2018.
  27. Deep residual learning for image recognition. In CVPR, pp.  770–778, 2016.
  28. Momentum contrast for unsupervised visual representation learning. In CVPR, pp.  9729–9738, 2020.
  29. Henaff, O. Data-efficient image recognition with contrastive predictive coding. In ICML, pp.  4182–4192, 2020.
  30. Hesterberg, T. C. Advances in importance sampling. Stanford University, 1988.
  31. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670, 2018.
  32. Unsupervised deep learning by neighbourhood discovery. In ICML, pp.  2849–2858, 2019.
  33. Learning from positive and unlabeled data: a survey. Machine Learning, 109:719–760, 2020.
  34. Supervised contrastive learning. In NeurIPS, pp.  18661–18673, 2020.
  35. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  36. Glow: Generative flow with invertible 1x1 convolutions. In NeurIPS, 2018.
  37. Positive-unlabeled learning with non-negative risk estimator. In NeurIPS, 2017.
  38. An efficient framework for learning sentence representations. In ICLR, 2018.
  39. Self-supervised learning: Generative or contrastive. IEEE Transactions on Knowledge and Data Engineering, pp. 857–876, 2021.
  40. Segclip: Patch aggregation with learnable centers for open-vocabulary semantic segmentation. In ICML, pp.  23033–23044, 2023.
  41. Distributed representations of words and phrases and their compositionality. NeurIPS, 26, 2013.
  42. Self-supervised learning of pretext-invariant representations. In CVPR, pp.  6707–6717, 2020.
  43. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  44. On variational bounds of mutual information. In ICML, pp.  5171–5180, 2019.
  45. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, volume 139, pp.  8748–8763, 2021a.
  46. Learning transferable visual models from natural language supervision. In ICML, pp.  8748–8763, 2021b.
  47. Contrastive learning with hard negative samples. In ICLR, 2021.
  48. Going deeper with convolutions. In CVPR, pp.  1–9, 2015.
  49. Contrastive multiview coding. In ECCV, pp.  776–794, 2020.
  50. Emp-ssl: Towards self-supervised learning in one training epoch. arXiv preprint arXiv:2304.03977, 2023.
  51. Understanding the behaviour of contrastive loss. In CVPR, pp.  2495–2504, 2021.
  52. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML, pp.  9929–9939, 2020.
  53. Understanding contrastive learning via distributionally robust optimization. In NeurIPS, 2023.
  54. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, pp.  3733–3742, 2018.
  55. Progcl: Rethinking hard negative mining in graph contrastive learning. In ICML, pp.  24332–24346, 2022.
  56. Negative sampling for contrastive representation learning: A review. arXiv preprint arXiv:2206.00212, 2022.
  57. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
  58. Local aggregation for unsupervised learning of visual embeddings. In CVPR, pp.  6002–6012, 2019.
Citations (5)

Summary

We haven't generated a summary for this paper yet.