Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approximately Bayes-Optimal Pseudo Label Selection (2302.08883v5)

Published 17 Feb 2023 in stat.ML, cs.AI, cs.LG, and stat.ME

Abstract: Semi-supervised learning by self-training heavily relies on pseudo-label selection (PLS). The selection often depends on the initial model fit on labeled data. Early overfitting might thus be propagated to the final model by selecting instances with overconfident but erroneous predictions, often referred to as confirmation bias. This paper introduces BPLS, a Bayesian framework for PLS that aims to mitigate this issue. At its core lies a criterion for selecting instances to label: an analytical approximation of the posterior predictive of pseudo-samples. We derive this selection criterion by proving Bayes optimality of the posterior predictive of pseudo-samples. We further overcome computational hurdles by approximating the criterion analytically. Its relation to the marginal likelihood allows us to come up with an approximation based on Laplace's method and the Gaussian integral. We empirically assess BPLS for parametric generalized linear and non-parametric generalized additive models on simulated and real-world data. When faced with high-dimensional data prone to overfitting, BPLS outperforms traditional PLS methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, 76:243–297.
  2. Archipelago: nonparametric Bayesian semi-supervised learning. In 26th International Conference on Machine Learning, pages 1–8.
  3. Semi-supervised logistic regression. In 15th European Conference on Artificial Intelligence (ECAI 2002), pages 390–394.
  4. An information-theoretical approach to semi-supervised learning under covariate-shift. In International Conference on Artificial Intelligence and Statistics, pages 7433–7449. PMLR.
  5. Towards understanding sharpness-aware minimization. In International Conference on Machine Learning, pages 639–668.
  6. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In 2020 International Joint Conference on Neural Networks, pages 1–8. IEEE.
  7. Barber, D. (2012). Bayesian reasoning and machine learning. Cambridge University Press.
  8. Berger, J. O. (1985). Statistical decision theory and Bayesian analysis. Springer, Berlin., 2nd edition.
  9. BASSUM: A Bayesian semi-supervised method for classification feature selection. Pattern Recognition, 44(4):811–820.
  10. A semi-supervised learning approach towards automatic wireless technology recognition. In 2019 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), pages 1–10.
  11. Cattaneo, M. E. (2007). Statistical decisions based directly on the likelihood function. PhD thesis, ETH Zurich.
  12. Semi-supervised learning. Adaptive computation and machine learning series. MIT Press.
  13. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, pages 1019–1028. PMLR.
  14. UCI machine learning repository. http://archive.ics.uci.edu/ml.
  15. Gauß, C. F. (1877). Theoria motus corporum coelestium in sectionibus conicis solem ambientium, volume 7. FA Perthes.
  16. Combining deep generative and discriminative models for Bayesian semi-supervised learning. Pattern Recognition, 100:107156.
  17. Semi-supervised learning by entropy minimization. In Advances in Neural Information Processing Systems, volume 17.
  18. Hastie, T. (2017). Generalized additive models. In Chambers, J. M. and Hastie, T., editors, Statistical models in S, pages 249–307. Routledge.
  19. Generalized additive models: some applications. Journal of the American Statistical Association, 82(398):371–386.
  20. Hüllermeier, E. (2014). Learning from imprecise and fuzzy observations: Data disambiguation through generalized loss minimization. International Journal of Approximate Reasoning, 55:1519–1534.
  21. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning, 110(3):457–506.
  22. Łapiński, T. M. (2019). Multivariate Laplace’s approximation with estimated error and application to limit theorems. Journal of Approximation Theory, 248:105305.
  23. Lee, D.-H. et al. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, International Conference on Machine Learning, volume 3, page 896.
  24. Visualizing the loss landscape of neural nets. In Advances in Neural Information Processing Systems, volume 31.
  25. Pseudo-label selection for deep semi-supervised learning. In 2020 IEEE International Conference on Progress in Informatics and Computing (PIC), pages 1–5. IEEE.
  26. Credal self-supervised learning. In Advances in Neural Information Processing Systems, volume 34, pages 14370–14382.
  27. Cycle self-training for domain adaptation. In Advances in Neural Information Processing Systems, volume 34, pages 22968–22981.
  28. Marginal likelihood computation for model selection and hypothesis testing: an extensive review. SIAM Review, 65(1):3–58.
  29. Bayesian model selection, the marginal likelihood, and generalization. In International Conference on Machine Learning, pages 14223–14247.
  30. Predictive uncertainty estimation via prior networks. In Advances in Neural Information Processing Systems, volume 31.
  31. Effective self-training for parsing. In Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pages 152–159.
  32. Miller, P. D. (2006). Applied asymptotic analysis, volume 75. American Mathematical Soc.
  33. Murphy, K. P. (2012). Machine learning: a probabilistic perspective. MIT press.
  34. Bayesian semi-supervised learning with graph gaussian processes. In Advances in Neural Information Processing Systems, volume 31.
  35. Seq-ups: Sequential uncertainty-aware pseudo-label selection for semi-supervised text recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6180–6190.
  36. Rapoport, A. (1998). Decision theory and decision behaviour. Springer.
  37. In defense of pseudo-labeling: An uncertainty-aware pseudo-label selection framework for semi-supervised learning. In International Conference on Learning Representations, 2020.
  38. Levelwise data disambiguation by cautious superset classification. In International Conference on Scalable Uncertainty Management, pages 263–276. Springer.
  39. ESL: Entropy-guided self-supervised learning for domain adaptation in semantic segmentation. arXiv preprint arXiv:2006.08658.
  40. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2):461–464.
  41. Transductive semi-supervised deep learning using min-max features. In European Conference on Computer Vision, pages 299–315.
  42. Fixmatch: Simplifying semi-supervised learning with consistency and confidence. In Advances in Neural Information Processing Systems, volume 33.
  43. The asymptotics of semi-supervised learning in discriminative probabilistic models. In 25th International Conference on Machine learning, pages 984–991.
  44. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information systems, 42(2):245–284.
  45. A survey on semi-supervised learning. Machine Learning, 109(2):373–440.
  46. A predictive deviance criterion for selecting a generative model in semi-supervised classification. Computational Statistics & Data Analysis, 64:220–236.
Citations (6)

Summary

We haven't generated a summary for this paper yet.