Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analysis of Estimating the Bayes Rule for Gaussian Mixture Models with a Specified Missing-Data Mechanism (2210.13785v2)

Published 25 Oct 2022 in stat.ML and cs.LG

Abstract: Semi-supervised learning (SSL) approaches have been successfully applied in a wide range of engineering and scientific fields. This paper investigates the generative model framework with a missingness mechanism for unclassified observations, as introduced by Ahfock and McLachlan(2020). We show that in a partially classified sample, a classifier using Bayes rule of allocation with a missing-data mechanism can surpass a fully supervised classifier in a two-class normal homoscedastic model, especially with moderate to low overlap and proportion of missing class labels, or with large overlap but few missing labels. It also outperforms a classifier with no missing-data mechanism regardless of the overlap region or the proportion of missing class labels. Our exploration of two- and three-component normal mixture models with unequal covariances through simulations further corroborates our findings. Finally, we illustrate the use of the proposed classifier with a missing-data mechanism on interneuronal and skin lesion datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Statistics and Computing pp. 1–12 (2020)
  2. Econometrics and Statistics 26, 124–138 (2023)
  3. Biometrika 50(1/2), 17–21 (1963)
  4. In: Proceedings of the eleventh annual conference on Computational learning theory, pp. 92–100 (1998)
  5. MIT Press, Cambridge, MA, USA. Cited in page (s) 21(1), 2 (2010)
  6. Journal of Artificial Intelligence Research 23, 331–366 (2005)
  7. Technometrics 53(4), 406–413 (2011)
  8. Pattern recognition 42(3), 334–348 (2009)
  9. Efron, B.: The efficiency of logistic regression compared to normal discriminant analysis. Journal of the American Statistical Association 70(352), 892–898 (1975)
  10. IEEE Transactions on Pattern Analysis and Machine Intelligence 30(3), 424–437 (2008)
  11. Gilbert, E.S.: The effect of unel variance-covariance matrices on fisher’s linear discriminant function. Biometrics pp. 505–515 (1969)
  12. Han, C.P.: Distribution of discriminant function when covariance matrices are proportional. The Annals of Mathematical Statistics 40(3), 979–985 (1969)
  13. The Canadian Journal of Statistics/La Revue Canadienne de Statistique pp. 261–270 (1982)
  14. In: Eleventh Annual Conference of the International Speech Communication Association (2010)
  15. In: Icml, vol. 99, pp. 200–209 (1999)
  16. Pattern recognition 40(4), 1207–1221 (2007)
  17. Frontiers in Cellular Neuroscience 17 (2023)
  18. Journal of Machine learning research 5(Jan), 27–72 (2004)
  19. Statistics and Computing 24(2), 181–202 (2014)
  20. Diagnostics 10 (2020)
  21. arXiv preprint arXiv:2302.13206 (2023)
  22. Journal of the American Statistical Association 69(346), 555–559 (1974)
  23. McLachlan, G.J.: Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. Journal of the American Statistical Association 70(350), 365–369 (1975)
  24. McLachlan, G.J.: Some expected values for the error rates of the sample quadratic discriminant function1. Australian Journal of Statistics 17(3), 161–165 (1975)
  25. McLachlan, G.J.: Estimating the linear discriminant function from initial samples containing a small number of unclassified observations. Journal of the American statistical association 72(358), 403–406 (1977). DOI 10.1080/01621459.1977.10481009
  26. Statistics in Medicine 8(10), 1291–1300 (1989). DOI 10.1002/sim.4780081012
  27. Biometrika 102(4), 995–1000 (2015)
  28. Scientific data 6(1), 1–6 (2019)
  29. O’Neill, T.J.: Normal discrimination with unclassified observations. Journal of the American Statistical Association 73(364), 821–826 (1978)
  30. Bioinformatics 22(19), 2388–2395 (2006)
  31. Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)
  32. Journal of Computational Biology 17(8), 953–967 (2010)
  33. Advances in neural information processing systems 14 (2001)
  34. Vapnik, V.: The support vector method of function estimation pp. 55–85 (1998)
  35. Scientific Reports 11, 17611 (2021). DOI 10.1038/s41598-021-96745-2
  36. International Journal of Molecular Sciences 24 (2023)
  37. Advances in neural information processing systems 16 (2003)

Summary

We haven't generated a summary for this paper yet.