Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 204 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Classifying Overlapping Gaussian Mixtures in High Dimensions: From Optimal Classifiers to Neural Nets (2405.18427v1)

Published 28 May 2024 in stat.ML, cs.AI, and cs.LG

Abstract: We derive closed-form expressions for the Bayes optimal decision boundaries in binary classification of high dimensional overlapping Gaussian mixture model (GMM) data, and show how they depend on the eigenstructure of the class covariances, for particularly interesting structured data. We empirically demonstrate, through experiments on synthetic GMMs inspired by real-world data, that deep neural networks trained for classification, learn predictors which approximate the derived optimal classifiers. We further extend our study to networks trained on authentic data, observing that decision thresholds correlate with the covariance eigenvectors rather than the eigenvalues, mirroring our GMM analysis. This provides theoretical insights regarding neural networks' ability to perform probabilistic inference and distill statistical patterns from intricate distributions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906):4273–4293, 2009.
  2. Applications of the lindeberg principle in communications and statistical learning. IEEE Transactions on Information Theory, 57(4):2440–2450, 2011. doi: 10.1109/TIT.2011.2112231.
  3. Deterministic matrices matching the compressed sensing phase transitions of gaussian random matrices. Proceedings of the National Academy of Sciences, 110(4):1181–1186, 2013. doi: 10.1073/pnas.1219540110.
  4. The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression. The Annals of Statistics, 48(1):27–42, 2020.
  5. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020. ISSN 0027-8424. doi: 10.1073/pnas.1907378117.
  6. A solvable model of neural scaling laws, 2022.
  7. Scaling laws for neural language models, 2020.
  8. Random matrix theory proves that deep learning representations of gan-data behave as gaussian mixtures, 2020.
  9. Universality of empirical risk minimization. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 4310–4312. PMLR, 02–05 Jul 2022.
  10. High-dimensional classification via empirical risk minimization: Improvements and optimality. arXiv: 1905.13742, 2019.
  11. The role of regularization in classification of high-dimensional noisy gaussian mixture. In International Conference on Machine Learning, pages 6874–6883. PMLR, 2020a.
  12. Optimality of least-squares for classification in gaussian-mixture models. In 2020 IEEE International Symposium on Information Theory (ISIT), pages 2515–2520. IEEE, 2020.
  13. Phase transitions for one-vs-one and one-vs-all linear separability in multiclass gaussian mixtures. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4020–4024. IEEE, 2021.
  14. Ke Wang and Christos Thrampoulidis. Benign overfitting in binary classification of gaussian mixtures. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4030–4034. IEEE, 2021.
  15. Classifying high-dimensional gaussian mixtures: Where kernel methods fail and neural networks succeed, 2021.
  16. Learning gaussian mixtures with generalized linear models: Precise asymptotics in high-dimensions. In Advances in Neural Information Processing Systems, volume 34, pages 10144–10157, 2021a.
  17. A Large Scale Analysis of Logistic Regression: Asymptotic Performance and New Insights. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3357–3361, May 2019. doi: 10.1109/ICASSP.2019.8683376. ISSN: 2379-190X.
  18. The role of regularization in classification of high-dimensional noisy gaussian mixture. 2020b. doi: 10.48550/ARXIV.2002.11544. URL https://arxiv.org/abs/2002.11544.
  19. A model of double descent for high-dimensional binary linear classification. Information and Inference: A Journal of the IMA, 11(2):435–495, June 2022. ISSN 2049-8772. doi: 10.1093/imaiai/iaab002. URL https://doi.org/10.1093/imaiai/iaab002.
  20. Theoretical insights into multiclass classification: a high-dimensional asymptotic view. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, pages 8907–8920, Red Hook, NY, USA, December 2020. Curran Associates Inc. ISBN 9781713829546.
  21. Learning gaussian mixtures with generalised linear models: Precise asymptotics in high-dimensions. 2021b. doi: 10.48550/ARXIV.2106.03791. URL https://arxiv.org/abs/2106.03791.
  22. Classification asymptotics in the random matrix regime. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 1875–1879, 2018. doi: 10.23919/EUSIPCO.2018.8553034.
  23. A large dimensional analysis of least squares support vector machines. IEEE Transactions on Signal Processing, 67(4):1065–1074, February 2019. ISSN 1941-0476. doi: 10.1109/tsp.2018.2889954. URL http://dx.doi.org/10.1109/TSP.2018.2889954.
  24. Covariance discriminative power of kernel clustering methods. Electronic Journal of Statistics, 17(1):291 – 390, 2023. doi: 10.1214/23-EJS2107. URL https://doi.org/10.1214/23-EJS2107.
  25. Learning gaussian mixtures with generalized linear models: Precise asymptotics in high-dimensions. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 10144–10157. Curran Associates, Inc., 2021c. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/543e83748234f7cbab21aa0ade66565f-Paper.pdf.
  26. Data-driven emergence of convolutional structure in neural networks. Proceedings of the National Academy of Sciences, 119(40), September 2022. ISSN 1091-6490. doi: 10.1073/pnas.2201854119. URL http://dx.doi.org/10.1073/pnas.2201854119.
  27. Asymptotic performance of regularized quadratic discriminant analysis based classifiers. In 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), pages 1–6. IEEE, 2017.
  28. Linear and quadratic discriminant analysis: Tutorial, 2019.
  29. A method to integrate and classify normal distributions. Journal of Vision, 21(10):1, September 2021. ISSN 1534-7362. doi: 10.1167/jov.21.10.1. URL http://dx.doi.org/10.1167/jov.21.10.1.
  30. Quantifying the separability of data classes in neural networks. Neural Networks, 139:278–293, July 2021. ISSN 0893-6080. doi: 10.1016/j.neunet.2021.03.035. URL http://dx.doi.org/10.1016/j.neunet.2021.03.035.
  31. Alberto Ferrari. A note on sum and difference of correlated chi-squared variables, 2019.
  32. Generative diffusion in very large dimensions. Journal of Statistical Mechanics: Theory and Experiment, 2023(9):093402, September 2023. ISSN 1742-5468. doi: 10.1088/1742-5468/acf8ba. URL http://dx.doi.org/10.1088/1742-5468/acf8ba.
  33. Improved estimation of the distance between covariance matrices. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7445–7449, 2019. doi: 10.1109/ICASSP.2019.8682621.
  34. Three mechanisms of weight decay regularization. CoRR, abs/1810.12281, 2018. URL http://arxiv.org/abs/1810.12281.
  35. The underlying scaling laws and universal statistical structure of complex datasets. arXiv preprint arXiv:2306.14975, 2023a.
  36. Learning curves of generic features maps for realistic datasets with a teacher-student model. In Advances in Neural Information Processing Systems, volume 34, 2021d.
  37. Learning curves of generic features maps for realistic datasets with a teacher-student model. Journal of Statistical Mechanics: Theory and Experiment, 2022(11):114001, nov 2022. doi: 10.1088/1742-5468/ac9825. URL https://doi.org/10.1088%2F1742-5468%2Fac9825.
  38. Wide and deep neural networks achieve consistency for classification. Proceedings of the National Academy of Sciences, 120(14), March 2023. ISSN 1091-6490. doi: 10.1073/pnas.2208779120. URL http://dx.doi.org/10.1073/pnas.2208779120.
  39. Phase retrieval: An overview of recent developments, 2015.
  40. Phase retrieval: From computational imaging to machine learning: A tutorial. IEEE Signal Processing Magazine, 40(1):45–57, 2023. doi: 10.1109/MSP.2022.3219240.
  41. Phase retrieval via wirtinger flow: Theory and algorithms. CoRR, abs/1407.1065, 2014. URL http://arxiv.org/abs/1407.1065.
  42. Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval. Mathematical Programming, 176(1–2):5–37, February 2019. ISSN 1436-4646. doi: 10.1007/s10107-019-01363-6. URL http://dx.doi.org/10.1007/s10107-019-01363-6.
  43. Escaping mediocrity: how two-layer networks learn hard generalized linear models with sgd, 2024.
  44. On the impact of overparameterization on the training of a shallow neural network in high dimensions, 2023.
  45. On the computational efficiency of training neural networks, 2014.
  46. Towards provable learning of polynomial neural networks using low-rank matrix estimation. In Amos Storkey and Fernando Perez-Cruz, editors, Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 1417–1426. PMLR, 09–11 Apr 2018. URL https://proceedings.mlr.press/v84/soltani18a.html.
  47. Directional convergence and alignment in deep learning, 2020.
  48. Gradient descent maximizes the margin of homogeneous neural networks, 2020.
  49. On the proliferation of support vectors in high dimensions, 2022.
  50. Alex Krizhevsky. Learning multiple layers of features from tiny images. University of Toronto, 05 2012. URL https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
  51. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv: 1708.07747, 2017. doi: 10.48550/ARXIV.1708.07747.
  52. Deep residual learning for image recognition, 2015.
  53. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell., 19(7):711–720, jul 1997. ISSN 0162-8828. doi: 10.1109/34.598228. URL https://doi.org/10.1109/34.598228.
  54. The universal statistical structure and scaling laws of chaos and turbulence. arXiv preprint arXiv:2311.01358, 2023b.
  55. Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences, 117(40):24652–24663, September 2020. ISSN 1091-6490. doi: 10.1073/pnas.2015509117. URL http://dx.doi.org/10.1073/pnas.2015509117.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 3 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube