Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Random Matrix Analysis to Balance between Supervised and Unsupervised Learning under the Low Density Separation Assumption (2310.13434v1)

Published 20 Oct 2023 in cs.LG, cs.AI, and stat.ML

Abstract: We propose a theoretical framework to analyze semi-supervised classification under the low density separation assumption in a high-dimensional regime. In particular, we introduce QLDS, a linear classification model, where the low density separation assumption is implemented via quadratic margin maximization. The algorithm has an explicit solution with rich theoretical properties, and we show that particular cases of our algorithm are the least-square support vector machine in the supervised case, the spectral clustering in the fully unsupervised regime, and a class of semi-supervised graph-based approaches. As such, QLDS establishes a smooth bridge between these supervised and unsupervised learning methods. Using recent advances in the random matrix theory, we formally derive a theoretical evaluation of the classification error in the asymptotic regime. As an application, we derive a hyperparameter selection policy that finds the best balance between the supervised and the unsupervised terms of our learning criterion. Finally, we provide extensive illustrations of our framework, as well as an experimental study on several benchmarks to demonstrate that QLDS, while being computationally more efficient, improves over cross-validation for hyperparameter selection, indicating a high promise of the usage of random matrix theory for semi-supervised model selection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (89)
  1. Machine learning and statistical physics: theory, inspiration, application. Journal of Physics A: Special, (2020).
  2. Random matrices in service of ML footprint: ternary random features with no performance loss. In International Conference on Learning Representations.
  3. A transductive bound for the voted classifier with an application to semi-supervised learning. In Advances in Neural Information Processing Systems, pages 65–72.
  4. Generalization error in high-dimensional perceptrons: Approaching bayes error with convex optimization. Advances in Neural Information Processing Systems, 33:12199–12210.
  5. Generalization error in high-dimensional perceptrons: Approaching Bayes error with convex optimization. Advances in Neural Information Processing Systems, 33:12199–12210.
  6. High-dimensional asymptotics of feature learning: How one gradient step improves the representation. In Advances in Neural Information Processing Systems.
  7. Semi-supervised learning on Riemannian manifolds. Machine Learning, 56(1-3):209–239.
  8. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(11).
  9. Semi-supervised support vector machines. Advances in Neural Information processing systems, 11.
  10. Mixmatch: A holistic approach to semi-supervised learning. Advances in Neural Information Processing Systems, 32.
  11. Graph construction based on labeled instances for semi-supervised learning. In 2014 22nd International Conference on Pattern Recognition, pages 2477–2482. IEEE.
  12. Poisson learning: Graph based semi-supervised learning at very low label rates. In International Conference on Machine Learning, pages 1306–1316. PMLR.
  13. Machine learning and the physical sciences. Rev. Mod. Phys., 91:045002.
  14. Semi-Supervised Learning. The MIT Press, 1st edition.
  15. Semi-supervised classification by low density separation. In International workshop on artificial intelligence and statistics, pages 57–64. PMLR.
  16. Graph-based semi-supervised learning: A review. Neurocomputing, 408:216–230.
  17. Kernel spectral clustering of large dimensional data. Electronic Journal of Statistics, 10(1):1393–1454.
  18. Random matrix methods for wireless communications. Cambridge University Press, New York, NY, USA, first edition.
  19. Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime. Advances in Neural Information Processing Systems, 34:10131–10143.
  20. Semi-supervised marginboost. Advances in Neural Information Processing Systems, 14:553–560.
  21. Explicit learning curves for transduction and application to clustering and compression algorithms. Journal of Artificial Intelligence Research, 22:117–142.
  22. Learning better data representation using inference-driven metric learning. In Proceedings of the acl 2010 conference short papers, pages 377–381.
  23. An overview on semi-supervised support vector machine. Neural Computing and Applications, 28(5):969–978.
  24. High dimensional robust m-estimation: Asymptotic variance via approximate message passing. Probability Theory and Related Fields, 166:935–969.
  25. UCI machine learning repository.
  26. Double trouble in double descent: Bias and variance (s) in the lazy regime. In International Conference on Machine Learning, pages 2280–2290. PMLR.
  27. Transductive bounds for the multi-class majority vote classifier. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):3566–3573.
  28. Multi-class probabilistic bounds for self-learning. arXiv preprint arXiv:2109.14422.
  29. Semi-supervised learning in gigantic image collections. Advances in neural information processing systems, 22.
  30. A stability result for mean width of lp-centroid bodies. Advances in Mathematics, 214(2):865–877.
  31. Asymptotic errors for teacher-student convex generalized linear models (or: How to prove kabashima’s replica formula). IEEE Transactions on Information Theory.
  32. Fast and simple gradient-based optimization for semi-supervised support vector machines. Neurocomputing, 123:23–32.
  33. Deformed graph laplacian for semisupervised learning. IEEE transactions on neural networks and learning systems, 26(10):2261–2274.
  34. Semi-supervised learning by entropy minimization. Advances in Neural Information Processing System, 17.
  35. ” lossless” compression of deep neural networks: A high-dimensional neural tangent kernel approach. In Advances in Neural Information Processing Systems.
  36. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In Proceedings of the 25th International Conference on World Wide Web, WWW ’16, page 507–517, Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee.
  37. Partly supervised multi-task learning. In 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 769–774.
  38. Graph construction and b-matching for semi-supervised learning. In Proceedings of the 26th annual international conference on machine learning, pages 441–448.
  39. Joachims, T. (1999). Transductive inference for text classification using support vector machines. In Proceedings of the Sixteenth International Conference on Machine Learning, ICML ’99, pages 200–209, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
  40. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015.
  41. Klartag, B. (2007). A central limit theorem for convex sets. Inventiones mathematicae, 168(1):91–131.
  42. Kohavi, R. et al. (1996). Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In Kdd, volume 96, pages 202–207.
  43. Asymptotic Bayes risk for Gaussian mixture in a semi-supervised setting. In 2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), pages 639–643. IEEE.
  44. Liao, Z. (2019). A random matrix framework for large dimensional machine learning and neural networks. PhD thesis, Université Paris-Saclay.
  45. A large dimensional analysis of least squares support vector machines. IEEE Transactions on Signal Processing, 67(4):1065–1074.
  46. Concentration of measure and large random matrices with an application to sample covariance matrices. arXiv preprint arXiv:1805.08295.
  47. A concentration of measure and random matrix approach to large-dimensional robust statistics. The Annals of Applied Probability, 32(6):4737–4762.
  48. Learning curves of generic features maps for realistic datasets with a teacher-student model. Advances in Neural Information Processing Systems, 34:18137–18151.
  49. Co-validation: Using model disagreement on unlabeled data to validate classification algorithms. In Advances in neural information processing systems, pages 873–880.
  50. A random matrix analysis and improvement of semi-supervised learning for large dimensional data. The Journal of Machine Learning Research, 19(1):3074–3100.
  51. Consistent semi-supervised graph regularization for high dimensional data. J. Mach. Learn. Res., 22:94:1–94:48.
  52. High dimensional classification via regularized and unregularized empirical risk minimization: Precise error and optimal loss. arXiv preprint arXiv:1905.13742.
  53. On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistics, 18(1):50–60.
  54. Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik, 114(4):507–536.
  55. Image-based recommendations on styles and substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, page 43–52, New York, NY, USA. Association for Computing Machinery.
  56. Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company.
  57. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 14.
  58. Weight vector tuning and asymptotic analysis of binary linear classifiers. arXiv preprint arXiv:2110.00567.
  59. Using the mutual k-nearest neighbor graphs for semi-supervised classification on natural language data. In Proceedings of the fifteenth conference on computational natural language learning, pages 154–162.
  60. Random matrix theory in statistics: A review. Journal of Statistical Planning and Inference, 150:1–29.
  61. A cluster-then-label semi-supervised learning approach for pathology image classification. Scientific reports, 8(1):1–13.
  62. Financial applications of random matrix theory: Old laces and new pieces. Acta Physica Polonica B, 36(9):2767.
  63. Rigollet, P. (2007). Generalization error bounds in semi-supervised classification under the cluster assumption. Journal of Machine Learning Research, 8(7).
  64. Supervised neighborhood graph construction for semi-supervised classification. Pattern Recognition, 45(4):1363–1372.
  65. Mutual exclusivity loss for semi-supervised deep learning. In 2016 IEEE International Conference on Image Processing (ICIP), pages 1908–1912. IEEE.
  66. Semi-supervised learning in causal and anticausal settings. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, pages 129–141.
  67. The unexpected deterministic and universal behavior of large softmax classifiers. In International Conference on Artificial Intelligence and Statistics, pages 1045–1053. PMLR.
  68. Random matrix theory proves that deep learning representations of GAN-data behave as Gaussian mixtures. In International Conference on Machine Learning, pages 8573–8582. PMLR.
  69. Unlabeled data: Now it helps, now it doesn’t. Advances in neural information processing systems, 21.
  70. Graph-based semi-supervised learning: A comprehensive review. IEEE transactions on neural networks and learning systems, PP.
  71. Least squares support vector machine classifiers. Neural processing letters, 9(3):293–300.
  72. Precise high-dimensional error analysis of regularized m-estimators. In 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 410–417. IEEE.
  73. Precise error analysis of regularized m𝑚mitalic_m-estimators in high dimensions. IEEE Transactions on Information Theory, 64(8):5592–5628.
  74. Regularized linear regression: A precise analysis of the estimation error. In Conference on Learning Theory, pages 1683–1709. PMLR.
  75. Theoretical insights into multiclass classification: A high-dimensional asymptotic view. Advances in Neural Information Processing Systems, 33:8907–8920.
  76. Deciphering and optimizing multi-task learning: a random matrix approach. In International Conference on Learning Representations.
  77. Pca-based multi task learning: a random matrix approach. arXiv preprint arXiv:2111.00924.
  78. Deciphering lasso-based classification through a large dimensional analysis of the iterative soft-thresholding algorithm. In International Conference on Machine Learning, pages 21449–21477. PMLR.
  79. Deciphering and optimizing multi-task learning: a random matrix approach. In ICLR 2021-9th International Conference on Learning Representations.
  80. Combining active and semi-supervised learning for spoken language understanding. Speech Communication, 45:171–186.
  81. A survey on semi-supervised learning. Machine Learning, 109:373–440.
  82. Vapnik, V. (1982). Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.
  83. Regular graph construction for semi-supervised learning. In Journal of physics: Conference series, volume 490. IOP Publishing.
  84. On transductive support vector machines. Contemporary Mathematics, 443:7–20.
  85. Learning from labeled and unlabeled data with label propagation. Tech. Rep., Technical Report CMU-CALD-02–107, Carnegie Mellon University.
  86. Discriminative semi-supervised feature selection via manifold regularization. IEEE Transactions on Neural networks, 21(7):1033–1047.
  87. How does unlabeled data improve generalization in self-training? a one-hidden-layer theoretical analysis. arXiv preprint arXiv:2201.08514.
  88. Text categorization based on regularized linear classification methods. Information retrieval, 4(1):5–31.
  89. Learning with local and global consistency. Advances in neural information processing systems, 16.
Citations (4)

Summary

We haven't generated a summary for this paper yet.