Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sharp analysis of EM for learning mixtures of pairwise differences (2302.10066v2)

Published 20 Feb 2023 in math.ST, cs.LG, stat.ML, and stat.TH

Abstract: We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $\ell_2$-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Proof of the contiguity conjecture and lognormal limit for the symmetric perceptron. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 327–338. IEEE, 2022.
  2. R. Adamczak. A note on the Hanson–Wright inequality for random vectors with dependencies. Electronic Communications in Probability, 20:1–13, 2015.
  3. D. J. Altschuler. Critical window of the symmetric perceptron. arXiv preprint arXiv:2205.02319, 2022.
  4. Statistical guarantees for the EM algorithm: From population to sample-based analysis. The Annals of Statistics, 45(1):77 – 120, 2017.
  5. I. Borg and P. J. Groenen. Modern multidimensional scaling: Theory and applications. Springer Science & Business Media, 2005.
  6. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  7. A. T. Chaganty and P. Liang. Spectral experts for estimating mixtures of linear regressions. In International Conference on Machine Learning, pages 1040–1048. PMLR, 2013.
  8. Sharp global convergence guarantees for iterative nonconvex optimization: A Gaussian process perspective. arXiv preprint arXiv:2109.09859, 2021.
  9. Optimal full ranking from pairwise comparisons. The Annals of Statistics, 50(3):1775–1805, 2022.
  10. Learning mixtures of linear regressions in subexponential time via Fourier moments. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 587–600, 2020.
  11. Pairwise ranking aggregation in a crowdsourced setting. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 193–202, 2013.
  12. Convex and nonconvex formulations for mixed regression with two components: Minimax optimal rates. IEEE Transactions on Information Theory, 64(3):1738–1766, 2017.
  13. Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Mathematical Programming, 176:5–37, 2019.
  14. Spectral methods for data science: A statistical perspective. Foundations and Trends® in Machine Learning, 14(5):566–806, 2021.
  15. Learning a mixture of two multinomial logits. In International Conference on Machine Learning, pages 961–969. PMLR, 2018.
  16. F. Chung and M. Radcliffe. On the spectra of general random graphs. the electronic journal of combinatorics, pages P215–P215, 2011.
  17. A. Coja-Oghlan. On the Laplacian eigenvalues of G(n, p). Combinatorics, Probability and Computing, 16(6):923–946, 2007.
  18. M. Cucuringu and H. Tyagi. An extension of the angular synchronization problem to the heterogeneous setting. arXiv preprint arXiv:2012.14932, 2020.
  19. EM’s convergence in Gaussian latent tree models. In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 2597–2667. PMLR, 02–05 Jul 2022.
  20. Ten steps of EM suffice for mixtures of two Gaussians. In Conference on Learning Theory, pages 704–710. PMLR, 2017.
  21. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1):20–28, 1979.
  22. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society: series B (methodological), 39(1):1–22, 1977.
  23. Noisy euclidean distance realization: robust facial reduction and the pareto frontier. SIAM Journal on Optimization, 27(4):2301–2331, 2017.
  24. Universality of approximate message passing with semi-random matrices. arXiv preprint arXiv:2204.04281, 2022.
  25. Singularity, misspecification and the convergence rate of EM. The Annals of Statistics, 48(6):3161 – 3182, 2020.
  26. Y. C. Eldar and S. Mendelson. Phase retrieval: Stability and recovery guarantees. Applied and Computational Harmonic Analysis, 36(3):473–494, 2014.
  27. U. Feige and E. Ofek. Spectral techniques applied to sparse random graphs. Random Structures & Algorithms, 27(2):251–275, 2005.
  28. C. Gao and A. Y. Zhang. Exact minimax estimation for phase synchronization. IEEE Transactions on Information Theory, 67(12):8236–8247, 2021.
  29. J. C. Gower. Euclidean distance geometry. Math. Sci, 7(1):1–14, 1982.
  30. Finite mixtures of generalized linear regression models. Recent advances in linear models and related areas: essays in honour of helge toutenburg, pages 205–230, 2008.
  31. Hierarchical mixtures of experts and the EM algorithm. Neural computation, 6(2):181–214, 1994.
  32. Estimating the coefficients of a mixture of two linear regressions by expectation maximization. IEEE Transactions on Information Theory, 65(6):3515–3524, 2019.
  33. Algebraic connectivity of Erdös–Rényi graphs near the connectivity threshold. Manuscript in preparation, 2014.
  34. J. Kwon and C. Caramanis. EM converges for a mixture of many linear regressions. In International Conference on Artificial Intelligence and Statistics, pages 1727–1736. PMLR, 2020.
  35. Global convergence of the EM algorithm for mixtures of two component linear regression. In Conference on Learning Theory, pages 2055–2110. PMLR, 2019.
  36. On the minimax optimality of the EM algorithm for learning two-component mixed linear regression. In International Conference on Artificial Intelligence and Statistics, pages 1405–1413. PMLR, 2021.
  37. R. Lai and J. Li. Solving partial differential equations on manifolds from incomplete interpoint distance. SIAM Journal on Scientific Computing, 39(5):A2231–A2256, 2017.
  38. Y. Li and Y. Liang. Learning mixtures of linear regressions with nearly optimal complexity. In Conference On Learning Theory, pages 1125–1144. PMLR, 2018.
  39. Euclidean distance geometry and applications. SIAM review, 56(1):3–69, 2014.
  40. Learning Plackett–Luce mixtures from partial preferences. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4328–4335, 2019.
  41. R. D. Luce. Individual choice behavior: A theoretical analysis. Wiley, 1959.
  42. C. Mao and Y. Wu. Learning mixtures of permutations: Groups of pairwise comparisons and combinatorial method of moments. The Annals of Statistics, 50(4):2231–2255, 2022.
  43. C. Mollica and L. Tardella. Bayesian Plackett–Luce mixture models for partially ranked data. Psychometrika, 82(2):442–458, 2017.
  44. D. Nguyen and A. Y. Zhang. Efficient and accurate learning of mixtures of Plackett–Luce models. arXiv preprint arXiv:2302.05343, 2023.
  45. W. Perkins and C. Xu. Frozen 1-rsb structure of the symmetric ising perceptron. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1579–1588, 2021.
  46. R. L. Plackett. The analysis of permutations. Journal of the Royal Statistical Society Series C: Applied Statistics, 24(2):193–202, 1975.
  47. Global convergence of least squares EM for demixing two log-concave densities. Advances in Neural Information Processing Systems, 32, 2019.
  48. Provable tensor methods for learning mixtures of generalized linear models. In Artificial Intelligence and Statistics, pages 1223–1231. PMLR, 2016.
  49. A. Singer. Angular synchronization by eigenvectors and semidefinite programming. Applied and computational harmonic analysis, 30(1):20–36, 2011.
  50. J. Spencer. Six standard deviations suffice. Transactions of the American mathematical society, 289(2):679–706, 1985.
  51. Noisy Euclidean distance matrix completion with a single missing node. Journal of Global Optimization, 75:973–1002, 2019.
  52. Learning mixtures of linear classifiers. In International Conference on Machine Learning, pages 721–729. PMLR, 2014.
  53. A. Tasissa and R. Lai. Exact reconstruction of Euclidean distance geometry problem using low-rank matrix completion. IEEE Transactions on Information Theory, 65(5):3124–3144, 2018.
  54. J. A. Tropp. An introduction to matrix concentration inequalities. Foundations and Trends® in Machine Learning, 8(1-2):1–230, 2015.
  55. R. Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  56. K. Viele and B. Tong. Modeling with mixtures of linear regressions. Statistics and Computing, 12:315–330, 2002.
  57. M. J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
  58. Universality of approximate message passing algorithms and tensor networks. arXiv preprint arXiv:2206.13037, 2022.
  59. C. J. Wu. On the convergence properties of the EM algorithm. The Annals of statistics, pages 95–103, 1983.
  60. Y. Wu and H. H. Zhou. Randomly initialized EM algorithm for two-component Gaussian mixture achieves near optimality in O⁢(n)𝑂𝑛{O}(\sqrt{n})italic_O ( square-root start_ARG italic_n end_ARG ) iterations. Mathematical Statistics and Learning, 4(3), 2021.
  61. Global analysis of expectation maximization for mixtures of two gaussians. Advances in Neural Information Processing Systems, 29, 2016.
  62. An alternative model for mixtures of experts. Advances in neural information processing systems, 7, 1994.
  63. On the identifiability of mixtures of ranking models. arXiv preprint arXiv:2201.13132, 2022.
  64. Learning mixtures of Plackett–Luce models. In International Conference on Machine Learning, pages 2906–2914. PMLR, 2016.
  65. Learning mixtures of random utility models with features from incomplete preferences. arXiv preprint arXiv:2006.03869, 2022.
  66. Y. Zhong and N. Boumal. Near-optimal bounds for phase synchronization. SIAM Journal on Optimization, 28(2):989–1016, 2018.
Citations (1)

Summary

We haven't generated a summary for this paper yet.