Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning Mixtures of Gaussians with Censored Data (2305.04127v2)

Published 6 May 2023 in cs.LG and stat.ML

Abstract: We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$ \sum_{i=1}k w_i \mathcal{N}(\mu_i,\sigma2), $$ i.e. the sample is observed only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and the means $\mu_i$. We propose an algorithm that takes only $\frac{1}{\varepsilon{O(k)}}$ samples to estimate the weights $w_i$ and the means $\mu_i$ within $\varepsilon$ error.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. T. Amemiya. Regression analysis when the dependent variable is truncated normal. Econometrica: Journal of the Econometric Society, pages 997–1016, 1973.
  2. N. Balakrishnan and E. Cramer. The art of progressive censoring. Statistics for industry and technology, 2014.
  3. D. Bernoulli. Essai d’une nouvelle analyse de la mortalité causée par la petite vérole, et des avantages de l’inoculation pour la prévenir. Histoire de l’Acad., Roy. Sci.(Paris) avec Mem, pages 1–45, 1760.
  4. A. C. Cohen. Truncated and Censored Samples: Theory and Applications. CRC Press, 2016.
  5. S. Dasgupta. Learning mixtures of gaussians. In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), pages 634–644. IEEE, 1999.
  6. Ten steps of em suffice for mixtures of two gaussians. In Conference on Learning Theory, pages 704–710. PMLR, 2017.
  7. Efficient statistics, in high dimensions, from truncated samples. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 639–649. IEEE, 2018.
  8. Computationally and statistically efficient truncated regression. In Conference on Learning Theory, pages 955–960. PMLR, 2019.
  9. I. Diakonikolas and D. M. Kane. Recent advances in algorithmic high-dimensional robust statistics. arXiv preprint arXiv:1911.05911, 2019.
  10. Being robust (in high dimensions) can be practical. In International Conference on Machine Learning, pages 999–1008. PMLR, 2017.
  11. Robustly learning a gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2683–2702. SIAM, 2018.
  12. Robust estimators in high-dimensions without the computational intractability. SIAM Journal on Computing, 48(2):742–864, 2019.
  13. Optimal estimation of high-dimensional gaussian mixtures. arXiv preprint arXiv:2002.05818, 2020.
  14. R. Fisher. Properties and applications of hh functions. Mathematical tables, 1:815–852, 1931.
  15. F. Galton. An examination into the registered speeds of american trotting horses, with remarks on their value as hereditary data. Proceedings of the Royal Society of London, 62(379-387):310–315, 1898.
  16. M. Hardt and E. Price. Tight bounds for learning a mixture of two gaussians. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 753–760, 2015.
  17. Social experimentation, truncated distributions, and efficient estimation. Econometrica: Journal of the Econometric Society, pages 919–938, 1977.
  18. Efficient mean estimation with pure differential privacy via a sum-of-squares exponential mechanism. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1406–1417, 2022.
  19. Efficiently learning mixtures of two gaussians. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 553–562, 2010.
  20. Agnostic estimation of mean and covariance. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665–674. IEEE, 2016.
  21. A. Lee. Table of the gaussian” tail” functions; when the” tail” is larger than the body. Biometrika, 10(2/3):208–214, 1914.
  22. G. Lee and C. Scott. Em algorithms for multivariate gaussian mixture models with truncated and censored data. Computational Statistics & Data Analysis, 56(9):2816–2829, 2012.
  23. B. G. Lindsay. Mixture models: theory, geometry and applications. In NSF-CBMS regional conference series in probability and statistics, pages i–163. JSTOR, 1995.
  24. Robust and differentially private mean estimation. Advances in neural information processing systems, 34:3887–3901, 2021.
  25. G. S. Maddala. Limited-dependent and qualitative variables in econometrics. Number 3. Cambridge university press, 1986.
  26. G. McLachlan and P. Jones. Fitting mixture models to grouped and truncated data via the em algorithm. Biometrics, pages 571–578, 1988.
  27. A. Moitra. Super-resolution, extremal functions and the condition number of vandermonde matrices. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 821–830, 2015.
  28. A. Moitra and G. Valiant. Settling the polynomial learnability of mixtures of gaussians. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 93–102. IEEE, 2010.
  29. S. G. Nagarajan and I. Panageas. On the analysis of em for truncated mixtures of two gaussians. In Algorithmic Learning Theory, pages 634–659. PMLR, 2020.
  30. K. Pearson. Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London. A, 185:71–110, 1894.
  31. K. Pearson. On the systematic fitting of curves to observations and measurements. Biometrika, 1(3):265–303, 1902.
  32. K. Pearson and A. Lee. On the generalised probable error in multiple normal correlation. Biometrika, 6(1):59–68, 1908.
  33. A fourier approach to mixture learning. Advances in Neural Information Processing Systems, 35:20850–20861, 2022.
  34. O. Regev and A. Vijayaraghavan. On learning mixtures of well-separated gaussians. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 85–96. IEEE, 2017.
  35. H. Schneider. Truncated and censored samples from normal populations. Marcel Dekker, Inc., 1986.
  36. J. Tobin. Estimation of relationships for limited dependent variables. Econometrica: journal of the Econometric Society, pages 24–36, 1958.
  37. S. Vempala and G. Wang. A spectral algorithm for learning mixture models. Journal of Computer and System Sciences, 68(4):841–860, 2004.
  38. Y. Wu and P. Yang. Optimal estimation of gaussian mixtures via denoised method of moments. arXiv preprint arXiv:1807.07237, 2018.
  39. Global analysis of expectation maximization for mixtures of two gaussians. Advances in Neural Information Processing Systems, 29, 2016.
Citations (1)

Summary

We haven't generated a summary for this paper yet.