2000 character limit reached
Learning Mixtures of Gaussians with Censored Data (2305.04127v2)
Published 6 May 2023 in cs.LG and stat.ML
Abstract: We study the problem of learning mixtures of Gaussians with censored data. Statistical learning with censored data is a classical problem, with numerous practical applications, however, finite-sample guarantees for even simple latent variable models such as Gaussian mixtures are missing. Formally, we are given censored data from a mixture of univariate Gaussians $$ \sum_{i=1}k w_i \mathcal{N}(\mu_i,\sigma2), $$ i.e. the sample is observed only if it lies inside a set $S$. The goal is to learn the weights $w_i$ and the means $\mu_i$. We propose an algorithm that takes only $\frac{1}{\varepsilon{O(k)}}$ samples to estimate the weights $w_i$ and the means $\mu_i$ within $\varepsilon$ error.
- T. Amemiya. Regression analysis when the dependent variable is truncated normal. Econometrica: Journal of the Econometric Society, pages 997–1016, 1973.
- N. Balakrishnan and E. Cramer. The art of progressive censoring. Statistics for industry and technology, 2014.
- D. Bernoulli. Essai d’une nouvelle analyse de la mortalité causée par la petite vérole, et des avantages de l’inoculation pour la prévenir. Histoire de l’Acad., Roy. Sci.(Paris) avec Mem, pages 1–45, 1760.
- A. C. Cohen. Truncated and Censored Samples: Theory and Applications. CRC Press, 2016.
- S. Dasgupta. Learning mixtures of gaussians. In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), pages 634–644. IEEE, 1999.
- Ten steps of em suffice for mixtures of two gaussians. In Conference on Learning Theory, pages 704–710. PMLR, 2017.
- Efficient statistics, in high dimensions, from truncated samples. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 639–649. IEEE, 2018.
- Computationally and statistically efficient truncated regression. In Conference on Learning Theory, pages 955–960. PMLR, 2019.
- I. Diakonikolas and D. M. Kane. Recent advances in algorithmic high-dimensional robust statistics. arXiv preprint arXiv:1911.05911, 2019.
- Being robust (in high dimensions) can be practical. In International Conference on Machine Learning, pages 999–1008. PMLR, 2017.
- Robustly learning a gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2683–2702. SIAM, 2018.
- Robust estimators in high-dimensions without the computational intractability. SIAM Journal on Computing, 48(2):742–864, 2019.
- Optimal estimation of high-dimensional gaussian mixtures. arXiv preprint arXiv:2002.05818, 2020.
- R. Fisher. Properties and applications of hh functions. Mathematical tables, 1:815–852, 1931.
- F. Galton. An examination into the registered speeds of american trotting horses, with remarks on their value as hereditary data. Proceedings of the Royal Society of London, 62(379-387):310–315, 1898.
- M. Hardt and E. Price. Tight bounds for learning a mixture of two gaussians. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 753–760, 2015.
- Social experimentation, truncated distributions, and efficient estimation. Econometrica: Journal of the Econometric Society, pages 919–938, 1977.
- Efficient mean estimation with pure differential privacy via a sum-of-squares exponential mechanism. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1406–1417, 2022.
- Efficiently learning mixtures of two gaussians. In Proceedings of the forty-second ACM symposium on Theory of computing, pages 553–562, 2010.
- Agnostic estimation of mean and covariance. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665–674. IEEE, 2016.
- A. Lee. Table of the gaussian” tail” functions; when the” tail” is larger than the body. Biometrika, 10(2/3):208–214, 1914.
- G. Lee and C. Scott. Em algorithms for multivariate gaussian mixture models with truncated and censored data. Computational Statistics & Data Analysis, 56(9):2816–2829, 2012.
- B. G. Lindsay. Mixture models: theory, geometry and applications. In NSF-CBMS regional conference series in probability and statistics, pages i–163. JSTOR, 1995.
- Robust and differentially private mean estimation. Advances in neural information processing systems, 34:3887–3901, 2021.
- G. S. Maddala. Limited-dependent and qualitative variables in econometrics. Number 3. Cambridge university press, 1986.
- G. McLachlan and P. Jones. Fitting mixture models to grouped and truncated data via the em algorithm. Biometrics, pages 571–578, 1988.
- A. Moitra. Super-resolution, extremal functions and the condition number of vandermonde matrices. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 821–830, 2015.
- A. Moitra and G. Valiant. Settling the polynomial learnability of mixtures of gaussians. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 93–102. IEEE, 2010.
- S. G. Nagarajan and I. Panageas. On the analysis of em for truncated mixtures of two gaussians. In Algorithmic Learning Theory, pages 634–659. PMLR, 2020.
- K. Pearson. Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London. A, 185:71–110, 1894.
- K. Pearson. On the systematic fitting of curves to observations and measurements. Biometrika, 1(3):265–303, 1902.
- K. Pearson and A. Lee. On the generalised probable error in multiple normal correlation. Biometrika, 6(1):59–68, 1908.
- A fourier approach to mixture learning. Advances in Neural Information Processing Systems, 35:20850–20861, 2022.
- O. Regev and A. Vijayaraghavan. On learning mixtures of well-separated gaussians. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 85–96. IEEE, 2017.
- H. Schneider. Truncated and censored samples from normal populations. Marcel Dekker, Inc., 1986.
- J. Tobin. Estimation of relationships for limited dependent variables. Econometrica: journal of the Econometric Society, pages 24–36, 1958.
- S. Vempala and G. Wang. A spectral algorithm for learning mixture models. Journal of Computer and System Sciences, 68(4):841–860, 2004.
- Y. Wu and P. Yang. Optimal estimation of gaussian mixtures via denoised method of moments. arXiv preprint arXiv:1807.07237, 2018.
- Global analysis of expectation maximization for mixtures of two gaussians. Advances in Neural Information Processing Systems, 29, 2016.