A Non-Classical Parameterization for Density Estimation Using Sample Moments (2201.04786v5)
Abstract: Probability density estimation is a core problem of statistics and signal processing. Moment methods are an important means of density estimation, but they are generally strongly dependent on the choice of feasible functions, which severely affects the performance. In this paper, we propose a non-classical parametrization for density estimation using sample moments, which does not require the choice of such functions. The parametrization is induced by the squared Hellinger distance, and the solution of it, which is proved to exist and be unique subject to a simple prior that does not depend on data, and can be obtained by convex optimization. Statistical properties of the density estimator, together with an asymptotic error upper bound are proposed for the estimator by power moments. Applications of the proposed density estimator in signal processing tasks are given. Simulation results validate the performance of the estimator by a comparison to several prevailing methods. To the best of our knowledge, the proposed estimator is the first one in the literature for which the power moments up to an arbitrary even order exactly match the sample moments, while the true density is not assumed to fall within specific function classes.
- E. Parzen, “On estimation of a probability density function and mode,” The annals of mathematical statistics, vol. 33, no. 3, pp. 1065–1076, 1962.
- F. Bunea, A. B. Tsybakov, and M. H. Wegkamp, “Sparse density estimation with l𝑙litalic_l1 penalties,” in International Conference on Computational Learning Theory. Springer, 2007, pp. 530–543.
- M. Dudik, S. J. Phillips, and R. E. Schapire, “Performance guarantees for regularized maximum entropy density estimation,” in International Conference on Computational Learning Theory. Springer, 2004, pp. 472–486.
- Y. Altun and A. Smola, “Unifying divergence minimization and statistical inference via convex duality,” in International Conference on Computational Learning Theory. Springer, 2006, pp. 139–153.
- L. Song, X. Zhang, A. Smola, A. Gretton, and B. Schölkopf, “Tailoring density estimation via reproducing kernel moment matching,” in Proceedings of the 25th international conference on Machine learning, 2008, pp. 992–999.
- H. Chernoff, “Estimation of the mode,” Annals of the Institute of Statistical Mathematics, vol. 16, no. 1, pp. 31–41, 1964.
- W. F. Eddy, “Optimum kernel estimators of the mode,” The Annals of Statistics, vol. 8, no. 4, pp. 870–882, 1980.
- Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE transactions on pattern analysis and machine intelligence, vol. 17, no. 8, pp. 790–799, 1995.
- C. Abraham, G. Biau, and B. Cadre, “On the asymptotic properties of a simple estimate of the mode,” ESAIM: Probability and Statistics, vol. 8, pp. 1–11, 2004.
- S. Dasgupta and S. Kpotufe, “Optimal rates for k-nn density and mode estimation,” Advances in Neural Information Processing Systems, vol. 27, pp. 2555–2563, 2014.
- C. R. Genovese, M. P. Pacifico, I. Verdinelli, L. Wasserman et al., “Minimax manifold estimation,” Journal of machine learning research, vol. 13, pp. 1263–1291, 2012.
- H. Jiang and S. Kpotufe, “Modal-set estimation with an application to clustering,” in Artificial Intelligence and Statistics. PMLR, 2017, pp. 1197–1206.
- P. Rigollet, “Generalization error bounds in semi-supervised classification under the cluster assumption.” Journal of Machine Learning Research, vol. 8, no. 7, 2007.
- T. T. Georgiou and A. Lindquist, “Kullback-Leibler approximation of spectral density functions,” IEEE Transactions on Information Theory, vol. 49, no. 11, pp. 2910–2917, 2003.
- D. Bertsimas and I. Popescu, “Optimal inequalities in probability theory: A convex optimization approach,” SIAM Journal on Optimization, vol. 15, no. 3, pp. 780–804, 2005.
- G. Wu and A. Lindquist, “Non-Gaussian Bayesian filtering by density parametrization using power moments,” Automatica, vol. 153, p. 111061, 2023.
- P. Hall, “On kullback-Leibler loss and density estimation,” The Annals of Statistics, pp. 1491–1519, 1987.
- J. Q. Li and A. R. Barron, “Mixture Density Estimation.” in NIPS, vol. 12, 1999, pp. 279–285.
- A. Cutler and O. I. Cordero-Brana, “Minimum Hellinger distance estimation for finite mixture models,” Journal of the American Statistical association, vol. 91, no. 436, pp. 1716–1723, 1996.
- Z. Lu, Y. V. Hui, and A. H. Lee, “Minimum Hellinger distance estimation for finite mixtures of poisson regression models and its applications,” Biometrics, vol. 59, no. 4, pp. 1016–1026, 2003.
- A. J. Izenman, “Review papers: Recent developments in nonparametric density estimation,” Journal of the american statistical association, vol. 86, no. 413, pp. 205–224, 1991.
- L. Gordon and R. A. Olshen, “Almost surely consistent nonparametric regression from recursive partitioning schemes,” Journal of Multivariate Analysis, vol. 15, no. 2, pp. 147–163, 1984.
- A. Tagliani, “A note on proximity of distributions in terms of coinciding moments,” Applied Mathematics and Computation, vol. 145, no. 2-3, pp. 195–203, 2003.
- J. N. Kapur and H. K. Kesavan, “Entropy optimization principles and their applications,” in Entropy and energy dissipation in water resources. Springer, 1992, pp. 3–20.
- S. Kullback, “Correction to a lower bound for discrimination information in terms of variation,” IEEE Transactions on Information Theory, vol. 16, no. 5, pp. 652–652, 1970.
- S. Kay, Q. Ding, B. Tang, and H. He, “Probability density function estimation using the eef with application to subset/feature selection,” IEEE Transactions on Signal Processing, vol. 64, no. 3, pp. 641–651, 2015.
- R. E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 1960.
- R. E. Kalman and R. S. Bucy, “New results in linear filtering and prediction theory,” Journal of Basic Engineering, vol. 83, no. 1, pp. 95–108, 1961.
- T. S. Schei, “A finite-difference method for linearization in nonlinear estimation algorithms,” Automatica, vol. 33, no. 11, pp. 2053–2058, 1997.
- M. Norgaard, N. K. Poulsen, and O. Ravn, “New developments in state estimation for nonlinear systems,” Automatica, vol. 36, no. 11, pp. 1627–1638, 2000.
- S. Julier, J. Uhlmann, and H. F. Durrant-Whyte, “A new method for the nonlinear transformation of means and covariances in filters and estimators,” IEEE Transactions on automatic control, vol. 45, no. 3, pp. 477–482, 2000.
- K. Ito and K. Xiong, “Gaussian filters for nonlinear filtering problems,” IEEE transactions on automatic control, vol. 45, no. 5, pp. 910–927, 2000.
- G. Wu and A. Lindquist, “A multivariate non-Gaussian Bayesian filter using power moments,” arXiv preprint arXiv:2211.13374, 2022.
- H. A. Blom and E. A. Bloem, “Exact bayesian and particle filtering of stochastic hybrid systems,” IEEE Transactions on Aerospace and Electronic Systems, vol. 43, no. 1, pp. 55–70, 2007.