Unsupervised tree boosting for learning probability distributions (2101.11083v7)
Abstract: We propose an unsupervised tree boosting algorithm for inferring the underlying sampling distribution of an i.i.d. sample based on fitting additive tree ensembles in a fashion analogous to supervised tree boosting. Integral to the algorithm is a new notion of "addition" on probability distributions that leads to a coherent notion of "residualization", i.e., subtracting a probability distribution from an observation to remove the distributional structure from the sampling distribution of the latter. We show that these notions arise naturally for univariate distributions through cumulative distribution function (CDF) transforms and compositions due to several "group-like" properties of univariate CDFs. While the traditional multivariate CDF does not preserve these properties, a new definition of multivariate CDF can restore these properties, thereby allowing the notions of "addition" and "residualization" to be formulated for multivariate settings as well. This then gives rise to the unsupervised boosting algorithm based on forward-stagewise fitting of an additive tree ensemble, which sequentially reduces the Kullback-Leibler divergence from the truth. The algorithm allows analytic evaluation of the fitted density and outputs a generative model that can be readily sampled from. We enhance the algorithm with scale-dependent shrinkage and a two-stage strategy that separately fits the marginals and the copula. The algorithm then performs competitively to state-of-the-art deep-learning approaches in multivariate density estimation on multiple benchmark data sets.
- Naoki Awaya and Li Ma. Hidden markov Pólya trees for high-dimensional distributions. Journal of the American Statistical Association, pages 1–13, 2022.
- Vladimir I Bogachev. Measure theory, volume 1. Springer Science & Business Media, 2007.
- Leo Breiman. Population theory for boosting ensembles. The Annals of Statistics, 32(1):1–11, 2004.
- Gbht: Gradient boosting histogram transform for density estimation. In International Conference on Machine Learning, pages 2233–2243. PMLR, 2021.
- Density estimation using Real NVP. Proceedings of the 5th International Conference on Learning Representations, 2017.
- UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
- Jerome H Friedman. Greedy function approximation: a gradient boosting machine. The Annals of Statistics, pages 1189–1232, 2001.
- Jerome H Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378, 2002.
- MADE: Masked autoencoder for distribution estimation. In International Conference on Machine Learning, pages 881–889. PMLR, 2015.
- Subhashis Ghosal and Aad Van der Vaart. Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press, 2017.
- The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, 2009.
- Stochastic tree ensembles for regularized nonlinear regression. Journal of the American Statistical Association, (just-accepted):1–61, 2021.
- Neural autoregressive flows. In International Conference on Machine Learning, pages 2078–2087. PMLR, 2018.
- Deep density destructors. In International Conference on Machine Learning, pages 2167–2175. PMLR, 2018.
- dslabs: Data Science Labs, 2021. URL https://CRAN.R-project.org/package=dslabs. R package version 0.7.4.
- Sum-of-squares polynomial flow. In International Conference on Machine Learning, pages 3009–3018. PMLR, 2019.
- Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
- The expressive power of a class of normalizing flow models. In International Conference on Artificial Intelligence and Statistics, pages 3599–3609. PMLR, 2020.
- Michael Lavine. Some aspects of Pólya tree distributions for statistical modelling. The Annals of Statistics, 20(3):1222–1235, 1992.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- Density estimation using deep generative neural networks. Proceedings of the National Academy of Sciences, 118(15), 2021.
- Multivariate density estimation by bayesian sequential partitioning. Journal of the American Statistical Association, 108(504):1402–1410, 2013.
- Li Ma. Adaptive shrinkage in Pólya tree type models. Bayesian Analysis, 12(3):779–805, 2017.
- Boosting algorithms as gradient descent. Advances in neural information processing systems, 12, 1999.
- George Papamakarios. Neural density estimation and likelihood-free inference. arXiv preprint arXiv:1910.13233, 2019.
- Masked autoregressive flow for density estimation. Advances in neural information processing systems, 30, 2017.
- Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1–64, 2021.
- Density estimation trees. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 627–635, 2011.
- Greg Ridgeway. Looking for lumps: Boosting and bagging for density estimation. Computational Statistics & Data Analysis, 38(4):379–392, 2002.
- Boosting density estimation. Advances in Neural Information Processing Systems, 15, 2002.
- Wing H Wong and Li Ma. Optional Pólya tree and Bayesian inference. The Annals of Statistics, 38(3):1433–1459, 2010.