Logistic-beta processes for dependent random probabilities with beta marginals (2402.07048v2)
Abstract: The beta distribution serves as a canonical tool for modelling probabilities in statistics and machine learning. However, there is limited work on flexible and computationally convenient stochastic process extensions for modelling dependent random probabilities. We propose a novel stochastic process called the logistic-beta process, whose logistic transformation yields a stochastic process with common beta marginals. Logistic-beta processes can model dependence on both discrete and continuous domains, such as space or time, and have a flexible dependence structure through correlation kernels. Moreover, its normal variance-mean mixture representation leads to effective posterior inference algorithms. We illustrate the benefits through nonparametric binary regression and conditional density estimation examples, both in simulation studies and in a pregnancy outcome application.
- Scale mixtures of normal distributions. Journal of the Royal Statistical Society: Series B (Methodological), 36(1):99–102.
- Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(3):269–342.
- Bayesian nonparametric dependent model for partially replicated data: The influence of fuel spills on species diversity. The Annals of Applied Statistics, 10(3):1496–1516.
- Nonparametric priors with full-range borrowing of information. Biometrika, in press.
- Clustering consistency with Dirichlet process mixtures. Biometrika, 110(2):551–558.
- Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(4):825–848.
- Normal variance-mean mixtures and z distributions. International Statistical Review, 50(2):145–159.
- On the support of MacEachern’s dependent Dirichlet processes and extensions. Bayesian Analysis, 7(2):277–310.
- Beta-product dependent Pitman–Yor processes for Bayesian inference. Journal of Econometrics, 180(1):49–72.
- Probability laws related to the Jacobi theta and Riemann zeta functions, and Brownian excursions. Bulletin of the American Mathematical Society, 38(4):435–465.
- Stan: A probabilistic programming language. Journal of Statistical Software, 76(1):1–32.
- Bayesian modeling of correlated binary responses via scale mixture of multivariate normal link functions. Sankhyā: The Indian Journal of Statistics, Series A, 60(3):322–343.
- Nonparametric Bayes conditional distribution modeling with variable selection. Journal of the American Statistical Association, 104(488):1646–1660.
- The local Dirichlet process. Annals of the Institue of Statistical Mathematics, 63(1):59–80.
- BNPmix: An R package for Bayesian nonparametric modeling via Pitman-Yor mixtures. Journal of Statistical Software, 100(15):1–33.
- Basis-function models in spatial statistics. Annual Review of Statistics and Its Application, 9(1):373–400.
- Bayesian nonparametric mixture modeling for temporal dynamics of gender stereotypes. The Annals of Applied Statistics, 17(3):2256–2278.
- Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer New York.
- Devroye, L. (2009). On exact simulation algorithms for some distributions related to Jacobi theta functions. Statistics & Probability Letters, 79(21):2251–2259.
- Modeling for dynamic ordinal regression relationships: An application to estimating maturity of rockfish in California. Journal of the American Statistical Association, 113(521):68–80.
- Kernel stick-breaking processes. Biometrika, 95(2):307–323.
- Species distribution models: Ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics, 40(1):677–697.
- Correlation and dependence in risk management: properties and pitfalls. Risk Management: Value at Risk and Beyond, 1:176–223.
- Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1(2):209–230.
- Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. The Annals of Statistics, 2(4):615–629.
- Improving the performance of predictive process modeling for large datasets. Computational Statistics & Data Analysis, 53(8):2873–2884.
- mcmcse: Monte Carlo standard errors for MCMC. R package version 1.5.0.
- Bayesian Data Analysis. Chapman and Hall/CRC.
- Fundamentals of Nonparametric Bayesian Inference. Cambridge University Press.
- Classification in a normalized feature space using support vector machines. IEEE Transactions on Neural Networks, 14(3):597–605.
- Simulation-based regularized logistic regression. Bayesian Analysis, 7(3):567–590.
- Order-based dependent Dirichlet processes. Journal of the American Statistical Association, 101(473):179–194.
- The Indian buffet process: An introduction and review. Journal of Machine Learning Research, 12(32):1185–1224.
- Grigelionis, B. (2008). On Pólya mixtures of multivariate Gaussian distributions. Statistics & Probability Letters, 78(12):1459–1465.
- Hjort, N. L. (1990). Nonparametric Bayes estimators based on beta processes in models for life history data. The Annals of Statistics, 18(3):1259–1294.
- Bayesian Nonparametrics. Cambridge University Press.
- Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1):145–168.
- A tree perspective on stick-breaking models in covariate-dependent mixtures. arXiv preprint arXiv:2208.02806.
- Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453):161–173.
- Joe, H. (2006). Range of correlation matrices for dependent random variables with given marginal distributions. In Advances in Distribution Theory, Order Statistics, and Inference, pages 125–142. Birkhäuser Boston.
- Joe, H. (2014). Dependence Modeling with Copulas. CRC Press.
- Continuous Univariate Distributions, Volume 2. John Wiley & Sons.
- Structured mixture of continuation-ratio logits models for ordinal regression. arXiv preprint arXiv:2211.04034.
- Katzfuss, M. (2017). A multi-resolution approximation for massive spatial datasets. Journal of the American Statistical Association, 112(517):201–214.
- Kingman, J. F. C. (1967). Completely random measures. Pacific Journal of Mathematics, 21(1):59–78.
- Lee, C. J. (2023). Loss-based objective and penalizing priors for model selection problems. arXiv preprint arXiv:2311.13347.
- Hierarchical generalized linear models. Journal of the Royal Statistical Society: Series B (Methodological), 58(4):619–656.
- Association between maternal serum concentration of the DDT metabolite DDE and preterm and small-for-gestational-age babies at birth. The Lancet, 358(9276):110–114.
- MacEachern, S. N. (1999). Dependent nonparametric processes. In ASA Proceedings of the Section on Bayesian Statistical Science, volume 1, pages 50–55.
- MacEachern, S. N. (2000). Dependent Drichlet processes. Technical report, Department of Statistics, The Ohio State University.
- Bayesian Nonparametric Data Analysis. Springer International Publishing.
- Some bivariate beta distributions. Statistics, 39(5):457–466.
- A time-series DDP for functional proteomics profiles. Biometrics, 68(3):859–868.
- A bivariate beta distribution. Statistics & Probability Letters, 62(4):407–412.
- Constructions for a bivariate beta distribution. Statistics & Probability Letters, 96:54–60.
- NIST Handbook of Mathematical Functions. Cambridge University Press.
- Size-biased sampling of Poisson point processes and excursions. Probability Theory and Related Fields, 92(1):21–39.
- Poisson random fields for dynamic feature models. Journal of Machine Learning Research, 18:1–45.
- The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability, 25(2):855–900.
- CODA: convergence diagnosis and output analysis for MCMC. R package version 0.19.4.
- Data augmentation for non-Gaussian regression models using variance-mean mixtures. Biometrika, 100(2):459–471.
- Bayesian inference for logistic models using Pólya–Gamma latent variables. Journal of the American Statistical Association, 108(504):1339–1349.
- A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6(65):1939–1959.
- The dependent Dirichlet process and related models. Statistical Science, 37(1):24–41.
- R Core Team (2023). R: A language and environment for statistical computing.
- Gaussian Processes for Machine Learning. MIT Press.
- Logistic stick-breaking process. Journal of Machine Learning Research, 12(1):203–239.
- Tractable Bayesian density regression via logit stick-breaking priors. Journal of Statistical Planning and Inference, 211:131–142.
- Monte Carlo Statistical Methods, volume 2. Springer New York.
- Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. Journal of Applied Probability, 44(2):458–475.
- Examples of adaptive MCMC. Journal of Computational and Graphical Statistics, 18(2):349–367.
- Nonparametric Bayesian models through probit stick-breaking processes. Bayesian Analysis, 6(1):145–177.
- Probabilistic aspects of Jacobi theta functions. arXiv preprint arXiv:2303.05942.
- A full scale approximation of covariance functions for large spatial data sets. Journal of the Royal Statistical Society Series B: Statistical Methodology, 74(1):111–132.
- Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statistica Sinica, 4(2):639–650.
- Bivariate beta-LSTM. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5818–5825.
- Taddy, M. A. (2010). Autoregressive mixture models for dynamic spatial Poisson processes: Application to tracking intensity of violent crime. Journal of the American Statistical Association, 105(492):1403–1417.
- The multivariate beta process and an extension of the Polya tree model. Biometrika, 98(1):17–34.
- Tutz, G. (1991). Sequential models in categorical regression. Computational Statistics & Data Analysis, 11(3):275–295.
- Multivariate output analysis for Markov chain Monte Carlo. Biometrika, 106(2):321–337.
- Vecchia, A. V. (1988). Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society: Series B (Methodological), 50(2):297–312.
- Bayesian dependent mixture models: A predictive comparison and survey. arXiv preprint arXiv:2307.16298.
- Robust multi-task learning with t-processes. In Proceedings of the 24th International Conference on Machine Learning, pages 1103–1110.
- Beta diffusion. In Advances in Neural Information Processing Systems, volume 36.
- Bayesian nonparametric modeling of latent partitions via Stirling-gamma priors. arXiv preprint arXiv:2306.02360.
- Bayesian modeling of sequential discoveries. Journal of the American Statistical Association, 118(544):2521–2532.