Deep Horseshoe Gaussian Processes (2403.01737v1)
Abstract: Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We introduce the deep Horseshoe Gaussian process Deep-HGP, a new simple prior based on deep Gaussian processes with a squared-exponential kernel, that in particular enables data-driven choices of the key lengthscale parameters. For nonparametric regression with random design, we show that the associated tempered posterior distribution recovers the unknown true regression curve optimally in terms of quadratic loss, up to a logarithmic factor, in an adaptive way. The convergence rates are simultaneously adaptive to both the smoothness of the regression function and to its structure in terms of compositions. The dependence of the rates in terms of dimension are explicit, allowing in particular for input spaces of dimension increasing with the number of observations.
- {bmisc}[author] \bauthor\bsnmAbraham, \bfnmKweku\binitsK. and \bauthor\bsnmDeo, \bfnmNeil\binitsN. (\byear2023). \btitleDeep Gaussian Process Priors for Bayesian Inference in Nonlinear Inverse Problems. \bnoteArxiv preprint 2312.14294. \endbibitem
- {barticle}[author] \bauthor\bsnmAgapiou, \bfnmSergios\binitsS. and \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI. (\byear2023). \btitleHeavy-tailed Bayesian nonparametric adaptation. \bnotearXiv e-print 2308.04916. \endbibitem
- {barticle}[author] \bauthor\bsnmBachoc, \bfnmFrançois\binitsF. and \bauthor\bsnmLagnoux, \bfnmAgnès\binitsA. (\byear2021). \btitlePosterior contraction rates for constrained deep Gaussian processes in density estimation and classification. \bnoteArxiv preprint 2112.07280. \bdoi10.48550/ARXIV.2112.07280 \endbibitem
- {binproceedings}[author] \bauthor\bsnmBai, \bfnmJincheng\binitsJ., \bauthor\bsnmSong, \bfnmQifan\binitsQ. and \bauthor\bsnmCheng, \bfnmGuang\binitsG. (\byear2020). \btitleEfficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee. In \bbooktitleAdvances in Neural Information Processing Systems \bvolume33 \bpages466–476. \endbibitem
- {barticle}[author] \bauthor\bsnmBhattacharya, \bfnmAnirban\binitsA., \bauthor\bsnmPati, \bfnmDebdeep\binitsD. and \bauthor\bsnmDunson, \bfnmDavid\binitsD. (\byear2014). \btitleAnisotropic function estimation using multi-bandwidth Gaussian processes. \bjournalThe Annals of Statistics \bvolume42 \bpages352–381. \endbibitem
- {barticle}[author] \bauthor\bsnmBhattacharya, \bfnmAnirban\binitsA., \bauthor\bsnmPati, \bfnmDebdeep\binitsD. and \bauthor\bsnmYang, \bfnmYun\binitsY. (\byear2019). \btitleBayesian fractional posteriors. \bjournalThe Annals of Statistics \bvolume47 \bpages39 – 66. \bdoi10.1214/18-AOS1712 \endbibitem
- {barticle}[author] \bauthor\bsnmCarvalho, \bfnmCarlos M.\binitsC. M., \bauthor\bsnmPolson, \bfnmNicholas G.\binitsN. G. and \bauthor\bsnmScott, \bfnmJames G.\binitsJ. G. (\byear2010). \btitleThe horseshoe estimator for sparse signals. \bjournalBiometrika \bvolume97 \bpages465-480. \bdoi10.1093/biomet/asq017 \endbibitem
- {barticle}[author] \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI. (\byear2008). \btitleLower bounds for posterior rates with Gaussian process priors. \bjournalElectron. J. Stat. \bvolume2 \bpages1281–1299. \bdoi10.1214/08-EJS273 \bmrnumber2471287 \endbibitem
- {barticle}[author] \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI. (\byear2012). \btitleA semiparametric Bernstein-von Mises theorem for Gaussian process priors. \bjournalProbability Theory and Related Fields \bvolume152 \bpages53–99. \bmrnumber2875753 \endbibitem
- {barticle}[author] \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI., \bauthor\bsnmKerkyacharian, \bfnmGérard\binitsG. and \bauthor\bsnmPicard, \bfnmDominique\binitsD. (\byear2014). \btitleThomas Bayes’ walk on manifolds. \bjournalProbab. Theory Related Fields \bvolume158 \bpages665–710. \bdoi10.1007/s00440-013-0493-0 \bmrnumber3176362 \endbibitem
- {barticle}[author] \bauthor\bsnmCastillo, \bfnmI.\binitsI. and \bauthor\bsnmRandrianarisoa, \bfnmT.\binitsT. (\byear2024). \btitleSupplementary material to ’Deep Horseshoe Gaussian Processes’. \endbibitem
- {barticle}[author] \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI. and \bauthor\bsnmRousseau, \bfnmJudith\binitsJ. (\byear2015). \btitleA Bernstein–von Mises theorem for smooth functionals in semiparametric models. \bjournalAnn. Statist. \bvolume43 \bpages2353–2383. \bdoi10.1214/15-AOS1336 \bmrnumber3405597 \endbibitem
- {barticle}[author] \bauthor\bsnmChang, \bfnmAlan\binitsA. (\byear2017). \btitleThe Whitney extension theorem in high dimensions. \bjournalRev. Mat. Iberoam. \bvolume33 \bpages623–632. \bdoi10.4171/RMI/952 \bmrnumber3651018 \endbibitem
- {binproceedings}[author] \bauthor\bsnmChérief-Abdellatif, \bfnmBadr-Eddine\binitsB. (\byear2020). \btitleConvergence Rates of Variational Inference in Sparse Deep Learning. In \bbooktitleProceedings of the 37th International Conference on Machine Learning, ICML 2020. \bseriesProceedings of Machine Learning Research \bvolume119 \bpages1831–1842. \endbibitem
- {binproceedings}[author] \bauthor\bsnmDamianou, \bfnmAndreas\binitsA. and \bauthor\bsnmLawrence, \bfnmNeil D.\binitsN. D. (\byear2013). \btitleDeep Gaussian Processes. In \bbooktitleProceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics. \bseriesProceedings of Machine Learning Research \bvolume31 \bpages207–215. \endbibitem
- {barticle}[author] \bauthor\bsnmFinocchio, \bfnmGianluca\binitsG. and \bauthor\bsnmSchmidt-Hieber, \bfnmJohannes\binitsJ. (\byear2023). \btitlePosterior contraction for deep Gaussian process priors. \bjournalJournal of Machine Learning Research \bvolume24 \bpages1–49. \endbibitem
- {barticle}[author] \bauthor\bsnmGhosal, \bfnmSubhashis\binitsS., \bauthor\bsnmGhosh, \bfnmJayanta K.\binitsJ. K. and \bauthor\bparticlevan der \bsnmVaart, \bfnmAad W.\binitsA. W. (\byear2000). \btitleConvergence rates of posterior distributions. \bjournalAnn. Statist. \bvolume28 \bpages500–531. \bdoi10.1214/aos/1016218228 \bmrnumber1790007 \endbibitem
- {bbook}[author] \bauthor\bsnmGhosal, \bfnmSubhashis\binitsS. and \bauthor\bparticlevan der \bsnmVaart, \bfnmAad\binitsA. (\byear2017). \btitleFundamentals of nonparametric Bayesian inference. \bseriesCambridge Series in Statistical and Probabilistic Mathematics \bvolume44. \bpublisherCambridge University Press, Cambridge. \bdoi10.1017/9781139029834 \bmrnumber3587782 \endbibitem
- {binproceedings}[author] \bauthor\bsnmGiordano, \bfnmMatteo\binitsM., \bauthor\bsnmRay, \bfnmKolyan\binitsK. and \bauthor\bsnmSchmidt-Hieber, \bfnmJohannes\binitsJ. (\byear2022). \btitleOn the inability of Gaussian process regression to optimally learn compositional functions. In \bbooktitleAdvances in Neural Information Processing Systems \bvolume35 \bpages22341–22353. \endbibitem
- {barticle}[author] \bauthor\bsnmHazan, \bfnmTamir\binitsT. and \bauthor\bsnmJaakkola, \bfnmTommi\binitsT. (\byear2015). \btitleSteps toward deep kernel methods from infinite neural networks. \bjournalarXiv preprint arXiv:1508.05133. \endbibitem
- {barticle}[author] \bauthor\bsnmJiang, \bfnmSheng\binitsS. and \bauthor\bsnmTokdar, \bfnmSurya T.\binitsS. T. (\byear2021). \btitleVariable selection consistency of Gaussian process regression. \bjournalAnn. Statist. \bvolume49 \bpages2491–2505. \bdoi10.1214/20-aos2043 \bmrnumber4338372 \endbibitem
- {barticle}[author] \bauthor\bsnmKohler, \bfnmMichael\binitsM. and \bauthor\bsnmLanger, \bfnmSophie\binitsS. (\byear2021). \btitleOn the rate of convergence of fully connected deep neural network regression estimates. \bjournalAnn. Statist. \bvolume49 \bpages2231–2249. \bdoi10.1214/20-aos2034 \bmrnumber4319248 \endbibitem
- {barticle}[author] \bauthor\bsnmKuelbs, \bfnmJames\binitsJ. and \bauthor\bsnmLi, \bfnmWenbo V.\binitsW. V. (\byear1993). \btitleMetric entropy and the small ball problem for Gaussian measures. \bjournalJ. Funct. Anal. \bvolume116 \bpages133–157. \bdoi10.1006/jfan.1993.1107 \bmrnumber1237989 \endbibitem
- {barticle}[author] \bauthor\bsnmLi, \bfnmWenbo V.\binitsW. V. and \bauthor\bsnmLinde, \bfnmWerner\binitsW. (\byear1999). \btitleApproximation, metric entropy and small ball estimates for Gaussian measures. \bjournalAnn. Probab. \bvolume27 \bpages1556–1578. \bdoi10.1214/aop/1022677459 \bmrnumber1733160 \endbibitem
- {bbook}[author] \bauthor\bsnmNeal, \bfnmRadford M\binitsR. M. (\byear2012). \btitleBayesian learning for neural networks \bvolume118. \bpublisherSpringer Science & Business Media. \endbibitem
- {bmisc}[author] \bauthor\bsnmOhn, \bfnmIlsang\binitsI. and \bauthor\bsnmLin, \bfnmLizhen\binitsL. (\byear2022). \btitleAdaptive variational Bayes: Optimality, computation and applications. \bnoteeprint Arxiv 2109.03204. \endbibitem
- {barticle}[author] \bauthor\bsnmPati, \bfnmDebdeep\binitsD., \bauthor\bsnmBhattacharya, \bfnmAnirban\binitsA. and \bauthor\bsnmCheng, \bfnmGuang\binitsG. (\byear2015). \btitleOptimal Bayesian estimation in random covariate design with a rescaled Gaussian process prior. \bjournalJ. Mach. Learn. Res. \bvolume16 \bpages2837–2851. \bmrnumber3450525 \endbibitem
- {bbook}[author] \bauthor\bsnmPisier, \bfnmGilles\binitsG. (\byear1989). \btitleThe volume of convex bodies and Banach space geometry. \bseriesCambridge Tracts in Mathematics \bvolume94. \bpublisherCambridge University Press, Cambridge. \bdoi10.1017/CBO9780511662454 \bmrnumber1036275 \endbibitem
- {binproceedings}[author] \bauthor\bsnmRocková, \bfnmVeronika\binitsV. and \bauthor\bsnmPolson, \bfnmNicholas\binitsN. (\byear2018). \btitlePosterior Concentration for Sparse Deep Learning. In \bbooktitleAnnual Conference on Neural Information Processing Systems 2018, NeurIPS 2018 \bpages938–949. \endbibitem
- {binproceedings}[author] \bauthor\bsnmSalimbeni, \bfnmHugh\binitsH. and \bauthor\bsnmDeisenroth, \bfnmMarc\binitsM. (\byear2017). \btitleDoubly Stochastic Variational Inference for Deep Gaussian Processes. In \bbooktitleAdvances in Neural Information Processing Systems \bvolume30. \endbibitem
- {bmanual}[author] \bauthor\bsnmSauer, \bfnmAnnie\binitsA. (\byear2022). \btitledeepgp: Deep Gaussian Processes using MCMC \bnoteR package version 1.1.1. \endbibitem
- {barticle}[author] \bauthor\bsnmSauer, \bfnmAnnie\binitsA., \bauthor\bsnmCooper, \bfnmAndrew\binitsA. and \bauthor\bsnmGramacy, \bfnmRobert B.\binitsR. B. (\byear2023). \btitleVecchia-Approximated Deep Gaussian Processes for Computer Experiments. \bjournalJournal of Computational and Graphical Statistics \bvolume32 \bpages824-837. \bdoi10.1080/10618600.2022.2129662 \endbibitem
- {barticle}[author] \bauthor\bsnmSchmidt-Hieber, \bfnmJohannes\binitsJ. (\byear2020). \btitleNonparametric regression using deep neural networks with ReLU activation function. \bjournalAnn. Statist. \bvolume48 \bpages1875–1897. \bdoi10.1214/19-AOS1875 \bmrnumber4134774 \endbibitem
- {barticle}[author] \bauthor\bsnmSzabó, \bfnmBotond\binitsB., \bauthor\bparticlevan der \bsnmVaart, \bfnmAad W.\binitsA. W. and \bauthor\bparticlevan \bsnmZanten, \bfnmHarry\binitsH. (\byear2015). \btitleFrequentist coverage of adaptive nonparametric Bayesian credible sets. \bjournalAnn. Statist. \bvolume43 \bpages1391–1428. \bnote(with discussion). \endbibitem
- {barticle}[author] \bauthor\bsnmTeckentrup, \bfnmAretha L.\binitsA. L. (\byear2020). \btitleConvergence of Gaussian process regression with estimated hyper-parameters and applications in Bayesian inverse problems. \bjournalSIAM/ASA J. Uncertain. Quantif. \bvolume8 \bpages1310–1337. \bdoi10.1137/19M1284816 \bmrnumber4164077 \endbibitem
- {barticle}[author] \bauthor\bsnmTomczak-Jaegermann, \bfnmNicole\binitsN. (\byear1987). \btitleDualité des nombres d’entropie pour des opérateurs à valeurs dans un espace de Hilbert. \bjournalC. R. Acad. Sci. Paris Sér. I Math. \bvolume305 \bpages299–301. \bmrnumber910364 \endbibitem
- {barticle}[author] \bauthor\bparticlevan der \bsnmPas, \bfnmStéphanie\binitsS., \bauthor\bsnmSzabó, \bfnmBotond\binitsB. and \bauthor\bparticlevan der \bsnmVaart, \bfnmAad\binitsA. (\byear2017). \btitleUncertainty quantification for the horseshoe (with discussion). \bjournalBayesian Anal. \bvolume12 \bpages1221–1274. \bnoteWith a rejoinder by the authors. \bdoi10.1214/17-BA1065 \bmrnumber3724985 \endbibitem
- {barticle}[author] \bauthor\bparticlevan der \bsnmPas, \bfnmS. L.\binitsS. L., \bauthor\bsnmKleijn, \bfnmB. J. K.\binitsB. J. K. and \bauthor\bparticlevan der \bsnmVaart, \bfnmA. W.\binitsA. W. (\byear2014). \btitleThe horseshoe estimator: Posterior concentration around nearly black vectors. \bjournalElectronic Journal of Statistics \bvolume8 \bpages2585 – 2618. \bdoi10.1214/14-EJS962 \endbibitem
- {barticle}[author] \bauthor\bparticlevan der \bsnmPas, \bfnmS. L.\binitsS. L., \bauthor\bsnmKleijn, \bfnmB. J. K.\binitsB. J. K. and \bauthor\bparticlevan der \bsnmVaart, \bfnmA. W.\binitsA. W. (\byear2014). \btitleThe horseshoe estimator: posterior concentration around nearly black vectors. \bjournalElectron. J. Stat. \bvolume8 \bpages2585–2618. \bdoi10.1214/14-EJS962 \bmrnumber3285877 \endbibitem
- {barticle}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmAad\binitsA. and \bauthor\bparticlevan \bsnmZanten, \bfnmHarry\binitsH. (\byear2007). \btitleBayesian inference with rescaled Gaussian process priors. \bjournalElectron. J. Stat. \bvolume1 \bpages433–448. \bdoi10.1214/07-EJS098 \bmrnumber2357712 \endbibitem
- {barticle}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmAad\binitsA. and \bauthor\bparticlevan \bsnmZanten, \bfnmHarry\binitsH. (\byear2011). \btitleInformation rates of nonparametric Gaussian process methods. \bjournalJ. Mach. Learn. Res. \bvolume12 \bpages2095–2119. \bmrnumber2819028 \endbibitem
- {barticle}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmA. W.\binitsA. W. and \bauthor\bparticlevan \bsnmZanten, \bfnmJ. H.\binitsJ. H. (\byear2008). \btitleRates of contraction of posterior distributions based on Gaussian process priors. \bjournalAnn. Statist. \bvolume36 \bpages1435–1463. \bdoi10.1214/009053607000000613 \bmrnumber2418663 \endbibitem
- {barticle}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmA. W.\binitsA. W. and \bauthor\bparticlevan \bsnmZanten, \bfnmJ. H.\binitsJ. H. (\byear2009). \btitleAdaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth. \bjournalAnn. Statist. \bvolume37 \bpages2655–2675. \bdoi10.1214/08-AOS678 \bmrnumber2541442 \endbibitem
- {barticle}[author] \bauthor\bsnmYang, \bfnmYun\binitsY., \bauthor\bsnmBhattacharya, \bfnmAnirban\binitsA. and \bauthor\bsnmPati, \bfnmDebdeep\binitsD. (\byear2017). \btitleFrequentist coverage and sup-norm convergence rate in Gaussian process regression. \bjournalarXiv e-prints \bpagesarXiv:1708.04753. \bdoi10.48550/arXiv.1708.04753 \endbibitem
- {barticle}[author] \bauthor\bsnmYang, \bfnmYun\binitsY. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2016). \btitleBayesian manifold regression. \bjournalThe Annals of Statistics \bvolume44 \bpages876 – 905. \bdoi10.1214/15-AOS1390 \endbibitem
- {barticle}[author] \bauthor\bsnmYang, \bfnmYun\binitsY. and \bauthor\bsnmTokdar, \bfnmSurya T.\binitsS. T. (\byear2015). \btitleMinimax-optimal nonparametric regression in high dimensions. \bjournalAnn. Statist. \bvolume43 \bpages652–674. \bdoi10.1214/14-AOS1289 \bmrnumber3319139 \endbibitem
- {barticle}[author] \bauthor\bsnmZhang, \bfnmTong\binitsT. (\byear2006). \btitleFrom ϵitalic-ϵ\epsilonitalic_ϵ-entropy to KL-entropy: analysis of minimum information complexity density estimation. \bjournalAnn. Statist. \bvolume34 \bpages2180–2210. \bdoi10.1214/009053606000000704 \bmrnumber2291497 \endbibitem