Deep Horseshoe Gaussian Processes
Abstract: Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We introduce the deep Horseshoe Gaussian process Deep-HGP, a new simple prior based on deep Gaussian processes with a squared-exponential kernel, that in particular enables data-driven choices of the key lengthscale parameters. For nonparametric regression with random design, we show that the associated posterior distribution recovers the unknown true regression curve optimally in terms of quadratic loss, up to a logarithmic factor, in an adaptive way. The convergence rates are simultaneously adaptive to both the smoothness of the regression function and to its structure in terms of compositions. The dependence of the rates in terms of dimension are explicit, allowing in particular for input spaces of dimension increasing with the number of observations.
- {bmisc}[author] \bauthor\bsnmAbraham,Ā \bfnmKweku\binitsK. and \bauthor\bsnmDeo,Ā \bfnmNeil\binitsN. (\byear2023). \btitleDeep Gaussian Process Priors for Bayesian Inference in Nonlinear Inverse Problems. \bnoteArxiv preprint 2312.14294. \endbibitem
- {barticle}[author] \bauthor\bsnmAgapiou, \bfnmSergios\binitsS. and \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI. (\byear2023). \btitleHeavy-tailed Bayesian nonparametric adaptation. \bnotearXiv e-print 2308.04916. \endbibitem
- {barticle}[author] \bauthor\bsnmBachoc, \bfnmFrançois\binitsF. and \bauthor\bsnmLagnoux, \bfnmAgnès\binitsA. (\byear2021). \btitlePosterior contraction rates for constrained deep Gaussian processes in density estimation and classification. \bnoteArxiv preprint 2112.07280. \bdoi10.48550/ARXIV.2112.07280 \endbibitem
- {binproceedings}[author] \bauthor\bsnmBai,Ā \bfnmJincheng\binitsJ., \bauthor\bsnmSong,Ā \bfnmQifan\binitsQ. and \bauthor\bsnmCheng,Ā \bfnmGuang\binitsG. (\byear2020). \btitleEfficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee. In \bbooktitleAdvances in Neural Information Processing Systems \bvolume33 \bpages466ā476. \endbibitem
- {barticle}[author] \bauthor\bsnmBhattacharya,Ā \bfnmAnirban\binitsA., \bauthor\bsnmPati,Ā \bfnmDebdeep\binitsD. and \bauthor\bsnmDunson,Ā \bfnmDavid\binitsD. (\byear2014). \btitleAnisotropic function estimation using multi-bandwidth Gaussian processes. \bjournalThe Annals of Statistics \bvolume42 \bpages352ā381. \endbibitem
- {barticle}[author] \bauthor\bsnmBhattacharya,Ā \bfnmAnirban\binitsA., \bauthor\bsnmPati,Ā \bfnmDebdeep\binitsD. and \bauthor\bsnmYang,Ā \bfnmYun\binitsY. (\byear2019). \btitleBayesian fractional posteriors. \bjournalThe Annals of Statistics \bvolume47 \bpages39 ā 66. \bdoi10.1214/18-AOS1712 \endbibitem
- {barticle}[author] \bauthor\bsnmCarvalho,Ā \bfnmCarlosĀ M.\binitsC.Ā M., \bauthor\bsnmPolson,Ā \bfnmNicholasĀ G.\binitsN.Ā G. and \bauthor\bsnmScott,Ā \bfnmJamesĀ G.\binitsJ.Ā G. (\byear2010). \btitleThe horseshoe estimator for sparse signals. \bjournalBiometrika \bvolume97 \bpages465-480. \bdoi10.1093/biomet/asq017 \endbibitem
- {barticle}[author] \bauthor\bsnmCastillo,Ā \bfnmIsmaĆ«l\binitsI. (\byear2008). \btitleLower bounds for posterior rates with Gaussian process priors. \bjournalElectron. J. Stat. \bvolume2 \bpages1281ā1299. \bdoi10.1214/08-EJS273 \bmrnumber2471287 \endbibitem
- {barticle}[author] \bauthor\bsnmCastillo,Ā \bfnmIsmaĆ«l\binitsI. (\byear2012). \btitleA semiparametric Bernstein-von Mises theorem for Gaussian process priors. \bjournalProbability Theory and Related Fields \bvolume152 \bpages53ā99. \bmrnumber2875753 \endbibitem
- {barticle}[author] \bauthor\bsnmCastillo,Ā \bfnmIsmaĆ«l\binitsI., \bauthor\bsnmKerkyacharian,Ā \bfnmGĆ©rard\binitsG. and \bauthor\bsnmPicard,Ā \bfnmDominique\binitsD. (\byear2014). \btitleThomas Bayesā walk on manifolds. \bjournalProbab. Theory Related Fields \bvolume158 \bpages665ā710. \bdoi10.1007/s00440-013-0493-0 \bmrnumber3176362 \endbibitem
- {barticle}[author] \bauthor\bsnmCastillo,Ā \bfnmI.\binitsI. and \bauthor\bsnmRandrianarisoa,Ā \bfnmT.\binitsT. (\byear2024). \btitleSupplementary material to āDeep Horseshoe Gaussian Processesā. \endbibitem
- {barticle}[author] \bauthor\bsnmCastillo,Ā \bfnmIsmaĆ«l\binitsI. and \bauthor\bsnmRousseau,Ā \bfnmJudith\binitsJ. (\byear2015). \btitleA Bernsteināvon Mises theorem for smooth functionals in semiparametric models. \bjournalAnn. Statist. \bvolume43 \bpages2353ā2383. \bdoi10.1214/15-AOS1336 \bmrnumber3405597 \endbibitem
- {barticle}[author] \bauthor\bsnmChang,Ā \bfnmAlan\binitsA. (\byear2017). \btitleThe Whitney extension theorem in high dimensions. \bjournalRev. Mat. Iberoam. \bvolume33 \bpages623ā632. \bdoi10.4171/RMI/952 \bmrnumber3651018 \endbibitem
- {binproceedings}[author] \bauthor\bsnmChĆ©rief-Abdellatif,Ā \bfnmBadr-Eddine\binitsB. (\byear2020). \btitleConvergence Rates of Variational Inference in Sparse Deep Learning. In \bbooktitleProceedings of the 37th International Conference on Machine Learning, ICML 2020. \bseriesProceedings of Machine Learning Research \bvolume119 \bpages1831ā1842. \endbibitem
- {binproceedings}[author] \bauthor\bsnmDamianou,Ā \bfnmAndreas\binitsA. and \bauthor\bsnmLawrence,Ā \bfnmNeilĀ D.\binitsN.Ā D. (\byear2013). \btitleDeep Gaussian Processes. In \bbooktitleProceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics. \bseriesProceedings of Machine Learning Research \bvolume31 \bpages207ā215. \endbibitem
- {barticle}[author] \bauthor\bsnmFinocchio,Ā \bfnmGianluca\binitsG. and \bauthor\bsnmSchmidt-Hieber,Ā \bfnmJohannes\binitsJ. (\byear2023). \btitlePosterior contraction for deep Gaussian process priors. \bjournalJournal of Machine Learning Research \bvolume24 \bpages1ā49. \endbibitem
- {barticle}[author] \bauthor\bsnmGhosal,Ā \bfnmSubhashis\binitsS., \bauthor\bsnmGhosh,Ā \bfnmJayantaĀ K.\binitsJ.Ā K. and \bauthor\bparticlevanĀ der \bsnmVaart,Ā \bfnmAadĀ W.\binitsA.Ā W. (\byear2000). \btitleConvergence rates of posterior distributions. \bjournalAnn. Statist. \bvolume28 \bpages500ā531. \bdoi10.1214/aos/1016218228 \bmrnumber1790007 \endbibitem
- {bbook}[author] \bauthor\bsnmGhosal,Ā \bfnmSubhashis\binitsS. and \bauthor\bparticlevanĀ der \bsnmVaart,Ā \bfnmAad\binitsA. (\byear2017). \btitleFundamentals of nonparametric Bayesian inference. \bseriesCambridge Series in Statistical and Probabilistic Mathematics \bvolume44. \bpublisherCambridge University Press, Cambridge. \bdoi10.1017/9781139029834 \bmrnumber3587782 \endbibitem
- {binproceedings}[author] \bauthor\bsnmGiordano,Ā \bfnmMatteo\binitsM., \bauthor\bsnmRay,Ā \bfnmKolyan\binitsK. and \bauthor\bsnmSchmidt-Hieber,Ā \bfnmJohannes\binitsJ. (\byear2022). \btitleOn the inability of Gaussian process regression to optimally learn compositional functions. In \bbooktitleAdvances in Neural Information Processing Systems \bvolume35 \bpages22341ā22353. \endbibitem
- {barticle}[author] \bauthor\bsnmHazan,Ā \bfnmTamir\binitsT. and \bauthor\bsnmJaakkola,Ā \bfnmTommi\binitsT. (\byear2015). \btitleSteps toward deep kernel methods from infinite neural networks. \bjournalarXiv preprint arXiv:1508.05133. \endbibitem
- {barticle}[author] \bauthor\bsnmJiang,Ā \bfnmSheng\binitsS. and \bauthor\bsnmTokdar,Ā \bfnmSuryaĀ T.\binitsS.Ā T. (\byear2021). \btitleVariable selection consistency of Gaussian process regression. \bjournalAnn. Statist. \bvolume49 \bpages2491ā2505. \bdoi10.1214/20-aos2043 \bmrnumber4338372 \endbibitem
- {barticle}[author] \bauthor\bsnmKohler,Ā \bfnmMichael\binitsM. and \bauthor\bsnmLanger,Ā \bfnmSophie\binitsS. (\byear2021). \btitleOn the rate of convergence of fully connected deep neural network regression estimates. \bjournalAnn. Statist. \bvolume49 \bpages2231ā2249. \bdoi10.1214/20-aos2034 \bmrnumber4319248 \endbibitem
- {barticle}[author] \bauthor\bsnmKuelbs,Ā \bfnmJames\binitsJ. and \bauthor\bsnmLi,Ā \bfnmWenboĀ V.\binitsW.Ā V. (\byear1993). \btitleMetric entropy and the small ball problem for Gaussian measures. \bjournalJ. Funct. Anal. \bvolume116 \bpages133ā157. \bdoi10.1006/jfan.1993.1107 \bmrnumber1237989 \endbibitem
- {barticle}[author] \bauthor\bsnmLi,Ā \bfnmWenboĀ V.\binitsW.Ā V. and \bauthor\bsnmLinde,Ā \bfnmWerner\binitsW. (\byear1999). \btitleApproximation, metric entropy and small ball estimates for Gaussian measures. \bjournalAnn. Probab. \bvolume27 \bpages1556ā1578. \bdoi10.1214/aop/1022677459 \bmrnumber1733160 \endbibitem
- {bbook}[author] \bauthor\bsnmNeal,Ā \bfnmRadfordĀ M\binitsR.Ā M. (\byear2012). \btitleBayesian learning for neural networks \bvolume118. \bpublisherSpringer Science & Business Media. \endbibitem
- {bmisc}[author] \bauthor\bsnmOhn,Ā \bfnmIlsang\binitsI. and \bauthor\bsnmLin,Ā \bfnmLizhen\binitsL. (\byear2022). \btitleAdaptive variational Bayes: Optimality, computation and applications. \bnoteeprint Arxiv 2109.03204. \endbibitem
- {barticle}[author] \bauthor\bsnmPati,Ā \bfnmDebdeep\binitsD., \bauthor\bsnmBhattacharya,Ā \bfnmAnirban\binitsA. and \bauthor\bsnmCheng,Ā \bfnmGuang\binitsG. (\byear2015). \btitleOptimal Bayesian estimation in random covariate design with a rescaled Gaussian process prior. \bjournalJ. Mach. Learn. Res. \bvolume16 \bpages2837ā2851. \bmrnumber3450525 \endbibitem
- {bbook}[author] \bauthor\bsnmPisier,Ā \bfnmGilles\binitsG. (\byear1989). \btitleThe volume of convex bodies and Banach space geometry. \bseriesCambridge Tracts in Mathematics \bvolume94. \bpublisherCambridge University Press, Cambridge. \bdoi10.1017/CBO9780511662454 \bmrnumber1036275 \endbibitem
- {binproceedings}[author] \bauthor\bsnmRockovĆ”,Ā \bfnmVeronika\binitsV. and \bauthor\bsnmPolson,Ā \bfnmNicholas\binitsN. (\byear2018). \btitlePosterior Concentration for Sparse Deep Learning. In \bbooktitleAnnual Conference on Neural Information Processing Systems 2018, NeurIPS 2018 \bpages938ā949. \endbibitem
- {binproceedings}[author] \bauthor\bsnmSalimbeni,Ā \bfnmHugh\binitsH. and \bauthor\bsnmDeisenroth,Ā \bfnmMarc\binitsM. (\byear2017). \btitleDoubly Stochastic Variational Inference for Deep Gaussian Processes. In \bbooktitleAdvances in Neural Information Processing Systems \bvolume30. \endbibitem
- {bmanual}[author] \bauthor\bsnmSauer,Ā \bfnmAnnie\binitsA. (\byear2022). \btitledeepgp: Deep Gaussian Processes using MCMC \bnoteR package version 1.1.1. \endbibitem
- {barticle}[author] \bauthor\bsnmSauer,Ā \bfnmAnnie\binitsA., \bauthor\bsnmCooper,Ā \bfnmAndrew\binitsA. and \bauthor\bsnmGramacy,Ā \bfnmRobertĀ B.\binitsR.Ā B. (\byear2023). \btitleVecchia-Approximated Deep Gaussian Processes for Computer Experiments. \bjournalJournal of Computational and Graphical Statistics \bvolume32 \bpages824-837. \bdoi10.1080/10618600.2022.2129662 \endbibitem
- {barticle}[author] \bauthor\bsnmSchmidt-Hieber,Ā \bfnmJohannes\binitsJ. (\byear2020). \btitleNonparametric regression using deep neural networks with ReLU activation function. \bjournalAnn. Statist. \bvolume48 \bpages1875ā1897. \bdoi10.1214/19-AOS1875 \bmrnumber4134774 \endbibitem
- {barticle}[author] \bauthor\bsnmSzabó,Ā \bfnmBotond\binitsB., \bauthor\bparticlevanĀ der \bsnmVaart,Ā \bfnmAadĀ W.\binitsA.Ā W. and \bauthor\bparticlevan \bsnmZanten,Ā \bfnmHarry\binitsH. (\byear2015). \btitleFrequentist coverage of adaptive nonparametric Bayesian credible sets. \bjournalAnn. Statist. \bvolume43 \bpages1391ā1428. \bnote(with discussion). \endbibitem
- {barticle}[author] \bauthor\bsnmTeckentrup,Ā \bfnmArethaĀ L.\binitsA.Ā L. (\byear2020). \btitleConvergence of Gaussian process regression with estimated hyper-parameters and applications in Bayesian inverse problems. \bjournalSIAM/ASA J. Uncertain. Quantif. \bvolume8 \bpages1310ā1337. \bdoi10.1137/19M1284816 \bmrnumber4164077 \endbibitem
- {barticle}[author] \bauthor\bsnmTomczak-Jaegermann,Ā \bfnmNicole\binitsN. (\byear1987). \btitleDualitĆ© des nombres dāentropie pour des opĆ©rateurs Ć valeurs dans un espace de Hilbert. \bjournalC. R. Acad. Sci. Paris SĆ©r. I Math. \bvolume305 \bpages299ā301. \bmrnumber910364 \endbibitem
- {barticle}[author] \bauthor\bparticlevanĀ der \bsnmPas,Ā \bfnmStĆ©phanie\binitsS., \bauthor\bsnmSzabó,Ā \bfnmBotond\binitsB. and \bauthor\bparticlevanĀ der \bsnmVaart,Ā \bfnmAad\binitsA. (\byear2017). \btitleUncertainty quantification for the horseshoe (with discussion). \bjournalBayesian Anal. \bvolume12 \bpages1221ā1274. \bnoteWith a rejoinder by the authors. \bdoi10.1214/17-BA1065 \bmrnumber3724985 \endbibitem
- {barticle}[author] \bauthor\bparticlevanĀ der \bsnmPas,Ā \bfnmS.Ā L.\binitsS.Ā L., \bauthor\bsnmKleijn,Ā \bfnmB.Ā J.Ā K.\binitsB.Ā J.Ā K. and \bauthor\bparticlevanĀ der \bsnmVaart,Ā \bfnmA.Ā W.\binitsA.Ā W. (\byear2014). \btitleThe horseshoe estimator: Posterior concentration around nearly black vectors. \bjournalElectronic Journal of Statistics \bvolume8 \bpages2585 ā 2618. \bdoi10.1214/14-EJS962 \endbibitem
- {barticle}[author] \bauthor\bparticlevanĀ der \bsnmPas,Ā \bfnmS.Ā L.\binitsS.Ā L., \bauthor\bsnmKleijn,Ā \bfnmB.Ā J.Ā K.\binitsB.Ā J.Ā K. and \bauthor\bparticlevanĀ der \bsnmVaart,Ā \bfnmA.Ā W.\binitsA.Ā W. (\byear2014). \btitleThe horseshoe estimator: posterior concentration around nearly black vectors. \bjournalElectron. J. Stat. \bvolume8 \bpages2585ā2618. \bdoi10.1214/14-EJS962 \bmrnumber3285877 \endbibitem
- {barticle}[author] \bauthor\bparticlevanĀ der \bsnmVaart,Ā \bfnmAad\binitsA. and \bauthor\bparticlevan \bsnmZanten,Ā \bfnmHarry\binitsH. (\byear2007). \btitleBayesian inference with rescaled Gaussian process priors. \bjournalElectron. J. Stat. \bvolume1 \bpages433ā448. \bdoi10.1214/07-EJS098 \bmrnumber2357712 \endbibitem
- {barticle}[author] \bauthor\bparticlevanĀ der \bsnmVaart,Ā \bfnmAad\binitsA. and \bauthor\bparticlevan \bsnmZanten,Ā \bfnmHarry\binitsH. (\byear2011). \btitleInformation rates of nonparametric Gaussian process methods. \bjournalJ. Mach. Learn. Res. \bvolume12 \bpages2095ā2119. \bmrnumber2819028 \endbibitem
- {barticle}[author] \bauthor\bparticlevanĀ der \bsnmVaart,Ā \bfnmA.Ā W.\binitsA.Ā W. and \bauthor\bparticlevan \bsnmZanten,Ā \bfnmJ.Ā H.\binitsJ.Ā H. (\byear2008). \btitleRates of contraction of posterior distributions based on Gaussian process priors. \bjournalAnn. Statist. \bvolume36 \bpages1435ā1463. \bdoi10.1214/009053607000000613 \bmrnumber2418663 \endbibitem
- {barticle}[author] \bauthor\bparticlevanĀ der \bsnmVaart,Ā \bfnmA.Ā W.\binitsA.Ā W. and \bauthor\bparticlevan \bsnmZanten,Ā \bfnmJ.Ā H.\binitsJ.Ā H. (\byear2009). \btitleAdaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth. \bjournalAnn. Statist. \bvolume37 \bpages2655ā2675. \bdoi10.1214/08-AOS678 \bmrnumber2541442 \endbibitem
- {barticle}[author] \bauthor\bsnmYang,Ā \bfnmYun\binitsY., \bauthor\bsnmBhattacharya,Ā \bfnmAnirban\binitsA. and \bauthor\bsnmPati,Ā \bfnmDebdeep\binitsD. (\byear2017). \btitleFrequentist coverage and sup-norm convergence rate in Gaussian process regression. \bjournalarXiv e-prints \bpagesarXiv:1708.04753. \bdoi10.48550/arXiv.1708.04753 \endbibitem
- {barticle}[author] \bauthor\bsnmYang,Ā \bfnmYun\binitsY. and \bauthor\bsnmDunson,Ā \bfnmDavidĀ B.\binitsD.Ā B. (\byear2016). \btitleBayesian manifold regression. \bjournalThe Annals of Statistics \bvolume44 \bpages876 ā 905. \bdoi10.1214/15-AOS1390 \endbibitem
- {barticle}[author] \bauthor\bsnmYang,Ā \bfnmYun\binitsY. and \bauthor\bsnmTokdar,Ā \bfnmSuryaĀ T.\binitsS.Ā T. (\byear2015). \btitleMinimax-optimal nonparametric regression in high dimensions. \bjournalAnn. Statist. \bvolume43 \bpages652ā674. \bdoi10.1214/14-AOS1289 \bmrnumber3319139 \endbibitem
- {barticle}[author] \bauthor\bsnmZhang,Ā \bfnmTong\binitsT. (\byear2006). \btitleFrom ϵitalic-ϵ\epsilonitalic_ϵ-entropy to KL-entropy: analysis of minimum information complexity density estimation. \bjournalAnn. Statist. \bvolume34 \bpages2180ā2210. \bdoi10.1214/009053606000000704 \bmrnumber2291497 \endbibitem
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.