Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Horseshoe Gaussian Processes (2403.01737v1)

Published 4 Mar 2024 in math.ST, stat.ML, and stat.TH

Abstract: Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We introduce the deep Horseshoe Gaussian process Deep-HGP, a new simple prior based on deep Gaussian processes with a squared-exponential kernel, that in particular enables data-driven choices of the key lengthscale parameters. For nonparametric regression with random design, we show that the associated tempered posterior distribution recovers the unknown true regression curve optimally in terms of quadratic loss, up to a logarithmic factor, in an adaptive way. The convergence rates are simultaneously adaptive to both the smoothness of the regression function and to its structure in terms of compositions. The dependence of the rates in terms of dimension are explicit, allowing in particular for input spaces of dimension increasing with the number of observations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. {bmisc}[author] \bauthor\bsnmAbraham, \bfnmKweku\binitsK. and \bauthor\bsnmDeo, \bfnmNeil\binitsN. (\byear2023). \btitleDeep Gaussian Process Priors for Bayesian Inference in Nonlinear Inverse Problems. \bnoteArxiv preprint 2312.14294. \endbibitem
  2. {barticle}[author] \bauthor\bsnmAgapiou, \bfnmSergios\binitsS. and \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI. (\byear2023). \btitleHeavy-tailed Bayesian nonparametric adaptation. \bnotearXiv e-print 2308.04916. \endbibitem
  3. {barticle}[author] \bauthor\bsnmBachoc, \bfnmFrançois\binitsF. and \bauthor\bsnmLagnoux, \bfnmAgnès\binitsA. (\byear2021). \btitlePosterior contraction rates for constrained deep Gaussian processes in density estimation and classification. \bnoteArxiv preprint 2112.07280. \bdoi10.48550/ARXIV.2112.07280 \endbibitem
  4. {binproceedings}[author] \bauthor\bsnmBai, \bfnmJincheng\binitsJ., \bauthor\bsnmSong, \bfnmQifan\binitsQ. and \bauthor\bsnmCheng, \bfnmGuang\binitsG. (\byear2020). \btitleEfficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee. In \bbooktitleAdvances in Neural Information Processing Systems \bvolume33 \bpages466–476. \endbibitem
  5. {barticle}[author] \bauthor\bsnmBhattacharya, \bfnmAnirban\binitsA., \bauthor\bsnmPati, \bfnmDebdeep\binitsD. and \bauthor\bsnmDunson, \bfnmDavid\binitsD. (\byear2014). \btitleAnisotropic function estimation using multi-bandwidth Gaussian processes. \bjournalThe Annals of Statistics \bvolume42 \bpages352–381. \endbibitem
  6. {barticle}[author] \bauthor\bsnmBhattacharya, \bfnmAnirban\binitsA., \bauthor\bsnmPati, \bfnmDebdeep\binitsD. and \bauthor\bsnmYang, \bfnmYun\binitsY. (\byear2019). \btitleBayesian fractional posteriors. \bjournalThe Annals of Statistics \bvolume47 \bpages39 – 66. \bdoi10.1214/18-AOS1712 \endbibitem
  7. {barticle}[author] \bauthor\bsnmCarvalho, \bfnmCarlos M.\binitsC. M., \bauthor\bsnmPolson, \bfnmNicholas G.\binitsN. G. and \bauthor\bsnmScott, \bfnmJames G.\binitsJ. G. (\byear2010). \btitleThe horseshoe estimator for sparse signals. \bjournalBiometrika \bvolume97 \bpages465-480. \bdoi10.1093/biomet/asq017 \endbibitem
  8. {barticle}[author] \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI. (\byear2008). \btitleLower bounds for posterior rates with Gaussian process priors. \bjournalElectron. J. Stat. \bvolume2 \bpages1281–1299. \bdoi10.1214/08-EJS273 \bmrnumber2471287 \endbibitem
  9. {barticle}[author] \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI. (\byear2012). \btitleA semiparametric Bernstein-von Mises theorem for Gaussian process priors. \bjournalProbability Theory and Related Fields \bvolume152 \bpages53–99. \bmrnumber2875753 \endbibitem
  10. {barticle}[author] \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI., \bauthor\bsnmKerkyacharian, \bfnmGérard\binitsG. and \bauthor\bsnmPicard, \bfnmDominique\binitsD. (\byear2014). \btitleThomas Bayes’ walk on manifolds. \bjournalProbab. Theory Related Fields \bvolume158 \bpages665–710. \bdoi10.1007/s00440-013-0493-0 \bmrnumber3176362 \endbibitem
  11. {barticle}[author] \bauthor\bsnmCastillo, \bfnmI.\binitsI. and \bauthor\bsnmRandrianarisoa, \bfnmT.\binitsT. (\byear2024). \btitleSupplementary material to ’Deep Horseshoe Gaussian Processes’. \endbibitem
  12. {barticle}[author] \bauthor\bsnmCastillo, \bfnmIsmaël\binitsI. and \bauthor\bsnmRousseau, \bfnmJudith\binitsJ. (\byear2015). \btitleA Bernstein–von Mises theorem for smooth functionals in semiparametric models. \bjournalAnn. Statist. \bvolume43 \bpages2353–2383. \bdoi10.1214/15-AOS1336 \bmrnumber3405597 \endbibitem
  13. {barticle}[author] \bauthor\bsnmChang, \bfnmAlan\binitsA. (\byear2017). \btitleThe Whitney extension theorem in high dimensions. \bjournalRev. Mat. Iberoam. \bvolume33 \bpages623–632. \bdoi10.4171/RMI/952 \bmrnumber3651018 \endbibitem
  14. {binproceedings}[author] \bauthor\bsnmChérief-Abdellatif, \bfnmBadr-Eddine\binitsB. (\byear2020). \btitleConvergence Rates of Variational Inference in Sparse Deep Learning. In \bbooktitleProceedings of the 37th International Conference on Machine Learning, ICML 2020. \bseriesProceedings of Machine Learning Research \bvolume119 \bpages1831–1842. \endbibitem
  15. {binproceedings}[author] \bauthor\bsnmDamianou, \bfnmAndreas\binitsA. and \bauthor\bsnmLawrence, \bfnmNeil D.\binitsN. D. (\byear2013). \btitleDeep Gaussian Processes. In \bbooktitleProceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics. \bseriesProceedings of Machine Learning Research \bvolume31 \bpages207–215. \endbibitem
  16. {barticle}[author] \bauthor\bsnmFinocchio, \bfnmGianluca\binitsG. and \bauthor\bsnmSchmidt-Hieber, \bfnmJohannes\binitsJ. (\byear2023). \btitlePosterior contraction for deep Gaussian process priors. \bjournalJournal of Machine Learning Research \bvolume24 \bpages1–49. \endbibitem
  17. {barticle}[author] \bauthor\bsnmGhosal, \bfnmSubhashis\binitsS., \bauthor\bsnmGhosh, \bfnmJayanta K.\binitsJ. K. and \bauthor\bparticlevan der \bsnmVaart, \bfnmAad W.\binitsA. W. (\byear2000). \btitleConvergence rates of posterior distributions. \bjournalAnn. Statist. \bvolume28 \bpages500–531. \bdoi10.1214/aos/1016218228 \bmrnumber1790007 \endbibitem
  18. {bbook}[author] \bauthor\bsnmGhosal, \bfnmSubhashis\binitsS. and \bauthor\bparticlevan der \bsnmVaart, \bfnmAad\binitsA. (\byear2017). \btitleFundamentals of nonparametric Bayesian inference. \bseriesCambridge Series in Statistical and Probabilistic Mathematics \bvolume44. \bpublisherCambridge University Press, Cambridge. \bdoi10.1017/9781139029834 \bmrnumber3587782 \endbibitem
  19. {binproceedings}[author] \bauthor\bsnmGiordano, \bfnmMatteo\binitsM., \bauthor\bsnmRay, \bfnmKolyan\binitsK. and \bauthor\bsnmSchmidt-Hieber, \bfnmJohannes\binitsJ. (\byear2022). \btitleOn the inability of Gaussian process regression to optimally learn compositional functions. In \bbooktitleAdvances in Neural Information Processing Systems \bvolume35 \bpages22341–22353. \endbibitem
  20. {barticle}[author] \bauthor\bsnmHazan, \bfnmTamir\binitsT. and \bauthor\bsnmJaakkola, \bfnmTommi\binitsT. (\byear2015). \btitleSteps toward deep kernel methods from infinite neural networks. \bjournalarXiv preprint arXiv:1508.05133. \endbibitem
  21. {barticle}[author] \bauthor\bsnmJiang, \bfnmSheng\binitsS. and \bauthor\bsnmTokdar, \bfnmSurya T.\binitsS. T. (\byear2021). \btitleVariable selection consistency of Gaussian process regression. \bjournalAnn. Statist. \bvolume49 \bpages2491–2505. \bdoi10.1214/20-aos2043 \bmrnumber4338372 \endbibitem
  22. {barticle}[author] \bauthor\bsnmKohler, \bfnmMichael\binitsM. and \bauthor\bsnmLanger, \bfnmSophie\binitsS. (\byear2021). \btitleOn the rate of convergence of fully connected deep neural network regression estimates. \bjournalAnn. Statist. \bvolume49 \bpages2231–2249. \bdoi10.1214/20-aos2034 \bmrnumber4319248 \endbibitem
  23. {barticle}[author] \bauthor\bsnmKuelbs, \bfnmJames\binitsJ. and \bauthor\bsnmLi, \bfnmWenbo V.\binitsW. V. (\byear1993). \btitleMetric entropy and the small ball problem for Gaussian measures. \bjournalJ. Funct. Anal. \bvolume116 \bpages133–157. \bdoi10.1006/jfan.1993.1107 \bmrnumber1237989 \endbibitem
  24. {barticle}[author] \bauthor\bsnmLi, \bfnmWenbo V.\binitsW. V. and \bauthor\bsnmLinde, \bfnmWerner\binitsW. (\byear1999). \btitleApproximation, metric entropy and small ball estimates for Gaussian measures. \bjournalAnn. Probab. \bvolume27 \bpages1556–1578. \bdoi10.1214/aop/1022677459 \bmrnumber1733160 \endbibitem
  25. {bbook}[author] \bauthor\bsnmNeal, \bfnmRadford M\binitsR. M. (\byear2012). \btitleBayesian learning for neural networks \bvolume118. \bpublisherSpringer Science & Business Media. \endbibitem
  26. {bmisc}[author] \bauthor\bsnmOhn, \bfnmIlsang\binitsI. and \bauthor\bsnmLin, \bfnmLizhen\binitsL. (\byear2022). \btitleAdaptive variational Bayes: Optimality, computation and applications. \bnoteeprint Arxiv 2109.03204. \endbibitem
  27. {barticle}[author] \bauthor\bsnmPati, \bfnmDebdeep\binitsD., \bauthor\bsnmBhattacharya, \bfnmAnirban\binitsA. and \bauthor\bsnmCheng, \bfnmGuang\binitsG. (\byear2015). \btitleOptimal Bayesian estimation in random covariate design with a rescaled Gaussian process prior. \bjournalJ. Mach. Learn. Res. \bvolume16 \bpages2837–2851. \bmrnumber3450525 \endbibitem
  28. {bbook}[author] \bauthor\bsnmPisier, \bfnmGilles\binitsG. (\byear1989). \btitleThe volume of convex bodies and Banach space geometry. \bseriesCambridge Tracts in Mathematics \bvolume94. \bpublisherCambridge University Press, Cambridge. \bdoi10.1017/CBO9780511662454 \bmrnumber1036275 \endbibitem
  29. {binproceedings}[author] \bauthor\bsnmRocková, \bfnmVeronika\binitsV. and \bauthor\bsnmPolson, \bfnmNicholas\binitsN. (\byear2018). \btitlePosterior Concentration for Sparse Deep Learning. In \bbooktitleAnnual Conference on Neural Information Processing Systems 2018, NeurIPS 2018 \bpages938–949. \endbibitem
  30. {binproceedings}[author] \bauthor\bsnmSalimbeni, \bfnmHugh\binitsH. and \bauthor\bsnmDeisenroth, \bfnmMarc\binitsM. (\byear2017). \btitleDoubly Stochastic Variational Inference for Deep Gaussian Processes. In \bbooktitleAdvances in Neural Information Processing Systems \bvolume30. \endbibitem
  31. {bmanual}[author] \bauthor\bsnmSauer, \bfnmAnnie\binitsA. (\byear2022). \btitledeepgp: Deep Gaussian Processes using MCMC \bnoteR package version 1.1.1. \endbibitem
  32. {barticle}[author] \bauthor\bsnmSauer, \bfnmAnnie\binitsA., \bauthor\bsnmCooper, \bfnmAndrew\binitsA. and \bauthor\bsnmGramacy, \bfnmRobert B.\binitsR. B. (\byear2023). \btitleVecchia-Approximated Deep Gaussian Processes for Computer Experiments. \bjournalJournal of Computational and Graphical Statistics \bvolume32 \bpages824-837. \bdoi10.1080/10618600.2022.2129662 \endbibitem
  33. {barticle}[author] \bauthor\bsnmSchmidt-Hieber, \bfnmJohannes\binitsJ. (\byear2020). \btitleNonparametric regression using deep neural networks with ReLU activation function. \bjournalAnn. Statist. \bvolume48 \bpages1875–1897. \bdoi10.1214/19-AOS1875 \bmrnumber4134774 \endbibitem
  34. {barticle}[author] \bauthor\bsnmSzabó, \bfnmBotond\binitsB., \bauthor\bparticlevan der \bsnmVaart, \bfnmAad W.\binitsA. W. and \bauthor\bparticlevan \bsnmZanten, \bfnmHarry\binitsH. (\byear2015). \btitleFrequentist coverage of adaptive nonparametric Bayesian credible sets. \bjournalAnn. Statist. \bvolume43 \bpages1391–1428. \bnote(with discussion). \endbibitem
  35. {barticle}[author] \bauthor\bsnmTeckentrup, \bfnmAretha L.\binitsA. L. (\byear2020). \btitleConvergence of Gaussian process regression with estimated hyper-parameters and applications in Bayesian inverse problems. \bjournalSIAM/ASA J. Uncertain. Quantif. \bvolume8 \bpages1310–1337. \bdoi10.1137/19M1284816 \bmrnumber4164077 \endbibitem
  36. {barticle}[author] \bauthor\bsnmTomczak-Jaegermann, \bfnmNicole\binitsN. (\byear1987). \btitleDualité des nombres d’entropie pour des opérateurs à valeurs dans un espace de Hilbert. \bjournalC. R. Acad. Sci. Paris Sér. I Math. \bvolume305 \bpages299–301. \bmrnumber910364 \endbibitem
  37. {barticle}[author] \bauthor\bparticlevan der \bsnmPas, \bfnmStéphanie\binitsS., \bauthor\bsnmSzabó, \bfnmBotond\binitsB. and \bauthor\bparticlevan der \bsnmVaart, \bfnmAad\binitsA. (\byear2017). \btitleUncertainty quantification for the horseshoe (with discussion). \bjournalBayesian Anal. \bvolume12 \bpages1221–1274. \bnoteWith a rejoinder by the authors. \bdoi10.1214/17-BA1065 \bmrnumber3724985 \endbibitem
  38. {barticle}[author] \bauthor\bparticlevan der \bsnmPas, \bfnmS. L.\binitsS. L., \bauthor\bsnmKleijn, \bfnmB. J. K.\binitsB. J. K. and \bauthor\bparticlevan der \bsnmVaart, \bfnmA. W.\binitsA. W. (\byear2014). \btitleThe horseshoe estimator: Posterior concentration around nearly black vectors. \bjournalElectronic Journal of Statistics \bvolume8 \bpages2585 – 2618. \bdoi10.1214/14-EJS962 \endbibitem
  39. {barticle}[author] \bauthor\bparticlevan der \bsnmPas, \bfnmS. L.\binitsS. L., \bauthor\bsnmKleijn, \bfnmB. J. K.\binitsB. J. K. and \bauthor\bparticlevan der \bsnmVaart, \bfnmA. W.\binitsA. W. (\byear2014). \btitleThe horseshoe estimator: posterior concentration around nearly black vectors. \bjournalElectron. J. Stat. \bvolume8 \bpages2585–2618. \bdoi10.1214/14-EJS962 \bmrnumber3285877 \endbibitem
  40. {barticle}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmAad\binitsA. and \bauthor\bparticlevan \bsnmZanten, \bfnmHarry\binitsH. (\byear2007). \btitleBayesian inference with rescaled Gaussian process priors. \bjournalElectron. J. Stat. \bvolume1 \bpages433–448. \bdoi10.1214/07-EJS098 \bmrnumber2357712 \endbibitem
  41. {barticle}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmAad\binitsA. and \bauthor\bparticlevan \bsnmZanten, \bfnmHarry\binitsH. (\byear2011). \btitleInformation rates of nonparametric Gaussian process methods. \bjournalJ. Mach. Learn. Res. \bvolume12 \bpages2095–2119. \bmrnumber2819028 \endbibitem
  42. {barticle}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmA. W.\binitsA. W. and \bauthor\bparticlevan \bsnmZanten, \bfnmJ. H.\binitsJ. H. (\byear2008). \btitleRates of contraction of posterior distributions based on Gaussian process priors. \bjournalAnn. Statist. \bvolume36 \bpages1435–1463. \bdoi10.1214/009053607000000613 \bmrnumber2418663 \endbibitem
  43. {barticle}[author] \bauthor\bparticlevan der \bsnmVaart, \bfnmA. W.\binitsA. W. and \bauthor\bparticlevan \bsnmZanten, \bfnmJ. H.\binitsJ. H. (\byear2009). \btitleAdaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth. \bjournalAnn. Statist. \bvolume37 \bpages2655–2675. \bdoi10.1214/08-AOS678 \bmrnumber2541442 \endbibitem
  44. {barticle}[author] \bauthor\bsnmYang, \bfnmYun\binitsY., \bauthor\bsnmBhattacharya, \bfnmAnirban\binitsA. and \bauthor\bsnmPati, \bfnmDebdeep\binitsD. (\byear2017). \btitleFrequentist coverage and sup-norm convergence rate in Gaussian process regression. \bjournalarXiv e-prints \bpagesarXiv:1708.04753. \bdoi10.48550/arXiv.1708.04753 \endbibitem
  45. {barticle}[author] \bauthor\bsnmYang, \bfnmYun\binitsY. and \bauthor\bsnmDunson, \bfnmDavid B.\binitsD. B. (\byear2016). \btitleBayesian manifold regression. \bjournalThe Annals of Statistics \bvolume44 \bpages876 – 905. \bdoi10.1214/15-AOS1390 \endbibitem
  46. {barticle}[author] \bauthor\bsnmYang, \bfnmYun\binitsY. and \bauthor\bsnmTokdar, \bfnmSurya T.\binitsS. T. (\byear2015). \btitleMinimax-optimal nonparametric regression in high dimensions. \bjournalAnn. Statist. \bvolume43 \bpages652–674. \bdoi10.1214/14-AOS1289 \bmrnumber3319139 \endbibitem
  47. {barticle}[author] \bauthor\bsnmZhang, \bfnmTong\binitsT. (\byear2006). \btitleFrom ϵitalic-ϵ\epsilonitalic_ϵ-entropy to KL-entropy: analysis of minimum information complexity density estimation. \bjournalAnn. Statist. \bvolume34 \bpages2180–2210. \bdoi10.1214/009053606000000704 \bmrnumber2291497 \endbibitem
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets