Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nonparametric Automatic Differentiation Variational Inference with Spline Approximation (2403.06302v1)

Published 10 Mar 2024 in stat.ML and cs.LG

Abstract: Automatic Differentiation Variational Inference (ADVI) is efficient in learning probabilistic models. Classic ADVI relies on the parametric approach to approximate the posterior. In this paper, we develop a spline-based nonparametric approximation approach that enables flexible posterior approximation for distributions with complicated structures, such as skewness, multimodality, and bounded support. Compared with widely-used nonparametric variational inference methods, the proposed method is easy to implement and adaptive to various data structures. By adopting the spline approximation, we derive a lower bound of the importance weighted autoencoder and establish the asymptotic consistency. Experiments demonstrate the efficiency of the proposed method in approximating complex posterior distributions and improving the performance of generative models with incomplete data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Abid, A., Balin, M. F., and Zou, J. (2019), “Concrete autoencoders for differentiable feature selection and reconstruction,” arXiv preprint arXiv:1901.09346.
  2. Alemi, A. A., Fischer, I., Dillon, J. V., and Murphy, K. (2017), “Deep variational information bottleneck,” in International Conference on Learning Representations.
  3. Anderson, S. J. and Jones, R. H. (1995), “Smoothing splines for longitudinal data,” Statistics in Medicine, 14, 1235–1248.
  4. Balestriero, R. and Baraniuk, R. (2018), “A spline theory of deep Learning,” in Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 374–383.
  5. Bergeron, M., Fung, N., Hull, J., Poulos, Z., and Veneris, A. (2022), “Variational autoencoders: A hands-off approach to volatility,” The Journal of Financial Data Science, 4, 125–138.
  6. Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017), “Variational inference: A review for statisticians,” Journal of the American Statistical Association, 112, 859–877.
  7. Burda, Y., Grosse, R., and Salakhutdinov, R. (2015), “Importance weighted autoencoders,” arXiv preprint arXiv:1509.00519.
  8. Chen, Y., Yang, Y., Pan, X., Meng, X., and Hu, J. (2022), “Spatiotemporal fusion network for land surface temperature based on a conditional variational autoencoder,” IEEE Transactions on Geoscience and Remote Sensing, 60, 1–13.
  9. Clevert, D.-A., Unterthiner, T., and Hochreiter, S. (2016), “Fast and accurate deep network learning by exponential linear units (elus),” in International Conference on Learning Representations.
  10. Csiszár, I. and Talata, Z. (2006), “Context tree estimation for not necessarily finite memory processes, via BIC and MDL,” IEEE Transactions on Information theory, 52, 1007–1016.
  11. Durkan, C., Bekasov, A., Murray, I., and Papamakarios, G. (2019), “Neural spline flows,” in Advances in Neural Information Processing Systems, vol. 32.
  12. Fakhoury, D., Fakhoury, E., and Speleers, H. (2022), “ExSpliNet: An interpretable and expressive spline-based neural network,” Neural Networks, 152, 332–346.
  13. Gershman, S. J., Hoffman, M. D., and Blei, D. M. (2012), “Nonparametric variational inference,” in Proceedings of the 29th International Conference on Machine Learning, p. 235–242.
  14. Gu, C. and Qiu, C. (1993), “Smoothing spline density estimation: Theory,” The Annals of Statistics, 21, 217–234.
  15. Han, S., Liao, X., Dunson, D., and Carin, L. (2016), “Variational gaussian copula inference,” in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, eds. Gretton, A. and Robert, C. C., Cadiz, Spain, vol. 51 of Proceedings of Machine Learning Research, pp. 829–838.
  16. Hastie, T. J. (2017), “Generalized additive models,” in Statistical Models in S, pp. 249–307.
  17. Huix, T., Majewski, S., Durmus, A., Moulines, E., and Korba, A. (2022), “Variational inference of overparameterized bayesian neural networks: a theoretical and empirical study,” arXiv preprint arXiv:2207.03859.
  18. Kingma, D. P. and Welling, M. (2013), “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114.
  19. Kobyzev, I., Prince, S. J., and Brubaker, M. A. (2020), “Normalizing flows: An introduction and review of current methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 43, 3964–3979.
  20. Kopf, A., Fortuin, V., Somnath, V. R., and Claassen, M. (2021), “Mixture-of-experts variational autoencoder for clustering and generating from similarity-based representations on single cell data,” PLOS Computational Biology, 17, e1009086.
  21. Krizhevsky, A., Hinton, G., et al. (2009), “Learning multiple layers of features from tiny images,” Master’s thesis, Department of Computer Science, University of Toronto.
  22. Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., and Blei, D. M. (2017), “Automatic differentiation variational inference,” Journal of Machine Learning Research, 18, 1–45.
  23. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998), “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, 86, 2278–2324.
  24. Li, Y. and Turner, R. E. (2017), “Gradient estimators for implicit models,” arXiv preprint arXiv:1705.07107.
  25. Loaiza-Ganem, G. and Cunningham, J. P. (2019), “Deep random splines for point process intensity estimation,” .
  26. Locatello, F., Khanna, R., Ghosh, J., and Ratsch, G. (2018), “Boosting variational inference: An optimization perspective,” in Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, vol. 84, pp. 464–472.
  27. Ma, C., Li, Y., and Hernández-Lobato, J. M. (2019), “Variational implicit processes,” in Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 4222–4233.
  28. Maddison, C., Mnih, A., and Teh, Y. (2017), “The concrete distribution: A continuous relaxation of discrete random variables,” in International Conference on Learning Representations.
  29. Molchanov, D., Kharitonov, V., Sobolev, A., and Vetrov, D. (2019), “Doubly semi-implicit variational inference,” in Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, vol. 89, pp. 2593–2602.
  30. Morningstar, W., Vikram, S., Ham, C., Gallagher, A., and Dillon, J. (2021), “Automatic differentiation variational inference with mixtures,” in Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, vol. 130, pp. 3250–3258.
  31. Ortega, L. A., Santana, S. R., and Hernández-Lobato, D. (2022), “Deep variational implicit processes,” arXiv preprint arXiv:2206.06720.
  32. Pang, B., Han, T., Nijkamp, E., Zhu, S.-C., and Wu, Y. N. (2020), “Learning Latent Space Energy-Based Prior Model,” in Advances in Neural Information Processing Systems, vol. 33, pp. 21994–22008.
  33. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019), “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems, vol. 32.
  34. Rainforth, T., Kosiorek, A., Le, T. A., Maddison, C., Igl, M., Wood, F., and Teh, Y. W. (2018), “Tighter variational bounds are not necessarily better,” in Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 4277–4285.
  35. Rezende, D. and Mohamed, S. (2015), “Variational inference with normalizing flows,” in Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1530–1538.
  36. Rezende, D. J., Mohamed, S., and Wierstra, D. (2014), “Stochastic backpropagation and approximate inference in deep generative models,” in Proceedings of the 31st International Conference on Machine Learning, vol. 32, pp. 1278–1286.
  37. Rezende, D. J., Papamakarios, G., Racaniere, S., Albergo, M., Kanwar, G., Shanahan, P., and Cranmer, K. (2020), “Normalizing flows on tori and spheres,” in Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 8083–8092.
  38. Rodríguez-Santana, S., Zaldivar, B., and Hernandez-Lobato, D. (2022), “Function-space inference with sparse implicit processes,” in Proceedings of the 39th International Conference on Machine Learning, vol. 162, pp. 18723–18740.
  39. Roeder, G., Wu, Y., and Duvenaud, D. K. (2017), “Sticking the landing: Simple, lower-variance gradient estimators for variational inference,” vol. 30.
  40. Schmidt-Hieber, J. (2020), “Nonparametric regression using deep neural networks with ReLU activation function,” The Annals of Statistics, 48, 1875 – 1897.
  41. Shi, J., Sun, S., and Zhu, J. (2017), “Kernel implicit variational inference,” arXiv preprint arXiv:1705.10119.
  42. — (2018), “A spectral approach to gradient estimation for implicit distributions,” in Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 4644–4653.
  43. Spiriti, S., Eubank, R., Smith, P. W., and Young, D. (2013), “Knot selection for least-squares and penalized splines,” Journal of Statistical Computation and Simulation, 83, 1020–1036.
  44. Sriperumbudur, B., Fukumizu, K., Gretton, A., Hyvärinen, A., and Kumar, R. (2017), “Density estimation in infinite dimensional exponential families,” Journal of Machine Learning Research, 18, 1–59.
  45. Stone, C. J. (1994), “The use of polynomial splines and their tensor products in multivariate function estimation,” The Annals of Statistics, 22, 118–171.
  46. Sundararajan, M., Taly, A., and Yan, Q. (2017), “Axiomatic attribution for deep networks,” in Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3319–3328.
  47. Takahashi, H., Iwata, T., Yamanaka, Y., Yamada, M., and Yagi, S. (2019), “Variational autoencoder with implicit optimal priors,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5066–5073.
  48. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005), “Sparsity and smoothness via the fused lasso,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 91–108.
  49. Wainwright, M. J., Jordan, M. I., et al. (2008), “Graphical models, exponential families, and variational inference,” Foundations and Trends® in Machine Learning, 1, 1–305.
  50. Wang, J.-L., Chiou, J.-M., and Müller, H.-G. (2016), “Functional data analysis,” Annual Review of Statistics and Its Application, 3, 257–295.
  51. Wang, L. and Yang, L. (2009), “Spline estimation of single-index models,” Statistica Sinica, 765–783.
  52. Wang, Y., Blei, D., and Cunningham, J. P. (2021), “Posterior collapse and latent variable non-identifiability,” in Advances in Neural Information Processing Systems, vol. 34, pp. 5443–5455.
  53. Wang, Y. and Blei, D. M. (2019), “Frequentist consistency of variational Bayes,” Journal of the American Statistical Association, 114, 1147–1161.
  54. Wood, S. N. (2003), “Thin plate regression splines,” Journal of the Royal Statistical Society Series B (Statistical Methodology), 65, 95–114.
  55. Wu, H., Köhler, J., and Noe, F. (2020), “Stochastic normalizing flows,” in Advances in Neural Information Processing Systems, vol. 33, pp. 5933–5944.
  56. Xiao, H., Rasul, K., and Vollgraf, R. (2017), “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747.
  57. Yu, S., Wang, G., Wang, L., Liu, C., and Yang, L. (2020), “Estimation and inference for generalized geoadditive models,” Journal of the American Statistical Association, 115, 761–774.
  58. Zhang, C., Bütepage, J., Kjellström, H., and Mandt, S. (2018), “Advances in variational inference,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 2008–2026.
  59. Zhang, F. and Gao, C. (2020), “Convergence rates of variational posterior distributions,” The Annals of Statistics, 48, 2180 – 2207.
  60. Zhou, Y., Shi, J., and Zhu, J. (2020), “Nonparametric score estimators,” in Proceedings of the 37th International Conference on Machine Learning, vol. 119, pp. 11513–11522.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com