Last layer state space model for representation learning and uncertainty quantification (2307.01566v1)
Abstract: As sequential neural architectures become deeper and more complex, uncertainty estimation is more and more challenging. Efforts in quantifying uncertainty often rely on specific training procedures, and bear additional computational costs due to the dimensionality of such models. In this paper, we propose to decompose a classification or regression task in two steps: a representation learning stage to learn low-dimensional states, and a state space model for uncertainty estimation. This approach allows to separate representation learning and design of generative models. We demonstrate how predictive distributions can be estimated on top of an existing and trained neural network, by adding a state space-based last layer whose parameters are estimated with Sequential Monte Carlo methods. We apply our proposed methodology to the hourly estimation of Electricity Transformer Oil temperature, a publicly benchmarked dataset. Our model accounts for the noisy data structure, due to unknown or unavailable variables, and is able to provide confidence intervals on predictions.
- Y. Bengio, P. Y. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5 2, pp. 157–66, 1994.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, pp. 1735–1780, 1997.
- A. Graves, A. rahman Mohamed, and G. E. Hinton, “Speech recognition with deep recurrent neural networks,” 2013, pp. 6645–6649.
- C. S. Crowson, E. J. Atkinson, and T. M. Therneau, “Assessing calibration of prognostic risk scores,” Statistical Methods in Medical Research, vol. 25, no. 4, pp. 1692–1706, 2016, pMID: 23907781.
- L. H. Mervin, S. Johansson, E. Semenova, K. A. Giblin, and O. Engkvist, “Uncertainty quantification in drug design.” Drug discovery today, vol. 26, no. 2, pp. 474–489, 2021.
- G. E. Hinton and R. Neal, “Bayesian learning for neural networks,” 1995.
- M. I. Jordan, Z. Ghahramani, T. Jaakkola, and L. K. Saul, “An introduction to variational methods for graphical models,” Machine Learning, vol. 37, pp. 183–233, 2004.
- M. Fraccaro, S. r. K. Sø nderby, U. Paquet, and O. Winther, “Sequential neural models with stochastic layers,” in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29. Curran Associates, Inc., 2016.
- M. Fortunato, C. Blundell, and O. Vinyals, “Bayesian recurrent neural networks,” arXiv preprint arXiv:1704.02798, 2017.
- M. Cohen, M. Charbit, and S. Le Corff, “Variational latent discrete representation for time series modelling,” 2023 IEEE Statistical Signal Processing Workshop (SSP), 2023.
- C. J. Maddison, J. Lawson, G. Tucker, N. Heess, M. Norouzi, A. Mnih, A. Doucet, and Y. Teh, “Filtering variational objectives,” in Advances in Neural Information Processing Systems, vol. 30, 2017.
- C. Naesseth, S. Linderman, R. Ranganath, and D. Blei, “Variational Sequential Monte Carlo,” in Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, A. Storkey and F. Perez-Cruz, Eds., vol. 84. PMLR, 09–11 Apr 2018, pp. 968–977.
- A. Martin, C. Ollion, F. Strub, S. L. Corff, and O. Pietquin, “The monte carlo transformer: a stochastic self-attention model for sequence prediction,” ArXiv, vol. abs/2007.08620, 2020.
- C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural networks,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ser. ICML’15. JMLR.org, 2015, p. 1613–1622.
- J. M. Hernandez-Lobato and R. Adams, “Probabilistic backpropagation for scalable learning of bayesian neural networks,” in Proceedings of the 32nd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, F. Bach and D. Blei, Eds., vol. 37. Lille, France: PMLR, 07–09 Jul 2015, pp. 1861–1869.
- M. Teye, H. Azizpour, and K. Smith, “Bayesian uncertainty estimation for batch normalized deep networks,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 4907–4916.
- A. Foong, D. Burt, Y. Li, and R. Turner, “On the expressiveness of approximate inference in bayesian neural networks,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 15 897–15 908.
- Y. Gal and Z. Ghahramani, “A theoretically grounded application of dropout in recurrent neural networks,” in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29. Curran Associates, Inc., 2016.
- T. Vandal, M. Livingston, C. Piho, and S. Zimmerman, “Prediction and uncertainty quantification of daily airport flight delays,” in Proceedings of The 4th International Conference on Predictive Applications and APIs, ser. Proceedings of Machine Learning Research, vol. 82, 24–25 Oct 2018, pp. 45–51.
- M. Wen and E. Tadmor, “Uncertainty quantification in molecular simulations with dropout neural network potentials,” npj Computational Materials, vol. 6, pp. 1–10, 2020.
- T. Pearce, A. Brintrup, M. Zaki, and A. Neely, “High-quality prediction intervals for deep learning: A distribution-free, ensembled approach,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 80, 10–15 Jul 2018, pp. 4075–4084.
- A. Ashukha, A. Lyzhov, D. Molchanov, and D. Vetrov, “Pitfalls of in-domain uncertainty estimation and ensembling in deep learning,” arXiv preprint arXiv:2002.06470, 2020.
- A. Foong, D. Burt, Y. Li, and R. Turner, “On the expressiveness of approximate inference in bayesian neural networks,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 15 897–15 908.
- N. Brosse, C. Riquelme, A. Martin, S. Gelly, and É. Moulines, “On last-layer algorithms for classification: Decoupling representation from uncertainty estimation,” ArXiv, vol. abs/2001.08049, 2020.
- N. J. Gordon, D. J. Salmond, and A. F. Smith, “Novel approach to nonlinear/non-gaussian bayesian state estimation,” in IEE Proceedings F, vol. 140, no. 2. IET, 1993, pp. 107–113.
- M. K. Pitt and N. Shephard, “Filtering via simulation: Auxiliary particle filters,” J. Amer. Statist. Assoc., vol. 94, no. 446, pp. 590–599, 1999.
- R. Douc and O. Cappé, “Comparison of resampling schemes for particle filtering,” in ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005. IEEE, 2005, pp. 64–69.
- V. Elvira, J. Míguez, and P. M. Djurić, “Adapting the number of particles in sequential monte carlo methods through an online scheme for convergence assessment,” IEEE Transactions on Signal Processing, vol. 65, no. 7, pp. 1781–1794, 2016.
- V. Elvira, J. Miguez, and P. M. Djurić, “On the performance of particle filters with adaptive number of particles,” Statistics and Computing, vol. 31, no. 6, pp. 1–18, 2021.
- G. Kitagawa, “Monte carlo filter and smoother for non-gaussian nonlinear state space models,” Journal of Computational and Graphical Statistics, vol. 5, no. 1, pp. 1–25, 1996.
- C. Andrieu, A. Doucet, and V. Tadic, “On-line parameter estimation in general state-space models,” in Proceedings of the 44th IEEE Conference on Decision and Control, 2005, pp. 332–337.
- A. Doucet, S. J. Godsill, and C. Andrieu, “On sequential monte carlo sampling methods for bayesian filtering,” Statistics and Computing, vol. 10, pp. 197–208, 2000.
- S. J. Godsill, A. Doucet, and M. A. West, “Monte carlo smoothing for nonlinear time series,” Journal of the American Statistical Association, vol. 99, pp. 156 – 168, 2004.
- J. Olsson and J. Westerborn, “Efficient particle-based online smoothing in general hidden markov models : the paris algorithm,” Bernoulli, vol. 23, pp. 1951–1996, 2014.
- P. Gloaguen, S. L. Corff, and J. Olsson, “A pseudo-marginal sequential Monte Carlo online smoothing algorithm,” Bernoulli, vol. 28, no. 4, pp. 2606 – 2633, 2022.
- P. Del Moral, A. Doucet, and S. S. Singh, “A backward interpretation of Feynman–Kac formulae,” ESAIM: Mathematical Modelling and Numerical Analysis, vol. 44, pp. 947–975, 2010.
- A. Martin, M.-P. Etienne, P. Gloaguen, S. L. Corff, and J. Olsson, “Backward importance sampling for online estimation of state space models,” arXiv preprint arXiv:2002.05438, 2020.
- H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 12, pp. 11 106–11 115, May 2021.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2015.
- D. L. Shrestha and D. P. Solomatine, “Machine learning approaches for estimation of prediction interval for the model output,” Neural Networks, vol. 19, no. 2, pp. 225–235, 2006, earth Sciences and Environmental Applications of Computational Intelligence.
- L. Zhu and N. P. Laptev, “Deep and confident prediction for time series at uber,” 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 103–110, 2017.
- A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society: Series B, vol. 39, pp. 1–38, 1977.