Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparse Deep Learning for Time Series Data: Theory and Applications (2310.03243v1)

Published 5 Oct 2023 in stat.ML, cs.AI, and cs.LG

Abstract: Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Modeling long-and short-term temporal patterns with deep neural networks. In The 41st international ACM SIGIR conference on research & development in information retrieval, pages 95–104, 2018.
  2. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020.
  3. Deep state space models for time series forecasting. Advances in neural information processing systems, 31, 2018.
  4. N-hits: Neural hierarchical interpolation for time series forecasting. arXiv preprint arXiv:2201.12886, 2022.
  5. Recurrent neural networks for time series forecasting: Current status and future directions. International Journal of Forecasting, 37(1):388–427, 2021.
  6. Mogrifier lstm. arXiv preprint arXiv:1909.01792, 2019.
  7. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
  8. Sequencer: Deep lstm for image classification. arXiv preprint arXiv:2205.01972, 2022.
  9. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  10. On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR, 2017.
  11. The loss surface of deep and wide neural networks. In International conference on machine learning, pages 2603–2612. PMLR, 2017.
  12. On the problem of local minima in backpropagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(1):76–86, 1992.
  13. A convergence theory for deep learning via over-parameterization. In International Conference on Machine Learning, pages 242–252. PMLR, 2019.
  14. Gradient descent finds global minima of deep neural networks. In International conference on machine learning, pages 1675–1685. PMLR, 2019.
  15. Gradient descent optimizes over-parameterized deep relu networks. Machine learning, 109(3):467–492, 2020.
  16. An improved analysis of training over-parameterized deep neural networks. Advances in neural information processing systems, 32, 2019.
  17. Consistent sparse deep learning: Theory and computation. Journal of the American Statistical Association, pages 1–15, 2021.
  18. Sparse deep learning: A new framework immune to local traps and miscalibration. Advances in Neural Information Processing Systems, 34:22301–22312, 2021.
  19. Bayesian neural networks for selection of drug sensitive genes. Journal of the American Statistical Association, 113(523):955–972, 2018.
  20. Posterior concentration for sparse deep learning. Advances in Neural Information Processing Systems, 31, 2018.
  21. Uncertainty quantification for sparse deep learning. In International Conference on Artificial Intelligence and Statistics, pages 298–308. PMLR, 2020.
  22. Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu activation function. The Annals of Statistics, 48(4):1916–1921, 2020.
  23. Optimal approximation with sparsely connected deep neural networks. SIAM Journal on Mathematics of Data Science, 1(1):8–45, 2019.
  24. Lower bounds for approximation by mlp neural networks. Neurocomputing, 25(1-3):81–91, 1999.
  25. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523):1094–1111, 2018.
  26. Inductive confidence machines for regression. In Machine Learning: ECML 2002: 13th European Conference on Machine Learning Helsinki, Finland, August 19–23, 2002 Proceedings 13, pages 345–356. Springer, 2002.
  27. Exact and robust conformal inference methods for predictive machine learning with dependent data. In Conference On learning theory, pages 732–749. PMLR, 2018.
  28. Conformal prediction interval for dynamic time-series. In International Conference on Machine Learning, pages 11559–11569. PMLR, 2021.
  29. Conformal prediction beyond exchangeability. The Annals of Statistics, 51(2):816–845, 2023.
  30. Adaptive conformal predictions for time series. In International Conference on Machine Learning, pages 25834–25866. PMLR, 2022.
  31. Adaptive conformal inference under distribution shift. Advances in Neural Information Processing Systems, 34:1660–1672, 2021.
  32. Conformal time-series forecasting. Advances in Neural Information Processing Systems, 34:6216–6228, 2021.
  33. A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511, 2021.
  34. A multi-horizon quantile recurrent forecaster. arXiv preprint arXiv:1711.11053, 2017.
  35. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050–1059. PMLR, 2016.
  36. Recursive bayesian recurrent neural networks for time-series modeling. IEEE Transactions on Neural Networks, 21(2):262–274, 2009.
  37. A general asymptotic theory for time-series models. Statistica Neerlandica, 64(1):97–111, 2010.
  38. Bin Yu. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, pages 94–116, 1994.
  39. Ron Meir. Performance bounds for nonlinear time series prediction. In Proceedings of the tenth annual conference on computational learning theory, pages 122–129, 1997.
  40. Predictive pac learning and process decompositions. Advances in neural information processing systems, 26, 2013.
  41. Subhashis Ghosal and Aad van der Vaart. Convergence rates of posterior distributions for non-i.i.d. observations. Annals of Statistics, 35:192–223, 2007.
  42. A bernstein–von mises theorem for smooth functionals in semiparametric models. Annals of Statistics, 43:2353–2383, 2013.
  43. Olivier Wintenberger. Optimal learning with bernstein online aggregation. Machine Learning, 106:119–141, 2017.
  44. Long short-term memory. Neural Comput., 9(8):1735–1780, nov 1997.
  45. Mimic-iii, a freely accessible critical care database. Scientific data, 3(1):1–9, 2016.
  46. Catherine Blake. Uci repository of machine learning databases. http://www. ics. uci. edu/~ mlearn/MLRepository. html, 1998.
  47. GOV UK et al. Coronavirus (covid-19) in the uk, 2020.
  48. A bias and variance analysis for multistep-ahead time series forecasting. IEEE transactions on neural networks and learning systems, 27(1):62–76, 2015.
  49. Building neural network models for time series: a statistical approach. Journal of Forecasting, 25(1):49–75, 2006.
  50. Forecasting with nonlinear time series models. Econometrics: Multiple Equation Models eJournal, 2010.
  51. Identification of nonlinear time series: First order characterization and order determination. Biometrika, 77(4):669–687, 1990.
  52. Wenxin Jiang. Bayesian variable selection for high dimensional generalized linear models: convergence rates of the fitted densities. The Annals of Statistics, 35(4):1487–1511, 2007.
  53. Subhashis Ghosal and Aad Van der Vaart. Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press, 2017.
  54. A complete proof of universal inequalities for the distribution function of the binomial law. Theory of Probability & Its Applications, 57(3):539–544, 2013.
  55. Pentti Saikkonen. Dependent versions of a central limit theorem for the squared length of a sample mean. Statistics & probability letters, 22(3):185–194, 1995.
  56. Stephen Portnoy. Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. The Annals of Statistics, pages 356–366, 1988.
  57. A complete recipe for stochastic gradient mcmc. Advances in neural information processing systems, 28, 2015.
  58. European market coupling algorithm incorporating clearing conditions of block and complex orders. In 2015 IEEE Eindhoven PowerTech, pages 1–6. IEEE, 2015.
  59. Conformal prediction with temporal quantile adjustments. arXiv preprint arXiv:2205.09940, 2022.
  60. What is the state of neural network pruning? Proceedings of machine learning and systems, 2:129–146, 2020.
  61. Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res., 22(241):1–124, 2021.
  62. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329, 2014.
  63. Bayesian compression for natural language processing. arXiv preprint arXiv:1810.10927, 2018.
  64. Efficient language modeling with automatic relevance determination in recurrent neural networks. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pages 40–48, 2019.
  65. Bayesian sparsification of gated recurrent neural networks. arXiv preprint arXiv:1812.05692, 2018.
  66. Compression of recurrent neural networks for efficient language modeling. Applied Soft Computing, 79:354–362, 2019.
  67. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878, 2017.
Citations (2)

Summary

We haven't generated a summary for this paper yet.