Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization (2401.09346v1)

Published 17 Jan 2024 in stat.ML and cs.LG

Abstract: Uncertainty quantification for estimation through stochastic optimization solutions in an online setting has gained popularity recently. This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level. Specifically, we propose to use a small number of independent multi-runs to acquire distribution information and construct a t-based confidence interval. Our method requires minimal additional computation and memory beyond the standard updating of estimates, making the inference process almost cost-free. We provide a rigorous theoretical guarantee for the confidence interval, demonstrating that the coverage is approximately exact with an explicit convergence rate and allowing for high confidence level inference. In particular, a new Gaussian approximation result is developed for the online estimators to characterize the coverage properties of our confidence intervals in terms of relative errors. Additionally, our method also allows for leveraging parallel computing to further accelerate calculations using multiple cores. It is easy to implement and can be integrated with existing stochastic algorithms without the need for complicated modifications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Normal approximation for stochastic gradient descent via non-asymptotic rates of martingale clt. In Conference on Learning Theory.
  2. Komlós–major–tusnády approximation under dependence. The Annals of Probability, 42(2):794–817.
  3. Statistical inference for online decision making via stochastic gradient descent. Journal of the American Statistical Association, 116(534):708–719.
  4. Statistical inference for model parameters in stochastic gradient descent. The Annals of Statistics, 48(1):251–273.
  5. A new method to prove strassen type laws of invariance principle. 1. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 31(4):255–259.
  6. Large scale distributed deep networks. Advances in Neural Information Processing Systems.
  7. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7):2121 – 2159.
  8. Asymptotic optimality in stochastic optimization. The Annals of Statistics, 49(1):21–48.
  9. Einmahl, U. (1987). Strong invariance principles for partial sums of independent random vectors. The Annals of Probability, 15(4):1419–1440.
  10. Online bootstrap confidence intervals for the stochastic gradient descent estimator. Journal of Machine Learning Research, 19(78):1–21.
  11. An efficient framework for clustered federated learning. Advances in Neural Information Processing Systems.
  12. Hájek, J. (1972). Local asymptotic minimax and admissibility in estimation. In Proceedings of Berkeley symposium on mathematical statistics and probability, pages 175–194.
  13. Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning.
  14. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  15. An approximation of partial sums of independent rv’-s, and the sample df. i. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 32:111–131.
  16. An approximation of partial sums of independent rv’s, and the sample df. ii. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 34:33–58.
  17. Fast and robust online inference with stochastic gradient descent via random scaling. In Proceedings of the AAAI Conference on Artificial Intelligence.
  18. Root-sgd: Sharp nonasymptotics and asymptotic efficiency in a single algorithm. In Conference on Learning Theory.
  19. Statistical inference using sgd. In Proceedings of the AAAI Conference on Artificial Intelligence.
  20. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3):50–60.
  21. Statistical estimation and online inference via local sgd. In Conference on Learning Theory.
  22. Strong approximation for a class of stationary processes. Stochastic Processes and their Applications, 119(1):249–280.
  23. Covariance estimators for the root-sgd algorithm in online learning. arXiv preprint arXiv:2212.01259.
  24. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics.
  25. Sequential gaussian approximation for nonstationary time series in high dimensions. Bernoulli, 29(4):3114–3140.
  26. Statistical inference of constrained stochastic optimization via sketched sequential quadratic programming. arXiv preprint arXiv:2205.13687.
  27. Acceleration of stochastic approximation by averaging. SIAM Journal of Control Optimization, 30(4):838–855.
  28. Making gradient descent optimal for strongly convex stochastic optimization. In International Conference on Machine Learning.
  29. Online bootstrap inference for policy evaluation in reinforcement learning. Journal of the American Statistical Association, 118(544):2901–2914.
  30. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400–407.
  31. Ruppert, D. (1988). Efficient estimations from a slowly convergent robbins-monro process. Technical report, Cornell University Operations Research and Industrial Engineering.
  32. Berry–esseen bounds for multivariate nonlinear statistics with applications to m-estimators and stochastic gradient descent algorithms. Bernoulli, 28(3):1548–1576.
  33. Higrad: Uncertainty quantification for online learning and stochastic approximation. Journal of Machine Learning Research, 24(124):1–53.
  34. Asymptotic and finite-sample properties of estimators based on stochastic gradients. The Annals of Statistics, 45(4):1694–1727.
  35. Van der Vaart, A. W. (2000). Asymptotic statistics, volume 3. Cambridge university press.
  36. Weighted averaged stochastic gradient descent: Asymptotic normality and optimality. arXiv preprint arXiv:2307.06915.
  37. Is local sgd better than minibatch sgd? In International Conference on Machine Learning.
  38. Wu, W. B. (2007). Strong invariance principles for dependent random variables. The Annals of Probability, 35(6):2294–2320.
  39. Parallel restarted sgd with faster convergence and less communication: Demystifying why model averaging works for deep learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
  40. Online covariance matrix estimation in stochastic gradient descent. Journal of the American Statistical Association, 118(541):393–404.
  41. Parallelized stochastic gradient descent. Advances in Neural Information Processing Systems.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com