Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 105 tok/s Pro
GPT OSS 120B 471 tok/s Pro
Kimi K2 193 tok/s Pro
2000 character limit reached

Stochastic Halpern iteration in normed spaces and applications to reinforcement learning (2403.12338v4)

Published 19 Mar 2024 in math.OC, cs.LG, and stat.ML

Abstract: We analyze the oracle complexity of the stochastic Halpern iteration with minibatch, where we aim to approximate fixed-points of nonexpansive and contractive operators in a normed finite-dimensional space. We show that if the underlying stochastic oracle has uniformly bounded variance, our method exhibits an overall oracle complexity of $\tilde{O}(\varepsilon{-5})$, to obtain $\varepsilon$ expected fixed-point residual for nonexpansive operators, improving recent rates established for the stochastic Krasnoselskii-Mann iteration. Also, we establish a lower bound of $\Omega(\varepsilon{-3})$ which applies to a wide range of algorithms, including all averaged iterations even with minibatching. Using a suitable modification of our approach, we derive a $O(\varepsilon{-2}(1-\gamma){-3})$ complexity bound in the case in which the operator is a $\gamma$-contraction to obtain an approximation of the fixed-point. As an application, we propose new model-free algorithms for average and discounted reward MDPs. For the average reward case, our method applies to weakly communicating MDPs without requiring prior parameter knowledge.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Learning algorithms for Markov decision processes with average cost. SIAM Journal on Control and Optimization, 40 681–698.
  2. Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Transactions on Information Theory, 58 3235–3249.
  3. Variance reduction for faster non-convex optimization. In International Conference on Machine Learning. 699–707.
  4. Lower bounds for non-convex stochastic optimization. Mathematical Programming, 199 165–214.
  5. Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model. Machine Learning, 91 325–349.
  6. Optimal rates of asymptotic regularity for averaged nonexpansive mappings. World Scientific Publishing Co. Pte. Ltd., PO Box, 128 27–66.
  7. The rate of asymptotic regularity is O⁢(1/n)O1𝑛\text{O}(1/\sqrt{n})O ( 1 / square-root start_ARG italic_n end_ARG ). Kartsatos A.G. (ed.) Theory and Applications of Nonlinear Operators of Accretive and Monotone Types, Lecture Notes in Pure and Applied Mathematics, 178 51–81.
  8. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics.
  9. Error bounds for constant step-size Q-learning. Systems & Control Letters, 61 1203–1208.
  10. Universal bounds for fixed point iterations via optimal transport metrics. Applied Set-Valued Analysis and Optimization, 4 293–310.
  11. Sharp convergence rates for averaged nonexpansive maps. Israel Journal of Mathematics, 227 163–188.
  12. Stochastic fixed-point iterations for nonexpansive maps: Convergence and error bounds. SIAM Journal on Control and Optimization, 62 191–219.
  13. Stochastic Halpern iteration with variance reduction for stochastic monotone inclusions. Advances in Neural Information Processing Systems.
  14. Reducing noise in GAN training with variance reduced extragradient. In Advances in Neural Information Processing Systems.
  15. On the rate of convergence of Krasnosel’skiĭ-Mann iterations and their connection with sums of Bernoullis. Israel Journal of Mathematics, 199 757–772.
  16. Optimal error bounds for non-expansive fixed-point iterations in normed spaces. Mathematical Programming, 199 343–374.
  17. Diakonikolas, J. (2020). Halpern iteration for near-optimal and parameter-free monotone inclusion and strong solutions to variational inequalities. In Conference on Learning Theory.
  18. Potential function-based framework for minimizing gradients in convex and min-max optimization. SIAM Journal of Optimization, 32 1668–1697.
  19. The complexity of finding stationary points with stochastic gradient descent. In International Conference on Machine Learning.
  20. Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming, 145 451–482.
  21. Learning rates for Q-learning. Journal of Machine Learning Research, 5 1–25.
  22. Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. In Advances in Neural Information Processing systems.
  23. The complexity of making the gradient small in stochastic convex optimization. In Conference on Learning Theory.
  24. Extragradient method with variance reduction for stochastic variational inequalities. SIAM Journal on Optimization, 27 686–724.
  25. Accelerating stochastic gradient descent using predictive variance reduction. In Advances in Neural Information Processing Systems.
  26. Kim, D. (2021). Accelerated proximal point method for maximally monotone operators. Mathematical Programming, 190 57–87.
  27. Fast extra gradient methods for smooth structured nonconvex-nonconcave minimax problems. In Advances in Neural Information Processing Systems.
  28. Page: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In International Conference on Machine Learning.
  29. An inexact Halpern iteration for with application to distributionally robust optimization. arXiv preprint arXiv:2402.06033.
  30. Lieder, F. (2021). On the convergence rate of the Halpern-iteration. Optimization Letters, 15 405–418.
  31. Optimal variance-reduced stochastic approximation in Banach spaces. arXiv preprint arXiv:2201.08518.
  32. Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience.
  33. Nesterov, Y. E. (1983). A method for solving the convex programming problem with convergence rate O⁢(1/k2)O1superscript𝑘2\mbox{O}(1/k^{2})O ( 1 / italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). Proceedings of the USSR Academy of Sciences, 269 543–547.
  34. Stochastic variance reduction methods for saddle-point problems. In Advances in Neural Information Processing Systems.
  35. Exact optimal accelerated complexity for fixed-point iterations. In International Conference on Machine Learning.
  36. Popov, L. D. (1980). A modification of the Arrow-Hurwicz method for search of saddle points. Mathematical notes of the Academy of Sciences of the USSR, 28 845–848.
  37. Puterman, M. L. (2014). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons.
  38. Information-based complexity, feedback and dynamics in convex programming. IEEE Transactions on Information Theory, 57 7036–7056.
  39. Reich, S. (1980). Strong convergence theorems for resolvents of accretive operators in Banach spaces. Journal of Mathematical Analysis and Applications, 75 287–292.
  40. Large-Scale Convex Optimization via Monotone Operators. Cambridge University Press.
  41. A first order method for solving convex bilevel optimization problems. SIAM Journal on Optimization, 27 640–660.
  42. Reinforcement Learning: An Introduction. MIT press.
  43. Tran-Dinh, Q. (2022). The connection between Nesterov’s accelerated methods and Halpern fixed-point iterations. arXiv preprint arXiv:2203.04869.
  44. Halpern-type accelerated and splitting algorithms for monotone inclusions. arXiv preprint arXiv:2110.08150.
  45. Randomized block-coordinate optimistic gradient algorithms for root-finding problems. arXiv preprint arXiv:2301.03113.
  46. Wainwright, M. J. (2019a). Stochastic approximation with cone-contractive operators: Sharp ℓ∞subscriptℓ\ell_{\infty}roman_ℓ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT-bounds for Q-learning. arXiv preprint arXiv:1905.06265.
  47. Wainwright, M. J. (2019b). Variance-reduced Q-learning is minimax optimal. arXiv preprint arXiv:1906.04697.
  48. Learning and planning in average-reward Markov decision processes. In International Conference on Machine Learning.
  49. Watkins, C. (1989). Learning from delayed rewards. King’s College, Cambridge United Kingdom.
  50. Q-learning. Machine learning, 8 279–292.
  51. Xu, H.-K. (2002). Iterative algorithms for nonlinear operators. Journal of the London Mathematical Society, 66 240–256.
  52. Stochastic nested variance reduction for nonconvex optimization. The Journal of Machine Learning Research, 21 4130–4192.
Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com