Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Operator Splitting Value Iteration (2211.13937v1)

Published 25 Nov 2022 in cs.LG, cs.AI, cs.SY, eess.SY, math.OC, and stat.ML

Abstract: We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence rate when the model is accurate enough. We also introduce a sample-based version of the algorithm called OS-Dyna. Unlike the traditional Dyna architecture, OS-Dyna still converges to the correct value function in presence of model approximation error.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Amin Rakhsha (7 papers)
  2. Andrew Wang (42 papers)
  3. Mohammad Ghavamzadeh (97 papers)
  4. Amir-massoud Farahmand (31 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.