Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning (2402.02017v2)

Published 3 Feb 2024 in cs.LG

Abstract: Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of $Q$-functions. By analyzing $Q$-function over-generalization, which impairs stable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Towards characterizing divergence in deep q-learning. arXiv preprint arXiv:1903.08894, 2019.
  2. Agarap, A. F. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375, 2018.
  3. Bellman, R. A markovian decision process. Journal of Mathematics and Mechanics, pp.  679–684, 1957.
  4. When does return-conditioned supervised learning work for offline reinforcement learning? Advances in Neural Information Processing Systems, 35:1542–1553, 2022.
  5. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  6. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
  7. Decision transformer: Reinforcement learning via sequence modeling. Advances in Neural Information Processing Systems, 34:15084–15097, 2021.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  9. Rvs: What is essential for offline RL via supervised learning? In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=S874XAIpkR-.
  10. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  11. A minimalist approach to offline reinforcement learning. Advances in Neural Information Processing Systems, 34:20132–20145, 2021.
  12. Act: Empowering decision transformer with dynamic programming via advantage conditioning. arXiv preprint arXiv:2309.05915, 2023.
  13. Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Conference on Robot Learning, pp.  1025–1037. PMLR, 2020.
  14. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pp.  1861–1870. PMLR, 2018.
  15. Neural tangent kernel: Convergence and generalization in neural networks. Advances in Neural Information Processing Systems, 31, 2018.
  16. Decision convformer: Local filtering in metaformer is sufficient for decision making. In International Conference on Learning Representations, 2024.
  17. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  18. Offline reinforcement learning with implicit q-learning. In International Conference on Learning Representations, 2021.
  19. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  20. Dr3: Value-based deep reinforcement learning requires explicit regularization. In International Conference on Learning Representations, 2021.
  21. When should we prefer offline reinforcement learning over behavioral cloning? arXiv preprint arXiv:2204.05618, 2022.
  22. Wide neural networks of any depth evolve as linear models under gradient descent. Advances in Neural Information Processing Systems, 32, 2019.
  23. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  24. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  10012–10022, 2021.
  25. A study of generalization in offline reinforcement learning. In NeurIPS 2023 Workshop on Generalization in Planning, 2023.
  26. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
  27. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 2019.
  28. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, 2019.
  29. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. Robotics: Science and Systems XIV, 2018.
  30. Reinforcement learning: An introduction. MIT press, 2018.
  31. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international Conference on Intelligent Robots and Systems, pp.  5026–5033. IEEE, 2012.
  32. Visualizing data using t-sne. Journal of Machine Learning Research, 9(11), 2008.
  33. Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
  34. Critic-guided decision transformer for offline reinforcement learning. arXiv preprint arXiv:2312.13716, 2023.
  35. Elastic decision transformer. In Advances in Neural Information Processing Systems, 2023.
  36. Offline rl with no ood actions: In-sample learning via implicit value regularization. In International Conference on Learning Representations, 2022.
  37. Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl. In International Conference on Machine Learning, pp.  38989–39007. PMLR, 2023.
  38. Understanding, predicting and better resolving q-value divergence in offline-rl. In Advances in Neural Information Processing Systems, 2023.
  39. Online decision transformer. In International Conference on Machine Learning, pp.  27042–27059. PMLR, 2022.
  40. Free from bellman completeness: Trajectory stitching via model-based return-conditioned supervised learning. In International Conference on Learning Representations, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jeonghye Kim (5 papers)
  2. Suyoung Lee (13 papers)
  3. Woojun Kim (20 papers)
  4. Youngchul Sung (48 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.