2000 character limit reached
Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning (2402.02017v2)
Published 3 Feb 2024 in cs.LG
Abstract: Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of $Q$-functions. By analyzing $Q$-function over-generalization, which impairs stable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.
- Towards characterizing divergence in deep q-learning. arXiv preprint arXiv:1903.08894, 2019.
- Agarap, A. F. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375, 2018.
- Bellman, R. A markovian decision process. Journal of Mathematics and Mechanics, pp. 679–684, 1957.
- When does return-conditioned supervised learning work for offline reinforcement learning? Advances in Neural Information Processing Systems, 35:1542–1553, 2022.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in Neural Information Processing Systems, 34:15084–15097, 2021.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
- Rvs: What is essential for offline RL via supervised learning? In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=S874XAIpkR-.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- A minimalist approach to offline reinforcement learning. Advances in Neural Information Processing Systems, 34:20132–20145, 2021.
- Act: Empowering decision transformer with dynamic programming via advantage conditioning. arXiv preprint arXiv:2309.05915, 2023.
- Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. In Conference on Robot Learning, pp. 1025–1037. PMLR, 2020.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pp. 1861–1870. PMLR, 2018.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in Neural Information Processing Systems, 31, 2018.
- Decision convformer: Local filtering in metaformer is sufficient for decision making. In International Conference on Learning Representations, 2024.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Offline reinforcement learning with implicit q-learning. In International Conference on Learning Representations, 2021.
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
- Dr3: Value-based deep reinforcement learning requires explicit regularization. In International Conference on Learning Representations, 2021.
- When should we prefer offline reinforcement learning over behavioral cloning? arXiv preprint arXiv:2204.05618, 2022.
- Wide neural networks of any depth evolve as linear models under gradient descent. Advances in Neural Information Processing Systems, 32, 2019.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022, 2021.
- A study of generalization in offline reinforcement learning. In NeurIPS 2023 Workshop on Generalization in Planning, 2023.
- Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
- Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 2019.
- Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, 2019.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. Robotics: Science and Systems XIV, 2018.
- Reinforcement learning: An introduction. MIT press, 2018.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE, 2012.
- Visualizing data using t-sne. Journal of Machine Learning Research, 9(11), 2008.
- Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
- Critic-guided decision transformer for offline reinforcement learning. arXiv preprint arXiv:2312.13716, 2023.
- Elastic decision transformer. In Advances in Neural Information Processing Systems, 2023.
- Offline rl with no ood actions: In-sample learning via implicit value regularization. In International Conference on Learning Representations, 2022.
- Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl. In International Conference on Machine Learning, pp. 38989–39007. PMLR, 2023.
- Understanding, predicting and better resolving q-value divergence in offline-rl. In Advances in Neural Information Processing Systems, 2023.
- Online decision transformer. In International Conference on Machine Learning, pp. 27042–27059. PMLR, 2022.
- Free from bellman completeness: Trajectory stitching via model-based return-conditioned supervised learning. In International Conference on Learning Representations, 2024.
- Jeonghye Kim (5 papers)
- Suyoung Lee (13 papers)
- Woojun Kim (20 papers)
- Youngchul Sung (48 papers)