Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 36 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 191 tok/s Pro
2000 character limit reached

Elastic Decision Transformer (2307.02484v6)

Published 5 Jul 2023 in cs.LG and cs.AI

Abstract: This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Reinforcement learning based recommender systems: A survey. ACM Computing Surveys, 55(7):1–38, 2022.
  2. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pages 104–114. PMLR, 2020.
  3. On the estimation of production frontiers: maximum likelihood estimation of the parameters of a discontinuous density function. International economic review, pages 377–396, 1976.
  4. Is conditional generative modeling all you need for decision-making? arXiv preprint arXiv:2211.15657, 2022.
  5. Model-based offline planning. arXiv preprint arXiv:2008.05556, 2020.
  6. Agent57: Outperforming the atari human benchmark. In International conference on machine learning, pages 507–517. PMLR, 2020.
  7. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  9. Bats: Best action trajectory stitching. arXiv preprint arXiv:2204.12026, 2022.
  10. Offline reinforcement learning via high-fidelity generative behavior modeling. arXiv preprint arXiv:2209.14548, 2022.
  11. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  12. Generative adversarial user model for reinforcement learning based recommendation system. In International Conference on Machine Learning, pages 1052–1061. PMLR, 2019.
  13. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  15. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  16. Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pages 2052–2062. PMLR, 2019.
  17. Rl unplugged: A suite of benchmarks for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:7248–7259, 2020.
  18. Idql: Implicit q-learning as an actor-critic method with diffusion policies. arXiv preprint arXiv:2304.10573, 2023.
  19. Model-based trajectory stitching for improved offline reinforcement learning. arXiv preprint arXiv:2211.11603, 2022.
  20. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
  21. Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991, 2022.
  22. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
  23. Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374, 2019.
  24. Optimal control as a graphical model inference problem. Machine learning, 87:159–182, 2012.
  25. Morel: Model-based offline reinforcement learning. Advances in neural information processing systems, 33:21810–21823, 2020.
  26. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
  27. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems, 32, 2019.
  28. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  29. Representation balancing offline model-based reinforcement learning. In International Conference on Learning Representations, 2021.
  30. Multi-game decision transformers. arXiv preprint arXiv:2205.15241, 2022.
  31. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  32. Adaptdiffuser: Diffusion models as adaptive self-evolving planners. arXiv preprint arXiv:2302.01877, 2023.
  33. Offline reinforcement learning with uncertainty for treatment strategies in sepsis. arXiv preprint arXiv:2107.04491, 2021.
  34. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  35. Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning, pages 7780–7791. PMLR, 2021.
  36. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  37. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
  38. Asymmetric least squares estimation and testing. Econometrica: Journal of the Econometric Society, pages 819–847, 1987.
  39. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, 2019.
  40. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
  41. Survey of model-based reinforcement learning: Applications on robotics. Journal of Intelligent & Robotic Systems, 86(2):153–173, 2017.
  42. Martin L Puterman. Markov decision processes. Handbooks in operations research and management science, 2:331–434, 1990.
  43. Ross D Shachter. Probabilistic inference and influence diagrams. Operations research, 36(4):589–604, 1988.
  44. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
  45. S4rl: Surprisingly simple self-supervision for offline reinforcement learning in robotics. In Conference on Robot Learning, pages 907–917. PMLR, 2022.
  46. Financial portfolio optimization with online deep reinforcement learning and restricted stacked autoencoder—deepbreath. Expert Systems with Applications, 156:113456, 2020.
  47. Emanuel Todorov. Linearly-solvable markov decision problems. Advances in neural information processing systems, 19, 2006.
  48. Marc Toussaint. Robot trajectory optimization using approximate inference. In Proceedings of the 26th annual international conference on machine learning, pages 1049–1056, 2009.
  49. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  50. Diffusion policies as an expressive policy class for offline reinforcement learning. arXiv preprint arXiv:2208.06193, 2022.
  51. Critic regularized regression. Advances in Neural Information Processing Systems, 33:7768–7778, 2020.
  52. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020.
  53. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
  54. Learning generalizable dexterous manipulation from human grasp affordance. In Conference on Robot Learning, pages 618–629. PMLR, 2023.
  55. A general offline reinforcement learning framework for interactive recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
  56. A policy-guided imitation approach for offline reinforcement learning. Advances in Neural Information Processing Systems, 35:4085–4098, 2022.
  57. Hyper-decision transformer for efficient online policy adaptation. arXiv preprint arXiv:2304.08487, 2023.
  58. Prompting decision transformer for few-shot policy generalization. In international conference on machine learning, pages 24631–24645. PMLR, 2022.
  59. Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl. arXiv preprint arXiv:2209.03993, 2022.
  60. Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
Citations (29)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.