Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Decision Transformer as a Foundation Model for Partially Observable Continuous Control (2404.02407v1)

Published 3 Apr 2024 in eess.SY, cs.AI, cs.LG, cs.RO, and cs.SY

Abstract: Closed-loop control of nonlinear dynamical systems with partial-state observability demands expert knowledge of a diverse, less standardized set of theoretical tools. Moreover, it requires a delicate integration of controller and estimator designs to achieve the desired system behavior. To establish a general controller synthesis framework, we explore the Decision Transformer (DT) architecture. Specifically, we first frame the control task as predicting the current optimal action based on past observations, actions, and rewards, eliminating the need for a separate estimator design. Then, we leverage the pre-trained LLMs, i.e., the Generative Pre-trained Transformer (GPT) series, to initialize DT and subsequently train it for control tasks using low-rank adaptation (LoRA). Our comprehensive experiments across five distinct control tasks, ranging from maneuvering aerospace systems to controlling partial differential equations (PDEs), demonstrate DT's capability to capture the parameter-agnostic structures intrinsic to control tasks. DT exhibits remarkable zero-shot generalization abilities for completely new tasks and rapidly surpasses expert performance levels with a minimal amount of demonstration data. These findings highlight the potential of DT as a foundational controller for general control applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Optimal Filtering. Prentice-Hall, 1979.
  2. T. Başar and P. Bernhard. H-infinity Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach. Birkhäuser, Boston., 1995.
  3. Robust and Optimal Control, volume 40. Prentice Hall New Jersey, 1996.
  4. K. J. Åström and B. Wittenmark. Adaptive Control. Courier Corporation, 2008.
  5. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
  6. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
  7. B. Recht. A tour of reinforcement learning: The view from continuous control. Annual Review of Control, Robotics, and Autonomous Systems, 2:253–279, 2019.
  8. Toward a theoretical foundation of policy optimization for learning control policies. Annual Review of Control, Robotics, and Autonomous Systems, 6:123–158, 2023.
  9. E. Lavretsky and K. Wise. Robust and Adaptive Control: With Aerospace Applications. Springer, 2012.
  10. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pages 1126–1135, 2017.
  11. Online meta-learning. In International Conference on Machine Learning, pages 1920–1930, 2019.
  12. Offline meta-reinforcement learning with advantage weighting. In International Conference on Machine Learning, pages 7780–7791, 2021.
  13. Meta-adaptive nonlinear control: Theory and algorithms. Advances in Neural Information Processing Systems, 34:10013–10025, 2021.
  14. Control-oriented meta-learning. The International Journal of Robotics Research, 42(10):777–797, 2023.
  15. N. Musavi and G. E. Dullerud. Convergence of gradient-based MAML in LQR. In IEEE Conference on Decision and Control, pages 7362–7366. IEEE, 2023.
  16. Meta-learning linear quadratic regulators: A policy gradient maml approach for the model-free lqr. arXiv preprint arXiv:2401.14534, 2024.
  17. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9, 2019.
  18. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
  19. GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  20. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  21. Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
  22. Decision transformer: Reinforcement learning via sequence modeling. Advances in Neural Information Processing Systems, 34:15084–15097, 2021.
  23. Offline reinforcement learning as one big sequence modeling problem. Advances in Neural Information Processing Systems, 34:1273–1286, 2021.
  24. Approximate information state for approximate planning and reinforcement learning in partially observed systems. Journal of Machine Learning Research, 23(12):1–83, 2022.
  25. K. J. Astrom. Introduction to Stochastic Control Theory. Elsevier, 1971.
  26. Learning the Kalman filter with fine-grained sample complexity. In American Control Conference, pages 4549–4554, 2023.
  27. Global convergence of receding-horizon policy search in learning estimator designs. arXiv preprint arXiv:2309.04831, 2023.
  28. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  29. Conservative Q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  30. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  31. Mastering the game of Go without human knowledge. nature, 550(7676):354–359, 2017.
  32. Closed-loop turbulence control: Progress and challenges. Applied Mechanics Reviews, 67(5):050801, 2015.
  33. Controlgym: Large-scale safety-critical control environments for benchmarking reinforcement learning algorithms. arXiv preprint arXiv:2311.18736, 2023.
  34. Global convergence of policy gradient methods for the linear quadratic regulator. In International Conference on Machine Learning, pages 1467–1476, 2018.
  35. Policy optimization for ℋ2subscriptℋ2\mathcal{H}_{2}caligraphic_H start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT linear control with ℋ∞subscriptℋ\mathcal{H}_{\infty}caligraphic_H start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT robustness guarantee: Implicit regularization and global convergence. SIAM Journal on Control and Optimization, 59(6):4081–4109, 2021.
  36. Derivative-free policy optimization for linear risk-sensitive and robust control design: Implicit regularization and sample complexity. Advances in Neural Information Processing Systems, 34:2949–2964, 2021.
  37. X. Zhang and T. Başar. Revisiting LQR control from the perspective of receding-horizon policy gradient. IEEE Control Systems Letters, 7:1664–1669, 2023.
  38. Multi-game decision transformers. Advances in Neural Information Processing Systems, 35:27921–27936, 2022.
  39. Prompting decision transformer for few-shot policy generalization. In International Conference on Machine Learning, pages 24631–24645, 2022.
  40. Online decision transformer. In International Conference on Machine Learning, pages 27042–27059, 2022.
  41. Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl. In International Conference on Machine Learning, pages 38989–39007, 2023.
  42. Hyper-decision transformer for efficient online policy adaptation. arXiv preprint arXiv:2304.08487, 2023.
  43. Unleashing the power of pre-trained language models for offline reinforcement learning. arXiv preprint arXiv:2310.20587, 2023.
  44. Learning to modulate pre-trained models in rl. Advances in Neural Information Processing Systems, 36, 2024.
  45. D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  46. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, pages 1094–1100. PMLR, 2020.
  47. DeepMind control suite. arXiv preprint arXiv:1801.00690, 2018.
  48. Deep transformer Q-networks for partially observable reinforcement learning. In NeurIPS 2022 Foundation Models for Decision Making Workshop, 2022.
  49. A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  50. RT-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
  51. VIMA: General robot manipulation with multimodal prompts. In NeurIPS 2022 Foundation Models for Decision Making Workshop, 2022.
  52. MetaMorph: Learning universal controllers with transformers. arXiv preprint arXiv:2203.11931, 2022.
  53. RT-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
  54. RoboCat: A self-improving generalist agent for robotic manipulation. Transactions on Machine Learning Research, 2023.
  55. Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705, 2023.
  56. Octo: An open-source generalist robot policy, 2023.
  57. Robot learning with sensorimotor pre-training. In Conference on Robot Learning, pages 683–693, 2023.
  58. Humanoid locomotion as next token prediction. arXiv preprint arXiv:2402.19469, 2024.
  59. Foundation models in robotics: Applications, challenges, and the future. arXiv preprint arXiv:2312.07843, 2023.
  60. Real-world robot applications of foundation models: A review. arXiv preprint arXiv:2402.05741, 2024.
  61. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, 1998.
  62. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  63. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xiangyuan Zhang (10 papers)
  2. Weichao Mao (11 papers)
  3. Haoran Qiu (10 papers)
  4. Tamer Başar (200 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com