Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills (2405.11380v3)

Published 18 May 2024 in cs.RO, cs.AI, cs.SY, and eess.SY

Abstract: The requirements for real-world manipulation tasks are diverse and often conflicting; some tasks require precise motion while others require force compliance; some tasks require avoidance of certain regions, while others require convergence to certain states. Satisfying these varied requirements with a fixed state-action representation and control strategy is challenging, impeding the development of a universal robotic foundation model. In this work, we propose Meta-Control, the first LLM-enabled automatic control synthesis approach that creates customized state representations and control strategies tailored to specific tasks. Our core insight is that a meta-control system can be built to automate the thought process that human experts use to design control systems. Specifically, human experts heavily use a model-based, hierarchical (from abstract to concrete) thought model, then compose various dynamic models and controllers together to form a control system. Meta-Control mimics the thought model and harnesses LLM's extensive control knowledge with Socrates' "art of midwifery" to automate the thought process. Meta-Control stands out for its fully model-based nature, allowing rigorous analysis, generalizability, robustness, efficient parameter tuning, and reliable real-time execution.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500. IEEE, 2023.
  2. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
  3. Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598, 2022.
  4. Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv preprint arXiv:2307.05973, 2023.
  5. Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, pages 287–318. PMLR, 2023.
  6. Open x-embodiment: Robotic learning datasets and rt-x models. arXiv preprint arXiv:2310.08864, 2023.
  7. Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647, 2023.
  8. Eureka: Human-level reward design via coding large language models. arXiv preprint arXiv: Arxiv-2310.12931, 2023.
  9. Text2reward: Automated dense reward function generation for reinforcement learning. arXiv preprint arXiv:2309.11489, 2023.
  10. Robogen: Towards unleashing infinite data for automated robot learning via generative simulation, 2023.
  11. Bootstrap your own skills: Learning to solve new tasks with large language model guidance. In 7th Annual Conference on Robot Learning, 2023. URL https://openreview.net/forum?id=a0mFRgadGO.
  12. Chatgpt for robotics: Design principles and model abilities. Microsoft Auton. Syst. Robot. Res, 2:20, 2023.
  13. Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176, 2023.
  14. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530. IEEE, 2023.
  15. Tidybot: Personalized robot assistance with large language models. Autonomous Robots, 47(8):1087–1102, 2023.
  16. Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning. arXiv preprint arXiv:2311.17842, 2023.
  17. Copal: Corrective planning of robot actions with large language models. arXiv preprint arXiv:2310.07263, 2023.
  18. Physically grounded vision-language models for robotic manipulation. In 2024 International Conference on Robotics and Automation (ICRA). IEEE, 2024.
  19. Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153, 2023.
  20. Scaling up and distilling down: Language-guided robot skill acquisition. In Proceedings of the 2023 Conference on Robot Learning, 2023.
  21. Palm-e: An embodied multimodal language model. In arXiv preprint arXiv:2303.03378, 2023.
  22. Doremi: Grounding language model by detecting and recovering from plan-execution misalignment. arXiv preprint arXiv:2307.00329, 2023.
  23. Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation. arXiv preprint arXiv:2305.18898, 2023.
  24. Grounded decoding: Guiding text generation with grounded models for embodied agents. Advances in Neural Information Processing Systems, 36, 2024.
  25. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
  26. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023.
  27. A survey of reinforcement learning informed by natural language. arXiv preprint arXiv:1906.03926, 2019.
  28. Modular multitask reinforcement learning with policy sketches. In International conference on machine learning, pages 166–175. PMLR, 2017.
  29. Language as an abstraction for hierarchical deep reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
  30. Deepmpc: Learning deep latent features for model predictive control. In Robotics: Science and Systems, volume 10, page 25. Rome, Italy, 2015.
  31. Learning-based model predictive control: Toward safe learning in control. Annual Review of Control, Robotics, and Autonomous Systems, 3:269–296, 2020.
  32. A compositional object-based approach to learning physical dynamics. arXiv preprint arXiv:1612.00341, 2016.
  33. Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems, 29, 2016.
  34. Deep dynamics models for learning dexterous manipulation. In Conference on Robot Learning, pages 1101–1112. PMLR, 2020.
  35. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. arXiv preprint arXiv:1810.01566, 2018.
  36. Deep visual heuristics: Learning feasibility of mixed-integer programs for manipulation planning. In 2020 IEEE international conference on robotics and automation (ICRA), pages 9563–9569. IEEE, 2020.
  37. Copa: General robotic manipulation through spatial constraints of parts with foundation models. arXiv preprint arXiv:2403.08248, 2024.
  38. Guided cost learning: Deep inverse optimal control via policy optimization. In International conference on machine learning, pages 49–58. PMLR, 2016.
  39. Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017.
  40. Differentiable mpc for end-to-end planning and control. Advances in neural information processing systems, 31, 2018.
  41. Correcting robot plans with natural language feedback. arXiv preprint arXiv:2204.05186, 2022.
  42. Saytap: Language to quadrupedal locomotion. 2023. URL https://saytap.github.io. https://saytap.github.io.
  43. Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
  44. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
  45. Latte: Language trajectory transformer. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7287–7294. IEEE, 2023.
  46. Open-world object manipulation using pre-trained vision-language models. arXiv preprint arXiv:2303.00905, 2023.
  47. Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023.
  48. Vima: General robot manipulation with multimodal prompts. In Fortieth International Conference on Machine Learning, 2023.
  49. Large language models as generalizable policies for embodied tasks. preprint, 2023.
  50. A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  51. Robocat: A self-improving foundation agent for robotic manipulation. arXiv preprint arXiv:2306.11706, 2023.
  52. Interactive language: Talking to robots in real time. IEEE Robotics and Automation Letters, 2023.
  53. Implicit behavioral cloning. Conference on Robot Learning (CoRL), 2021.
  54. Transporter networks: Rearranging the visual world for robotic manipulation. Conference on Robot Learning (CoRL), 2020.
  55. Cliport: What and where pathways for robotic manipulation. In Proceedings of the 5th Conference on Robot Learning (CoRL), 2021.
  56. Perceiver-actor: A multi-task transformer for robotic manipulation. In Proceedings of the 6th Conference on Robot Learning (CoRL), 2022.
  57. Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
  58. Playfusion: Skill acquisition via diffusion from language-annotated play. In Conference on Robot Learning, pages 2012–2029. PMLR, 2023.
  59. Efficient abstraction selection in reinforcement learning. Computational Intelligence, 30(4):657–699, 2014.
  60. Efficient skill learning using abstraction selection. In IJCAI, volume 9, pages 1107–1112, 2009.
  61. Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 512–519. IEEE, 2016.
  62. The unsurprising effectiveness of pre-trained vision models for control. In international conference on machine learning, pages 17359–17371. PMLR, 2022.
  63. Unsupervised state representation learning with robotic priors: a robustness benchmark. arXiv preprint arXiv:1709.05185, 2017.
  64. Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020.
  65. Autonomous reinforcement learning on raw visual input data in a real world application. In The 2012 international joint conference on neural networks (IJCNN), pages 1–8. IEEE, 2012.
  66. Curl: Contrastive unsupervised representations for reinforcement learning. In International conference on machine learning, pages 5639–5650. PMLR, 2020.
  67. Simple but effective: Clip embeddings for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838, 2022.
  68. R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
  69. Masked visual pre-training for motor control. arXiv preprint arXiv:2203.06173, 2022.
  70. Real-world robot learning with masked visual pre-training. In Conference on Robot Learning, pages 416–426. PMLR, 2023.
  71. Vip: Towards universal visual reward and representation via value-implicit pre-training. arXiv preprint arXiv:2210.00030, 2022.
  72. Language-driven representation learning for robotics. arXiv preprint arXiv:2302.12766, 2023.
  73. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  74. Can foundation models perform zero-shot task specification for robot manipulation? In Learning for dynamics and control conference, pages 893–905. PMLR, 2022.
  75. Instruction-following agents with multimodal transformer. arXiv preprint arXiv:2210.13431, 2022.
  76. Drake: Model-based design and verification for robotics, 2019. URL https://drake.mit.edu.
  77. T. Wei and C. Liu. Safe control algorithms using energy functions: A uni ed framework, benchmark, and new directions. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 238–243. IEEE, 2019.
  78. Zero-shot transferable and persistently feasible safe control for high dimensional systems by consistent abstraction. In 2023 62nd IEEE Conference on Decision and Control (CDC), pages 8614–8619. IEEE, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com