Meta-Control: Automatic Model-based Control Synthesis for Heterogeneous Robot Skills (2405.11380v3)
Abstract: The requirements for real-world manipulation tasks are diverse and often conflicting; some tasks require precise motion while others require force compliance; some tasks require avoidance of certain regions, while others require convergence to certain states. Satisfying these varied requirements with a fixed state-action representation and control strategy is challenging, impeding the development of a universal robotic foundation model. In this work, we propose Meta-Control, the first LLM-enabled automatic control synthesis approach that creates customized state representations and control strategies tailored to specific tasks. Our core insight is that a meta-control system can be built to automate the thought process that human experts use to design control systems. Specifically, human experts heavily use a model-based, hierarchical (from abstract to concrete) thought model, then compose various dynamic models and controllers together to form a control system. Meta-Control mimics the thought model and harnesses LLM's extensive control knowledge with Socrates' "art of midwifery" to automate the thought process. Meta-Control stands out for its fully model-based nature, allowing rigorous analysis, generalizability, robustness, efficient parameter tuning, and reliable real-time execution.
- Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 9493–9500. IEEE, 2023.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
- Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598, 2022.
- Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv preprint arXiv:2307.05973, 2023.
- Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, pages 287–318. PMLR, 2023.
- Open x-embodiment: Robotic learning datasets and rt-x models. arXiv preprint arXiv:2310.08864, 2023.
- Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647, 2023.
- Eureka: Human-level reward design via coding large language models. arXiv preprint arXiv: Arxiv-2310.12931, 2023.
- Text2reward: Automated dense reward function generation for reinforcement learning. arXiv preprint arXiv:2309.11489, 2023.
- Robogen: Towards unleashing infinite data for automated robot learning via generative simulation, 2023.
- Bootstrap your own skills: Learning to solve new tasks with large language model guidance. In 7th Annual Conference on Robot Learning, 2023. URL https://openreview.net/forum?id=a0mFRgadGO.
- Chatgpt for robotics: Design principles and model abilities. Microsoft Auton. Syst. Robot. Res, 2:20, 2023.
- Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176, 2023.
- Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530. IEEE, 2023.
- Tidybot: Personalized robot assistance with large language models. Autonomous Robots, 47(8):1087–1102, 2023.
- Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning. arXiv preprint arXiv:2311.17842, 2023.
- Copal: Corrective planning of robot actions with large language models. arXiv preprint arXiv:2310.07263, 2023.
- Physically grounded vision-language models for robotic manipulation. In 2024 International Conference on Robotics and Automation (ICRA). IEEE, 2024.
- Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153, 2023.
- Scaling up and distilling down: Language-guided robot skill acquisition. In Proceedings of the 2023 Conference on Robot Learning, 2023.
- Palm-e: An embodied multimodal language model. In arXiv preprint arXiv:2303.03378, 2023.
- Doremi: Grounding language model by detecting and recovering from plan-execution misalignment. arXiv preprint arXiv:2307.00329, 2023.
- Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation. arXiv preprint arXiv:2305.18898, 2023.
- Grounded decoding: Guiding text generation with grounded models for embodied agents. Advances in Neural Information Processing Systems, 36, 2024.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023.
- A survey of reinforcement learning informed by natural language. arXiv preprint arXiv:1906.03926, 2019.
- Modular multitask reinforcement learning with policy sketches. In International conference on machine learning, pages 166–175. PMLR, 2017.
- Language as an abstraction for hierarchical deep reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
- Deepmpc: Learning deep latent features for model predictive control. In Robotics: Science and Systems, volume 10, page 25. Rome, Italy, 2015.
- Learning-based model predictive control: Toward safe learning in control. Annual Review of Control, Robotics, and Autonomous Systems, 3:269–296, 2020.
- A compositional object-based approach to learning physical dynamics. arXiv preprint arXiv:1612.00341, 2016.
- Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems, 29, 2016.
- Deep dynamics models for learning dexterous manipulation. In Conference on Robot Learning, pages 1101–1112. PMLR, 2020.
- Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. arXiv preprint arXiv:1810.01566, 2018.
- Deep visual heuristics: Learning feasibility of mixed-integer programs for manipulation planning. In 2020 IEEE international conference on robotics and automation (ICRA), pages 9563–9569. IEEE, 2020.
- Copa: General robotic manipulation through spatial constraints of parts with foundation models. arXiv preprint arXiv:2403.08248, 2024.
- Guided cost learning: Deep inverse optimal control via policy optimization. In International conference on machine learning, pages 49–58. PMLR, 2016.
- Learning robust rewards with adversarial inverse reinforcement learning. arXiv preprint arXiv:1710.11248, 2017.
- Differentiable mpc for end-to-end planning and control. Advances in neural information processing systems, 31, 2018.
- Correcting robot plans with natural language feedback. arXiv preprint arXiv:2204.05186, 2022.
- Saytap: Language to quadrupedal locomotion. 2023. URL https://saytap.github.io. https://saytap.github.io.
- Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
- Latte: Language trajectory transformer. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7287–7294. IEEE, 2023.
- Open-world object manipulation using pre-trained vision-language models. arXiv preprint arXiv:2303.00905, 2023.
- Octo: An open-source generalist robot policy. https://octo-models.github.io, 2023.
- Vima: General robot manipulation with multimodal prompts. In Fortieth International Conference on Machine Learning, 2023.
- Large language models as generalizable policies for embodied tasks. preprint, 2023.
- A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
- Robocat: A self-improving foundation agent for robotic manipulation. arXiv preprint arXiv:2306.11706, 2023.
- Interactive language: Talking to robots in real time. IEEE Robotics and Automation Letters, 2023.
- Implicit behavioral cloning. Conference on Robot Learning (CoRL), 2021.
- Transporter networks: Rearranging the visual world for robotic manipulation. Conference on Robot Learning (CoRL), 2020.
- Cliport: What and where pathways for robotic manipulation. In Proceedings of the 5th Conference on Robot Learning (CoRL), 2021.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In Proceedings of the 6th Conference on Robot Learning (CoRL), 2022.
- Diffusion policy: Visuomotor policy learning via action diffusion. In Proceedings of Robotics: Science and Systems (RSS), 2023.
- Playfusion: Skill acquisition via diffusion from language-annotated play. In Conference on Robot Learning, pages 2012–2029. PMLR, 2023.
- Efficient abstraction selection in reinforcement learning. Computational Intelligence, 30(4):657–699, 2014.
- Efficient skill learning using abstraction selection. In IJCAI, volume 9, pages 1107–1112, 2009.
- Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 512–519. IEEE, 2016.
- The unsurprising effectiveness of pre-trained vision models for control. In international conference on machine learning, pages 17359–17371. PMLR, 2022.
- Unsupervised state representation learning with robotic priors: a robustness benchmark. arXiv preprint arXiv:1709.05185, 2017.
- Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742, 2020.
- Autonomous reinforcement learning on raw visual input data in a real world application. In The 2012 international joint conference on neural networks (IJCNN), pages 1–8. IEEE, 2012.
- Curl: Contrastive unsupervised representations for reinforcement learning. In International conference on machine learning, pages 5639–5650. PMLR, 2020.
- Simple but effective: Clip embeddings for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838, 2022.
- R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
- Masked visual pre-training for motor control. arXiv preprint arXiv:2203.06173, 2022.
- Real-world robot learning with masked visual pre-training. In Conference on Robot Learning, pages 416–426. PMLR, 2023.
- Vip: Towards universal visual reward and representation via value-implicit pre-training. arXiv preprint arXiv:2210.00030, 2022.
- Language-driven representation learning for robotics. arXiv preprint arXiv:2302.12766, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Can foundation models perform zero-shot task specification for robot manipulation? In Learning for dynamics and control conference, pages 893–905. PMLR, 2022.
- Instruction-following agents with multimodal transformer. arXiv preprint arXiv:2210.13431, 2022.
- Drake: Model-based design and verification for robotics, 2019. URL https://drake.mit.edu.
- T. Wei and C. Liu. Safe control algorithms using energy functions: A uni ed framework, benchmark, and new directions. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 238–243. IEEE, 2019.
- Zero-shot transferable and persistently feasible safe control for high dimensional systems by consistent abstraction. In 2023 62nd IEEE Conference on Decision and Control (CDC), pages 8614–8619. IEEE, 2023.