ASID: Active Exploration for System Identification in Robotic Manipulation (2404.12308v2)
Abstract: Model-free control strategies such as reinforcement learning have shown the ability to learn control strategies without requiring an accurate model or simulator of the world. While this is appealing due to the lack of modeling requirements, such methods can be sample inefficient, making them impractical in many real-world domains. On the other hand, model-based control techniques leveraging accurate simulators can circumvent these challenges and use a large amount of cheap simulation data to learn controllers that can effectively transfer to the real world. The challenge with such model-based techniques is the requirement for an extremely accurate simulation, requiring both the specification of appropriate simulation assets and physical parameters. This requires considerable human effort to design for every environment being considered. In this work, we propose a learning system that can leverage a small amount of real-world data to autonomously refine a simulation model and then plan an accurate control strategy that can be deployed in the real world. Our approach critically relies on utilizing an initial (possibly inaccurate) simulator to design effective exploration policies that, when deployed in the real world, collect high-quality data. We demonstrate the efficacy of this paradigm in identifying articulation, mass, and other physical parameters in several challenging robotic manipulation tasks, and illustrate that only a small amount of real-world data can allow for effective sim-to-real transfer. Project website at https://weirdlabuw.github.io/asid
- System identification—a survey. Automatica, 1971.
- Optimal experiment design for open and closed-loop system identification. Communications in Information and Systems, 11(3):197–224, 2011.
- Offline multi-task transfer rl with representational penalization. arXiv preprint arXiv:2402.12570, 2024.
- Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In ICRA, 2019.
- Urdformer: Constructing interactive realistic scenes from real images via simulation and generative modeling. In Towards Generalist Robots: Learning Paradigms for Scalable Skill Acquisition@ CoRL2023, 2023.
- Visual dexterity: In-hand dexterous manipulation from depth. arXiv preprint arXiv:2211.11744, 2022.
- Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in neural information processing systems, 31, 2018.
- Pilco: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on machine learning (ICML-11), pp. 465–472, 2011.
- Ar2-d2: Training a robot without a robot. In 7th Annual Conference on Robot Learning, 2023.
- A 2-stage framework for learning to push unknown objects. In Joint IEEE 10th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), 2020.
- Estimating the center of mass of an unknown object for nonprehensile manipulation. In IEEE International Conference on Mechatronics and Automation (ICMA), 2022.
- Adaptive input design in system identification. In Proceedings of the 44th IEEE Conference on Decision and Control, pp. 4988–4993. IEEE, 2005.
- Identification of arx systems with non-stationary inputs—asymptotic analysis with application to adaptive input design. Automatica, 45(3):623–633, 2009.
- Identification and the information matrix: how to get just sufficiently rich? IEEE Transactions on Automatic Control, 54(ARTICLE):2828–2840, 2009.
- Dynamic system identification: experiment design and data analysis. Academic press, 1977.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
- Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
- Robust and adaptive excitation signal generation for input and output constrained systems. In 2013 European Control Conference (ECC), pp. 1416–1421. IEEE, 2013.
- Dextreme: Transfer of agile in-hand manipulation from simulation to reality. arXiv preprint arXiv:2210.13702, 2022.
- For model-based control design, closed-loop identification gives better performance. Automatica, 32(12):1659–1673, 1996.
- Ditto in the house: Building articulation models of indoor scenes through interactive perception. In ICRA, 2023.
- What went wrong? closing the sim-to-real gap via differentiable causal discovery. In Conference on Robot Learning, pp. 734–760. PMLR, 2023.
- Learning agile and dynamic motor skills for legged robots. Science Robotics, 2019.
- When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 32, 2019.
- Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991, 2022.
- Ditto: Building digital twins of articulated objects from interaction. In CVPR, 2022.
- Never stop learning: The effectiveness of fine-tuning in robotic reinforcement learning. In CoRL, 2021.
- Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034, 2021.
- Estimating mass distribution of articulated objects using non-prehensile manipulation. arXiv preprint arXiv:1907.03964, 2019.
- Learning active task-oriented exploration policies for bridging the sim-to-real gap. Robotics science and systems, 2020.
- Identification for control: Adaptive input design using convex optimization. In Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No. 01CH37228), volume 5, pp. 4326–4331. IEEE, 2001.
- Lennart Ljung. System identification. Springer, 1998.
- Sim2real22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT: Actively building explicit physics model for precise articulated object manipulation. In ICRA, 2023.
- Ian R Manchester. Input design for system identification via convex relaxation. In 49th IEEE Conference on Decision and Control (CDC), pp. 2041–2046. IEEE, 2010.
- Active learning for nonlinear system identification with guarantees. J. Mach. Learn. Res., 23:32–1, 2022.
- Learning physically grounded robot vision with active sensing motor policies. In CoRL, 2023.
- Estimating an object’s inertial parameters by robotic pushing: a data-driven approach. In IROS, 2020.
- Raman Mehra. Optimal input signals for parameter estimation in dynamic systems–survey and new results. IEEE Transactions on Automatic Control, 19(6):753–768, 1974.
- Raman K Mehra. Synthesis of optimal inputs for multiinput-multioutput (mimo) systems with process noise part i: Frequenc y-domain synthesis part ii: Time-domain synthesis. In Mathematics in Science and Engineering, volume 126, pp. 211–249. Elsevier, 1976.
- Active domain randomization. In CoRL, 2020.
- Dimensionality reduction and prioritized exploration for policy search. In International Conference on Artificial Intelligence and Statistics, 2022.
- Scalable identification of partially observed systems with certainty-equivalent em. In ICML. PMLR, 2020.
- Where2act: From pixels to actions for articulated 3d objects. In CVPR, 2021.
- Assessing transferability from simulation to reality for reinforcement learning. IEEE transactions on pattern analysis and machine intelligence, 2019.
- Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In 2018 IEEE international conference on robotics and automation (ICRA), pp. 7559–7566. IEEE, 2018.
- Cal-QL: Calibrated offline RL pre-training for efficient online fine-tuning. In Workshop on Reincarnating Reinforcement Learning at ICLR, 2023.
- Structure from action: Learning interactions for articulated object 3d structure discovery. arXiv preprint arXiv:2207.08997, 2022.
- Learning dexterous in-hand manipulation. arXiv preprint arXiv:1808.00177, 2018.
- Self-supervised exploration via disagreement. In International conference on machine learning, pp. 5062–5071. PMLR, 2019.
- Relative entropy policy search. In AAAI, 2010.
- Design of experiments in nonlinear models. Lecture notes in statistics, 212(1), 2013.
- Friedrich Pukelsheim. Optimal design of experiments. SIAM, 2006.
- In-hand object rotation via rapid motor adaptation. In CoRL, 2023.
- Bayessim: adaptive domain randomization via probabilistic inference for robotics simulators. arXiv preprint arXiv:1906.01728, 2019.
- Adaptsim: Task-driven simulation adaptation for sim-to-real transfer. arXiv preprint arXiv:2302.04903, 2023.
- Adaptive-control-oriented meta-learning for nonlinear systems. In Robotics science and systems, 2021.
- Robust optimal experiment design for system identification. Automatica, 43(6):993–1008, 2007.
- Robustness in experiment design. IEEE Transactions on Automatic Control, 57(4):860–874, 2011.
- Learning to walk in minutes using massively parallel deep reinforcement learning. In CoRL, 2022.
- Cad2rl: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201, 2016.
- System identification of nonlinear state-space models. Automatica, 2011.
- Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017a. URL http://arxiv.org/abs/1707.06347.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017b.
- Model-based active exploration. In ICML, 2019.
- Adaptive robust model predictive control with matched and unmatched uncertainty. In 2022 American Control Conference (ACC), 2022.
- Legged robots that keep on learning: Fine-tuning locomotion policies in the real world. In 2022 International Conference on Robotics and Automation (ICRA), pp. 1593–1599. IEEE, 2022.
- System identification. Prentice-Hall International, 1989.
- Industreal: Transferring contact-rich assembly tasks from simulation to reality. arXiv preprint arXiv:2305.17110, 2023.
- Domain randomization for transferring deep neural networks from simulation to the real world. In IROS, 2017.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, 2012.
- Reconciling reality through simulation: A real-to-sim-to-real approach for robust manipulation. Arxiv, 2024.
- Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
- Active learning for identification of linear dynamical systems. In Conference on Learning Theory, pp. 3487–3582. PMLR, 2020.
- Optimal exploration for model-based rl in nonlinear systems. arXiv preprint arXiv:2306.09210, 2023.
- Task-optimal exploration in linear dynamical systems. In International Conference on Machine Learning, pp. 10641–10652. PMLR, 2021.
- Adaafford: Learning to adapt manipulation affordance for 3d articulated objects via few-shot interactions. In ECCV, 2022.
- Information theoretic mpc for model-based reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pp. 1714–1721. IEEE, 2017.
- Densephysnet: Learning dense physical object representations via multi-step dynamic interactions. In RSS, 2019.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pp. 1094–1100. PMLR, 2020.
- Repo: Resilient model-based reinforcement learning by regularizing posterior predictability. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=OIJ3VXDy6s.
- Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost. In 2019 International Conference on Robotics and Automation (ICRA), pp. 3651–3657. IEEE, 2019.
- Fast model identification via physics engines for data-efficient policy search. In IJCAI, 2018.