DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control (2306.09010v1)
Abstract: Delayed Markov decision processes fulfill the Markov property by augmenting the state space of agents with a finite time window of recently committed actions. In reliance with these state augmentations, delay-resolved reinforcement learning algorithms train policies to learn optimal interactions with environments featured with observation or action delays. Although such methods can directly be trained on the real robots, due to sample inefficiency, limited resources or safety constraints, a common approach is to transfer models trained in simulation to the physical robot. However, robotic simulations rely on approximated models of the physical systems, which hinders the sim2real transfer. In this work, we consider various uncertainties in the modelling of the robot's dynamics as unknown intrinsic disturbances applied on the system input. We introduce a disturbance-augmented Markov decision process in delayed settings as a novel representation to incorporate disturbance estimation in training on-policy reinforcement learning algorithms. The proposed method is validated across several metrics on learning a robotic reaching task and compared with disturbance-unaware baselines. The results show that the disturbance-augmented models can achieve higher stabilization and robustness in the control response, which in turn improves the prospects of successful sim2real transfer.
- X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), K. Lynch and I. I. C. o. R. a. Automation, Eds. [Piscataway, NJ]: IEEE, 2018, pp. 3803–3810.
- S. Levine, N. Wagener, and P. Abbeel, “Learning contact-rich manipulation skills with guided policy search,” S. Levine, 2015.
- S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.” [Online]. Available: https://arxiv.org/pdf/1603.02199
- S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies.” [Online]. Available: https://arxiv.org/pdf/1504.00702
- J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world.” [Online]. Available: https://arxiv.org/pdf/1703.06907
- J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai, D. Hafner, S. Bohez, and V. Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots.” [Online]. Available: https://arxiv.org/pdf/1804.10332
- R. Julian, E. Heiden, Z. He, H. Zhang, S. Schaal, J. J. Lim, G. Sukhatme, and K. Hausman, “Scaling simulation-to-real transfer by learning composable robot skills.” [Online]. Available: https://arxiv.org/pdf/1809.10253
- F. Golemo, A. A. Taiga, A. Courville, and P.-Y. Oudeyer, “Sim-to-real transfer with neural-augmented robot simulation,” Conference on Robot Learning, pp. 817–828, 2018. [Online]. Available: http://proceedings.mlr.press/v87/golemo18a.html
- P. Christiano, Z. Shah, I. Mordatch, J. Schneider, T. Blackwell, J. Tobin, P. Abbeel, and W. Zaremba, “Transfer from simulation to real world through learning deep inverse dynamics model.” [Online]. Available: https://arxiv.org/pdf/1610.03518
- M. Oliva, S. Banik, J. Josifovski, and A. Knoll, “Graph neural networks for relational inductive bias in vision-based deep reinforcement learning of robot control,” 2022. [Online]. Available: https://arxiv.org/abs/2203.05985
- A. A. Rusu, M. Večerík, T. Rothörl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-real robot learning from pixels with progressive nets,” Conference on Robot Learning, pp. 262–270, 2017. [Online]. Available: http://proceedings.mlr.press/v78/rusu17a.html
- C. Tessler, Y. Efroni, and S. Mannor, “Action robust reinforcement learning and applications in continuous control.” [Online]. Available: https://arxiv.org/pdf/1901.09184
- C. Packer, K. Gao, J. Kos, P. Krähenbühl, V. Koltun, and D. Song, “Assessing generalization in deep reinforcement learning.” [Online]. Available: https://arxiv.org/pdf/1810.12282
- M. A. Abdullah, H. Ren, H. B. Ammar, V. Milenkovic, R. Luo, M. Zhang, and J. Wang, “Wasserstein robust reinforcement learning.” [Online]. Available: http://arxiv.org/pdf/1907.13196v4
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning.” [Online]. Available: https://arxiv.org/pdf/1509.02971
- N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. Riedmiller, and D. Silver, “Emergence of locomotion behaviours in rich environments.” [Online]. Available: https://arxiv.org/pdf/1707.02286
- M. NEUNERT, T. BOAVENTURA, and J. BUCHLI, “Why off-the-shelf physics simulators fail in evaluating feedback controller performance - a case study for quadrupedal robots,” in Advances in Cooperative Robotics, M. O. Tokhi and G. S. Virk, Eds. New Jersey: WORLD SCIENTIFIC, 2017, pp. 464–472.
- S. Zhu, A. Kimmel, K. E. Bekris, and A. Boularias, “Fast model identification via physics engines for data-efficient policy search.” [Online]. Available: https://arxiv.org/pdf/1710.08893
- W. Yu, J. Tan, C. K. Liu, and G. Turk, “Preparing for the unknown: Learning a universal policy with online system identification.” [Online]. Available: https://arxiv.org/pdf/1702.02453
- S. James and E. Johns, “3d simulation for robot arm control with deep q-learning.” [Online]. Available: https://arxiv.org/pdf/1609.03759
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition.” [Online]. Available: https://arxiv.org/pdf/1310.1531
- E. Tzeng, C. Devin, J. Hoffman, C. Finn, P. Abbeel, S. Levine, K. Saenko, and T. Darrell, “Adapting deep visuomotor representations with weak pairwise constraints.” [Online]. Available: https://arxiv.org/pdf/1511.07111
- K. Fang, Y. Bai, S. Hinterstoisser, S. Savarese, and M. Kalakrishnan, “Multi-task domain adaptation for deep learning of instance grasping from simulation.” [Online]. Available: http://arxiv.org/pdf/1710.06422v2
- K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige, S. Levine, and V. Vanhoucke, “Using simulation and domain adaptation to improve efficiency of deep robotic grasping.” [Online]. Available: http://arxiv.org/pdf/1709.07857v2
- J. Josifovski, M. Malmir, N. Klarmann, and A. Knoll, “Continual Learning on Incremental Simulations for Real-World Robotic Manipulation Tasks,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, 2020, p. 3.
- J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V. Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization.” [Online]. Available: https://arxiv.org/pdf/1804.06516
- F. Sadeghi and S. Levine, “Cad2rl: Real single-image flight without a single real image.” [Online]. Available: http://arxiv.org/pdf/1611.04201v4
- A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine, “Epopt: Learning robust neural network policies using model ensembles.” [Online]. Available: https://arxiv.org/pdf/1610.01283
- I. Mordatch, K. Lowrey, and E. Todorov, “Ensemble-cio: Full-body dynamic motion planning that transfers to physical humanoids,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), W. Burgard, Ed. Piscataway, NJ: IEEE, 2015, pp. 5307–5314.
- R. Antonova, S. Cruciani, C. Smith, and D. Kragic, “Reinforcement learning for pivoting task.” [Online]. Available: http://arxiv.org/pdf/1703.00472v1
- C. R. Glossop, J. Panerati, A. Krishnan, Z. Yuan, and A. P. Schoellig, “Characterising the robustness of reinforcement learning for continuous control using disturbance injection,” 2022. [Online]. Available: https://arxiv.org/abs/2210.15199
- K. Katsikopoulos and S. Engelbrecht, “Markov decision processes with delays and asynchronous cost collection,” IEEE Transactions on Automatic Control, vol. 48, no. 4, pp. 568–574, 2003.
- A. Rupam Mahmood, D. Korenkevych, B. J. Komer, and J. Bergstra, “Setting up a reinforcement learning task with a real-world robot,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 4635–4640.
- S. P. Singh, T. Jaakkola, and M. I. Jordan, “Learning without state-estimation in partially observable markovian decision processes,” in Machine Learning Proceedings 1994, W. W. Cohen and H. Hirsh, Eds. San Francisco (CA): Morgan Kaufmann, 1994, pp. 284–292.
- T. J. Walsh, A. Nouri, L. Li, and M. L. Littman, “Learning and planning in environments with delayed feedback,” Autonomous Agents and Multi-Agent Systems, vol. 18, no. 1, pp. 83–105, 2009. [Online]. Available: https://doi.org/10.1007/s10458-008-9056-7
- E. Schuitema, L. Buşoniu, R. Babuška, and P. Jonker, “Control delay in reinforcement learning for real-time dynamic systems: A memoryless approach,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 3226–3231.
- V. Firoiu, T. Ju, and J. Tenenbaum, “At human speed: Deep reinforcement learning with action delay,” 2018. [Online]. Available: https://arxiv.org/abs/1810.07286
- B. Chen, M. Xu, L. Li, and D. Zhao, “Delay-aware model-based reinforcement learning for continuous control,” Neurocomputing, vol. 450, pp. 119–128, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231221005427
- S. Ramstedt and C. Pal, “Real-time reinforcement learning,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/file/54e36c5ff5f6a1802925ca009f3ebb68-Paper.pdf
- Y. Bouteiller, S. Ramstedt, G. Beltrame, C. Pal, and J. Binas, “Reinforcement learning with random delays,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=QFYnKlBJYR
- S. Nath, M. Baranwal, and H. Khadilkar, “Revisiting state augmentation methods for reinforcement learning with stochastic delays,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, ser. CIKM ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 1346–1355. [Online]. Available: https://doi.org/10.1145/3459637.3482386
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms.” [Online]. Available: https://arxiv.org/pdf/1707.06347
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80. PMLR, 10–15 Jul 2018, pp. 1861–1870. [Online]. Available: https://proceedings.mlr.press/v80/haarnoja18b.html
- J. Josifovski, M. Malmir, N. Klarmann, B. L. Žagar, N. Navarro-Guerrero, and A. Knoll, “Analysis of randomization effects on sim2real transfer in reinforcement learning for robotic manipulation tasks,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 10 193–10 200.
- L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarial reinforcement learning,” International Conference on Machine Learning, pp. 2817–2826, 2017. [Online]. Available: http://proceedings.mlr.press/v70/pinto17a.html
- J. W. Kim, H. Shim, and I. Yang, “On improving the robustness of reinforcement learning-based controllers using disturbance observer,” in 2019 IEEE 58th Conference on Decision and Control (CDC). [Piscataway, NJ]: IEEE, 2019, pp. 847–852.
- M. Malmir, J. Josifovski, N. Klarmann, and A. Knoll, “Robust Sim2Real Transfer by Learning Inverse Dynamics of Simulated Systems,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, 2020, p. 3.
- J. Lee, M. X. Grey, S. Ha, T. Kunz, S. Jain, Y. Ye, S. S. Srinivasa, M. Stilman, and C. K. Liu, “DART: Dynamic animation and robotics toolkit,” The Journal of Open Source Software, vol. 3, no. 22, p. 500, Feb 2018. [Online]. Available: https://doi.org/10.21105/joss.00500
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI Gym,” arXiv:1606.01540 [cs], 2016.
- “Kuka lbr-iiwa,” https://www.kuka.com/products/robot-systems/industrial-robots/lbr-iiwa, accessed: 2023-2-28.
- “Ros industrial,” https://github.com/ros-industrial/kuka_experimental, accessed: 2023-2-28.
- C. Hennersperger, B. Fuerst, S. Virga, O. Zettinig, B. Frisch, T. Neff, and N. Navab, “Towards MRI-Based Autonomous Robotic Us Acquisitions: A First Feasibility Study,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 538–548, 2017.
- A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-Baselines3: Reliable Reinforcement Learning Implementations,” Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021.
- J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” 2018.
- K. Katyal, I.-J. Wang, and P. Burlina, “Leveraging deep reinforcement learning for reaching robotic tasks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 490–491.