Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiAReL: Reinforcement Learning with Disturbance Awareness for Robust Sim2Real Policy Transfer in Robot Control (2306.09010v1)

Published 15 Jun 2023 in cs.RO, cs.LG, cs.SY, and eess.SY

Abstract: Delayed Markov decision processes fulfill the Markov property by augmenting the state space of agents with a finite time window of recently committed actions. In reliance with these state augmentations, delay-resolved reinforcement learning algorithms train policies to learn optimal interactions with environments featured with observation or action delays. Although such methods can directly be trained on the real robots, due to sample inefficiency, limited resources or safety constraints, a common approach is to transfer models trained in simulation to the physical robot. However, robotic simulations rely on approximated models of the physical systems, which hinders the sim2real transfer. In this work, we consider various uncertainties in the modelling of the robot's dynamics as unknown intrinsic disturbances applied on the system input. We introduce a disturbance-augmented Markov decision process in delayed settings as a novel representation to incorporate disturbance estimation in training on-policy reinforcement learning algorithms. The proposed method is validated across several metrics on learning a robotic reaching task and compared with disturbance-unaware baselines. The results show that the disturbance-augmented models can achieve higher stabilization and robustness in the control response, which in turn improves the prospects of successful sim2real transfer.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), K. Lynch and I. I. C. o. R. a. Automation, Eds.   [Piscataway, NJ]: IEEE, 2018, pp. 3803–3810.
  2. S. Levine, N. Wagener, and P. Abbeel, “Learning contact-rich manipulation skills with guided policy search,” S. Levine, 2015.
  3. S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection.” [Online]. Available: https://arxiv.org/pdf/1603.02199
  4. S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies.” [Online]. Available: https://arxiv.org/pdf/1504.00702
  5. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world.” [Online]. Available: https://arxiv.org/pdf/1703.06907
  6. J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai, D. Hafner, S. Bohez, and V. Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots.” [Online]. Available: https://arxiv.org/pdf/1804.10332
  7. R. Julian, E. Heiden, Z. He, H. Zhang, S. Schaal, J. J. Lim, G. Sukhatme, and K. Hausman, “Scaling simulation-to-real transfer by learning composable robot skills.” [Online]. Available: https://arxiv.org/pdf/1809.10253
  8. F. Golemo, A. A. Taiga, A. Courville, and P.-Y. Oudeyer, “Sim-to-real transfer with neural-augmented robot simulation,” Conference on Robot Learning, pp. 817–828, 2018. [Online]. Available: http://proceedings.mlr.press/v87/golemo18a.html
  9. P. Christiano, Z. Shah, I. Mordatch, J. Schneider, T. Blackwell, J. Tobin, P. Abbeel, and W. Zaremba, “Transfer from simulation to real world through learning deep inverse dynamics model.” [Online]. Available: https://arxiv.org/pdf/1610.03518
  10. M. Oliva, S. Banik, J. Josifovski, and A. Knoll, “Graph neural networks for relational inductive bias in vision-based deep reinforcement learning of robot control,” 2022. [Online]. Available: https://arxiv.org/abs/2203.05985
  11. A. A. Rusu, M. Večerík, T. Rothörl, N. Heess, R. Pascanu, and R. Hadsell, “Sim-to-real robot learning from pixels with progressive nets,” Conference on Robot Learning, pp. 262–270, 2017. [Online]. Available: http://proceedings.mlr.press/v78/rusu17a.html
  12. C. Tessler, Y. Efroni, and S. Mannor, “Action robust reinforcement learning and applications in continuous control.” [Online]. Available: https://arxiv.org/pdf/1901.09184
  13. C. Packer, K. Gao, J. Kos, P. Krähenbühl, V. Koltun, and D. Song, “Assessing generalization in deep reinforcement learning.” [Online]. Available: https://arxiv.org/pdf/1810.12282
  14. M. A. Abdullah, H. Ren, H. B. Ammar, V. Milenkovic, R. Luo, M. Zhang, and J. Wang, “Wasserstein robust reinforcement learning.” [Online]. Available: http://arxiv.org/pdf/1907.13196v4
  15. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning.” [Online]. Available: https://arxiv.org/pdf/1509.02971
  16. N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. Riedmiller, and D. Silver, “Emergence of locomotion behaviours in rich environments.” [Online]. Available: https://arxiv.org/pdf/1707.02286
  17. M. NEUNERT, T. BOAVENTURA, and J. BUCHLI, “Why off-the-shelf physics simulators fail in evaluating feedback controller performance - a case study for quadrupedal robots,” in Advances in Cooperative Robotics, M. O. Tokhi and G. S. Virk, Eds.   New Jersey: WORLD SCIENTIFIC, 2017, pp. 464–472.
  18. S. Zhu, A. Kimmel, K. E. Bekris, and A. Boularias, “Fast model identification via physics engines for data-efficient policy search.” [Online]. Available: https://arxiv.org/pdf/1710.08893
  19. W. Yu, J. Tan, C. K. Liu, and G. Turk, “Preparing for the unknown: Learning a universal policy with online system identification.” [Online]. Available: https://arxiv.org/pdf/1702.02453
  20. S. James and E. Johns, “3d simulation for robot arm control with deep q-learning.” [Online]. Available: https://arxiv.org/pdf/1609.03759
  21. J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “Decaf: A deep convolutional activation feature for generic visual recognition.” [Online]. Available: https://arxiv.org/pdf/1310.1531
  22. E. Tzeng, C. Devin, J. Hoffman, C. Finn, P. Abbeel, S. Levine, K. Saenko, and T. Darrell, “Adapting deep visuomotor representations with weak pairwise constraints.” [Online]. Available: https://arxiv.org/pdf/1511.07111
  23. K. Fang, Y. Bai, S. Hinterstoisser, S. Savarese, and M. Kalakrishnan, “Multi-task domain adaptation for deep learning of instance grasping from simulation.” [Online]. Available: http://arxiv.org/pdf/1710.06422v2
  24. K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M. Kelcey, M. Kalakrishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige, S. Levine, and V. Vanhoucke, “Using simulation and domain adaptation to improve efficiency of deep robotic grasping.” [Online]. Available: http://arxiv.org/pdf/1709.07857v2
  25. J. Josifovski, M. Malmir, N. Klarmann, and A. Knoll, “Continual Learning on Incremental Simulations for Real-World Robotic Manipulation Tasks,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, 2020, p. 3.
  26. J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V. Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization.” [Online]. Available: https://arxiv.org/pdf/1804.06516
  27. F. Sadeghi and S. Levine, “Cad2rl: Real single-image flight without a single real image.” [Online]. Available: http://arxiv.org/pdf/1611.04201v4
  28. A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine, “Epopt: Learning robust neural network policies using model ensembles.” [Online]. Available: https://arxiv.org/pdf/1610.01283
  29. I. Mordatch, K. Lowrey, and E. Todorov, “Ensemble-cio: Full-body dynamic motion planning that transfers to physical humanoids,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), W. Burgard, Ed.   Piscataway, NJ: IEEE, 2015, pp. 5307–5314.
  30. R. Antonova, S. Cruciani, C. Smith, and D. Kragic, “Reinforcement learning for pivoting task.” [Online]. Available: http://arxiv.org/pdf/1703.00472v1
  31. C. R. Glossop, J. Panerati, A. Krishnan, Z. Yuan, and A. P. Schoellig, “Characterising the robustness of reinforcement learning for continuous control using disturbance injection,” 2022. [Online]. Available: https://arxiv.org/abs/2210.15199
  32. K. Katsikopoulos and S. Engelbrecht, “Markov decision processes with delays and asynchronous cost collection,” IEEE Transactions on Automatic Control, vol. 48, no. 4, pp. 568–574, 2003.
  33. A. Rupam Mahmood, D. Korenkevych, B. J. Komer, and J. Bergstra, “Setting up a reinforcement learning task with a real-world robot,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018, pp. 4635–4640.
  34. S. P. Singh, T. Jaakkola, and M. I. Jordan, “Learning without state-estimation in partially observable markovian decision processes,” in Machine Learning Proceedings 1994, W. W. Cohen and H. Hirsh, Eds.   San Francisco (CA): Morgan Kaufmann, 1994, pp. 284–292.
  35. T. J. Walsh, A. Nouri, L. Li, and M. L. Littman, “Learning and planning in environments with delayed feedback,” Autonomous Agents and Multi-Agent Systems, vol. 18, no. 1, pp. 83–105, 2009. [Online]. Available: https://doi.org/10.1007/s10458-008-9056-7
  36. E. Schuitema, L. Buşoniu, R. Babuška, and P. Jonker, “Control delay in reinforcement learning for real-time dynamic systems: A memoryless approach,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 3226–3231.
  37. V. Firoiu, T. Ju, and J. Tenenbaum, “At human speed: Deep reinforcement learning with action delay,” 2018. [Online]. Available: https://arxiv.org/abs/1810.07286
  38. B. Chen, M. Xu, L. Li, and D. Zhao, “Delay-aware model-based reinforcement learning for continuous control,” Neurocomputing, vol. 450, pp. 119–128, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0925231221005427
  39. S. Ramstedt and C. Pal, “Real-time reinforcement learning,” in Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32.   Curran Associates, Inc., 2019. [Online]. Available: https://proceedings.neurips.cc/paper/2019/file/54e36c5ff5f6a1802925ca009f3ebb68-Paper.pdf
  40. Y. Bouteiller, S. Ramstedt, G. Beltrame, C. Pal, and J. Binas, “Reinforcement learning with random delays,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=QFYnKlBJYR
  41. S. Nath, M. Baranwal, and H. Khadilkar, “Revisiting state augmentation methods for reinforcement learning with stochastic delays,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, ser. CIKM ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 1346–1355. [Online]. Available: https://doi.org/10.1145/3459637.3482386
  42. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms.” [Online]. Available: https://arxiv.org/pdf/1707.06347
  43. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80.   PMLR, 10–15 Jul 2018, pp. 1861–1870. [Online]. Available: https://proceedings.mlr.press/v80/haarnoja18b.html
  44. J. Josifovski, M. Malmir, N. Klarmann, B. L. Žagar, N. Navarro-Guerrero, and A. Knoll, “Analysis of randomization effects on sim2real transfer in reinforcement learning for robotic manipulation tasks,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 10 193–10 200.
  45. L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, “Robust adversarial reinforcement learning,” International Conference on Machine Learning, pp. 2817–2826, 2017. [Online]. Available: http://proceedings.mlr.press/v70/pinto17a.html
  46. J. W. Kim, H. Shim, and I. Yang, “On improving the robustness of reinforcement learning-based controllers using disturbance observer,” in 2019 IEEE 58th Conference on Decision and Control (CDC).   [Piscataway, NJ]: IEEE, 2019, pp. 847–852.
  47. M. Malmir, J. Josifovski, N. Klarmann, and A. Knoll, “Robust Sim2Real Transfer by Learning Inverse Dynamics of Simulated Systems,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, 2020, p. 3.
  48. J. Lee, M. X. Grey, S. Ha, T. Kunz, S. Jain, Y. Ye, S. S. Srinivasa, M. Stilman, and C. K. Liu, “DART: Dynamic animation and robotics toolkit,” The Journal of Open Source Software, vol. 3, no. 22, p. 500, Feb 2018. [Online]. Available: https://doi.org/10.21105/joss.00500
  49. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI Gym,” arXiv:1606.01540 [cs], 2016.
  50. “Kuka lbr-iiwa,” https://www.kuka.com/products/robot-systems/industrial-robots/lbr-iiwa, accessed: 2023-2-28.
  51. “Ros industrial,” https://github.com/ros-industrial/kuka_experimental, accessed: 2023-2-28.
  52. C. Hennersperger, B. Fuerst, S. Virga, O. Zettinig, B. Frisch, T. Neff, and N. Navab, “Towards MRI-Based Autonomous Robotic Us Acquisitions: A First Feasibility Study,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 538–548, 2017.
  53. A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-Baselines3: Reliable Reinforcement Learning Implementations,” Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021.
  54. J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” 2018.
  55. K. Katyal, I.-J. Wang, and P. Burlina, “Leveraging deep reinforcement learning for reaching robotic tasks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 490–491.
Citations (1)

Summary

We haven't generated a summary for this paper yet.