Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

State Representations as Incentives for Reinforcement Learning Agents: A Sim2Real Analysis on Robotic Grasping (2309.11984v3)

Published 21 Sep 2023 in cs.RO and cs.AI

Abstract: Choosing an appropriate representation of the environment for the underlying decision-making process of the reinforcement learning agent is not always straightforward. The state representation should be inclusive enough to allow the agent to informatively decide on its actions and disentangled enough to simplify policy training and the corresponding sim2real transfer. Given this outlook, this work examines the effect of various representations in incentivizing the agent to solve a specific robotic task: antipodal and planar object grasping. A continuum of state representations is defined, starting from hand-crafted numerical states to encoded image-based representations, with decreasing levels of induced task-specific knowledge. The effects of each representation on the ability of the agent to solve the task in simulation and the transferability of the learned policy to the real robot are examined and compared against a model-based approach with complete system knowledge. The results show that reinforcement learning agents using numerical states can perform on par with non-learning baselines. Furthermore, we find that agents using image-based representations from pre-trained environment embedding vectors perform better than end-to-end trained agents, and hypothesize that separation of representation learning from reinforcement learning can benefit sim2real transfer. Finally, we conclude that incentivizing the state representation with task-specific knowledge facilitates faster convergence for agent training and increases success rates in sim2real robot control.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Y. Tassa, Y. Doron, A. Muldal, T. Erez, Y. Li, D. de Las Casas, D. Budden, A. Abdolmaleki, J. Merel, A. Lefrancq, T. P. Lillicrap, and M. A. Riedmiller, “Deepmind control suite,” CoRR, vol. abs/1801.00690, 2018. [Online]. Available: http://arxiv.org/abs/1801.00690
  2. T. Everitt, R. Carey, E. D. Langlois, P. A. Ortega, and S. Legg, “Agent incentives: A causal perspective,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 13, pp. 11 487–11 495, May 2021. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/17368
  3. T. Everitt, P. A. Ortega, E. Barnes, and S. Legg, “Understanding agent incentives using causal influence diagrams. part i: Single action settings,” 2022.
  4. A. Tirinzoni, M. Papini, A. Touati, A. Lazaric, and M. Pirotta, “Scalable representation learning in linear contextual bandits with constant regret guarantees,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35.   Curran Associates, Inc., 2022, pp. 2307–2319. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/0fd489e5e393f61b355be86ed4c24a54-Paper-Conference.pdf
  5. C. Dann, Y. Mansour, M. Mohri, A. Sekhari, and K. Sridharan, “Guarantees for epsilon-greedy reinforcement learning with function approximation,” in International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu, and S. Sabato, Eds., vol. 162.   PMLR, 2022, pp. 4666–4689. [Online]. Available: https://proceedings.mlr.press/v162/dann22a.html
  6. A. Agarwal, S. Kakade, A. Krishnamurthy, and W. Sun, “Flambe: Structural complexity and representation learning of low rank mdps,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33.   Curran Associates, Inc., 2020, pp. 20 095–20 107. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2020/file/e894d787e2fd6c133af47140aa156f00-Paper.pdf
  7. M. Uehara, X. Zhang, and W. Sun, “Representation learning for online and offline RL in low-rank MDPs,” in International Conference on Learning Representations, 2022. [Online]. Available: https://openreview.net/forum?id=J4iSIR9fhY0
  8. X. Zhang, Y. Song, M. Uehara, M. Wang, W. Sun, and A. Agarwal, “Efficient reinforcement learning in block mdps: A model-free representation learning approach,” in International Conference on Machine Learning, 2022.
  9. M. Laskin, K. Lee, A. Stooke, L. Pinto, P. Abbeel, and A. Srinivas, “Reinforcement learning with augmented data,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS’20.   Red Hook, NY, USA: Curran Associates Inc., 2020.
  10. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, J. Dy and A. Krause, Eds., vol. 80.   PMLR, 10–15 Jul 2018, pp. 1861–1870. [Online]. Available: https://proceedings.mlr.press/v80/haarnoja18b.html
  11. J. Josifovski, M. Malmir, N. Klarmann, B. L. Žagar, N. Navarro-Guerrero, and A. Knoll, “Analysis of randomization effects on sim2real transfer in reinforcement learning for robotic manipulation tasks,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 10 193–10 200.
  12. S. Parisi, A. Rajeswaran, S. Purushwalkam, and A. K. Gupta, “The unsurprising effectiveness of pre-trained vision models for control,” in International Conference on Machine Learning, 2022.
  13. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition.   Ieee, 2009, pp. 248–255.
  14. S. Joshi, S. Kumra, and F. Sahin, “Robotic grasping using deep reinforcement learning,” in 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), 2020, pp. 1461–1466.
  15. M. Breyer, F. Furrer, T. Novkovic, R. Y. Siegwart, and J. I. Nieto, “Flexible robotic grasping with sim-to-real transfer based reinforcement learning,” ArXiv, vol. abs/1803.04996, 2018.
  16. A. Nair, S. Bahl, A. Khazatsky, V. Pong, G. Berseth, and S. Levine, “Contextual imagined goals for self-supervised robotic learning,” in Conference on Robot Learning (CoRL), 2019.
  17. A. Zhan, R. Zhao, L. Pinto, P. Abbeel, and M. Laskin, “Learning visual robotic control efficiently with contrastive pre-training and data augmentation,” 2022.
  18. C. S. Wickramasinghe, D. L. Marino, and M. Manic, “Resnet autoencoders for unsupervised feature learning from high-dimensional data: Deep models resistant to performance degradation,” IEEE Access, vol. 9, pp. 40 511–40 520, 2021.
  19. S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334–1373, 2016.
  20. P. Florence, L. Manuelli, and R. Tedrake, “Self-supervised correspondence in visuomotor policy learning,” IEEE Robotics and Automation Letters, vol. PP, pp. 1–1, 11 2019.
  21. L. Berscheid and T. Kröger, “Jerk-limited real-time trajectory generation with arbitrary target states,” Robotics: Science and Systems XVII, 2021.
  22. W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: a survey,” in 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020, pp. 737–744.
  23. M. Malmir, J. Josifovski, N. Klarmann, and A. Knoll, “Robust sim2real transfer by learning inverse dynamics of simulated systems,” in 2nd Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, 2020.
  24. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 23–30.
  25. J. Josifovski, M. Malmir, N. Klarmann, and A. Knoll, “Continual Learning on Incremental Simulations for Real-World Robotic Manipulation Tasks,” in 2nd R:SS Workshop on Closing the Reality Gap in Sim2Real Transfer for Robotics, Corvallis, OR, USA, 2020, p. 3.
  26. “Unity 3d,” https://unity.com/, accessed: 2020-6-26.
  27. J. Lee, M. X. Grey, S. Ha, T. Kunz, S. Jain, Y. Ye, S. S. Srinivasa, M. Stilman, and C. K. Liu, “Dart: Dynamic animation and robotics toolkit,” Journal of Open Source Software, vol. 3, no. 22, p. 500, 2018. [Online]. Available: https://doi.org/10.21105/joss.00500
  28. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI Gym,” arXiv:1606.01540 [cs], 2016.
  29. “Kuka lbr-iiwa,” https://www.kuka.com/products/robot-systems/industrial-robots/lbr-iiwa, accessed: 2020-6-26.
  30. “robotiq: 32f-85 robot gripper,” https://robotiq.com/products/2f85-140-adaptive-robot-gripper.
  31. “Ros industrial,” https://github.com/ros-industrial/kuka_experimental, accessed: 2022-2-23.
  32. “Robotiq packages,” https://github.com/ros-industrial/robotiq, accessed: 2013-10-13.
  33. C. Hennersperger, B. Fuerst, S. Virga, O. Zettinig, B. Frisch, T. Neff, and N. Navab, “Towards MRI-Based Autonomous Robotic Us Acquisitions: A First Feasibility Study,” IEEE Transactions on Medical Imaging, vol. 36, no. 2, pp. 538–548, 2017.
  34. P. Fankhauser, M. Bloesch, D. Rodriguez, R. Kaestner, M. Hutter, and R. Siegwart, “Kinect v2 for mobile robot navigation: Evaluation and modeling,” in 2015 international conference on advanced robotics (ICAR).   IEEE, 2015, pp. 388–394.
  35. T. Wiedemeyer, “IAI Kinect2,” https://github.com/code-iai/iai_kinect2, Institute for Artificial Intelligence, University Bremen, 2014 – 2015, accessed June 12, 2015.
  36. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017.
  37. A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, and N. Dormann, “Stable-baselines3: Reliable reinforcement learning implementations,” Journal of Machine Learning Research, vol. 22, no. 268, pp. 1–8, 2021. [Online]. Available: http://jmlr.org/papers/v22/20-1364.html
  38. T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019.
  39. M. Dalal, D. Pathak, and R. Salakhutdinov, “Accelerating robotic reinforcement learning via parameterized action primitives,” in NeurIPS, 2021.

Summary

We haven't generated a summary for this paper yet.