A comparison of RL-based and PID controllers for 6-DOF swimming robots: hybrid underwater object tracking (2401.16618v1)
Abstract: In this paper, we present an exploration and assessment of employing a centralized deep Q-network (DQN) controller as a substitute for the prevalent use of PID controllers in the context of 6DOF swimming robots. Our primary focus centers on illustrating this transition with the specific case of underwater object tracking. DQN offers advantages such as data efficiency and off-policy learning, while remaining simpler to implement than other reinforcement learning methods. Given the absence of a dynamic model for our robot, we propose an RL agent to control this multi-input-multi-output (MIMO) system, where a centralized controller may offer more robust control than distinct PIDs. Our approach involves initially using classical controllers for safe exploration, then gradually shifting to DQN to take full control of the robot. We divide the underwater tracking task into vision and control modules. We use established methods for vision-based tracking and introduce a centralized DQN controller. By transmitting bounding box data from the vision module to the control module, we enable adaptation to various objects and effortless vision system replacement. Furthermore, dealing with low-dimensional data facilitates cost-effective online learning for the controller. Our experiments, conducted within a Unity-based simulator, validate the effectiveness of a centralized RL agent over separated PID controllers, showcasing the applicability of our framework for training the underwater RL agent and improved performance compared to traditional control methods. The code for both real and simulation implementations is at https://github.com/FARAZLOTFI/underwater-object-tracking.
- A. El-Fakdi and M. Carreras, “Two-step gradient-based reinforcement learning for underwater robotics behavior learning,” Robotics and Autonomous Systems, vol. 61, no. 3, pp. 271–282, 2013.
- I. Carlucho, M. De Paula, S. Wang, Y. Petillot, and G. G. Acosta, “Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning,” Robotics and Autonomous Systems, vol. 107, pp. 71–86, 2018.
- G. Dudek, P. Giguere, C. Prahacs, S. Saunderson, J. Sattar, L.-A. Torres-Mendez, M. Jenkin, A. German, A. Hogue, A. Ripsman, et al., “Aqua: An amphibious autonomous robot,” Computer, vol. 40, no. 1, pp. 46–53, 2007.
- G. Dudek, M. Jenkin, C. Prahacs, A. Hogue, J. Sattar, P. Giguere, A. German, H. Liu, S. Saunderson, A. Ripsman, et al., “A visually guided swimming robot,” in 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3604–3609, IEEE, 2005.
- F. Lotfi, K. Virji, and G. Dudek, “Robust scuba diver tracking and recovery in open water using yolov7, sort, and spiral search,” in 2023 20th Conference on Computer and Robot Vision (CRV), 2023.
- M. J. Islam, M. Fulton, and J. Sattar, “Toward a generic diver-following algorithm: Balancing robustness and efficiency in deep visual detection,” IEEE Robotics and Automation Letters, vol. 4, no. 1, pp. 113–120, 2018.
- J. Sattar and G. Dudek, “A vision-based control and interaction framework for a legged underwater robot,” in 2009 Canadian Conference on Computer and Robot Vision, pp. 329–336, IEEE, 2009.
- T. Manderson, J. C. G. Higuera, S. Wapnick, J.-F. Tremblay, F. Shkurti, D. Meger, and G. Dudek, “Vision-based goal-conditioned policies for underwater navigation in the presence of obstacles,” arXiv preprint arXiv:2006.16235, 2020.
- C. Edge and J. Sattar, “Diver interest via pointing: Human-directed object inspection for auvs,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 3146–3153, 2023.
- J. Garrido, F. Vázquez, and F. Morilla, “Centralized multivariable control by simplified decoupling,” Journal of process control, vol. 22, no. 6, pp. 1044–1062, 2012.
- L. Ma, “Position synchronization control algorithm of legged robot based on dsp centralized control,” Mobile Networks and Applications, vol. 27, no. 3, pp. 955–964, 2022.
- F. Shkurti, W.-D. Chang, P. Henderson, M. J. Islam, J. C. G. Higuera, J. Li, T. Manderson, A. Xu, G. Dudek, and J. Sattar, “Underwater multi-robot convoying using visual tracking by detection,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4189–4196, IEEE, 2017.
- D. Li, D. Zhao, Q. Zhang, and Y. Chen, “Reinforcement learning and deep learning based lateral control for autonomous driving [application notes],” IEEE Computational Intelligence Magazine, vol. 14, no. 2, pp. 83–98, 2019.
- D. Yarats, R. Fergus, A. Lazaric, and L. Pinto, “Mastering visual continuous control: Improved data-augmented reinforcement learning,” arXiv preprint arXiv:2107.09645, 2021.
- M. Toromanoff, E. Wirbel, and F. Moutarde, “End-to-end model-free reinforcement learning for urban driving using implicit affordances,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7153–7162, 2020.
- S. Ramstedt and C. Pal, “Real-time reinforcement learning,” Advances in neural information processing systems, vol. 32, 2019.
- J. B. Travnik, K. W. Mathewson, R. S. Sutton, and P. M. Pilarski, “Reactive reinforcement learning in asynchronous environments,” Frontiers in Robotics and AI, vol. 5, p. 79, 2018.
- J. Zhang, J. T. Springenberg, J. Boedecker, and W. Burgard, “Deep reinforcement learning with successor features for navigation across similar environments,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2371–2378, IEEE, 2017.
- J. Buckman, D. Hafner, G. Tucker, E. Brevdo, and H. Lee, “Sample-efficient reinforcement learning with stochastic ensemble value expansion,” Advances in neural information processing systems, vol. 31, 2018.
- S. Burlington and G. Dudek, “Spiral search as an efficient mobile robotic search technique,” in Proceedings of the 16th National Conf. on AI, Orlando Fl, 1999.
- K. de Langis and J. Sattar, “Realtime multi-diver tracking and re-identification for underwater human-robot collaboration,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), pp. 11140–11146, 2020.
- J. Sattar and G. Dudek, “Where is your dive buddy: tracking humans underwater using spatio-temporal features,” in 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3654–3659, 2007.
- Dula Nad, F. Mandic, and N. Miskovic, “Diver tracking using path stabilization - the virtual diver experimental results,” IFAC-PapersOnLine, vol. 49, no. 23, pp. 214–219, 2016. 10th IFAC Conference on Control Applications in Marine SystemsCAMS 2016.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37, Springer, 2016.
- C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” arXiv preprint arXiv:2207.02696, 2022.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
- K. de Langis, M. Fulton, and J. Sattar, “Video diver dataset (vdd-c) 100,000 annotated images of divers underwater,” 2021.
- T. A. Euzébio, M. T. Da Silva, and A. S. Yamashita, “Decentralized pid controller tuning based on nonlinear optimization to minimize the disturbance effects in coupled loops,” IEEE Access, vol. 9, pp. 156857–156867, 2021.
- T. Shuprajhaa, S. K. Sujit, and K. Srinivasan, “Reinforcement learning based adaptive pid controller design for control of linear/nonlinear unstable processes,” Applied Soft Computing, vol. 128, p. 109450, 2022.
- B. Singh, R. Kumar, and V. P. Singh, “Reinforcement learning in robotic applications: a comprehensive survey,” Artificial Intelligence Review, pp. 1–46, 2022.
- G. S. Krishna, D. Sumith, and G. Akshay, “Epersist: A two-wheeled self balancing robot using pid controller and deep reinforcement learning,” in 2022 22nd International Conference on Control, Automation and Systems (ICCAS), pp. 1488–1492, IEEE, 2022.
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, pp. 279–292, 1992.
- H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proceedings of the AAAI conference on artificial intelligence, vol. 30, 2016.
- P.-H. Su, P. Budzianowski, S. Ultes, M. Gasic, and S. Young, “Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management,” arXiv preprint arXiv:1707.00130, 2017.
- Y. Tang and S. Agrawal, “Discretizing continuous action space for on-policy optimization,” in Proceedings of the aaai conference on artificial intelligence, vol. 34, pp. 5981–5988, 2020.
- T. Van de Wiele, D. Warde-Farley, A. Mnih, and V. Mnih, “Q-learning in enormous action spaces via amortized approximate maximization,” arXiv preprint arXiv:2001.08116, 2020.
- S. Gu, L. Yang, Y. Du, G. Chen, F. Walter, J. Wang, Y. Yang, and A. Knoll, “A review of safe reinforcement learning: Methods, theory and applications,” arXiv preprint arXiv:2205.10330, 2022.
- A. Tamar, D. Di Castro, and S. Mannor, “Policy gradients with variance related risk criteria,” in Proceedings of the twenty-ninth international conference on machine learning, pp. 387–396, 2012.
- B. Lütjens, M. Everett, and J. P. How, “Safe reinforcement learning with model uncertainty estimates,” in 2019 International Conference on Robotics and Automation (ICRA), pp. 8662–8668, IEEE, 2019.
- D. Ding, K. Zhang, T. Basar, and M. Jovanovic, “Natural policy gradient primal-dual method for constrained markov decision processes,” Advances in Neural Information Processing Systems, vol. 33, pp. 8378–8390, 2020.
- P. Abbeel, A. Coates, and A. Y. Ng, “Autonomous helicopter aerobatics through apprenticeship learning,” The International Journal of Robotics Research, vol. 29, no. 13, pp. 1608–1639, 2010.
- J. Garcia and F. Fernández, “Safe exploration of state and action spaces in reinforcement learning,” Journal of Artificial Intelligence Research, vol. 45, pp. 515–564, 2012.
- Independent Robotics Inc., “Underwater robot simulator.” https://www.independentrobotics.com/robot-simulator. Accessed: Sept 14, 2023.
- A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and realtime tracking,” in 2016 IEEE international conference on image processing (ICIP), pp. 3464–3468, IEEE, 2016.
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033, IEEE, 2012.
- S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020.
- S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 627–635, JMLR Workshop and Conference Proceedings, 2011.
- P. Giguere, C. Prahacs, and G. Dudek, “Characterization and modeling of rotational responses for an oscillating foil underwater robot,” in 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3000–3005, IEEE, 2006.