Diver Interest via Pointing in Three Dimensions: 3D Pointing Reconstruction for Diver-AUV Communication (2310.11536v1)
Abstract: This paper presents Diver Interest via Pointing in Three Dimensions (DIP-3D), a method to relay an object of interest from a diver to an autonomous underwater vehicle (AUV) by pointing that includes three-dimensional distance information to discriminate between multiple objects in the AUV's camera image. Traditional dense stereo vision for distance estimation underwater is challenging because of the relative lack of saliency of scene features and degraded lighting conditions. Yet, including distance information is necessary for robotic perception of diver pointing when multiple objects appear within the robot's image plane. We subvert the challenges of underwater distance estimation by using sparse reconstruction of keypoints to perform pose estimation on both the left and right images from the robot's stereo camera. Triangulated pose keypoints, along with a classical object detection method, enable DIP-3D to infer the location of an object of interest when multiple objects are in the AUV's field of view. By allowing the scuba diver to point at an arbitrary object of interest and enabling the AUV to autonomously decide which object the diver is pointing to, this method will permit more natural interaction between AUVs and human scuba divers in underwater-human robot collaborative tasks.
- Y. R. Petillot, S. R. Reed, and J. M. Bell, “Real Time AUV Pipeline Detection and Tracking Using Side Scan Sonar and Multi-beam Echo-sounder,” in OCEANS ’02 MTS/IEEE, vol. 1, 2002, pp. 217–222 vol.1.
- M. Modasshir, S. Rahman, O. Youngquist, and I. Rekleitis, “Coral Identification and Counting with an Autonomous Underwater Vehicle,” in IEEE International Conference on Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, (Finalist of T. J. Tarn Best Paper in Robotics), Dec. 2018, pp. 524– 529.
- B. Bingham, B. Foley, H. Singh, R. Camilli, K. Delaporta, R. Eustice, A. Mallios, D. Mindell, C. Roman, and D. Sakellariou, “Robotic Tools for Deep Water Archaeology: Surveying an Ancient Shipwreck with an Autonomous Underwater Vehicle,” Journal of Field Robotics, vol. 27, pp. 702–717, 11 2010.
- R. P. Stokey, L. E. Freitag, and M. D. Grund, “A compact control language for AUV acoustic communication,” in 2005 MTS/IEEE OCEANS–Europe, vol. 2. IEEE, 2005, pp. 1133–1137.
- N. Stilinovic, M. Marković, N. Miskovic, Z. Vukić, and A. Vasilijevic, “Mechanical design of an autonomous marine robotic system for interaction with divers,” Brodogradnja, vol. 67, 08 2016.
- C. Edge, S. S. Enan, M. Fulton, J. Hong, J. Mo, K. Barthelemy, H. Bashaw, B. Kallevig, C. Knutson, K. Orpen, and J. Sattar, “Design and Experiments with LoCO AUV: A Low Cost Open-Source Autonomous Underwater Vehicle,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, Nevada, USA, October 2020, pp. 1761–1768.
- L. Fridman, B. Reimer, B. Mehler, and W. T. Freeman, “Cognitive load estimation in the wild,” in 2018 Chi Conference on Human Factors in Computing Systems, 2018, pp. 1–9.
- A. Birk, “A Survey of Underwater Human-Robot Interaction (U-HRI),” Current Robotics Reports, vol. 3, no. 4, pp. 199–211, Dec 2022. [Online]. Available: https://doi.org/10.1007/s43154-022-00092-7
- J. Sattar, E. Bourque, P. Giguère, and G. Dudek, “Fourier tags: Smoothly degradable fiducial markers for use in human-robot interaction,” in Proceedings of the Fourth Canadian Conference on Computer and Robot Vision, Montréal, QC, Canada, 5 2007, pp. 165–174.
- A. Gomez Chavez, A. Ranieri, D. Chiarella, and A. Birk, “Underwater Vision-Based Gesture Recognition: A Robustness Validation for Safe Human–Robot Interaction,” IEEE Robotics & Automation Magazine, vol. 28, no. 3, pp. 67–78, 2021.
- D. Chiarella, M. Bibuli, G. Bruzzone, M. Caccia, A. Ranieri, E. Zereik, L. Marconi, and P. Cutugno, “A Novel Gesture-Based Language for Underwater Human–Robot Interaction,” Journal of Marine Science and Engineering, vol. 6, no. 3, 2018. [Online]. Available: https://www.mdpi.com/2077-1312/6/3/91
- M. J. Islam, M. Ho, and J. Sattar, “Dynamic Reconfiguration of Mission Parameters in Underwater Human-Robot Collaboration,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6212–6219.
- ——, “Understanding Human Motion and Gestures for Underwater Human-Robot Collaboration,” Journal of Field Robotics (JFR), pp. 1–23, 2018.
- A. Gomez Chavez, A. Ranieri, D. Chiarella, E. Zereik, A. Babić, and A. Birk, “CADDY Underwater Stereo-Vision Dataset for Human–Robot Interaction (HRI) in the Context of Diver Activities,” Journal of Marine Science and Engineering, vol. 7, no. 1, 2019. [Online]. Available: https://www.mdpi.com/2077-1312/7/1/16
- D. Nad, F. Ferreira, I. Kvasic, L. Mandic, C. Walker, D. O. Antillon, and I. Anderson, “Diver-Robot Communication using Wearable Sensing Diver Glove,” in OCEANS 2021: San Diego – Porto, 2021, pp. 1–6.
- C. Edge and J. Sattar, “Diver Interest via Pointing: Human-Directed Object Inspection for AUVs,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.
- A. M. Walker, “Towards Natural Underwater Human-Robot Interaction: Pointing Gesture Recognition for Autonomous Underwater Vehicles,” Master’s thesis, University of Minnesota, 2021.
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single Shot MultiBox Detector,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds. Cham: Springer International Publishing, 2016, pp. 21–37.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
- M. Tölgyessy, M. Dekan, F. Duchoň, J. Rodina, P. Hubinský, and L. Chovanec, “Foundations of Visual Linear Human–Robot Interaction via Pointing Gesture Navigation,” International Journal of Social Robotics, vol. 9, no. 4, p. 509–523, Sep 2017. [Online]. Available: https://doi.org/10.1007/s12369-017-0408-9
- J. Richarz, A. Scheidig, C. Martin, S. Müller, and H.-M. Gross, “A Monocular Pointing Pose Estimator for Gestural Instruction of a Mobile Robot,” International Journal of Advanced Robotic Systems, vol. 4, no. 1, p. 17, 2007. [Online]. Available: https://doi.org/10.5772/5700
- S. Abidi, M. Williams, and B. Johnston, “Human Pointing as a Robot Directive,” in 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2013, pp. 67–68.
- A. C. S. Medeiros, P. Ratsamee, J. Orlosky, Y. Uranishi, M. Higashida, and H. Takemura, “3D Pointing Gestures as Target Selection Tools: Guiding Monocular UAVs During Window Selection in an Outdoor Environment,” ROBOMECH Journal, vol. 8, no. 1, Apr. 2021. [Online]. Available: https://doi.org/10.1186/s40648-021-00200-w
- Y. Chen, Y. Tian, and M. He, “Monocular Human Pose Estimation: A Survey of Deep Learning-based Methods,” Computer Vision and Image Understanding, vol. 192, p. 102897, 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1077314219301778
- M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, “2D Human Pose Estimation: New Benchmark and State of the Art Analysis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2014.
- N. Sarafianos, B. Boteanu, B. Ionescu, and I. A. Kakadiaris, “3D Human Pose Estimation: A Review of the Literature and Analysis of Covariates,” Computer Vision and Image Understanding, vol. 152, pp. 1–20, 2016. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1077314216301369
- A. G. Chavez, C. A. Mueller, A. Birk, A. Babic, and N. Miskovic, “Stereo-Vision Based Diver Pose Estimation Using LSTM Recurrent Neural Networks for AUV Navigation Guidance,” in OCEANS 2017 - Aberdeen, 2017, pp. 1–7.
- M. Fulton, J. Hong, and J. Sattar, “Using Monocular Vision and Human Body Priors for AUVs to Autonomously Approach Divers,” in 2022 International Conference on Robotics and Automation (ICRA), 2022, pp. 1076–1082.
- “TRT Pose,” 2020. [Online]. Available: https://github.com/NVIDIA-AI-IOT/trt_pose
- M. J. Islam, J. Mo, and J. Sattar, “Robot-to-Robot Relative Pose Estimation Using Humans as Markers,” Autonomous Robots, vol. 45, no. 4, pp. 579–593, May 2021. [Online]. Available: https://doi.org/10.1007/s10514-021-09985-6
- Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 172–186, 2021.
- J. D. Jackson, “Classical electrodynamics,” 1999.
- C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. Yong, J. Lee, W.-T. Chang, W. Hua, M. Georg, and M. Grundmann, “MediaPipe: A Framework for Perceiving and Processing Reality,” in Third Workshop on Computer Vision for AR/VR at IEEE Computer Vision and Pattern Recognition (CVPR) 2019, 2019. [Online]. Available: https://mixedreality.cs.cornell.edu/s/NewTitle_May1_MediaPipe_CVPR_CV4ARVR_Workshop_2019.pdf
- V. Bazarevsky, I. Grishchenko, K. Raveendran, T. Zhu, F. Zhang, and M. Grundmann, “BlazePose: On-device Real-time Body Pose tracking,” 2020. [Online]. Available: https://arxiv.org/abs/2006.10204
- J. Delmerico, S. Mintchev, A. Giusti, B. Gromov, K. Melo, T. Horvat, C. Cadena, M. Hutter, A. Ijspeert, D. Floreano, L. M. Gambardella, R. Siegwart, and D. Scaramuzza, “The Current State and Future Outlook of Rescue Robotics,” Journal of Field Robotics, vol. 36, no. 7, pp. 1171–1191, 2019. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.21887
- D. Shukla, O. Erkent, and J. Piater, “Probabilistic Detection of Pointing Directions for Human-Robot Interaction,” in 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Nov 2015, p. 1–8.
- E. Littmann, A. Drees, and H. Ritter, “Robot Guidance by Human Pointing Gestures,” in Proceedings of International Workshop on Neural Networks for Identification, Control, Robotics and Signal/Image Processing, Aug 1996, p. 449–457.
- B. Azari, A. Lim, and R. Vaughan, “Commodifying Pointing in HRI: Simple and Fast Pointing Gesture Detection from RGB-D Images,” in 2019 16th Conference on Computer and Robot Vision (CRV), 2019, pp. 174–180.
- M. Van den Bergh, D. Carton, R. De Nijs, N. Mitsou, C. Landsiedel, K. Kuehnlenz, D. Wollherr, L. Van Gool, and M. Buss, “Real-time 3d hand gesture interaction with a robot for understanding directions from humans,” in 2011 RO-MAN, 2011, pp. 357–362.
- B. Großmann, M. R. Pedersen, J. Klonovs, D. Herzog, L. Nalpantidis, and V. Krüger, “Communicating Unknown Objects to Robots through Pointing Gestures,” in Advances in Autonomous Robotics Systems, M. Mistry, A. Leonardis, M. Witkowski, and C. Melhuish, Eds. Cham: Springer International Publishing, 2014, pp. 209–220.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints.” International Journal of Computer Vision (IJCV), vol. 60, no. 2, pp. 91–110, 2004.
- H. Hirschmuller, “Stereo processing by semiglobal matching and mutual information,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 328–341, 2007.
- A. Richardson, J. Strom, and E. Olson, “Aprilcal: Assisted and repeatable camera calibration,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2013, pp. 1814–1821.
- J.-M. Lavest, G. Rives, and J.-T. Lapresté, “Underwater camera calibration,” in Computer Vision—ECCV 2000: 6th European Conference on Computer Vision Dublin, Ireland, June 26–July 1, 2000 Proceedings, Part II 6. Springer, 2000, pp. 654–668.
- M. Quigley, K. Conley, B. P. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng, “ROS: An Open-Source Robot Operating System,” in ICRA Workshop on Open Source Software, 2009. [Online]. Available: http://pub1.willowgarage.com/~konolige/cs225B/docs/quigley-icra2009-ros.pdf