Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Multimodal Handover Failure Detection Dataset and Baselines

Published 28 Feb 2024 in cs.RO and cs.CV | (2402.18319v1)

Abstract: An object handover between a robot and a human is a coordinated action which is prone to failure for reasons such as miscommunication, incorrect actions and unexpected object properties. Existing works on handover failure detection and prevention focus on preventing failures due to object slip or external disturbances. However, there is a lack of datasets and evaluation methods that consider unpreventable failures caused by the human participant. To address this deficit, we present the multimodal Handover Failure Detection dataset, which consists of failures induced by the human participant, such as ignoring the robot or not releasing the object. We also present two baseline methods for handover failure detection: (i) a video classification method using 3D CNNs and (ii) a temporal action segmentation approach which jointly classifies the human action, robot action and overall outcome of the action. The results show that video is an important modality, but using force-torque data and gripper position help improve failure detection and action segmentation accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. S. Parastegari, E. Noohi, B. Abbasi, and M. Žefran, “Failure Recovery in Robot–Human Object Handover,” IEEE Transactions on Robotics, vol. 34, no. 3, pp. 660–673, 2018.
  2. M.-J. Davari, M. Hegedus, K. Gupta, and M. Mehrandezh, “Identifying Multiple Interaction Events from Tactile Data during Robot-Human Object Transfer,” in 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).   IEEE, 2019, pp. 1–6.
  3. R. Liu, R. Chen, and C. Liu, “Task-Agnostic Adaptation for Safe Human-Robot Handover,” arXiv preprint arXiv:2209.09418, 2022.
  4. A. G. Eguiluz, I. Rañó, S. A. Coleman, and T. M. McGinnity, “Reliable object handover through tactile force sensing and effort control in the Shadow Robot hand,” in 2017 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2017, pp. 372–377.
  5. W. Yang, C. Paxton, A. Mousavian, Y.-W. Chao, M. Cakmak, and D. Fox, “Reactive Human-to-Robot Handovers of Arbitrary Objects,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 3118–3124.
  6. M. Mavsar and A. Ude, “RoverNet: Vision-Based Adaptive Human-to-Robot Object Handovers,” in 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).   IEEE, 2022, pp. 858–864.
  7. Y. L. Pang, A. Xompero, C. Oh, and A. Cavallaro, “Towards safe human-to-robot handovers of unknown containers,” in 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN).   IEEE, 2021, pp. 51–58.
  8. S. Thoduka and N. Hochgeschwender, “Benchmarking Robots by Inducing Failures in Competition Scenarios,” in Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management. AI, Product and Service: 12th International Conference, DHM 2021, Held as Part of the 23rd HCI International Conference, HCII 2021, Virtual Event, July 24–29, 2021, Proceedings, Part II.   Springer, 2021, pp. 263–276.
  9. Z. Xu, S. Escalera, A. Pavão, M. Richard, W.-W. Tu, Q. Yao, H. Zhao, and I. Guyon, “Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform,” Patterns, vol. 3, no. 7, p. 100543, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2666389922001465
  10. E. C. Grigore, K. Eder, A. G. Pipe, C. Melhuish, and U. Leonards, “Joint Action Understanding improves Robot-to-Human Object Handover,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2013, pp. 4622–4629.
  11. V. Ortenzi, A. Cosgun, T. Pardi, W. P. Chan, E. Croft, and D. Kulić, “Object Handovers: a Review for Robotics,” IEEE Transactions on Robotics, vol. 37, no. 6, pp. 1855–1873, 2021.
  12. S. Parastegari, E. Noohi, B. Abbasi, and M. Žefran, “A fail-safe object handover controller,” in 2016 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2016, pp. 2003–2008.
  13. P. Rosenberger, A. Cosgun, R. Newbury, J. Kwan, V. Ortenzi, P. Corke, and M. Grafinger, “Object-Independent Human-to-Robot Handovers Using Real Time Robotic Vision,” IEEE Robotics and Automation Letters, vol. 6, no. 1, pp. 17–23, 2020.
  14. Z. Han and H. A. Yanco, “Reasons People Want Explanations After Unrecoverable Pre-Handover Failures,” in ICRA Workshop on Human-Robot Handovers, 2020. [Online]. Available: https://arxiv.org/abs/2010.02278
  15. C. Meng, T. Zhang, and T. lun Lam, “Fast and Comfortable Interactive Robot-to-Human Object Handover,” in 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 3701–3706.
  16. I. Mamaev, D. Kretsch, H. Alagi, and B. Hein, “Grasp Detection for Robot to Human Handovers Using Capacitive Sensors,” in 2021 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 12 552–12 558.
  17. F. Iori, G. Perovic, F. Cini, A. Mazzeo, E. Falotico, and M. Controzzi, “DMP-Based Reactive Robot-to-Human Handover in Perturbed Scenarios,” International Journal of Social Robotics, pp. 1–16, 2023.
  18. M. Mohandes, B. Moradi, K. Gupta, and M. Mehrandezh, “Robot to Human Object Handover Using Vision and Joint Torque Sensor Modalities,” in International Conference on Robot Intelligence Technology and Applications.   Springer, 2022, pp. 109–124.
  19. Y.-W. Chao, C. Paxton, Y. Xiang, W. Yang, B. Sundaralingam, T. Chen, A. Murali, M. Cakmak, and D. Fox, “HandoverSim: A Simulation Framework and Benchmark for Human-to-Robot Object Handovers,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 6941–6947.
  20. J. Carreira and A. Zisserman, “Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308.
  21. C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager, “Temporal Convolutional Networks for Action Segmentation and Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 156–165.
  22. Y. A. Farha and J. Gall, “MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3575–3584.
  23. S. Li, Y. A. Farha, Y. Liu, M.-M. Cheng, and J. Gall, “MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 6, pp. 6647–6658, 2023.
  24. F. Yi, H. Wen, and T. Jiang, “ASFormer: Transformer for Action Segmentation,” British Machine Vision Conference (BMVC), 2021.
  25. G. Ding, F. Sener, and A. Yao, “Temporal Action Segmentation: An Analysis of Modern Techniques,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
Citations (1)

Summary

  • The paper introduces a novel multimodal dataset capturing human-induced handover failures using video, force-torque, and joint sensors.
  • The paper presents baseline methods such as modified I3D for video classification and MSTCNs for temporal action segmentation to enhance detection accuracy.
  • The paper demonstrates that integrating multimodal data, especially force-torque signals, significantly improves failure detection in robotic handovers.

A Multimodal Handover Failure Detection Dataset and Baselines

Introduction

The paper "A Multimodal Handover Failure Detection Dataset and Baselines" presents a dataset and evaluation framework focused on the detection of failures in robot-to-human (R2H) and human-to-robot (H2R) handovers. Unlike previous works concentrating on preventable failures such as object slips or disturbances, this research addresses failures induced by human participants, providing essential data for realistic benchmarking in human-robot interaction (HRI) scenarios.

Dataset Overview

The Handover Failure Detection (HFD) dataset is a novel contribution that includes various failure modes caused by the human participant, such as ignoring the robot or failing to release an object during handover. The dataset is composed of multimodal data, including video captured from robots, force-torque (F-T) readings, and robot joint states. Figure 1

Figure 1: The dataset consists of video, force-torque readings and robot joint states, and contains annotations for the robot and human actions.

The data collection involved 17 participants interacting with two robotic platforms under controlled conditions to simulate diverse handover scenarios. Annotations include both successful and failed trials, categorized into different failure modes as depicted in Table 1 from the paper.

Baseline Methods

The paper introduces two baseline methods for detecting handover failures:

  1. Video Classification: Leveraging 3D CNNs, specifically modified variants of the Inflated 3D ConvNet (I3D), to classify the outcome of handovers using different modalities such as RGB video, F-T, and gripper position.
  2. Temporal Action Segmentation: Utilizing Multi-Stage Temporal Convolutional Networks (MS-TCNs) to predict human actions during the handover, allowing for the detection of unexpected or unperformed actions as indicators of failure. Figure 2

    Figure 2: Illustration of network architectures for video classification and temporal action segmentation used in the study.

Experimental Results

The experiments demonstrate that incorporating multimodal data enhances failure detection accuracy, with F-T data proving particularly valuable. The I3D-D model, which performs intermediate fusion of modalities, achieved superior classification performance. Similarly, the MSTCN-A model outperformed in action segmentation, underscoring the importance of integrating visual and physical signals for accurate failure detection.

Implications and Future Work

This dataset and the accompanying baselines form a crucial foundation for advancing research in HRI safety and reliability. The inclusion of human-induced failure scenarios in benchmarking aligns closely with real-world applications, particularly in sensitive fields such as healthcare. Future research could explore real-time failure detection, enabling robots to react immediately and appropriately to failure conditions. The development of causal temporal prediction models would be an advantageous expansion, further increasing the applicability of these findings in dynamic, interactive environments.

Conclusion

The research provides a structured approach to handling the complexities of multimodal failure detection in robotic handovers. The HFD dataset enriches the resources available for evaluating robotic systems in realistic conditions, emphasizing the critical role of multimodal integration in failure detection. As robotics continues to evolve, methodologies developed through this work will likely inform safer and more effective human-robot collaborations.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.