Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Visual Perception of a Social Robot for Controlled and In-the-wild Human-robot Interaction (2403.01766v2)

Published 4 Mar 2024 in cs.RO and cs.CV

Abstract: Social robots often rely on visual perception to understand their users and the environment. Recent advancements in data-driven approaches for computer vision have demonstrated great potentials for applying deep-learning models to enhance a social robot's visual perception. However, the high computational demands of deep-learning methods, as opposed to the more resource-efficient shallow-learning models, bring up important questions regarding their effects on real-world interaction and user experience. It is unclear how will the objective interaction performance and subjective user experience be influenced when a social robot adopts a deep-learning based visual perception model. We employed state-of-the-art human perception and tracking models to improve the visual perception function of the Pepper robot and conducted a controlled lab study and an in-the-wild human-robot interaction study to evaluate this novel perception function for following a specific user with other people present in the scene.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. BoT-SORT: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651 (2022).
  2. Evaluating the engagement with social robots. International Journal of Social Robotics 7 (2015), 465–478.
  3. A Kinect-Based Gesture Acquisition and Reproduction System for Humanoid Robots. In Computational Science and Its Applications–ICCSA 2020: 20th International Conference, Cagliari, Italy, July 1–4, 2020, Proceedings, Part I 20. Springer, 967–977.
  4. Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots. International journal of social robotics 1 (2009), 71–81.
  5. Refining the fusion of pepper robot and estimated depth maps method for improved 3D perception. IEEE Access 7 (2019), 185076–185085.
  6. The impact of a social robot public speaker on audience attention. In Proceedings of the 8th International Conference on Human-Agent Interaction. 60–68.
  7. Adapted pepper. arXiv preprint arXiv:2009.03648 (2020).
  8. Observation-centric sort: Rethinking sort for robust multi-object tracking. arXiv preprint arXiv:2203.14360 (2022).
  9. 2D Human pose estimation: a survey. Multimedia Systems (2022), 1–24.
  10. Deep learning based 2D human pose estimation: A survey. Tsinghua Science and Technology 24, 6 (2019), 663–676.
  11. Adaptive technique for brightness enhancement of automated knife detection in surveillance video with deep learning. Arabian Journal for Science and Engineering 46 (2021), 4049–4058.
  12. Complementary-view multiple human tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 10917–10924.
  13. Robinson Jiménez-Moreno and Ricardo A Castillo. 2023. Deep learning speech recognition for residential assistant robot. IAES International Journal of Artificial Intelligence 12, 2 (2023), 585.
  14. Socially assistive robots as mental health interventions for children: a scoping review. International Journal of Social Robotics 13 (2021), 919–935.
  15. PoseAnalyser: A Survey on Human Pose Estimation. SN Computer Science 4, 2 (2023), 136.
  16. Low-light image and video enhancement using deep learning: A survey. IEEE transactions on pattern analysis and machine intelligence 44, 12 (2021), 9396–9416.
  17. Bottom-up pose estimation of multiple person with bounding box constraint. In 2018 24th international conference on pattern recognition (ICPR). IEEE, 115–120.
  18. Multi-person pose estimation using bounding box constraint and LSTM. IEEE Transactions on Multimedia 21, 10 (2019), 2653–2663.
  19. Human–robot collaboration in construction: classification and research trends. Journal of Construction Engineering and Management 147, 10 (2021), 03121006.
  20. Multiple object tracking: A literature review. Artificial intelligence 293 (2021), 103448.
  21. YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2637–2646.
  22. Object Recognition in Different Lighting Conditions at Various Angles by Deep Learning Method. arXiv preprint arXiv:2210.09618 (2022).
  23. Thong Duy Nguyen and Milan Kresovic. 2022. A survey of top-down approaches for human pose estimation. arXiv preprint arXiv:2202.02656 (2022).
  24. Amit Kumar Pandey and Rodolphe Gelin. 2018. A mass-produced sociable humanoid robot: Pepper: The first machine of its kind. IEEE Robotics & Automation Magazine 25, 3 (2018), 40–48.
  25. Robotic Vision for Human-Robot Interaction and Collaboration: A Survey and Systematic Review. ACM Transactions on Human-Robot Interaction 12, 1 (2023), 1–66.
  26. Improving LEO robot conversational ability via deep learning algorithms for children with autism. In 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS). IEEE, 416–420.
  27. Tarek Stiebel and Dorit Merhof. 2020. Brightness invariant deep spectral super-resolution. Sensors 20, 20 (2020), 5789.
  28. Pepper learns together with children: Development of an educational application. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids). IEEE, 270–275.
  29. Planar surface SLAM with 3D and 2D sensors. In 2012 IEEE International Conference on Robotics and Automation. IEEE, 3041–3048.
  30. Spicing up hospitality service encounters: the case of Pepper™. International Journal of Contemporary Hospitality Management (2021).
  31. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022).
  32. Pose2seg: Detection free human instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 889–898.
  33. Bytetrack: Multi-object tracking by associating every detection box. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII. Springer, 1–21.
  34. DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation. In Artificial Intelligence: Second CAAI International Conference, CICAI 2022, Beijing, China, August 27–28, 2022, Revised Selected Papers, Part II. Springer, 559–576.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Wangjie Zhong (1 paper)
  2. Leimin Tian (12 papers)
  3. Duy Tho Le (2 papers)
  4. Hamid Rezatofighi (61 papers)

Summary

We haven't generated a summary for this paper yet.