Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 83 tok/s
Gemini 2.5 Pro 34 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 21 tok/s Pro
GPT-4o 130 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors (2308.11020v2)

Published 21 Aug 2023 in cs.CL, cs.HC, and cs.RO

Abstract: This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors. In this study, our main focus is on assessing the human-likeness of the robot as the primary evaluation metric. While previous research often relied on subjective evaluations from users, our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and reproducibility. To begin, we created an annotated dataset of human-likeness scores, utilizing user behaviors found in an attentive listening dialogue corpus. We then conducted an analysis to determine the correlation between multimodal user behaviors and human-likeness scores, demonstrating the feasibility of our proposed behavior-based evaluation method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Technical metrics used to evaluate health care chatbots: scoping review. Journal of medical Internet research 22, 6 (2020).
  2. Trends & methods in chatbot evaluation. In Companion Publication of ICMI. 280–286.
  3. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review 54 (2021), 755–810.
  4. SimSensei Kiosk: A Virtual Human Interviewer for Healthcare Decision Support. In AAMAS. 1061–1068.
  5. Towards human-like spoken dialogue systems. Speech Communication 50, 8 (2008), 630–645.
  6. Counseling Dialog System with 5W1H Extraction. In SIGDIAL.
  7. Human-like guide robot that proactively explains exhibits. International Journal of Social Robotics 12 (2020), 549–566.
  8. Job interviewer android with elaborate follow-up question generation. In ICMI. 324–332.
  9. An attentive listening system with android ERICA: Comparison of autonomous and WOZ interactions. In SIGDIAL. 118–127.
  10. Talking with ERICA, an autonomous android. In SIGDIAL. 212–215.
  11. Spoken Dialog Systems for Automated Survey Interviewing. In SIGDIAL. 329–333.
  12. Tatsuya Kawahara. 2018. Spoken dialogue system for a human-like conversational robot ERICA. In IWSDS. 65–75.
  13. Conversational agents in healthcare: a systematic review. Journal of the American Medical Informatics Association 25, 9 (2018), 1248–1258.
  14. Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems. In SIGKDD. 3299–3308.
  15. Automatic Detection of Miscommunication in Spoken Dialogue Systems. In SIGDIAL. 354–363.
  16. Recent advances in deep learning based dialogue systems: A systematic survey. Artificial intelligence review 56 (2023), 3055–3155.
  17. Towards an engagement-aware attentive artificial listener for multi-party interactions. Frontiers in Robotics and AI 8 (2021).
  18. Potential applications of social robots in robot-assisted interventions for social anxiety. International Journal of Social Robotics 14 (2022), 1–32.
  19. Building autonomous sensitive artificial listeners. In ACII. 456–462.
  20. Use of social robots in mental health and well-being research: Systematic review. Journal of medical Internet research 21, 7 (2019).
  21. Doreen Ying Ying Sim and Chu Kiong Loo. 2015. Extensive assessment and evaluation methodologies on assistive social robots for modelling human–robot interaction – A review. Information Sciences 301 (2015), 305–344.
  22. Ada and Grace: Toward realistic and engaging virtual museum guides. In IVA. 286–300.
  23. Stefan Ultes and Wolfgang Maier. 2021. User Satisfaction Reward Estimation Across Domains: Domain-independent Dialogue Policy Learning. Dialogue & Discourse 12, 2 (2021), 81–114.
  24. An open-source dialog system with real-time engagement tracking for job interview training applications. In IWSDS.
  25. Automatic evaluation and moderation of open-domain dialogue systems. arXiv preprint, 2111.02110 (2021).

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.