Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 73 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 218 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue Systems (2401.04867v2)

Published 10 Jan 2024 in cs.CL, cs.AI, and cs.HC

Abstract: Establishing evaluation schemes for spoken dialogue systems is important, but it can also be challenging. While subjective evaluations are commonly used in user experiments, objective evaluations are necessary for research comparison and reproducibility. To address this issue, we propose a framework for indirectly but objectively evaluating systems based on users' behaviors. In this paper, to this end, we investigate the relationship between user behaviors and subjective evaluation scores in social dialogue tasks: attentive listening, job interview, and first-meeting conversation. The results reveal that in dialogue tasks where user utterances are primary, such as attentive listening and job interview, indicators like the number of utterances and words play a significant role in evaluation. Observing disfluency also can indicate the effectiveness of formal tasks, such as job interview. On the other hand, in dialogue tasks with high interactivity, such as first-meeting conversation, behaviors related to turn-taking, like average switch pause length, become more important. These findings suggest that selecting appropriate user behaviors can provide valuable insights for objective evaluation in each social dialogue task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. PARADISE: A framework for evaluating spoken dialogue agents. In Annual Meeting of the Association for Computational Linguistics (ACL) and Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 271–280, 1997.
  2. Doreen Ying Ying Sim and Chu Kiong Loo. Extensive assessment and evaluation methodologies on assistive social robots for modelling human–robot interaction – a review. Information Sciences, 301:305–344, 2015.
  3. Technical metrics used to evaluate health care chatbots: Scoping review. Journal of medical Internet research, 22(6), 2020.
  4. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review, 54:755–810, 2021.
  5. Automatic evaluation and moderation of open-domain dialogue systems. arXiv preprint, 2111.02110, 2021.
  6. Recent advances in deep learning based dialogue systems: A systematic survey. Artificial intelligence review, 56:3055–3155, 2023.
  7. BLEU: a method for automatic evaluation of machine translation. In Annual Meeting of the Association for Computational Linguistics (ACL), page 311–318, 2002.
  8. A diversity-promoting objective function for neural conversation models. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pages 110–119, 2016.
  9. The second dialog state tracking challenge. In Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial), pages 263–272, 2014.
  10. MultiWOZ - A large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5016–5026, 2018.
  11. Ada and grace: Toward realistic and engaging virtual museum guides. In The annual conference on Intelligent Virtual Agents (IVA), pages 286–300, 2010.
  12. Human-like guide robot that proactively explains exhibits. International Journal of Social Robotics, 12:549–566, 2020.
  13. SimSensei Kiosk: A virtual human interviewer for healthcare decision support. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1061–1068, 2014.
  14. Conversational agents in healthcare: A systematic review. Journal of the American Medical Informatics Association, 25(9):1248–1258, 2018.
  15. Use of social robots in mental health and well-being research: Systematic review. Journal of medical Internet research, 21(7), 2019.
  16. Potential applications of social robots in robot-assisted interventions for social anxiety. International Journal of Social Robotics, 14:1–32, 2022.
  17. Counseling dialog system with 5W1H extraction. In Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial), 2013.
  18. Spoken dialog systems for automated survey interviewing. In Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial), pages 329–333, 2013.
  19. An open-source dialog system with real-time engagement tracking for job interview training applications. In International Workshop on Spoken Dialogue Systems Technology (IWSDS), 2017.
  20. Job interviewer android with elaborate follow-up question generation. In International Conference on Multimodal Interaction (ICMI), pages 324–332, 2020.
  21. Timing in turn-taking and its implications for processing models of language. Frontiers in Psychology, 6(731):1–17, 2015.
  22. Gabriel Skantze. Turn-taking in conversational systems and human-robot interaction: A review. Computer Speech & Language, 67:1–26, 2021.
  23. From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology. In Annual Meeting of the Association for Computational Linguistics (ACL), pages 5614–5633, 2022.
  24. Talking with ERICA, an autonomous android. In Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial), pages 212–215, 2016.
  25. An attentive listening system with android ERICA: Comparison of autonomous and WOZ interactions. In Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGdial), pages 118–127, 2020.
  26. A unified approach to interpreting model predictions. In Neural Information Processing Systems (NeurIPS), 2017.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.