Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 177 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Learning from Implicit User Feedback, Emotions and Demographic Information in Task-Oriented and Document-Grounded Dialogues (2401.09248v2)

Published 17 Jan 2024 in cs.CL and cs.HC

Abstract: Implicit user feedback, user emotions and demographic information have shown to be promising sources for improving the accuracy and user engagement of responses generated by dialogue systems. However, the influence of such information on task completion and factual consistency, which are important criteria for task-oriented and document-grounded dialogues, is not yet known. To address this, we introduce FEDI, the first English task-oriented and document-grounded dialogue dataset annotated with this information. Our experiments with Flan-T5, GPT-2 and Llama 2 show a particularly positive impact on task completion and factual consistency. Participants in our human evaluation reported that the responses generated by the feedback-trained models were more informative (Flan-T5 and GPT-2), relevant and factual consistent (Llama 2).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Theo Araujo. 2018. Living up to the chatbot hype: The influence of anthropomorphic design cues and communicative agency framing on conversational agent and company perceptions. Computers in Human Behavior, 85:183–189.
  2. MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, Brussels, Belgium. Association for Computational Linguistics.
  3. Ana Paula Chaves and Marco Aurelio Gerosa. 2021. How should my chatbot interact? a survey on social characteristics in human–chatbot interaction design. International Journal of Human–Computer Interaction, 37(8):729–758.
  4. KETOD: Knowledge-enriched task-oriented dialogue. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2581–2593, Seattle, United States. Association for Computational Linguistics.
  5. Marked personas: Using natural language prompts to measure stereotypes in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1504–1532, Toronto, Canada. Association for Computational Linguistics.
  6. Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
  7. In the shades of the uncanny valley: An experimental study of human–chatbot interaction. Future Generation Computer Systems, 92:539–548.
  8. MEISD: A multimodal multi-label emotion, intensity and sentiment dialogue dataset for emotion recognition and sentiment analysis in conversations. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4441–4453, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  9. Learning from dialogue after deployment: Feed yourself, chatbot! In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3667–3684, Florence, Italy. Association for Computational Linguistics.
  10. q2superscript𝑞2q^{2}italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT: Evaluating factual consistency in knowledge-grounded dialogues via question generation and question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7856–7870, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  11. A simple language model for task-oriented dialogue. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  12. EmotionLines: An emotion corpus of multi-party conversations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
  13. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  14. Aligning language models to user opinions. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5906–5919, Singapore. Association for Computational Linguistics.
  15. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
  16. SODA: Million-scale dialogue distillation with social commonsense contextualization. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12930–12949, Singapore. Association for Computational Linguistics.
  17. The INCEpTION platform: Machine-assisted and knowledge-oriented interactive annotation. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 5–9, Santa Fe, New Mexico. Association for Computational Linguistics.
  18. Klaus Krippendorff. 2006. Reliability in Content Analysis: Some Common Misconceptions and Recommendations. Human Communication Research, 30(3):411–433.
  19. Language generation models can cause harm: So what can we do about it? an actionable survey. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3299–3321, Dubrovnik, Croatia. Association for Computational Linguistics.
  20. PERSONACHATGEN: Generating personalized dialogues using GPT-3. In Proceedings of the 1st Workshop on Customized Chat Grounding Persona and Knowledge, pages 29–48, Gyeongju, Republic of Korea. Association for Computational Linguistics.
  21. NormDial: A comparable bilingual synthetic dialog dataset for modeling social norm adherence and violation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15732–15744, Singapore. Association for Computational Linguistics.
  22. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
  23. Learning personalized end-to-end goal-oriented dialog. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 6794–6801. AAAI Press.
  24. Expertqa: Expert-curated questions and attributed answers. CoRR, abs/2309.07852.
  25. Philip M. McCarthy and Scott Jarvis. 2010. MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2):381–392.
  26. Wei Peng Minjin Rheu, Ji Youn Shin and Jina Huh-Yoo. 2021. Systematic review: Trust-building factors and implications for conversational agent design. International Journal of Human–Computer Interaction, 37(1):81–96.
  27. Tomáš Nekvinda and Ondřej Dušek. 2021. Shades of BLEU, flavours of success: The case of MultiWOZ. In Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), pages 34–46, Online. Association for Computational Linguistics.
  28. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
  29. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  30. Don’t blame the annotator: Bias already starts in the annotation instructions. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1779–1789, Dubrovnik, Croatia. Association for Computational Linguistics.
  31. What makes an AI device human-like? The role of interaction quality, empathy and perceived psychological anthropomorphic characteristics in the acceptance of artificial intelligence in the service industry. Computers in Human Behavior, 122:106855.
  32. Learning from free-text human feedback – collect new datasets or extend existing ones? In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16259–16279, Singapore. Association for Computational Linguistics.
  33. On releasing annotator-level labels and information in datasets. In Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop, pages 133–138, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  34. Language models are unsupervised multitask learners.
  35. Towards empathetic open-domain conversation models: A new benchmark and dataset. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5370–5381, Florence, Italy. Association for Computational Linguistics.
  36. Personalizing task-oriented dialog systems via zero-shot generalizable reward function. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM ’22, page 1787–1797, New York, NY, USA. Association for Computing Machinery.
  37. The sensitivity of annotator bias to task definitions in argument mining. In Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022, pages 44–61, Marseille, France. European Language Resources Association.
  38. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
  39. SaFeRDialogues: Taking feedback gracefully after conversational safety failures. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6462–6481, Dublin, Ireland. Association for Computational Linguistics.
  40. Evaluate on-the-job learning dialogue systems and a case study for natural language understanding. CoRR, abs/2102.13589.
  41. Incremental learning from scratch for task-oriented dialogue systems. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3710–3720, Florence, Italy. Association for Computational Linguistics.
  42. Improving open language models by learning from organic interactions. CoRR, abs/2306.04707.
  43. Learning new skills after deployment: Improving open-domain internet-driven dialogue with human feedback. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13557–13572, Toronto, Canada. Association for Computational Linguistics.
  44. RefGPT: Dialogue generation of GPT, by GPT, and for GPT. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2511–2535, Singapore. Association for Computational Linguistics.
  45. Jennifer Zamora. 2017. I’m sorry, dave, i’m afraid i can’t do that: Chatbot perception and expectations. In Proceedings of the 5th International Conference on Human Agent Interaction, HAI ’17, page 253–260, New York, NY, USA. Association for Computing Machinery.
  46. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204–2213, Melbourne, Australia. Association for Computational Linguistics.
  47. Bertscore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
  48. Siren’s song in the AI ocean: A survey on hallucination in large language models. CoRR, abs/2309.01219.

Summary

  • The paper demonstrates that combining implicit user feedback, emotions, and demographic data significantly improves dialogue system performance.
  • It employs advanced language models like FLAN-T5, GPT-2, and LLaMA-2 to validate enhanced task completion and factual coherence.
  • A mixed methodology using synthetic generation and human evaluations underpins the dataset, paving the way for future multimodal dialogue research.

Introduction to FEDI

The development of dialogue systems often focuses on task completion and factual accuracy. However, success in human-computer interaction significantly hinges on user acceptance and enjoyment. A novel dataset named FEDI (Feedback, Emotions, and Demographic Information) offers a groundbreaking approach to enriching task-oriented document-grounded dialogue systems. By integrating implicit user feedback with emotional and demographic layers, FEDI aims to catalyze improvements in user engagement and the reliability of system responses.

Innovation in Dialogue Systems

Current natural language processing research tends to isolate demographic data, emotional states, and implicit feedback when training dialogue systems, which overlooks the interconnectedness of these aspects. FEDI, however, brings these elements into a cohesive analytical model. It is a pioneering English dialogue dataset that marks the first attempt in the field to incorporate this trichotomy for task-oriented dialogue advancement. Its structure consists of 8,000 dialogues, with 6,000 carrying annotations for implicit user feedback. Using powerful LLMs like FLAN-T5, GPT-2, and LLaMA-2, the paper showcases how this triad of information can significantly enhance task resolution and the factual coherence of generated dialogues.

FEDI's Dataset Creation

The inventiveness of FEDI extends to its generation process, utilizing the GPT-3.5-Turbo model to efficiently create high-quality training and validation dialogue data. To supplement this and address the limitations of synthetic data, human evaluators were employed in quality assessment and in compiling a separate set of test dialogues. This mixed methodology ensures the robustness of the dataset. FEDI also stands out by incorporating feedback scenarios relevant to a variety of contexts, as opposed to the typically narrow focus seen in preceding works.

Insights and Future Directions

Through rigorous experimentation, the paper vindicates that combining demographic information, user emotions, and implicit feedback can indeed fortify task completions and the quality of generated responses. Further, human evaluations affirm a palpable improvement in the systems' capability to engage users, a testament to the potential of multi-faceted, human-centric datasets like FEDI. Beyond its immediate applications, the paper sets the stage for subsequent explorations, contemplating enhancements in FEDI's annotation quality and a broader generalization across tasks and domains. It even proposes expanding into multimodal territory, integrating visual and audio signals to further the dialogue system's empathy and intuitive responses.

The incorporation of complex human aspects such as emotions and implicit signals into dialogue systems marks a remarkable step forward. As reflected by FEDI's promising start, future dialogue systems that understand not just the words but the feelings and backgrounds of their users can transform interactions from merely transactional to genuinely interactive experiences.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 8 tweets and received 25 likes.

Upgrade to Pro to view all of the tweets about this paper: