Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommender Systems (2404.11773v2)

Published 17 Apr 2024 in cs.IR and cs.AI

Abstract: LLMs have demonstrated great potential in Conversational Recommender Systems (CRS). However, the application of LLMs to CRS has exposed a notable discrepancy in behavior between LLM-based CRS and human recommenders: LLMs often appear inflexible and passive, frequently rushing to complete the recommendation task without sufficient inquiry.This behavior discrepancy can lead to decreased accuracy in recommendations and lower user satisfaction. Despite its importance, existing studies in CRS lack a study about how to measure such behavior discrepancy. To fill this gap, we propose Behavior Alignment, a new evaluation metric to measure how well the recommendation strategies made by a LLM-based CRS are consistent with human recommenders'. Our experiment results show that the new metric is better aligned with human preferences and can better differentiate how systems perform than existing evaluation metrics. As Behavior Alignment requires explicit and costly human annotations on the recommendation strategies, we also propose a classification-based method to implicitly measure the Behavior Alignment based on the responses. The evaluation results confirm the robustness of the method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Tallrec: An effective and efficient tuning framework to align large language model with recommendation” In arXiv preprint arXiv:2305.00447, 2023
  2. “Towards knowledge-based recommender dialog system” In arXiv preprint arXiv:1908.05391, 2019
  3. “Survey on evaluation methods for dialogue systems” In Artificial Intelligence Review 54 Springer, 2021, pp. 755–810
  4. “Leveraging Large Language Models in Conversational Recommender Systems” In arXiv preprint arXiv:2305.07961, 2023
  5. “Advances and challenges in conversational recommender systems: A survey” In AI open 2 Elsevier, 2021, pp. 100–126
  6. “Inspired: Toward sociable recommendation dialog systems” In arXiv preprint arXiv:2009.14306, 2020
  7. “Large language models as zero-shot conversational recommenders” In arXiv preprint arXiv:2308.10053, 2023
  8. “Large language models are zero-shot rankers for recommender systems” In arXiv preprint arXiv:2305.08845, 2023
  9. Dietmar Jannach “Evaluating conversational recommender systems: A landscape of research” In Artificial Intelligence Review 56.3 Springer, 2023, pp. 2365–2400
  10. “OpenAssistant Conversations–Democratizing Large Language Model Alignment” In arXiv preprint arXiv:2304.07327, 2023
  11. “Large language models for generative recommendation: A survey and visionary discussions” In arXiv preprint arXiv:2309.01157, 2023
  12. “Towards deep conversational recommendations” In Advances in neural information processing systems 31, 2018
  13. Lizi Liao, Grace Hui Yang and Chirag Shah “Proactive Conversational Agents in the Post-ChatGPT World” In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023, pp. 3452–3455
  14. “Training language models to follow instructions with human feedback” In Advances in Neural Information Processing Systems 35, 2022, pp. 27730–27744
  15. “The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only” In arXiv preprint arXiv:2306.01116, 2023
  16. Damien Sileo, Wout Vossen and Robbe Raymaekers “Zero-shot recommendation as language modeling” In European Conference on Information Retrieval, 2022, pp. 223–230 Springer
  17. “Conversational recommender system” In The 41st international acm sigir conference on research & development in information retrieval, 2018, pp. 235–244
  18. “Llama 2: Open foundation and fine-tuned chat models” In arXiv preprint arXiv:2307.09288, 2023
  19. “Towards unified conversational recommender systems via knowledge-enhanced prompt learning” In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 1929–1937
  20. “Emergent abilities of large language models” In arXiv preprint arXiv:2206.07682, 2022
  21. Gangyi Zhang “User-Centric Conversational Recommendation: Adapting the Need of User with Large Language Models” In Proceedings of the 17th ACM Conference on Recommender Systems, 2023, pp. 1349–1354
  22. “Recommendation as instruction following: A large language model empowered recommendation approach” In arXiv preprint arXiv:2305.07001, 2023
  23. “Evaluating conversational recommender systems via user simulation” In Proceedings of the 26th acm sigkdd international conference on knowledge discovery & data mining, 2020, pp. 1512–1520
  24. “CRSLab: An open-source toolkit for building conversational recommender system” In arXiv preprint arXiv:2101.00939, 2021
  25. “Improving conversational recommender systems via knowledge graph based semantic fusion” In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 1006–1014
  26. “Towards topic-guided conversational recommender system” In arXiv preprint arXiv:2010.04125, 2020
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Dayu Yang (8 papers)
  2. Fumian Chen (5 papers)
  3. Hui Fang (48 papers)
Citations (3)