Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models as Conversational Movie Recommenders: A User Study (2404.19093v1)

Published 29 Apr 2024 in cs.IR, cs.AI, and cs.HC

Abstract: This paper explores the effectiveness of using LLMs for personalized movie recommendations from users' perspectives in an online field experiment. Our study involves a combination of between-subject prompt and historic consumption assessments, along with within-subject recommendation scenario evaluations. By examining conversation and survey response data from 160 active users, we find that LLMs offer strong recommendation explainability but lack overall personalization, diversity, and user trust. Our results also indicate that different personalized prompting techniques do not significantly affect user-perceived recommendation quality, but the number of movies a user has watched plays a more significant role. Furthermore, LLMs show a greater ability to recommend lesser-known or niche movies. Through qualitative analysis, we identify key conversational patterns linked to positive and negative user interaction experiences and conclude that providing personal context and examples is crucial for obtaining high-quality recommendations from LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Hervé Abdi and Lynne J Williams. 2010. Tukey’s honestly significant difference (HSD) test. Encyclopedia of research design 3, 1 (2010), 1–5.
  2. Llm based generation of item-description for recommendation system. In Proceedings of the 17th ACM Conference on Recommender Systems. 1204–1207.
  3. Recommendation systems and machine learning: Mapping the user experience. In Design, User Experience, and Usability. Design for Contemporary Interactive Environments: 9th International Conference, DUXU 2020, Held as Part of the 22nd HCI International Conference, HCII 2020, Copenhagen, Denmark, July 19–24, 2020, Proceedings, Part II 22. Springer, 3–17.
  4. David Baidoo-Anu and Leticia Owusu Ansah. 2023. Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI 7, 1 (2023), 52–62.
  5. Examination of Ethical Principles for LLM-Based Recommendations in Conversational AI. In 2023 International Conference on Platform Technology and Service (PlatCon). IEEE, 109–113.
  6. Large language model assisted software engineering: prospects, challenges, and a case study. In International Conference on Bridging the Gap between AI and Reality. Springer, 355–374.
  7. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  8. Antony Bryant. 2013. The grounded theory method. In Reviewing qualitative research in the social sciences. Routledge, 108–124.
  9. Diego Carraro and Derek Bridge. 2024. Enhancing Recommendation Diversity by Re-ranking with Large Language Models. arXiv preprint arXiv:2401.11506 (2024).
  10. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. Journal of Medical Systems 47, 1 (2023), 33.
  11. Ana Paula Chaves and Marco Aurelio Gerosa. 2021. How should my chatbot interact? A survey on social characteristics in human–chatbot interaction design. International Journal of Human–Computer Interaction 37, 8 (2021), 729–758.
  12. Victoria Clarke and Virginia Braun. 2017. Thematic analysis. The journal of positive psychology 12, 3 (2017), 297–298.
  13. Uncovering chatgpt’s capabilities in recommender systems. In Proceedings of the 17th ACM Conference on Recommender Systems. 1126–1132.
  14. Dario Di Palma. 2023. Retrieval-augmented recommender system: Enhancing recommender systems with large language models. In Proceedings of the 17th ACM Conference on Recommender Systems. 1369–1373.
  15. User perception of differences in recommender algorithms. In Proceedings of the 8th ACM Conference on Recommender systems. 161–168.
  16. Emilio Ferrara. 2024. GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models. Journal of Computational Social Science (2024), 1–21.
  17. Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). arXiv preprint arXiv:2203.13366 (2022).
  18. VideolandGPT: A User Study on a Conversational Recommender System. arXiv preprint arXiv:2309.03645 (2023).
  19. Andrew F Hayes and Klaus Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication methods and measures 1, 1 (2007), 77–89.
  20. Large Language Models as Zero-Shot Conversational Recommenders. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (, Birmingham, United Kingdom,) (CIKM ’23). Association for Computing Machinery, New York, NY, USA, 720–730. https://doi.org/10.1145/3583780.3614949
  21. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232 (2023).
  22. Recommender ai agent: Integrating large language models for interactive recommendations. arXiv preprint arXiv:2308.16505 (2023).
  23. Personallm: Investigating the ability of gpt-3.5 to express personality traits and gender differences. arXiv preprint arXiv:2305.02547 (2023).
  24. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences 103 (2023), 102274.
  25. Explaining the user experience of recommender systems. User modeling and user-adapted interaction 22 (2012), 441–504.
  26. Joseph A Konstan and John Riedl. 2012. Recommender systems: from algorithms to user experience. User modeling and user-adapted interaction 22 (2012), 101–123.
  27. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
  28. Personalized Prompt Learning for Explainable Recommendation. ACM Trans. Inf. Syst. 41, 4, Article 103 (mar 2023), 26 pages. https://doi.org/10.1145/3580488
  29. Prompt-Based Generative News Recommendation (PGNR): Accuracy and Controllability. In European Conference on Information Retrieval. Springer, 66–79.
  30. Large language models in finance: A survey. In Proceedings of the Fourth ACM International Conference on AI in Finance. 374–382.
  31. Is ChatGPT a Good Recommender? A Preliminary Study. ArXiv abs/2304.10149 (2023). https://api.semanticscholar.org/CorpusID:258236609
  32. Is ChatGPT a Good Recommender? A Preliminary Study. arXiv:2304.10149 [cs.IR]
  33. Weizi Liu and Yanyun Wang. 2024. Evaluating Trust in Recommender Systems: A User Study on the Impacts of Explanations, Agency Attribution, and Product Types. International Journal of Human–Computer Interaction (2024), 1–13.
  34. Cutting down on prompts and parameters: Simple few-shot learning with language models. arXiv preprint arXiv:2106.13353 (2021).
  35. Jonathan Luckett. 2023. Regulating Generative AI: A Pathway to Ethical and Responsible Implementation. Journal of Computing Sciences in Colleges 39, 3 (2023), 47–65.
  36. Bertalan Meskó and Eric J Topol. 2023. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ digital medicine 6, 1 (2023), 120.
  37. Presentations by the Humans and For the Humans: Harnessing LLMs for Generating Persona-Aware Slides from Documents. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). 2664–2684.
  38. Improving the user experience with a conversational recommender system. In AI* IA 2018–Advances in Artificial Intelligence: XVIIth International Conference of the Italian Association for Artificial Intelligence, Trento, Italy, November 20–23, 2018, Proceedings 17. Springer, 528–538.
  39. Tien Tran Tu Quynh Nguyen. 2016. Enhancing user experience with recommender systems beyond prediction accuracies. Ph. D. Dissertation. University of Minnesota.
  40. Memgpt: Towards llms as operating systems. arXiv preprint arXiv:2310.08560 (2023).
  41. A user-centric evaluation framework for recommender systems. In Proceedings of the fifth ACM conference on Recommender systems. 157–164.
  42. Evaluating recommender systems from the user’s perspective: survey of the state of the art. User Modeling and User-Adapted Interaction 22 (2012), 317–355.
  43. Is chatgpt a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476 (2023).
  44. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  45. Learning to retrieve prompts for in-context learning. arXiv preprint arXiv:2112.08633 (2021).
  46. Large language models are competitive near cold-start recommenders for language-and item-based preferences. In Proceedings of the 17th ACM conference on recommender systems. 890–896.
  47. Zero-shot recommendation as language modeling. In European Conference on Information Retrieval. Springer, 223–230.
  48. Leveraging ChatGPT for Automated Human-centered Explanations in Recommender Systems. In Proceedings of the 29th International Conference on Intelligent User Interfaces. 597–608.
  49. The user experience of ChatGPT: Findings from a questionnaire study of early users. In Proceedings of the 5th International Conference on Conversational User Interfaces. 1–10.
  50. Large language models in medicine. Nature medicine 29, 8 (2023), 1930–1940.
  51. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  52. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.
  53. Understanding User Experience in Large Language Model Interactions. arXiv preprint arXiv:2401.08329 (2024).
  54. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 35 (2022), 24824–24837.
  55. Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–22.
  56. Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis. arXiv preprint arXiv:2401.04997 (2024).
  57. Large language model can interpret latent space of sequential recommender. arXiv preprint arXiv:2310.20487 (2023).
  58. Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems 36 (2024).
  59. Where to go next for recommender systems? id-vs. modality-based recommender models revisited. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2639–2649.
  60. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21.
  61. Is chatgpt fair for recommendation? evaluating fairness in large language model recommendation. In Proceedings of the 17th ACM Conference on Recommender Systems. 993–999.
  62. Language Models as Recommender Systems: Evaluations and Limitations. In I (Still) Can’t Believe It’s Not Better! NeurIPS 2021 Workshop.
  63. Zizhuo Zhang and Bang Wang. 2023. Prompt Learning for News Recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23). ACM. https://doi.org/10.1145/3539618.3591752
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ruixuan Sun (7 papers)
  2. Xinyi Li (97 papers)
  3. Avinash Akella (3 papers)
  4. Joseph A. Konstan (11 papers)
Citations (3)