Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What Are We Optimizing For? A Human-centric Evaluation of Deep Learning-based Movie Recommenders (2401.11632v2)

Published 21 Jan 2024 in cs.IR, cs.HC, and cs.LG

Abstract: In the past decade, deep learning (DL) models have gained prominence for their exceptional accuracy on benchmark datasets in recommender systems (RecSys). However, their evaluation has primarily relied on offline metrics, overlooking direct user perception and experience. To address this gap, we conduct a human-centric evaluation case study of four leading DL-RecSys models in the movie domain. We test how different DL-RecSys models perform in personalized recommendation generation by conducting survey study with 445 real active users. We find some DL-RecSys models to be superior in recommending novel and unexpected items and weaker in diversity, trustworthiness, transparency, accuracy, and overall user satisfaction compared to classic collaborative filtering (CF) methods. To further explain the reasons behind the underperformance, we apply a comprehensive path analysis. We discover that the lack of diversity and too much serendipity from DL models can negatively impact the consequent perceived transparency and personalization of recommendations. Such a path ultimately leads to lower summative user satisfaction. Qualitatively, we confirm with real user quotes that accuracy plus at least one other attribute is necessary to ensure a good user experience, while their demands for transparency and trust can not be neglected. Based on our findings, we discuss future human-centric DL-RecSys design and optimization strategies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (92)
  1. Recommender systems as multistakeholder environments. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. 347–348.
  2. Gediminas Adomavicius and YoungOk Kwon. 2011. Improving aggregate recommendation diversity using ranking-based techniques. IEEE Transactions on Knowledge and Data Engineering 24, 5 (2011), 896–911.
  3. Explainability in music recommender systems. AI Magazine 43, 2 (2022), 190–208.
  4. Ahmad Hassan Afridi and Fatma Outay. 2021. Triggers and connection-making for serendipity via user interface in recommender systems. Personal and Ubiquitous Computing 25, 1 (2021), 77–92.
  5. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10, 7 (2015), e0130140.
  6. Explainable deep learning for efficient and robust pattern recognition: A survey of recent developments. Pattern Recognition 120 (2021), 108102.
  7. Transparent, scrutable and explainable user models for personalized recommendation. In Proceedings of the 42nd international acm sigir conference on research and development in information retrieval. 265–274.
  8. Intent-Satisfaction Modeling: From Music to Video Streaming. ACM Transactions on Recommender Systems 1, 3 (2023), 1–23.
  9. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263 (2017).
  10. A hybrid recommendation system with many-objective evolutionary algorithm. Expert Systems with Applications 159 (2020), 113648.
  11. How serendipity improves user satisfaction with recommendations? a large-scale user evaluation. In The world wide web conference. 240–250.
  12. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 785–794.
  13. Joint neural collaborative filtering for recommender systems. ACM Transactions on Information Systems (TOIS) 37, 4 (2019), 1–30.
  14. Measuring” why” in recommender systems: A comprehensive survey on the evaluation of explainable recommendation. arXiv preprint arXiv:2202.06466 (2022).
  15. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In 2016 IEEE symposium on security and privacy (SP). IEEE, 598–617.
  16. Transformers4rec: Bridging the gap between nlp and sequential/session-based recommendation. In Proceedings of the 15th ACM Conference on Recommender Systems. 143–153.
  17. Explaining recommender systems fairness and accuracy through the lens of data characteristics. Information Processing & Management 58, 5 (2021), 102662.
  18. Recommender systems under European AI regulations. Commun. ACM 65, 4 (2022), 69–73.
  19. Sequential user-based recurrent neural network recommendations. In Proceedings of the eleventh ACM conference on recommender systems. 152–160.
  20. User perception of differences in recommender algorithms. In Proceedings of the 8th ACM Conference on Recommender systems. 161–168.
  21. Sequential recommendation via stochastic self-attention. In Proceedings of the ACM Web Conference 2022. 2036–2047.
  22. Integrating keywords into BERT4Rec for sequential recommendation. In KI 2020: Advances in Artificial Intelligence: 43rd German Conference on AI, Bamberg, Germany, September 21–25, 2020, Proceedings 43. Springer, 275–282.
  23. Statistics (international student edition). Pisani, R. Purves, 4th edn. WW Norton & Company, New York (2007).
  24. Deep Learning Models for Serendipity Recommendations: A Survey and New Perspectives. Comput. Surveys (2023).
  25. Contextual hybrid session-based news recommendation with recurrent neural networks. IEEE Access 7 (2019), 169185–169203.
  26. Chat-rec: Towards interactive and explainable llms-augmented recommender system. arXiv preprint arXiv:2303.14524 (2023).
  27. Matt W Gardner and SR Dorling. 1998. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment 32, 14-15 (1998), 2627–2636.
  28. Fairness-aware ranking in search & recommendation systems with application to linkedin talent search. In Proceedings of the 25th acm sigkdd international conference on knowledge discovery & data mining. 2221–2231.
  29. From ratings to trust: an empirical study of implicit trust in recommender systems. In Proceedings of the 29th annual acm symposium on applied computing. 248–253.
  30. GLocal-K: Global and local kernels for recommender systems. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3063–3067.
  31. Taha Hassan. 2019. Trust and trustworthiness in social recommender systems. In Companion proceedings of the 2019 world wide web conference. 529–532.
  32. Trirank: Review-aware explainable recommendation by modeling aspects. In Proceedings of the 24th ACM international on conference on information and knowledge management. 1661–1670.
  33. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.
  34. Jerry L Hintze and Ray D Nelson. 1998. Violin plots: a box plot-density trace synergism. The American Statistician 52, 2 (1998), 181–184.
  35. Timothy O Hodson. 2022. Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geoscientific Model Development 15, 14 (2022), 5481–5487.
  36. GRSAT: a novel method on group recommendation by social affinity and trustworthiness. Cybernetics and Systems 48, 3 (2017), 140–161.
  37. A deep reinforcement learning based long-term recommender system. Knowledge-Based Systems 213 (2021), 106706.
  38. Nicolas Hug. 2020. Surprise: A Python library for recommender systems. Journal of Open Source Software 5, 52 (2020), 2174.
  39. Marius Kaminskas and Derek Bridge. 2016. Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Transactions on Interactive Intelligent Systems (TiiS) 7, 1 (2016), 1–42.
  40. Personalized explanations for hybrid recommender systems. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 379–390.
  41. Let me explain: Impact of personal and impersonal explanations on trust in recommender systems. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–12.
  42. Improving transparency of deep neural inference process. Progress in Artificial Intelligence 8 (2019), 273–285.
  43. Shall I trust a recommendation? Towards an evaluation of the trustworthiness of recommender sites. In Advances in Databases and Information Systems: Associated Workshops and Doctoral Consortium of the 13th East European Conference, ADBIS 2009, Riga, Latvia, September 7-10, 2009. Revised Selected Papers 13. Springer, 121–128.
  44. User fairness in recommender systems. In Companion Proceedings of the The Web Conference 2018. 101–102.
  45. PURS: personalized unexpected recommender system for improving user satisfaction. In Proceedings of the 14th ACM Conference on Recommender Systems. 279–288.
  46. Stan Lipovetsky and Michael Conklin. 2001. Analysis of regression in game theory approach. Applied Stochastic Models in Business and Industry 17, 4 (2001), 319–330.
  47. EDMF: Efficient deep matrix factorization with review feature learning for industrial recommender system. IEEE Transactions on Industrial Informatics 18, 7 (2021), 4361–4371.
  48. Explainable recommender systems via resolving learning representations. In Proceedings of the 29th ACM international conference on information & knowledge management. 895–904.
  49. Empirical analysis of session-based recommendation algorithms: A comparison of neural and non-neural approaches. User Modeling and User-Adapted Interaction 31 (2021), 149–181.
  50. Fairmatch: A graph-based approach for improving aggregate diversity in recommender systems. In Proceedings of the 28th ACM conference on user modeling, adaptation and personalization. 154–162.
  51. Recommender systems and their ethical challenges. Ai & Society 35 (2020), 957–967.
  52. Methods for interpreting and understanding deep neural networks. Digital signal processing 73 (2018), 1–15.
  53. Leann Myers and Maria J Sirois. 2004. Spearman correlation coefficients, differences between. Encyclopedia of statistical sciences 12 (2004).
  54. User personality and user satisfaction with recommender systems. Information Systems Frontiers 20 (2018), 1173–1189.
  55. BERTERS: Multimodal Representation Learning for Expert Recommendation System with Transformer. arXiv preprint arXiv:2007.07229 (2020).
  56. Xi Niu and Ahmad Al-Doulat. 2021. LuckyFind: Leveraging surprise to improve user satisfaction and inspire curiosity in a recommender system. In Proceedings of the 2021 Conference on Human Information Interaction and Retrieval. 163–172.
  57. John O’Donovan and Barry Smyth. 2005. Trust in recommender systems. In Proceedings of the 10th international conference on Intelligent user interfaces. 167–174.
  58. Zachary A Pardos and Weijie Jiang. 2020. Designing for serendipity in a university course recommendation system. In Proceedings of the tenth international conference on learning analytics & knowledge. 350–359.
  59. Assessing the Impact of Music Recommendation Diversity on Listeners: A Longitudinal Study. ACM Trans. Recomm. Syst. (jul 2023). https://doi.org/10.1145/3608487 Just Accepted.
  60. A user-centric evaluation framework for recommender systems. In Proceedings of the fifth ACM conference on Recommender systems. 157–164.
  61. Explanation methods in deep learning: Users, values, concerns and challenges. Explainable and interpretable models in computer vision and machine learning (2018), 19–36.
  62. Neural collaborative filtering vs. matrix factorization revisited. In Proceedings of the 14th ACM Conference on Recommender Systems. 240–248.
  63. ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
  64. Recommender systems: Techniques, applications, and challenges. Recommender Systems Handbook (2021), 1–35.
  65. Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models. arXiv preprint arXiv:1708.08296 (2017).
  66. Towards a knowledge based explainable recommender systems. In Proceedings of the 4th International Conference on Big Data and Internet of Things. 1–5.
  67. A brief review on search engine optimization. In 2019 9th international conference on cloud computing, data science & engineering (confluence). IEEE, 687–692.
  68. Donghee Shin. 2020. User perceptions of algorithmic decisions in the personalized AI system: Perceptual evaluation of fairness, accountability, transparency, and explainability. Journal of Broadcasting & Electronic Media 64, 4 (2020), 541–565.
  69. Donghee Shin and Yong Jin Park. 2019. Role of fairness, accountability, and transparency in algorithmic affordance. Computers in Human Behavior 98 (2019), 277–284.
  70. Learning important features through propagating activation differences. In International conference on machine learning. PMLR, 3145–3153.
  71. How good your recommender system is? A survey on evaluations in recommendation. International Journal of Machine Learning and Cybernetics 10 (2019), 813–831.
  72. Fairness and transparency in recommendation: The users’ perspective. In Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization. 274–279.
  73. Examining the User Evaluation of Multi-list Recommender Interfaces in the Context of Healthy Recipe Choices. ACM Transactions on Recommender Systems (2023).
  74. Building human values into recommender systems: An interdisciplinary synthesis. arXiv preprint arXiv:2207.10192 (2022).
  75. Erik Štrumbelj and Igor Kononenko. 2014. Explaining prediction models and individual predictions with feature contributions. Knowledge and information systems 41 (2014), 647–665.
  76. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441–1450.
  77. Jodie B Ullman and Peter M Bentler. 2012. Structural equation modeling. Handbook of Psychology, Second Edition 2 (2012).
  78. The tag genome: Encoding community knowledge to support novel interaction. ACM Transactions on Interactive Intelligent Systems (TiiS) 2, 3 (2012), 1–44.
  79. Alexandra Vultureanu-Albişi and Costin Bădică. 2021. Recommender systems: An explainable AI perspective. In 2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA). IEEE, 1–6.
  80. Exploring high-order user preference on the knowledge graph for recommender systems. ACM Transactions on Information Systems (TOIS) 37, 3 (2019), 1–26.
  81. Knowledge graph convolutional networks for recommender systems. In The world wide web conference. 3307–3313.
  82. A survey on the fairness of recommender systems. ACM Transactions on Information Systems 41, 3 (2023), 1–43.
  83. One recommender fits All? An exploration of user satisfaction with text-based news recommender systems. Media and Communication 9, 4 (2021), 208–221.
  84. SSE-PT: Sequential recommendation via personalized transformer. In Proceedings of the 14th ACM Conference on Recommender Systems. 328–337.
  85. Teng Xiao and Hong Shen. 2019. Neural variational matrix factorization for collaborative filtering in recommendation systems. Applied Intelligence 49 (2019), 3558–3569.
  86. Explainable AI: A brief survey on history, research areas, approaches and challenges. In Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part II 8. Springer, 563–574.
  87. Fairness in ranking, part ii: Learning-to-rank and recommender systems. Comput. Surveys 55, 6 (2022), 1–41.
  88. Bridging user interest to item content for recommender systems: An optimization model. IEEE transactions on cybernetics 50, 10 (2019), 4268–4280.
  89. Deep learning based recommender system: A survey and new perspectives. ACM computing surveys (CSUR) 52, 1 (2019), 1–38.
  90. Interactive recommender system via knowledge graph-enhanced reinforcement learning. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. 179–188.
  91. Fairness among new items in cold start recommender systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 767–776.
  92. Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on World Wide Web. 22–32.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets