Evaluation of Session-based Recommendation Algorithms (1803.09587v2)

Published 26 Mar 2018 in cs.IR, cs.AI, cs.LG, and cs.NE

Abstract: Recommender systems help users find relevant items of interest, for example on e-commerce or media streaming sites. Most academic research is concerned with approaches that personalize the recommendations according to long-term user profiles. In many real-world applications, however, such long-term profiles often do not exist and recommendations therefore have to be made solely based on the observed behavior of a user during an ongoing session. Given the high practical relevance of the problem, an increased interest in this problem can be observed in recent years, leading to a number of proposals for session-based recommendation algorithms that typically aim to predict the user's immediate next actions. In this work, we present the results of an in-depth performance comparison of a number of such algorithms, using a variety of datasets and evaluation measures. Our comparison includes the most recent approaches based on recurrent neural networks like GRU4REC, factorized Markov model approaches such as FISM or FOSSIL, as well as simpler methods based, e.g., on nearest neighbor schemes. Our experiments reveal that algorithms of this latter class, despite their sometimes almost trivial nature, often perform equally well or significantly better than today's more complex approaches based on deep neural networks. Our results therefore suggest that there is substantial room for improvement regarding the development of more sophisticated session-based recommendation algorithms.

View on arXiv

Authors (2)

Malte Ludewig (2 papers)
Dietmar Jannach (53 papers)

Citations (280)

View on Semantic Scholar

Summary

Evaluation of Session-Based Recommendation Algorithms

The paper "Evaluation of Session-based Recommendation Algorithms" provides a comprehensive comparative analysis of various algorithms designed for session-based recommendation tasks. It addresses the challenge where user profiles are not available, necessitating recommendations based solely on a user’s in-session actions. The research highlights the performance of multiple algorithmic techniques using a diverse set of datasets from different domains such as e-commerce, music, and news.

Methodologies Compared

The paper evaluates several classes of algorithms:

Baseline Methods: Simple association rules (ar), Markov Chains (mc), Sequential Rules (sr), and Bayesian Personalized Ranking (bpr-mf) are employed as baseline techniques. These methods are notable for their low computational complexity and ability to quickly model user interactions.
Nearest Neighbors: Both item-based (iknn) and session-based (sknn, v-sknn, s-sknn, sf-sknn) nearest neighbor methods are considered. These algorithms leverage historical session data to find correlations between sessions and predict the next item.
Neural Networks: The authors investigate the usage of Recurrent Neural Networks (RNNs), focusing on the gru4rec model which incorporates Gated Recurrent Units optimized for session-parallel mini-batch training.
Factorization-Based Models: Traditional matrix factorization methods are extended to account for sequences, including Factorized Personalized Markov Chains (fpmc), Factored Item Similarity Models (fism), and fossil. Additionally, the paper introduces a novel method called Session-based Matrix Factorization (smf), aiming to embed session preferences effectively.

Key Findings

Prediction Accuracy: The paper finds that while complex models like gru4rec show competitive performance on traditional next-item prediction tasks, simpler session-based nearest neighbor approaches (especially v-sknn) excel across different datasets. These models often outperform deep learning models in real-world recommendation scenarios where session data available is sparse.
Computation and Memory Efficiency: The research exposes the high computational demands and memory requirements of neural network and factorization-based approaches such as gru4rec and smf. In contrast, nearest neighbor methods, especially with in-memory indexing and sampling, demonstrate efficiency suitable for practical applications.
Diversity and Popularity Bias: Factorization-based methods typically achieve higher item coverage, indicating a greater diversity of recommendations. In contrast, gru4rec often displays a narrow recommendation spectrum. Algorithms such as bpr-mf reveal a proclivity towards recommending popular items heavily.
Temporal Effects: Observations on user actions highlight that e-commerce and media consumption patterns often align more closely with recent behavior. Consequently, the authors suggest that focusing on recent session data can yield better accuracy, supporting the use of recency-biased neighborhood sampling strategies.

Implications and Future Directions

The paper identifies substantial room for improvement in session-based recommendation research. The findings suggest that simpler models could serve as robust baselines, prompting researchers to refine complex models with consideration for computational efficiency and coverage.

Future pursuits might explore optimizing hybrid approaches that incorporate both long-term user preferences and short-term session dynamics. The integration of external contextual data or user annotations remains a promising direction to improve prediction and personalization capabilities. Furthermore, evolving algorithmic frameworks to handle evolving item catalogs, as seen in rapidly changing domains like news recommendations, will be critical.

Ultimately, the research stresses the importance of rigorous evaluation protocols and the need for standard benchmark datasets to propel advancements in session-based recommendation technologies. This approach encourages a balance between innovative model architectures and practical applicability in real-world scenarios.

PDF Markdown

Related Papers

Find Related Papers