A Comprehensive Review on Non-Neural Networks Collaborative Filtering Recommendation Systems
(2106.10679v2)
Published 20 Jun 2021 in cs.IR, cs.AI, and cs.LG
Abstract: Over the past two decades, recommender systems have attracted a lot of interest due to the explosion in the amount of data in online applications. A particular attention has been paid to collaborative filtering, which is the most widely used in applications that involve information recommendations. Collaborative filtering (CF) uses the known preference of a group of users to make predictions and recommendations about the unknown preferences of other users (recommendations are made based on the past behavior of users). First introduced in the 1990s, a wide variety of increasingly successful models have been proposed. Due to the success of machine learning techniques in many areas, there has been a growing emphasis on the application of such algorithms in recommendation systems. In this article, we present an overview of the CF approaches for recommender systems, their two main categories, and their evaluation metrics. We focus on the application of classical Machine Learning algorithms to CF recommender systems by presenting their evolution from their first use-cases to advanced Machine Learning models. We attempt to provide a comprehensive and comparative overview of CF systems (with python implementations) that can serve as a guideline for research and practice in this area.
The paper presents a comprehensive review of non-neural collaborative filtering methods, detailing both memory-based and model-based approaches.
It demonstrates that techniques like ratings normalization and explainability significantly boost prediction accuracy, with EMF achieving the lowest MAE.
The review offers practical Python implementations and comparative experiments on benchmark datasets, guiding both practitioners and researchers.
This paper, "A Comprehensive Review on Non-Neural Networks Collaborative Filtering Recommendation Systems" (Wenga et al., 2021), provides an extensive overview of classical (non-neural network) collaborative filtering (CF) techniques, their evolution, practical implementations, and evaluation. It aims to serve as a guide for practitioners and researchers by detailing various algorithms, offering Python implementations, and comparing their performance on benchmark datasets.
The review begins by introducing collaborative filtering as a process where the preferences of a group of users are used to predict the unknown preferences of others. User preferences, often represented as ratings in a user-item matrix, form the basis for these predictions.
Memory-Based Collaborative Filtering
Memory-based CF algorithms directly use the entire user-item interaction matrix to make predictions. They are primarily categorized into:
User-based CF:
Concept: Identifies users similar to an active user based on their rating patterns. Recommendations are then generated from items liked by these similar users but not yet rated by the active user.
Similarity Computation: Common metrics include:
Pearson Correlation: Measures the linear relationship between the ratings of two users on co-rated items.
where Nu is the set of neighbors of user u who have rated item i.
* Top-N Recommendation: Identifies the k most similar users and recommends the N most frequent/highly-rated items among them that the active user hasn't interacted with.
Item-based CF:
Concept: Identifies items similar to those an active user has liked in the past and recommends those similar items.
Similarity Computation: Similar metrics are used, but applied to item vectors (columns of the user-item matrix).
Adjusted Cosine Similarity: Addresses the issue that different users have different rating scales by subtracting each user's average rating from their ratings before computing cosine similarity between items.
where U is the set of users who rated both items i and j.
Prediction: Predicted rating R^u,i for user u on item i is a weighted average of user u's ratings on items similar to i:
R^u,i=∑j∈S(i)∣Wi,j∣∑j∈S(i)Ru,j⋅Wi,j
where S(i) is the set of items similar to item i that user u has rated.
Top-N Recommendation: For items Iu purchased by user u, candidate items C are formed by taking the union of k most similar items for each item in Iu (excluding items already in Iu). Similarities are aggregated, and items are sorted to get the top-N.
Implementation Considerations for Memory-based CF:
Sparsity: A major challenge, as similarity scores can be unreliable or undefined if there are few co-rated items/users. Imputation techniques can be used but might introduce bias.
Scalability: User-based CF can be computationally expensive for large datasets as neighborhood search happens at runtime. Item-based CF often scales better because item-item similarities can be pre-computed offline.
Cold Start: Difficulty in making recommendations for new users or new items with no interaction data.
The paper notes that the authors provide Python/Numpy/Pandas implementations for these models on GitHub.
Model-Based Collaborative Filtering
Model-based CF methods learn a model from the user-item interactions, which is then used for predictions. This review focuses on dimensionality reduction techniques.
Singular Value Decomposition (SVD):
Concept: Decomposes the m×n rating matrix R into R=PΣQT, where P and Q are orthogonal matrices representing user and item latent factors, and Σ is a diagonal matrix of singular values. Dimensionality is reduced by keeping only the k largest singular values (Σk).
Prediction: R^k=PkΣkQkT. The predicted rating for user u on item i is R^u,i=puTΣkΣkqi.
Implementation: Requires imputing missing values in R (e.g., with item means). Normalizing ratings (e.g., by subtracting user means) can improve accuracy.
Algorithm Steps:
1. Normalize rating matrix R→Rnorm.
2. Factor Rnorm to get P,Σ,Q.
3. Reduce Σ to Σk.
4. Compute PkΣk and ΣkQkT for predictions.
Matrix Factorization (MF) / Regularized SVD:
Concept: Directly learns latent factor vectors Pu∈Rk for each user u and Qi∈Rk for each item i. The predicted rating R^u,i=QiTPu.
Learning: Minimizes a regularized squared error cost function over known ratings:
where K is the set of known ratings and λ is the regularization parameter.
Optimization: Typically uses Stochastic Gradient Descent (SGD) with update rules:
eu,i=Ru,i−QiTPu
Qi←Qi+α(eu,iPu−λQi)
Pu←Pu+α(eu,iQi−λPu)
where α is the learning rate.
Advantage: Handles missing values directly without imputation, often more accurate than traditional SVD.
Probabilistic Matrix Factorization (PMF):
Concept: A probabilistic approach where ratings are assumed to be drawn from a Gaussian distribution with mean QiTPu. Gaussian priors are placed on Pu and Qi.
Learning: Maximizes the log-posterior, equivalent to minimizing a sum-of-squared-errors objective similar to MF but with potentially different regularization terms λP,λQ derived from prior variances:
Concept: Constrains the elements of factor matrices P and Q to be non-negative (P≥0,Q≥0). This allows for a parts-based representation and more interpretable latent factors.
Interpretation: Pu,l can represent the probability user u belongs to group l, and Qi,l the probability users in group l like item i.
Learning: Uses multiplicative update rules to maintain non-negativity while minimizing a similar objective function to PMF/MF.
(Note: The paper's equations 19 and 20 for NMF updates are slightly different, this is a common form). The paper uses R^u,i in the denominator, which is PuQiT.
Explainable Matrix Factorization (EMF):
Concept: Incorporates neighborhood-based explanations into the MF model to improve accuracy and provide justifications. An item i is explainable for user u if many of u's neighbors rated i.
(The paper's EMF objective structure (Eq. 25) for the third term is (Pu−Qi)2Wu,i, which appears unusual; typically the term would relate to how well the model aligns with the explainability score, or how similar latent factors of explainable items/users are).
Limitations of MF techniques:
Large number of parameters, potentially leading to overfitting and slow training.
Making predictions for new users/items requires re-optimization.
Linear transformations may not capture complex non-linear patterns.
Evaluation Metrics
The paper categorizes evaluation metrics as:
Prediction Accuracy: For rating prediction tasks.
Mean Absolute Error (MAE): ∣T∣1(u,i)∈T∑∣Ru,i−R^u,i∣
Root Mean Squared Error (RMSE): ∣T∣1(u,i)∈T∑(Ru,i−R^u,i)2
Coverage: Percentage of items for which the system can provide predictions.
Quality of Set of Recommendations: For evaluating relevance of a recommended set.
F1-score@N: Harmonic mean of Precision and Recall.
Quality of List of Recommendations: Considers the ranking of recommended items.
Mean Average Precision (MAP): Mean of average precision scores over all users.
Half-life Utility Rate: Assumes exponential decay in user interest down the list.
Discounted Cumulative Gain (DCG): Assigns higher value to relevant items at the top, with logarithmic decay. DCGk=∑i=1klog2(i+1)reli. Normalized DCG (nDCG) is often preferred.
Novelty and Diversity:
Novelty: Measures how new or surprising recommended items are to the user.
noveltyi=∣Zu∣−11j∈Zu,j=i∑(1−sim(i,j))
* Diversity: Measures how different items within a recommendation list are from each other.
Experiments were conducted on MovieLens ML-100K and ML-1M datasets using MAE.
User-based vs. Item-based CF:
Cosine similarity generally outperformed Euclidean distance for both.
Item-based CF showed lower MAE than User-based CF (e.g., on ML-1M with Cosine: 0.42 for Item-based vs. 0.73 for User-based). This supports the idea that item-item similarities are more stable.
Importance of Ratings Normalization:
For MF, normalizing ratings (e.g., by subtracting user mean) significantly reduced MAE (e.g., from ~1.48 to ~0.82 on ML-1M, a ~45% reduction).
NMF cannot be trained on standard normalized ratings if they become negative, due to the non-negativity constraint.
EMF showed less difference between raw and normalized ratings, suggesting its explainability component helps handle biases.
Performance of MF, NMF, EMF (on raw ratings, k=10, 10 epochs):
EMF achieved the lowest MAE (e.g., ~0.76 on ML-1M).
NMF performed better than MF (e.g., NMF ~0.9567 vs. MF ~1.482 on ML-1M).
The ranking was EMF > NMF > MF, attributed to the benefits of explainability (for EMF) and interpretable non-negative factors (for NMF).
Conclusion and Future Work
The review concludes that memory-based methods are simple but struggle with sparsity and scalability. Model-based methods, particularly MF and its variants (NMF, EMF), address these issues by learning latent factors. Experimentally, EMF showed the best performance, followed by NMF, then MF. Normalization is crucial for MF.
The authors propose a future research direction: Non-negative Explainable Matrix Factorization (NEMF), hypothesizing that combining NMF's interpretable non-negative factors with EMF's explicit explainability mechanism could further improve performance and provide two-stage explanations.
Resources
The paper highlights the availability of Python implementations for all discussed models on GitHub (https://github.com/nzhinusoftcm/review-on-collaborative-filtering), including Jupyter notebooks that can be run on Google Colaboratory. This is a key practical contribution for developers looking to implement these CF techniques.
This review serves as a valuable practical guide by:
Clearly explaining various non-neural CF algorithms.
Discussing their mathematical foundations and update rules.
Providing insights into their strengths and weaknesses.
Detailing relevant evaluation metrics for different recommendation goals.
Presenting comparative experimental results on standard datasets.
Offering open-source code for hands-on implementation.