Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Review on Non-Neural Networks Collaborative Filtering Recommendation Systems (2106.10679v2)

Published 20 Jun 2021 in cs.IR, cs.AI, and cs.LG

Abstract: Over the past two decades, recommender systems have attracted a lot of interest due to the explosion in the amount of data in online applications. A particular attention has been paid to collaborative filtering, which is the most widely used in applications that involve information recommendations. Collaborative filtering (CF) uses the known preference of a group of users to make predictions and recommendations about the unknown preferences of other users (recommendations are made based on the past behavior of users). First introduced in the 1990s, a wide variety of increasingly successful models have been proposed. Due to the success of machine learning techniques in many areas, there has been a growing emphasis on the application of such algorithms in recommendation systems. In this article, we present an overview of the CF approaches for recommender systems, their two main categories, and their evaluation metrics. We focus on the application of classical Machine Learning algorithms to CF recommender systems by presenting their evolution from their first use-cases to advanced Machine Learning models. We attempt to provide a comprehensive and comparative overview of CF systems (with python implementations) that can serve as a guideline for research and practice in this area.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Carmel Wenga (1 paper)
  2. Majirus Fansi (1 paper)
  3. Sébastien Chabrier (2 papers)
  4. Jean-Martial Mari (7 papers)
  5. Alban Gabillon (4 papers)
Citations (2)

Summary

  • The paper presents a comprehensive review of non-neural collaborative filtering methods, detailing both memory-based and model-based approaches.
  • It demonstrates that techniques like ratings normalization and explainability significantly boost prediction accuracy, with EMF achieving the lowest MAE.
  • The review offers practical Python implementations and comparative experiments on benchmark datasets, guiding both practitioners and researchers.

This paper, "A Comprehensive Review on Non-Neural Networks Collaborative Filtering Recommendation Systems" (Wenga et al., 2021 ), provides an extensive overview of classical (non-neural network) collaborative filtering (CF) techniques, their evolution, practical implementations, and evaluation. It aims to serve as a guide for practitioners and researchers by detailing various algorithms, offering Python implementations, and comparing their performance on benchmark datasets.

The review begins by introducing collaborative filtering as a process where the preferences of a group of users are used to predict the unknown preferences of others. User preferences, often represented as ratings in a user-item matrix, form the basis for these predictions.

Memory-Based Collaborative Filtering

Memory-based CF algorithms directly use the entire user-item interaction matrix to make predictions. They are primarily categorized into:

  1. User-based CF:
    • Concept: Identifies users similar to an active user based on their rating patterns. Recommendations are then generated from items liked by these similar users but not yet rated by the active user.
    • Similarity Computation: Common metrics include:

      • Pearson Correlation: Measures the linear relationship between the ratings of two users on co-rated items.

        Wu,v=iI(Ru,iRˉu)(Rv,iRˉv)iI(Ru,iRˉu)2iI(Rv,iRˉv)2W_{u,v} = \frac{\sum_{i \in I} (R_{u,i} - \bar{R}_u) (R_{v,i} - \bar{R}_v)}{\sqrt{\sum_{i \in I} (R_{u,i} - \bar{R}_u)^2} \sqrt{\sum_{i \in I} (R_{v,i} - \bar{R}_v)^2}}

        where II is the set of co-rated items, Ru,iR_{u,i} is user uu's rating on item ii, and Rˉu\bar{R}_u is user uu's average rating on co-rated items.

      • Cosine Similarity: Measures the cosine of the angle between two user rating vectors.

        Wu,v=iIRu,iRv,iiI(Ru,i)2iI(Rv,i)2W_{u,v} = \frac{\sum_{i \in I} R_{u,i} R_{v,i}}{\sqrt{\sum_{i \in I} (R_{u,i})^2} \sqrt{\sum_{i \in I} (R_{v,i})^2}}

* Prediction: Predicted rating R^u,i\hat{R}_{u,i} for user uu on item ii is often a weighted average:

R^u,i=Rˉu+vNu(Rv,iRˉv)Wu,vvNuWu,v\hat{R}_{u,i} = \bar{R}_u + \frac{\sum_{v \in N_u} (R_{v,i} - \bar{R}_v) \cdot W_{u,v}}{\sum_{v \in N_u} |W_{u,v}|}

where NuN_u is the set of neighbors of user uu who have rated item ii. * Top-N Recommendation: Identifies the kk most similar users and recommends the NN most frequent/highly-rated items among them that the active user hasn't interacted with.

  1. Item-based CF:
    • Concept: Identifies items similar to those an active user has liked in the past and recommends those similar items.
    • Similarity Computation: Similar metrics are used, but applied to item vectors (columns of the user-item matrix).

      • Adjusted Cosine Similarity: Addresses the issue that different users have different rating scales by subtracting each user's average rating from their ratings before computing cosine similarity between items.

        Wi,j=uU(Ru,iRˉu)(Ru,jRˉu)uU(Ru,iRˉu)2uU(Ru,jRˉu)2W_{i,j} = \frac{\sum_{u \in U} (R_{u,i} - \bar{R}_u) (R_{u,j} - \bar{R}_u)}{\sqrt{\sum_{u \in U} (R_{u,i} - \bar{R}_u)^2} \sqrt{\sum_{u \in U} (R_{u,j} - \bar{R}_u)^2}}

        where UU is the set of users who rated both items ii and jj.

    • Prediction: Predicted rating R^u,i\hat{R}_{u,i} for user uu on item ii is a weighted average of user uu's ratings on items similar to ii:

      R^u,i=jS(i)Ru,jWi,jjS(i)Wi,j\hat{R}_{u,i} = \frac{\sum_{j \in S(i)} R_{u,j} \cdot W_{i,j}}{\sum_{j \in S(i)} |W_{i,j}|}

      where S(i)S(i) is the set of items similar to item ii that user uu has rated.

    • Top-N Recommendation: For items IuI_u purchased by user uu, candidate items CC are formed by taking the union of kk most similar items for each item in IuI_u (excluding items already in IuI_u). Similarities are aggregated, and items are sorted to get the top-N.

Implementation Considerations for Memory-based CF:

  • Sparsity: A major challenge, as similarity scores can be unreliable or undefined if there are few co-rated items/users. Imputation techniques can be used but might introduce bias.
  • Scalability: User-based CF can be computationally expensive for large datasets as neighborhood search happens at runtime. Item-based CF often scales better because item-item similarities can be pre-computed offline.
  • Cold Start: Difficulty in making recommendations for new users or new items with no interaction data.

The paper notes that the authors provide Python/Numpy/Pandas implementations for these models on GitHub.

Model-Based Collaborative Filtering

Model-based CF methods learn a model from the user-item interactions, which is then used for predictions. This review focuses on dimensionality reduction techniques.

  1. Singular Value Decomposition (SVD):
    • Concept: Decomposes the m×nm \times n rating matrix RR into R=PΣQTR = P \Sigma Q^T, where PP and QQ are orthogonal matrices representing user and item latent factors, and Σ\Sigma is a diagonal matrix of singular values. Dimensionality is reduced by keeping only the kk largest singular values (Σk\Sigma_k).
    • Prediction: R^k=PkΣkQkT\hat{R}_k = P_k \Sigma_k Q_k^T. The predicted rating for user uu on item ii is R^u,i=puTΣkΣkqi\hat{R}_{u,i} = p_u^T \sqrt{\Sigma_k} \sqrt{\Sigma_k} q_i.
    • Implementation: Requires imputing missing values in RR (e.g., with item means). Normalizing ratings (e.g., by subtracting user means) can improve accuracy.
    • Algorithm Steps:

    1. Normalize rating matrix RRnormR \rightarrow R_{norm}. 2. Factor RnormR_{norm} to get P,Σ,QP, \Sigma, Q. 3. Reduce Σ\Sigma to Σk\Sigma_k. 4. Compute PkΣkP_k \sqrt{\Sigma_k} and ΣkQkT\sqrt{\Sigma_k} Q_k^T for predictions.

  2. Matrix Factorization (MF) / Regularized SVD:

    • Concept: Directly learns latent factor vectors PuRkP_u \in \mathbb{R}^k for each user uu and QiRkQ_i \in \mathbb{R}^k for each item ii. The predicted rating R^u,i=QiTPu\hat{R}_{u,i} = Q_i^T P_u.
    • Learning: Minimizes a regularized squared error cost function over known ratings:

      J(P,Q)=12(u,i)K(Ru,iQiTPu)2+λ2(Pu2+Qi2)J(P,Q) = \frac{1}{2} \sum_{(u,i) \in K} (R_{u,i} - Q_i^T P_u)^2 + \frac{\lambda}{2} (||P_u||^2 + ||Q_i||^2)

      where KK is the set of known ratings and λ\lambda is the regularization parameter.

    • Optimization: Typically uses Stochastic Gradient Descent (SGD) with update rules:

      eu,i=Ru,iQiTPue_{u,i} = R_{u,i} - Q_i^T P_u

      QiQi+α(eu,iPuλQi)Q_i \leftarrow Q_i + \alpha (e_{u,i} P_u - \lambda Q_i)

      PuPu+α(eu,iQiλPu)P_u \leftarrow P_u + \alpha (e_{u,i} Q_i - \lambda P_u)

      where α\alpha is the learning rate.

    • Advantage: Handles missing values directly without imputation, often more accurate than traditional SVD.

  3. Probabilistic Matrix Factorization (PMF):
    • Concept: A probabilistic approach where ratings are assumed to be drawn from a Gaussian distribution with mean QiTPuQ_i^T P_u. Gaussian priors are placed on PuP_u and QiQ_i.
    • Likelihood: Pr(RP,Q,σ2)=u=1mi=1n[N(Ru,iPuTQi,σ2)]Iu,iPr(R|P,Q,\sigma^2) = \prod_{u=1}^m \prod_{i=1}^n [\mathcal{N}(R_{u,i} | P_u^T Q_i, \sigma^2)]^{I_{u,i}}
    • Learning: Maximizes the log-posterior, equivalent to minimizing a sum-of-squared-errors objective similar to MF but with potentially different regularization terms λP,λQ\lambda_P, \lambda_Q derived from prior variances:

      J(P,Q)=12(u,i)K(Ru,iQiTPu)2+λP2PuFrob2+λQ2QiFrob2J(P,Q) = \frac{1}{2} \sum_{(u,i) \in K} (R_{u,i} - Q_i^T P_u)^2 + \frac{\lambda_P}{2} ||P_u||_{Frob}^2 + \frac{\lambda_Q}{2} ||Q_i||_{Frob}^2

  4. Non-negative Matrix Factorization (NMF):
    • Concept: Constrains the elements of factor matrices PP and QQ to be non-negative (P0,Q0P \ge 0, Q \ge 0). This allows for a parts-based representation and more interpretable latent factors.
    • Interpretation: Pu,lP_{u,l} can represent the probability user uu belongs to group ll, and Qi,lQ_{i,l} the probability users in group ll like item ii.
    • Learning: Uses multiplicative update rules to maintain non-negativity while minimizing a similar objective function to PMF/MF.

      Pu,lPu,liIuQi,lRu,iiIuQi,lR^u,i+λPIuPu,lP_{u,l} \leftarrow P_{u,l} \frac{\sum_{i \in I_u} Q_{i,l} R_{u,i}}{\sum_{i \in I_u} Q_{i,l} \hat{R}_{u,i} + \lambda_P |I_u| P_{u,l}}

      Qi,lQi,luUiPu,lRu,iuUiPu,lR^u,i+λQUiQi,lQ_{i,l} \leftarrow Q_{i,l} \frac{\sum_{u \in U_i} P_{u,l} R_{u,i}}{\sum_{u \in U_i} P_{u,l} \hat{R}_{u,i} + \lambda_Q |U_i| Q_{i,l}}

      (Note: The paper's equations 19 and 20 for NMF updates are slightly different, this is a common form). The paper uses R^u,i\hat{R}_{u,i} in the denominator, which is PuQiTP_u Q_i^T.

  5. Explainable Matrix Factorization (EMF):
    • Concept: Incorporates neighborhood-based explanations into the MF model to improve accuracy and provide justifications. An item ii is explainable for user uu if many of uu's neighbors rated ii.
    • Explainability Score (User-based): Explu,i=E(Rv,iNu)=xxPr(Rv,i=xvNu)Expl_{u,i} = E(R_{v,i} | N_u) = \sum_x x \cdot Pr(R_{v,i} = x | v \in N_u).
    • Explanation Weight: Wu,iW_{u,i} is derived from Explu,iExpl_{u,i}, thresholded to indicate significant explainability.
    • Objective Function: Adds an explainability regularization term to the MF objective:

      J(P,Q)=(u,i)K(Ru,iR^u,i)2+β(Pu2+Qi2)+λ(u,i)K(PuQi)2Wu,iJ(P,Q) = \sum_{(u,i) \in K} (R_{u,i} - \hat{R}_{u,i})^2 + \beta (||P_u||^2 + ||Q_i||^2) + \lambda \sum_{(u,i) \in K} (P_u - Q_i)^2 W_{u,i}

      (The paper's EMF objective structure (Eq. 25) for the third term is (PuQi)2Wu,i(P_u - Q_i)^2 W_{u,i}, which appears unusual; typically the term would relate to how well the model aligns with the explainability score, or how similar latent factors of explainable items/users are).

Limitations of MF techniques:

  • Large number of parameters, potentially leading to overfitting and slow training.
  • Making predictions for new users/items requires re-optimization.
  • Linear transformations may not capture complex non-linear patterns.

Evaluation Metrics

The paper categorizes evaluation metrics as:

  1. Prediction Accuracy: For rating prediction tasks.
    • Mean Absolute Error (MAE): 1T(u,i)TRu,iR^u,i\frac{1}{|T|} \sum_{(u,i) \in T} |R_{u,i} - \hat{R}_{u,i}|
    • Root Mean Squared Error (RMSE): 1T(u,i)T(Ru,iR^u,i)2\sqrt{\frac{1}{|T|} \sum_{(u,i) \in T} (R_{u,i} - \hat{R}_{u,i})^2}
    • Coverage: Percentage of items for which the system can provide predictions.
  2. Quality of Set of Recommendations: For evaluating relevance of a recommended set.
    • Precision@N: Relevant Recommended ItemsN\frac{|\text{Relevant Recommended Items}|}{N}
    • Recall@N: Relevant Recommended ItemsTotal Relevant Items\frac{|\text{Relevant Recommended Items}|}{|\text{Total Relevant Items}|}
    • F1-score@N: Harmonic mean of Precision and Recall.
  3. Quality of List of Recommendations: Considers the ranking of recommended items.
    • Mean Average Precision (MAP): Mean of average precision scores over all users.
    • Half-life Utility Rate: Assumes exponential decay in user interest down the list.
    • Discounted Cumulative Gain (DCG): Assigns higher value to relevant items at the top, with logarithmic decay. DCGk=i=1krelilog2(i+1)DCG_k = \sum_{i=1}^k \frac{rel_i}{\log_2(i+1)}. Normalized DCG (nDCG) is often preferred.
  4. Novelty and Diversity:
    • Novelty: Measures how new or surprising recommended items are to the user.

      noveltyi=1Zu1jZu,ji(1sim(i,j))novelty_i = \frac{1}{|Z_u|-1} \sum_{j \in Z_u, j \neq i} (1 - sim(i,j))

* Diversity: Measures how different items within a recommendation list are from each other.

diversityZu=1Zu(Zu1)iZujZu,ji(1sim(i,j))diversity_{Z_u} = \frac{1}{|Z_u|(|Z_u|-1)} \sum_{i \in Z_u} \sum_{j \in Z_u, j \neq i} (1 - sim(i,j))

Comparative Experimentation

Experiments were conducted on MovieLens ML-100K and ML-1M datasets using MAE.

  • User-based vs. Item-based CF:
    • Cosine similarity generally outperformed Euclidean distance for both.
    • Item-based CF showed lower MAE than User-based CF (e.g., on ML-1M with Cosine: 0.42 for Item-based vs. 0.73 for User-based). This supports the idea that item-item similarities are more stable.
  • Importance of Ratings Normalization:
    • For MF, normalizing ratings (e.g., by subtracting user mean) significantly reduced MAE (e.g., from ~1.48 to ~0.82 on ML-1M, a ~45% reduction).
    • NMF cannot be trained on standard normalized ratings if they become negative, due to the non-negativity constraint.
    • EMF showed less difference between raw and normalized ratings, suggesting its explainability component helps handle biases.
  • Performance of MF, NMF, EMF (on raw ratings, k=10, 10 epochs):
    • EMF achieved the lowest MAE (e.g., ~0.76 on ML-1M).
    • NMF performed better than MF (e.g., NMF ~0.9567 vs. MF ~1.482 on ML-1M).
    • The ranking was EMF > NMF > MF, attributed to the benefits of explainability (for EMF) and interpretable non-negative factors (for NMF).

Conclusion and Future Work

The review concludes that memory-based methods are simple but struggle with sparsity and scalability. Model-based methods, particularly MF and its variants (NMF, EMF), address these issues by learning latent factors. Experimentally, EMF showed the best performance, followed by NMF, then MF. Normalization is crucial for MF.

The authors propose a future research direction: Non-negative Explainable Matrix Factorization (NEMF), hypothesizing that combining NMF's interpretable non-negative factors with EMF's explicit explainability mechanism could further improve performance and provide two-stage explanations.

Resources

The paper highlights the availability of Python implementations for all discussed models on GitHub (https://github.com/nzhinusoftcm/review-on-collaborative-filtering), including Jupyter notebooks that can be run on Google Colaboratory. This is a key practical contribution for developers looking to implement these CF techniques.

This review serves as a valuable practical guide by:

  • Clearly explaining various non-neural CF algorithms.
  • Discussing their mathematical foundations and update rules.
  • Providing insights into their strengths and weaknesses.
  • Detailing relevant evaluation metrics for different recommendation goals.
  • Presenting comparative experimental results on standard datasets.
  • Offering open-source code for hands-on implementation.