A Survey on Accuracy-oriented Neural Recommendation: From Collaborative Filtering to Information-rich Recommendation

Published 27 Apr 2021 in cs.IR and cs.LG | (2104.13030v3)

Abstract: Influenced by the great success of deep learning in computer vision and language understanding, research in recommendation has shifted to inventing new recommender models based on neural networks. In recent years, we have witnessed significant progress in developing neural recommender models, which generalize and surpass traditional recommender models owing to the strong representation power of neural networks. In this survey paper, we conduct a systematic review on neural recommender models from the perspective of recommendation modeling with the accuracy goal, aiming to summarize this field to facilitate researchers and practitioners working on recommender systems. Specifically, based on the data usage during recommendation modeling, we divide the work into collaborative filtering and information-rich recommendation: 1) collaborative filtering, which leverages the key source of user-item interaction data; 2) content enriched recommendation, which additionally utilizes the side information associated with users and items, like user profile and item knowledge graph; and 3) temporal/sequential recommendation, which accounts for the contextual information associated with an interaction, such as time, location, and the past interactions. After reviewing representative work for each type, we finally discuss some promising directions in this field.

Abstract PDF Upgrade to Chat

Citations (235)

View on Semantic Scholar

Summary

The paper systematically reviews neural recommendation models categorized by collaborative filtering, content-enriched, and temporal approaches.
The paper highlights innovative methodologies like attention mechanisms, GNNs, and autoencoders to improve user-item representation and interaction modeling.
The paper discusses future directions including benchmarking, multi-objective optimization, and enhanced reproducibility for robust recommender systems.

This survey, "A Survey on Accuracy-oriented Neural Recommendation: From Collaborative Filtering to Information-rich Recommendation" (2104.13030), provides a systematic review of neural recommender models, focusing on how different types of data are used to improve recommendation accuracy. It categorizes these models based on their data usage into three main types: collaborative filtering, content-enriched recommendation, and temporal/sequential recommendation.

The core problem in recommendation is framed as learning a prediction function $\hat{y}_{u,i,c}=f(D_{u}, D_{i}, D_{c})$ , which estimates the likelihood of user $u$ favoring item $i$ under context $c$ , given data $D_u$ (describing the user), $D_i$ (describing the item), and $D_c$ (describing the context).

1. Collaborative Filtering (CF) Models

CF models primarily leverage user-item interaction data, effectively ignoring $D_c$ and using only IDs or interaction history for $D_u$ and $D_i$ . The development in neural CF is divided into representation learning and interaction modeling.

A. Representation Learning

The goal is to learn user embeddings ( $P$ ) and item embeddings ( $Q$ ).

History Behavior Attention Aggregation Models: These models improve upon classical latent factor models (which use free embeddings for user/item IDs) by incorporating a user's interaction history. Instead of simple pooling (like in FISM) or adding ID embeddings (like SVD++), attention mechanisms assign different weights to historical items.
- Attentive Collaborative Filtering (ACF): Assigns user-aware attentive weights to historical items. The user representation is a sum of their ID embedding and an attention-weighted sum of their interacted item embeddings.
  
  $\hat{r}_{ui} = (p_u + \sum_{j\in R_{u}}\alpha(u,j)q_{j})^T q_i$
  
  where $\alpha(u,j)=\frac{\exp{(F(p_{u},q_{j}))}}{\sum_{j'\in R_{u}}\exp{(F(p_{u},q_{j'}))}}$ . $F(\cdot,\cdot)$ can be an MLP or inner product.
- Neural Attentive Item Similarity (NAIS): Makes the attention target item-aware, meaning the influence of a historical item depends on the item being predicted.
  
  $\hat{r}_{ui} = (\sum_{j\in R_{u}}\alpha(i,j)q_{j})^T q_i$
  
  $\alpha(i,j) =\frac{\exp{(F(q_{i},q_{j}))}}{[\sum_{k\in R_u}\exp{(F(q_{i},q_{k}))}]^{\beta} }$ .
Autoencoder based Representation Learning: These models use autoencoders to learn latent representations by reconstructing the input (e.g., a user's interaction vector). Variants include denoising autoencoders (CDAE) and variational autoencoders (Mult-VAE). Some models use parallel encoders for users and items.
Graph based Representation Learning: User-item interactions are viewed as a bipartite graph. Graph Neural Networks (GNNs) are used to learn embeddings by propagating information from neighbors.
- The $(l+1)^{th}$ order user embedding $p_{u}^{(l+1)}$ is updated by aggregating its connected items' $l^{th}$ order embeddings $q_{j}^{l}$ :
  
  $a_{u}^{(l+1)}=Agg(q_{j}^{l}|j\in R_u)$
  
  $p_{u}^{(l+1)}=\rho(W^l[p_{u}^{l},a_{u}^{(l+1)}])$
- Models like GC-MC and NGCF use graph convolutions. Simpler models like LightGCN remove non-linearities and transformations, often achieving strong performance by focusing on neighborhood aggregation.

B. Interaction Modeling

This component estimates the preference score $\hat{r}_{ui}$ from the learned user ( $\mathbf{p}_u$ ) and item ( $\mathbf{q}_i$ ) embeddings.

Inner Product: The most common method: $\hat{r}_{ui}=\mathbf{p}_{u}^T \mathbf{q}_{i}$ . It's efficient but can be limited by the triangle inequality violation and its linear nature.
Distance based Metrics: Address the triangle inequality.
- CML: Minimizes Euclidean distance: $d_{ui}=||\mathbf{p}_{u}-\mathbf{q}_{i}||_{2}^{2}$ .
- TransRec: Uses a translation principle for sequential behavior: $\mathbf{q}_{j}+\mathbf{p}_{u}\approx\mathbf{q}_{i}$ (user $u$ translates from next item $j$ to current item $i$ ).
- LRML: Introduces relation vectors $e$ learned via attention over a memory matrix: $s_{ui}=||\mathbf{p}_{u}+\mathbf{e}-\mathbf{q}_{i}||_{F}^{2}$ .
Neural Network based Metrics: Capture complex, non-linear interactions.
- NCF: Uses an MLP on concatenated embeddings: $\hat{r}_{ui}=f_{\text{MLP}}(\mathbf{p}_{u}||\mathbf{q}_i)$ . It often combines this with a generalized matrix factorization (GMF) component (inner product).
- CNN-based: Use outer product of embeddings to create an interaction map, then apply CNNs (e.g., ONCF).
- Autoencoder-based: The decoder part directly reconstructs the interaction matrix (e.g., AutoRec).

2. Content-enriched Recommendation

These models incorporate auxiliary information (side information) associated with users and items, such as profiles, social networks, item attributes (text, images), and knowledge graphs.

A. Modeling General Feature Interactions

These models focus on categorical or numerical features often found in CTR prediction.

Factorization Machines (FM): A baseline that models second-order feature interactions efficiently: $\hat{y}_x=w_0+\sum w_dx_d+\sum \sum x_{d}x_{d'}\langle \mathbf{v}_d, \mathbf{v}_{d'} \rangle$ .
MLP based High Order Modeling: Embed features, then use MLPs to implicitly learn high-order interactions (e.g., NFM, DeepCrossing). Wide & Deep models combine these deep MLP paths with shallow, linear paths.
Cross Network for K-th Order Modeling: Explicitly model feature interactions up to a defined order $K$ (e.g., DCN, xDeepFM). DCN uses a cross layer: $\mathbf{x}_k=\mathbf{x}_0\mathbf{x}_{k-1}^T\mathbf{w}_k+\mathbf{b}_k+\mathbf{x}_{k-1}$ .
Tree Enhanced Modeling: Use decision trees to extract explicit cross-features, then feed their embeddings into an attention model (e.g., TEM).

B. Modeling Textual Content

Leverages NLP techniques for item descriptions, user reviews, etc.

Autoencoder based Models: Use autoencoders (e.g., Stacked Denoising Autoencoders in CDL) to learn item content representations. The item embedding $q_i$ can be a combination of content-derived representation and a free latent vector: $\mathbf{q}_i= f_{e}(\mathbf{x}_{i})+ \mathbf{\theta}_i$ .
Leveraging Word Embeddings for Recommendation: Use pre-trained or jointly trained word embeddings with models like CNNs or RNNs.
- ConvMF: Integrates TextCNN into probabilistic matrix factorization to derive item embeddings from text. Item latent vector $q_i$ is drawn from a Gaussian centered around $TextCNN(\mathbf{W},\mathbf{x}_i)$ .
- DeepCoNN: Uses two parallel TextCNNs to model user reviews and item reviews, then a Factorization Machine for interaction: $\hat{r}_{ui}=FM(TextCNN(D_u), TextCNN(D_i))$ .
Attention Models: Assign weights to different parts of text (words, sentences, aspects) to create more informative representations.
Text Explanations for Recommendation:
- Extraction-based: Select important text pieces (e.g., via attention weights) as explanations.
- Generation-based: Generate natural language explanations using encoder-decoder architectures (e.g., NRT predicts ratings and generates reviews simultaneously).

C. Modeling Multimedia Content

Utilizes visual (images, videos) and audio information.

Image Information:
- Content-based: Extract visual features using CNNs, then project users and items into this visual space.
- Hybrid Models: Combine CF signals with visual features.
- VBPR (Visual Bayesian Personalized Ranking): Extends BPR by incorporating visual features. The preference score is a sum of collaborative and visual preference: $\hat{r}_{ui}=\mathbf{p}^T_u\mathbf{q}_i+ \mathbf{w}^T_u f(CNN(x_i))$ , where $f(CNN(x_i))$ is the item's visual representation and $\mathbf{w}_u$ is the user's visual preference vector.
- GNNs: Model relationships in item-item graphs where nodes have visual features (e.g., PinSage).
Video Recommendation: Often involves extracting frame-level features, then using attention or RNNs to aggregate them. Audio features can also be incorporated using fusion techniques. ACF uses attention with visual inputs for multimedia.

D. Modeling Social Network

Exploits social connections (trust, friendship) assuming social influence affects preferences.

Social Correlation Enhancement and Regularization: User embedding $\mathbf{p}_u$ is a fusion of an item domain embedding $\mathbf{e}_u$ and a social embedding $\mathbf{h}_u$ derived from social connections: $\mathbf{p}_u=f(\mathbf{e}_u,g(u, \mathbf{S}))$ . Social structure can also act as a regularizer, encouraging connected users to have similar embeddings.
GNN Based Approaches: Model the social diffusion process more explicitly.
- DiffNet: Simulates recursive social influence. User embedding $\mathbf{h}^k_u$ at diffusion step $k$ combines their previous embedding $\mathbf{h}^{k-1}_u$ with aggregated influence from social neighbors $\mathbf{h}^{(k-1)}_{Su}$ .

E. Modeling Knowledge Graph (KG)

Leverages structured knowledge about items and their attributes (e.g., movie -[has_director]-> director).

Path Based Methods: Exploit paths (sequences of entities and relations) between users and items in the KG to infer preferences. Models like KPRN embed paths and pool them. RippleNet constructs "ripple sets" (multi-hop KG neighbors) for users.
Regularization Based Methods: Use KG embedding (KGE) techniques (e.g., TransE, TransR) to learn entity representations. The KGE loss acts as a regularizer for the recommendation model.
- CKE (Collaborative Knowledge Base Embedding): Item embedding is a sum of its ID embedding and KGE-derived embedding: $\mathbf{q}_{i}=f_{\text{Embed}}(i) + f_{\text{KGE}}(i|\mathcal{G})$ .
GNN Based Methods: Apply GNNs to a "collaborative knowledge graph" (user-item graph + KG).
- KGAT (Knowledge Graph Attention Network): Recursively propagates embeddings on this unified graph, using attention to weigh neighbor contributions. User embedding $\mathbf{p}_{u}=f_{\text{GNN}}(u,\mathcal{G})$ .

3. Temporal/Sequential Models

These models account for the dynamic nature of user preferences and the order of interactions.

Temporal based recommendation: Focuses on the timestamp of interactions $[u, i, r_{ui}, t_{ui}]$ $[u, i, r_{u i}, t_{u i}]$ to model evolving preferences.
- RRN (Recurrent Recommender Networks): Use RNNs (e.g., LSTMs) to model the evolution of user ( $\mathbf{p}^t_u$ ) and item ( $\mathbf{q}^t_i$ ) dynamic embeddings over time:
  
  $\mathbf{p}^{t}_{u}=RNN(\mathbf{p}^{(t-1)}_{u}, \mathbf{W}\mathbf{x}^t_{u})$
  
  $\mathbf{q}^{t}_{i}=RNN(\mathbf{q}^{(t-1)}_i, \mathbf{W}\mathbf{x}^t_{i})$
  
  where $\mathbf{x}^t_{u}$ is the user's rating vector in the current time interval.
- Memory Networks: Use external memory components to store and update user historical states, aiming to capture long-term dependencies better than standard RNNs.
Session based recommendation: Models sequences of item interactions within a session $[i_1, i_2, ..., i_{|S|}]$ $[i_{1}, i_{2}, ..., i_{∣ S ∣}]$ , often for anonymous users.
- GRU4Rec: Uses GRUs to predict the next item in a session based on preceding items.
- Translation-based models: Model transitions like $\mathbf{q}_i+\mathbf{p}_u\approx \mathbf{q}_j$ (user $u$ translates from item $i$ to next item $j$ ). (Note: This formula seems reversed compared to TransRec in CF section, check paper for consistency - paper actually uses $q_j + p_u \approx q_i$ for TransRec, so this would be predicting previous item or a typo). The survey paper uses $\mathbf{q}_i+\mathbf{p}_u\approx \mathbf{q}_j$ in the TransRec citation, where $j$ is the next item.
- Self-Attention Models (e.g., SASRec): Use self-attention to capture dependencies between all items in a sequence directly, without recurrence.
- GNNs (e.g., SR-GNN): Construct a graph from all session sequences (nodes are items, edges represent co-occurrence or transitions). GNNs then learn item embeddings from this graph structure.
Temporal and session based recommendation: Combines user identity with temporal sequences of sessions $[u, s, t]$ $[u, s, t]$ .
- Hierarchical Models: Often use two levels of RNNs or attention: one to model item interactions within a session (short-term interest), and another to model session evolution for a user over time (long-term interest). SHAN uses hierarchical attention.
- CNN-based Models (e.g., Caser): Treat the sequence of recent items/sessions as an "image" and apply 2D convolutions to capture local sequential patterns.
- GNN-based Models: Construct dynamic graphs or hypergraphs evolving over time (e.g., HyperRec).

Discussion and Future Directions

Recommendation Benchmarking: Need for standardized datasets and evaluation protocols to reliably track progress.
Graph Reasoning & Self-supervised Learning: Leveraging GNNs for complex relational data and using self-supervised tasks to pre-train or augment representation learning for sparsity issues.
Multi-Objective Goals for Social Good: Moving beyond accuracy to consider fairness, diversity, explainability, and multi-stakeholder satisfaction.
Reproducibility: Acknowledges challenges in reproducing results due to sensitivity to hyperparameters, dataset splits, evaluation metrics, and calls for transparency and robust evaluation.

This survey provides a comprehensive roadmap of how neural networks have been applied to various recommendation scenarios, emphasizing the modeling of different data sources to enhance predictive accuracy. It highlights common techniques like attention mechanisms, GNNs, RNNs, and autoencoders, and their adaptations for specific recommendation tasks.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

A Survey on Accuracy-oriented Neural Recommendation: From Collaborative Filtering to Information-rich Recommendation

Summary

1. Collaborative Filtering (CF) Models

2. Content-enriched Recommendation

3. Temporal/Sequential Models

Discussion and Future Directions

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (5)

Collections

A Survey on Accuracy-oriented Neural Recommendation: From Collaborative Filtering to Information-rich Recommendation

Summary

1. Collaborative Filtering (CF) Models

2. Content-enriched Recommendation

3. Temporal/Sequential Models

Discussion and Future Directions

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (5)

Collections