Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Memory Augmented Graph Neural Networks for Sequential Recommendation (1912.11730v1)

Published 26 Dec 2019 in cs.IR

Abstract: The chronological order of user-item interactions can reveal time-evolving and sequential user behaviors in many recommender systems. The items that users will interact with may depend on the items accessed in the past. However, the substantial increase of users and items makes sequential recommender systems still face non-trivial challenges: (1) the hardness of modeling the short-term user interests; (2) the difficulty of capturing the long-term user interests; (3) the effective modeling of item co-occurrence patterns. To tackle these challenges, we propose a memory augmented graph neural network (MA-GNN) to capture both the long- and short-term user interests. Specifically, we apply a graph neural network to model the item contextual information within a short-term period and utilize a shared memory network to capture the long-range dependencies between items. In addition to the modeling of user interests, we employ a bilinear function to capture the co-occurrence patterns of related items. We extensively evaluate our model on five real-world datasets, comparing with several state-of-the-art methods and using a variety of performance metrics. The experimental results demonstrate the effectiveness of our model for the task of Top-K sequential recommendation.

Citations (196)

Summary

  • The paper introduces a MA-GNN model that integrates short-term GNN-based and long-term memory network modules for capturing diverse user interests.
  • It leverages a sliding window on item graphs and a shared multi-dimensional attention memory network to model recent and historical interaction patterns.
  • A gating mechanism fuses these interests with explicit item co-occurrence modeling, achieving state-of-the-art performance across five real-world datasets.

The paper "Memory Augmented Graph Neural Networks for Sequential Recommendation" (Memory Augmented Graph Neural Networks for Sequential Recommendation, 2019) addresses the challenges in sequential recommendation, specifically modeling short-term user interests, long-term user interests, and item co-occurrence patterns. To tackle these issues, the authors propose a Memory Augmented Graph Neural Network (MA-GNN) model.

The MA-GNN model incorporates four key components:

  1. General Interest Module: This module captures the static, inherent user preferences using a standard matrix factorization term, represented by the dot product of user and item embeddings (puqj\mathbf{p}_u^{\top} \cdot \mathbf{q}_j). This term is independent of the item sequence dynamics.
  2. Short-term Interest Module: To model short-term interests based on recent interactions, the paper uses a sliding window strategy (Lu,lL_{u,l}) on user sequences. Since sequences aren't naturally graphs, an item graph is constructed where edges are added between successive items in sequences across all users, weighted by their frequency and row-normalized (A\mathbf{A}). A two-layer Graph Neural Network (GNN) then aggregates neighboring item information within the short-term window. For an item ii in the window, its representation hi\mathbf{h}_i is computed by aggregating its neighbors kk weighted by Ai,kA_{i,k} and combining it with its own embedding ei\mathbf{e}_i (Eq. 1). The short-term user interest representation pu,lS\mathbf{p}^{S}_{u,l} is then derived by averaging item representations hi\mathbf{h}_i within the window and concatenating with the user embedding pu\mathbf{p}_u (Eq. 2).
  3. Long-term Interest Modeling: To capture long-range dependencies from past interactions (Hu,lH_{u,l}), the model utilizes a shared memory network. Instead of per-user memory, a global key-value memory (K,V\mathbf{K}, \mathbf{V}) is used, where each unit represents a latent interest type. A query embedding zu,l\mathbf{z}_{u,l} is generated from the historical item embeddings Hu,l\mathbf{H}_{u,l} using a multi-dimensional attention mechanism, incorporating positional encoding to account for item order (Eq. 3). This query interacts with the memory keys K\mathbf{K} to produce attention scores (sis_i), which are used to weight the memory values V\mathbf{V} to produce an output ou,l\mathbf{o}_{u,l}. The long-term user interest pu,lH\mathbf{p}^{H}_{u,l} is then the sum of the query and the memory output (Eq. 4). This shared memory approach helps alleviate memory overhead compared to per-user memory.
  4. Interest Fusion: A gating mechanism, inspired by LSTMs, is introduced to dynamically combine the short-term (pu,lS\mathbf{p}^{S}_{u,l}, implicitly through aggregated item representations) and long-term (pu,lH\mathbf{p}^{H}_{u,l}) interest representations. A learned gate gu,l\mathbf{g}_{u,l} controls the contribution of each component to the final combined user representation pu,lC\mathbf{p}_{u,l}^{C} (Eq. 5).
  5. Item Co-occurrence Modeling: Explicitly modeling pairwise item relationships is crucial for sequential patterns. A bilinear function (eiWrqj\mathbf{e}^{\top}_i \, \mathbf{W}_r \, \mathbf{q}_j) is used to capture correlations between items ii in the current short-term window and potential next items jj. Wr\mathbf{W}_r is a learnable matrix.

Prediction and Training:

The final prediction score r^u,j\hat{r}_{u,j} for user uu and item jj, given the short-term window Lu,lL_{u,l} and historical sequence Hu,lH_{u,l}, is a combination of the general user interest, the fused short/long-term interest, and the average item co-occurrence score from items in the short-term window to item jj (Eq. 6).

The model is trained using the Bayesian Personalized Ranking (BPR) objective, minimizing a pairwise ranking loss between positive (observed) items and randomly sampled negative (non-observed) items, combined with L2L_2 regularization on model parameters (Eq. 7). Optimization is performed using gradient descent with back-propagation.

Implementation Details:

  • Item Graph: Constructed by considering subsequent items (e.g., 3) for each item in all user sequences and counting occurrences. Adjacency matrix A\mathbf{A} is row-normalized.
  • Short-term Window: Sliding window size L=5|L|=5, predicting next T=3|T|=3 items.
  • Multi-dimensional Attention: Parameter hh controls the number of attention dimensions. Positional encoding added to item embeddings.
  • Memory Network: Parameter mm controls the number of memory units.
  • Embedding Size: d=50d=50 across experiments.
  • Hyperparameter Tuning: h,mh, m selected from {5,10,15,20}\{5, 10, 15, 20\}. Learning rate $0.001$, λ=0.001\lambda = 0.001, batch size $4096$.

Evaluation:

The model is evaluated on five real-world datasets: MovieLens-20M, Amazon-Books, Amazon-CDs, Goodreads-Children, and Goodreads-Comics. Data is preprocessed by filtering users/items with less than 10 interactions and treating ratings 4\ge 4 as positive feedback. Data is split chronologically into 70% training, 10% validation, and 20% testing. Performance is measured using Recall@10 and NDCG@10.

Experimental Results:

  • MA-GNN significantly outperforms various state-of-the-art baselines (BPRMF, GRU4Rec, GRU4Rec+, GC-SAN, Caser, SASRec, MARank) on all five datasets and metrics.
  • Ablation studies demonstrate the effectiveness of each proposed module. Incorporating short-term interest (GNN) improves over BPRMF. Adding long-term interest (Memory Network) and the gating fusion further improves performance, showing the gating mechanism's superiority over simple concatenation or GRU for fusion. Finally, adding the item co-occurrence module yields the best performance, highlighting the importance of this pattern.
  • Hyperparameter analysis shows that both the attention dimension (hh) and the number of memory units (mm) influence performance, and the memory network contributes more significantly on sparser datasets like CDs.
  • Memory visualization suggests that individual memory units learn to represent distinct types of user interests, as evidenced by different attention patterns for different movie genres.

Conclusion:

The paper successfully demonstrates that combining short-term context modeling via GNNs, long-term dependency modeling via a shared memory network, adaptive fusion of these interests using a gating mechanism, and explicit item co-occurrence modeling with a bilinear function leads to significant improvements in sequential recommendation performance across various datasets. The proposed MA-GNN effectively captures diverse aspects of user behavior sequences.