- The paper introduces a MA-GNN model that integrates short-term GNN-based and long-term memory network modules for capturing diverse user interests.
- It leverages a sliding window on item graphs and a shared multi-dimensional attention memory network to model recent and historical interaction patterns.
- A gating mechanism fuses these interests with explicit item co-occurrence modeling, achieving state-of-the-art performance across five real-world datasets.
The paper "Memory Augmented Graph Neural Networks for Sequential Recommendation" (1912.11730) addresses the challenges in sequential recommendation, specifically modeling short-term user interests, long-term user interests, and item co-occurrence patterns. To tackle these issues, the authors propose a Memory Augmented Graph Neural Network (MA-GNN) model.
The MA-GNN model incorporates four key components:
- General Interest Module: This module captures the static, inherent user preferences using a standard matrix factorization term, represented by the dot product of user and item embeddings (pu⊤​⋅qj​). This term is independent of the item sequence dynamics.
- Short-term Interest Module: To model short-term interests based on recent interactions, the paper uses a sliding window strategy (Lu,l​) on user sequences. Since sequences aren't naturally graphs, an item graph is constructed where edges are added between successive items in sequences across all users, weighted by their frequency and row-normalized (A). A two-layer Graph Neural Network (GNN) then aggregates neighboring item information within the short-term window. For an item i in the window, its representation hi​ is computed by aggregating its neighbors k weighted by Ai,k​ and combining it with its own embedding ei​ (Eq. 1). The short-term user interest representation pu,lS​ is then derived by averaging item representations hi​ within the window and concatenating with the user embedding pu​ (Eq. 2).
- Long-term Interest Modeling: To capture long-range dependencies from past interactions (Hu,l​), the model utilizes a shared memory network. Instead of per-user memory, a global key-value memory (K,V) is used, where each unit represents a latent interest type. A query embedding zu,l​ is generated from the historical item embeddings Hu,l​ using a multi-dimensional attention mechanism, incorporating positional encoding to account for item order (Eq. 3). This query interacts with the memory keys K to produce attention scores (si​), which are used to weight the memory values V to produce an output ou,l​. The long-term user interest pu,lH​ is then the sum of the query and the memory output (Eq. 4). This shared memory approach helps alleviate memory overhead compared to per-user memory.
- Interest Fusion: A gating mechanism, inspired by LSTMs, is introduced to dynamically combine the short-term (pu,lS​, implicitly through aggregated item representations) and long-term (pu,lH​) interest representations. A learned gate gu,l​ controls the contribution of each component to the final combined user representation pu,lC​ (Eq. 5).
- Item Co-occurrence Modeling: Explicitly modeling pairwise item relationships is crucial for sequential patterns. A bilinear function (ei⊤​Wr​qj​) is used to capture correlations between items i in the current short-term window and potential next items j. Wr​ is a learnable matrix.
Prediction and Training:
The final prediction score r^u,j​ for user u and item j, given the short-term window Lu,l​ and historical sequence Hu,l​, is a combination of the general user interest, the fused short/long-term interest, and the average item co-occurrence score from items in the short-term window to item j (Eq. 6).
The model is trained using the Bayesian Personalized Ranking (BPR) objective, minimizing a pairwise ranking loss between positive (observed) items and randomly sampled negative (non-observed) items, combined with L2​ regularization on model parameters (Eq. 7). Optimization is performed using gradient descent with back-propagation.
Implementation Details:
- Item Graph: Constructed by considering subsequent items (e.g., 3) for each item in all user sequences and counting occurrences. Adjacency matrix A is row-normalized.
- Short-term Window: Sliding window size ∣L∣=5, predicting next ∣T∣=3 items.
- Multi-dimensional Attention: Parameter h controls the number of attention dimensions. Positional encoding added to item embeddings.
- Memory Network: Parameter m controls the number of memory units.
- Embedding Size: d=50 across experiments.
- Hyperparameter Tuning: h,m selected from {5,10,15,20}. Learning rate $0.001$, λ=0.001, batch size $4096$.
Evaluation:
The model is evaluated on five real-world datasets: MovieLens-20M, Amazon-Books, Amazon-CDs, Goodreads-Children, and Goodreads-Comics. Data is preprocessed by filtering users/items with less than 10 interactions and treating ratings ≥4 as positive feedback. Data is split chronologically into 70% training, 10% validation, and 20% testing. Performance is measured using Recall@10 and NDCG@10.
Experimental Results:
- MA-GNN significantly outperforms various state-of-the-art baselines (BPRMF, GRU4Rec, GRU4Rec+, GC-SAN, Caser, SASRec, MARank) on all five datasets and metrics.
- Ablation studies demonstrate the effectiveness of each proposed module. Incorporating short-term interest (GNN) improves over BPRMF. Adding long-term interest (Memory Network) and the gating fusion further improves performance, showing the gating mechanism's superiority over simple concatenation or GRU for fusion. Finally, adding the item co-occurrence module yields the best performance, highlighting the importance of this pattern.
- Hyperparameter analysis shows that both the attention dimension (h) and the number of memory units (m) influence performance, and the memory network contributes more significantly on sparser datasets like CDs.
- Memory visualization suggests that individual memory units learn to represent distinct types of user interests, as evidenced by different attention patterns for different movie genres.
Conclusion:
The paper successfully demonstrates that combining short-term context modeling via GNNs, long-term dependency modeling via a shared memory network, adaptive fusion of these interests using a gating mechanism, and explicit item co-occurrence modeling with a bilinear function leads to significant improvements in sequential recommendation performance across various datasets. The proposed MA-GNN effectively captures diverse aspects of user behavior sequences.