- The paper introduces a MA-GNN model that integrates short-term GNN-based and long-term memory network modules for capturing diverse user interests.
- It leverages a sliding window on item graphs and a shared multi-dimensional attention memory network to model recent and historical interaction patterns.
- A gating mechanism fuses these interests with explicit item co-occurrence modeling, achieving state-of-the-art performance across five real-world datasets.
The paper "Memory Augmented Graph Neural Networks for Sequential Recommendation" (Memory Augmented Graph Neural Networks for Sequential Recommendation, 2019) addresses the challenges in sequential recommendation, specifically modeling short-term user interests, long-term user interests, and item co-occurrence patterns. To tackle these issues, the authors propose a Memory Augmented Graph Neural Network (MA-GNN) model.
The MA-GNN model incorporates four key components:
- General Interest Module: This module captures the static, inherent user preferences using a standard matrix factorization term, represented by the dot product of user and item embeddings (pu⊤⋅qj). This term is independent of the item sequence dynamics.
- Short-term Interest Module: To model short-term interests based on recent interactions, the paper uses a sliding window strategy (Lu,l) on user sequences. Since sequences aren't naturally graphs, an item graph is constructed where edges are added between successive items in sequences across all users, weighted by their frequency and row-normalized (A). A two-layer Graph Neural Network (GNN) then aggregates neighboring item information within the short-term window. For an item i in the window, its representation hi is computed by aggregating its neighbors k weighted by Ai,k and combining it with its own embedding ei (Eq. 1). The short-term user interest representation pu,lS is then derived by averaging item representations hi within the window and concatenating with the user embedding pu (Eq. 2).
- Long-term Interest Modeling: To capture long-range dependencies from past interactions (Hu,l), the model utilizes a shared memory network. Instead of per-user memory, a global key-value memory (K,V) is used, where each unit represents a latent interest type. A query embedding zu,l is generated from the historical item embeddings Hu,l using a multi-dimensional attention mechanism, incorporating positional encoding to account for item order (Eq. 3). This query interacts with the memory keys K to produce attention scores (si), which are used to weight the memory values V to produce an output ou,l. The long-term user interest pu,lH is then the sum of the query and the memory output (Eq. 4). This shared memory approach helps alleviate memory overhead compared to per-user memory.
- Interest Fusion: A gating mechanism, inspired by LSTMs, is introduced to dynamically combine the short-term (pu,lS, implicitly through aggregated item representations) and long-term (pu,lH) interest representations. A learned gate gu,l controls the contribution of each component to the final combined user representation pu,lC (Eq. 5).
- Item Co-occurrence Modeling: Explicitly modeling pairwise item relationships is crucial for sequential patterns. A bilinear function (ei⊤Wrqj) is used to capture correlations between items i in the current short-term window and potential next items j. Wr is a learnable matrix.
Prediction and Training:
The final prediction score r^u,j for user u and item j, given the short-term window Lu,l and historical sequence Hu,l, is a combination of the general user interest, the fused short/long-term interest, and the average item co-occurrence score from items in the short-term window to item j (Eq. 6).
The model is trained using the Bayesian Personalized Ranking (BPR) objective, minimizing a pairwise ranking loss between positive (observed) items and randomly sampled negative (non-observed) items, combined with L2 regularization on model parameters (Eq. 7). Optimization is performed using gradient descent with back-propagation.
Implementation Details:
- Item Graph: Constructed by considering subsequent items (e.g., 3) for each item in all user sequences and counting occurrences. Adjacency matrix A is row-normalized.
- Short-term Window: Sliding window size ∣L∣=5, predicting next ∣T∣=3 items.
- Multi-dimensional Attention: Parameter h controls the number of attention dimensions. Positional encoding added to item embeddings.
- Memory Network: Parameter m controls the number of memory units.
- Embedding Size: d=50 across experiments.
- Hyperparameter Tuning: h,m selected from {5,10,15,20}. Learning rate $0.001$, λ=0.001, batch size $4096$.
Evaluation:
The model is evaluated on five real-world datasets: MovieLens-20M, Amazon-Books, Amazon-CDs, Goodreads-Children, and Goodreads-Comics. Data is preprocessed by filtering users/items with less than 10 interactions and treating ratings ≥4 as positive feedback. Data is split chronologically into 70% training, 10% validation, and 20% testing. Performance is measured using Recall@10 and NDCG@10.
Experimental Results:
- MA-GNN significantly outperforms various state-of-the-art baselines (BPRMF, GRU4Rec, GRU4Rec+, GC-SAN, Caser, SASRec, MARank) on all five datasets and metrics.
- Ablation studies demonstrate the effectiveness of each proposed module. Incorporating short-term interest (GNN) improves over BPRMF. Adding long-term interest (Memory Network) and the gating fusion further improves performance, showing the gating mechanism's superiority over simple concatenation or GRU for fusion. Finally, adding the item co-occurrence module yields the best performance, highlighting the importance of this pattern.
- Hyperparameter analysis shows that both the attention dimension (h) and the number of memory units (m) influence performance, and the memory network contributes more significantly on sparser datasets like CDs.
- Memory visualization suggests that individual memory units learn to represent distinct types of user interests, as evidenced by different attention patterns for different movie genres.
Conclusion:
The paper successfully demonstrates that combining short-term context modeling via GNNs, long-term dependency modeling via a shared memory network, adaptive fusion of these interests using a gating mechanism, and explicit item co-occurrence modeling with a bilinear function leads to significant improvements in sequential recommendation performance across various datasets. The proposed MA-GNN effectively captures diverse aspects of user behavior sequences.