Memory Augmented Graph Neural Networks for Sequential Recommendation (1912.11730v1)

Published 26 Dec 2019 in cs.IR

Abstract: The chronological order of user-item interactions can reveal time-evolving and sequential user behaviors in many recommender systems. The items that users will interact with may depend on the items accessed in the past. However, the substantial increase of users and items makes sequential recommender systems still face non-trivial challenges: (1) the hardness of modeling the short-term user interests; (2) the difficulty of capturing the long-term user interests; (3) the effective modeling of item co-occurrence patterns. To tackle these challenges, we propose a memory augmented graph neural network (MA-GNN) to capture both the long- and short-term user interests. Specifically, we apply a graph neural network to model the item contextual information within a short-term period and utilize a shared memory network to capture the long-range dependencies between items. In addition to the modeling of user interests, we employ a bilinear function to capture the co-occurrence patterns of related items. We extensively evaluate our model on five real-world datasets, comparing with several state-of-the-art methods and using a variety of performance metrics. The experimental results demonstrate the effectiveness of our model for the task of Top-K sequential recommendation.

Citations (196)

View on Semantic Scholar

Summary

The paper introduces a MA-GNN model that integrates short-term GNN-based and long-term memory network modules for capturing diverse user interests.
It leverages a sliding window on item graphs and a shared multi-dimensional attention memory network to model recent and historical interaction patterns.
A gating mechanism fuses these interests with explicit item co-occurrence modeling, achieving state-of-the-art performance across five real-world datasets.

The paper "Memory Augmented Graph Neural Networks for Sequential Recommendation" (Memory Augmented Graph Neural Networks for Sequential Recommendation, 2019) addresses the challenges in sequential recommendation, specifically modeling short-term user interests, long-term user interests, and item co-occurrence patterns. To tackle these issues, the authors propose a Memory Augmented Graph Neural Network (MA-GNN) model.

The MA-GNN model incorporates four key components:

General Interest Module: This module captures the static, inherent user preferences using a standard matrix factorization term, represented by the dot product of user and item embeddings ( $\mathbf{p}_u^{\top} \cdot \mathbf{q}_j$ ). This term is independent of the item sequence dynamics.
Short-term Interest Module: To model short-term interests based on recent interactions, the paper uses a sliding window strategy ( $L_{u,l}$ ) on user sequences. Since sequences aren't naturally graphs, an item graph is constructed where edges are added between successive items in sequences across all users, weighted by their frequency and row-normalized ( $\mathbf{A}$ ). A two-layer Graph Neural Network (GNN) then aggregates neighboring item information within the short-term window. For an item $i$ in the window, its representation $\mathbf{h}_i$ is computed by aggregating its neighbors $k$ weighted by $A_{i,k}$ and combining it with its own embedding $\mathbf{e}_i$ (Eq. 1). The short-term user interest representation $\mathbf{p}^{S}_{u,l}$ is then derived by averaging item representations $\mathbf{h}_i$ within the window and concatenating with the user embedding $\mathbf{p}_u$ (Eq. 2).
Long-term Interest Modeling: To capture long-range dependencies from past interactions ( $H_{u,l}$ ), the model utilizes a shared memory network. Instead of per-user memory, a global key-value memory ( $\mathbf{K}, \mathbf{V}$ ) is used, where each unit represents a latent interest type. A query embedding $\mathbf{z}_{u,l}$ is generated from the historical item embeddings $\mathbf{H}_{u,l}$ using a multi-dimensional attention mechanism, incorporating positional encoding to account for item order (Eq. 3). This query interacts with the memory keys $\mathbf{K}$ to produce attention scores ( $s_i$ ), which are used to weight the memory values $\mathbf{V}$ to produce an output $\mathbf{o}_{u,l}$ . The long-term user interest $\mathbf{p}^{H}_{u,l}$ is then the sum of the query and the memory output (Eq. 4). This shared memory approach helps alleviate memory overhead compared to per-user memory.
Interest Fusion: A gating mechanism, inspired by LSTMs, is introduced to dynamically combine the short-term ( $\mathbf{p}^{S}_{u,l}$ , implicitly through aggregated item representations) and long-term ( $\mathbf{p}^{H}_{u,l}$ ) interest representations. A learned gate $\mathbf{g}_{u,l}$ controls the contribution of each component to the final combined user representation $\mathbf{p}_{u,l}^{C}$ (Eq. 5).
Item Co-occurrence Modeling: Explicitly modeling pairwise item relationships is crucial for sequential patterns. A bilinear function ( $\mathbf{e}^{\top}_i \, \mathbf{W}_r \, \mathbf{q}_j$ ) is used to capture correlations between items $i$ in the current short-term window and potential next items $j$ . $\mathbf{W}_r$ is a learnable matrix.

Prediction and Training:

The final prediction score $\hat{r}_{u,j}$ for user $u$ and item $j$ , given the short-term window $L_{u,l}$ and historical sequence $H_{u,l}$ , is a combination of the general user interest, the fused short/long-term interest, and the average item co-occurrence score from items in the short-term window to item $j$ (Eq. 6).

The model is trained using the Bayesian Personalized Ranking (BPR) objective, minimizing a pairwise ranking loss between positive (observed) items and randomly sampled negative (non-observed) items, combined with $L_2$ regularization on model parameters (Eq. 7). Optimization is performed using gradient descent with back-propagation.

Implementation Details:

Item Graph: Constructed by considering subsequent items (e.g., 3) for each item in all user sequences and counting occurrences. Adjacency matrix $\mathbf{A}$ is row-normalized.
Short-term Window: Sliding window size $|L|=5$ , predicting next $|T|=3$ items.
Multi-dimensional Attention: Parameter $h$ controls the number of attention dimensions. Positional encoding added to item embeddings.
Memory Network: Parameter $m$ controls the number of memory units.
Embedding Size: $d=50$ across experiments.
Hyperparameter Tuning: $h, m$ selected from $\{5, 10, 15, 20\}$ . Learning rate $0.001$, $\lambda = 0.001$ , batch size $4096$.

Evaluation:

The model is evaluated on five real-world datasets: MovieLens-20M, Amazon-Books, Amazon-CDs, Goodreads-Children, and Goodreads-Comics. Data is preprocessed by filtering users/items with less than 10 interactions and treating ratings $\ge 4$ as positive feedback. Data is split chronologically into 70% training, 10% validation, and 20% testing. Performance is measured using Recall@10 and NDCG@10.

Experimental Results:

MA-GNN significantly outperforms various state-of-the-art baselines (BPRMF, GRU4Rec, GRU4Rec+, GC-SAN, Caser, SASRec, MARank) on all five datasets and metrics.
Ablation studies demonstrate the effectiveness of each proposed module. Incorporating short-term interest (GNN) improves over BPRMF. Adding long-term interest (Memory Network) and the gating fusion further improves performance, showing the gating mechanism's superiority over simple concatenation or GRU for fusion. Finally, adding the item co-occurrence module yields the best performance, highlighting the importance of this pattern.
Hyperparameter analysis shows that both the attention dimension ( $h$ ) and the number of memory units ( $m$ ) influence performance, and the memory network contributes more significantly on sparser datasets like CDs.
Memory visualization suggests that individual memory units learn to represent distinct types of user interests, as evidenced by different attention patterns for different movie genres.

Conclusion:

The paper successfully demonstrates that combining short-term context modeling via GNNs, long-term dependency modeling via a shared memory network, adaptive fusion of these interests using a gating mechanism, and explicit item co-occurrence modeling with a bilinear function leads to significant improvements in sequential recommendation performance across various datasets. The proposed MA-GNN effectively captures diverse aspects of user behavior sequences.

PDF Markdown

Memory Augmented Graph Neural Networks for Sequential Recommendation (1912.11730v1)

Summary

Related Papers