Graph Convolutional Matrix Completion (GC-MC)
- Graph Convolutional Matrix Completion is a method that recasts collaborative filtering as a link prediction task on bipartite graphs, enabling efficient handling of sparse data.
- It employs a graph convolutional encoder with edge-wise message passing and dense transformations to integrate user-item features and side information.
- Empirical evaluations on benchmarks like MovieLens demonstrate GC-MC's state-of-the-art RMSE performance and robustness, especially in cold-start scenarios.
Graph Convolutional Matrix Completion (GC-MC) is a framework for matrix completion, formulated as a link prediction problem on bipartite graphs, particularly suited for collaborative filtering tasks in recommender systems. It models user-item interactions using a graph auto-encoder architecture that leverages differentiable message passing over a bipartite user-item graph, enabling seamless incorporation of side information and state-of-the-art prediction accuracy on benchmark datasets (Berg et al., 2017).
1. Problem Formulation: Matrix Completion as Graph Link Prediction
GC-MC addresses the matrix completion problem typically encountered in recommender systems, where the observed data is a sparse rating matrix with users and items. Each entry encodes an observed rating or is missing (represented by $0$). The key innovation is to recast this as a link prediction task on an undirected bipartite graph , with node set (users and items) and labeled edges corresponding to observed ratings.
For each rating level , an adjacency matrix is constructed where if . These may be assembled into a block adjacency tensor of size . Nodes are initialized with feature vectors ; in the absence of features, a unique one-hot identity is used (), while available user/item features are incorporated directly as rows of .
2. Graph-Convolutional Encoder Architecture
The encoder consists of a single graph convolutional layer followed by a dense transformation, calculating user embeddings and item embeddings .
Edge-wise Message Passing
For each edge type (rating) and edge , a message
is computed, where are learnable parameters and is a normalization constant (e.g., left-normalization: , or symmetric: ). Messages are accumulated as
with “accum” as either concatenation or sum, and an elementwise nonlinearity (typically ReLU). This is followed by a dense layer:
and similarly for item embeddings .
Vectorized Form
Denoting and letting be the degree matrix of , the entire operation for one layer is:
where stacks user and item embeddings.
3. Bilinear Decoder and Rating Prediction
Predicted distributions over ratings are computed using a bilinear softmax decoder. Given and , the distribution over rating classes is:
where are learnable, often parameterized with a shared basis. The real-valued rating prediction is taken as the expected value:
4. Training Objective and Regularization
GC-MC is trained by minimizing the negative log-likelihood over observed entries. With as a masking matrix for observed ratings,
where denotes all learnable parameters and is a coefficient for regularization. Generalization is further improved by:
- Node-dropout: dropping all outgoing messages from each node independently with probability , with rescaling.
- Standard dropout applied to the dense layer.
Mini-batch training is achieved by subsampling observed pairs and restricting computations to the subgraph induced by these nodes.
5. Incorporation of Side Information
GC-MC flexibly incorporates side information. When nodes have associated feature vectors (e.g., demographics, item content, auxiliary adjacency vectors), they are embedded via
and the final representation is
with separate parameters for users and items. In this scenario, convolutional input is typically set to to route structure via the bipartite graph and content via .
6. Empirical Evaluation and Results
Datasets and Metrics
GC-MC is evaluated chiefly on:
- MovieLens-100K, -1M, -10M: ratings in , matrix density $3$–.
- Flixster, Douban, YahooMusic: users items, available side graphs.
Performance metric is root mean squared error (RMSE) on held-out test set.
Training Protocol
- Optimizer: Adam with learning rate $0.01$.
- Decoder weight-sharing: two basis matrices.
- Encoder layer sizes: graph convolution dense $75$; no activation in final layer.
- Dropout rates: , ; choice of normalization (left/symmetric) by validation.
- Full-batch for MovieLens-100K and 1M; mini-batch size for ML-10M.
- Side-information layers: dimension $10$ (ML-100K), $64$ (others).
Comparative Results
| Dataset/Setting | Method | RMSE |
|---|---|---|
| ML-100K + side info | MC (nuclear-norm) | 0.973 |
| IMC (inductive MF) | 1.653 | |
| GMC | 0.996 | |
| GRALS | 0.945 | |
| sRGCNN | 0.929 | |
| GC-MC | 0.910 | |
| GC-MC + Feat | 0.905 | |
| ML-1M / 10M | PMF | 0.883 / – |
| I-RBM | 0.854 / 0.825 | |
| BiasedMF | 0.845 / 0.803 | |
| LLORMA-Local | 0.833 / 0.782 | |
| AutoRec | 0.831 / 0.782 | |
| CF-NADE | 0.829 / 0.771 | |
| GC-MC | 0.832 / 0.777 | |
| Flixster | GRALS | 1.313 |
| sRGCNN | 1.179 | |
| GC-MC | 0.941 | |
| Douban | GRALS | 0.833 |
| sRGCNN | 0.801 | |
| GC-MC | 0.734 | |
| YahooMusic | GRALS | 38.0 |
| sRGCNN | 22.4 | |
| GC-MC | 20.5 |
GC-MC attains state-of-the-art or near state-of-the-art RMSE on these standard collaborative filtering benchmarks.
Cold-Start Performance
Experiments on artificially constructed cold-start users in ML-100K, where each user retains only ratings, demonstrate that side information substantially improves recovery of good embeddings as , implicating the efficacy of content features in data-scarce regimes.
7. Summary and Significance
GC-MC frames collaborative filtering as bipartite graph link prediction, employing a single-layer graph convolutional encoder (message passing plus dense transform) with a bilinear softmax decoder. Its architecture enables the direct integration of arbitrary node features or auxiliary graphs and is computationally efficient through mini-batch training. Empirical results on MovieLens and other graph-structured datasets establish GC-MC as delivering competitive or leading accuracy under standard RMSE evaluation, especially notable in settings enriched with side information (Berg et al., 2017).