Papers
Topics
Authors
Recent
Search
2000 character limit reached

Graph Convolutional Matrix Completion (GC-MC)

Updated 13 February 2026
  • Graph Convolutional Matrix Completion is a method that recasts collaborative filtering as a link prediction task on bipartite graphs, enabling efficient handling of sparse data.
  • It employs a graph convolutional encoder with edge-wise message passing and dense transformations to integrate user-item features and side information.
  • Empirical evaluations on benchmarks like MovieLens demonstrate GC-MC's state-of-the-art RMSE performance and robustness, especially in cold-start scenarios.

Graph Convolutional Matrix Completion (GC-MC) is a framework for matrix completion, formulated as a link prediction problem on bipartite graphs, particularly suited for collaborative filtering tasks in recommender systems. It models user-item interactions using a graph auto-encoder architecture that leverages differentiable message passing over a bipartite user-item graph, enabling seamless incorporation of side information and state-of-the-art prediction accuracy on benchmark datasets (Berg et al., 2017).

GC-MC addresses the matrix completion problem typically encountered in recommender systems, where the observed data is a sparse rating matrix MRNu×NvM \in \mathbb{R}^{N_u \times N_v} with NuN_u users and NvN_v items. Each entry MijM_{ij} encodes an observed rating r{1,,R}r\in\{1,\dots,R\} or is missing (represented by $0$). The key innovation is to recast this as a link prediction task on an undirected bipartite graph G=(W,E,R)G = (\mathcal{W}, \mathcal{E}, \mathcal{R}), with node set W=UV\mathcal{W} = \mathcal{U} \cup \mathcal{V} (users and items) and labeled edges (ui,r,vj)E(u_i, r, v_j) \in \mathcal{E} corresponding to observed ratings.

For each rating level rr, an adjacency matrix Mr{0,1}Nu×NvM_r \in \{0,1\}^{N_u\times N_v} is constructed where (Mr)ij=1(M_r)_{ij}=1 if Mij=rM_{ij}=r. These may be assembled into a block adjacency tensor Mr=[0Mr MrT0]\mathcal{M}_r = \begin{bmatrix} 0 & M_r \ M_r^T & 0 \end{bmatrix} of size (Nu+Nv)×(Nu+Nv)(N_u+N_v)\times(N_u+N_v). Nodes are initialized with feature vectors xiRDx_i\in\mathbb{R}^D; in the absence of features, a unique one-hot identity is used (X=INu+NvX = I_{N_u+N_v}), while available user/item features are incorporated directly as rows of XR(Nu+Nv)×DX\in\mathbb{R}^{(N_u+N_v)\times D}.

2. Graph-Convolutional Encoder Architecture

The encoder consists of a single graph convolutional layer followed by a dense transformation, calculating user embeddings URNu×EU \in \mathbb{R}^{N_u \times E} and item embeddings VRNv×EV \in \mathbb{R}^{N_v \times E}.

Edge-wise Message Passing

For each edge type (rating) rr and edge jij \to i, a message

μji,r=1cijWrxj\mu_{j \rightarrow i, r} = \frac{1}{c_{ij}} W_r x_j

is computed, where WrRE×DW_r \in \mathbb{R}^{E \times D} are learnable parameters and cijc_{ij} is a normalization constant (e.g., left-normalization: cij=Nic_{ij} = |\mathcal{N}_i|, or symmetric: cij=NiNjc_{ij} = \sqrt{|\mathcal{N}_i||\mathcal{N}_j|}). Messages are accumulated as

hi=σ(accum(jNi,1μji,1,,jNi,Rμji,R)),h_i = \sigma\Bigl(\mathrm{accum}\bigl(\sum_{j\in\mathcal{N}_{i,1}}\mu_{j\to i,1}, \dots, \sum_{j\in\mathcal{N}_{i,R}} \mu_{j\to i,R}\bigr)\Bigr),

with “accum” as either concatenation or sum, and σ\sigma an elementwise nonlinearity (typically ReLU). This is followed by a dense layer:

ui=σ(Whi),u_i = \sigma(W h_i),

and similarly for item embeddings vjv_j.

Vectorized Form

Denoting Mr\mathcal{M}_r and letting DD be the degree matrix of rMr\sum_r \mathcal{M}_r, the entire operation for one layer is:

H=σ ⁣(r=1RD1MrXWrT),H = \sigma\!\Bigl(\sum_{r=1}^R D^{-1}\,\mathcal M_r X W_r^T\Bigr),

[U;V]=σ(HWT),[U;V] = \sigma(H W^T),

where [U;V][U;V] stacks user and item embeddings.

3. Bilinear Decoder and Rating Prediction

Predicted distributions over ratings are computed using a bilinear softmax decoder. Given uiREu_i \in \mathbb{R}^E and vjREv_j \in \mathbb{R}^E, the distribution over rating classes rr is:

p(M^ij=r)=exp(uiTQrvj)s=1Rexp(uiTQsvj),p(\hat{M}_{ij} = r) = \frac{\exp\left(u_i^T Q_r v_j\right)}{\sum_{s=1}^R \exp\left(u_i^T Q_s v_j\right)},

where QrRE×EQ_r \in \mathbb{R}^{E\times E} are learnable, often parameterized with a shared basis. The real-valued rating prediction is taken as the expected value:

M^ij=r=1Rr  p(M^ij=r).\hat M_{ij} = \sum_{r=1}^R r\;p(\hat M_{ij}=r).

4. Training Objective and Regularization

GC-MC is trained by minimizing the negative log-likelihood over observed entries. With Ω{0,1}Nu×Nv\Omega \in \{0,1\}^{N_u \times N_v} as a masking matrix for observed ratings,

L=i,j:Ωij=1r=1R1[r=Mij]logp(M^ij=r)+λΘ22,\mathcal{L} = - \sum_{i,j: \Omega_{ij}=1} \sum_{r=1}^R 1[r=M_{ij}]\,\log p(\hat M_{ij}=r) + \lambda \|\Theta\|^2_2,

where Θ\Theta denotes all learnable parameters and λ\lambda is a coefficient for L2L_2 regularization. Generalization is further improved by:

  • Node-dropout: dropping all outgoing messages from each node independently with probability pdropoutp_\text{dropout}, with rescaling.
  • Standard dropout applied to the dense layer.

Mini-batch training is achieved by subsampling observed (i,j)(i,j) pairs and restricting computations to the subgraph induced by these nodes.

5. Incorporation of Side Information

GC-MC flexibly incorporates side information. When nodes have associated feature vectors xifRDfx_i^f\in\mathbb{R}^{D_f} (e.g., demographics, item content, auxiliary adjacency vectors), they are embedded via

fi=σ(W1fxif+b),f_i = \sigma(W_1^f x_i^f + b),

and the final representation is

ui=σ(Whi+W2ffi),u_i = \sigma(W h_i + W_2^f f_i),

with separate parameters W1f,W2f,bW_1^f,\, W_2^f,\, b for users and items. In this scenario, convolutional input is typically set to X=IX=I to route structure via the bipartite graph and content via fif_i.

6. Empirical Evaluation and Results

Datasets and Metrics

GC-MC is evaluated chiefly on:

  • MovieLens-100K, -1M, -10M: ratings in {1,,5}\{1, \ldots, 5\}, matrix density $3$–6%6\%.
  • Flixster, Douban, YahooMusic: 3K3\text{K} users ×3K\times 3\text{K} items, available side graphs.

Performance metric is root mean squared error (RMSE) on held-out test set.

Training Protocol

  • Optimizer: Adam with learning rate $0.01$.
  • Decoder weight-sharing: two basis matrices.
  • Encoder layer sizes: graph convolution 500ReLU500 \to \text{ReLU} \to dense $75$; no activation in final layer.
  • Dropout rates: pnode0.7p_\text{node}\approx 0.7, pdense0.7p_\text{dense} \approx 0.7; choice of normalization (left/symmetric) by validation.
  • Full-batch for MovieLens-100K and 1M; mini-batch size 10,00010{,}000 for ML-10M.
  • Side-information layers: dimension $10$ (ML-100K), $64$ (others).

Comparative Results

Dataset/Setting Method RMSE
ML-100K + side info MC (nuclear-norm) 0.973
IMC (inductive MF) 1.653
GMC 0.996
GRALS 0.945
sRGCNN 0.929
GC-MC 0.910
GC-MC + Feat 0.905
ML-1M / 10M PMF 0.883 / –
I-RBM 0.854 / 0.825
BiasedMF 0.845 / 0.803
LLORMA-Local 0.833 / 0.782
AutoRec 0.831 / 0.782
CF-NADE 0.829 / 0.771
GC-MC 0.832 / 0.777
Flixster GRALS 1.313
sRGCNN 1.179
GC-MC 0.941
Douban GRALS 0.833
sRGCNN 0.801
GC-MC 0.734
YahooMusic GRALS 38.0
sRGCNN 22.4
GC-MC 20.5

GC-MC attains state-of-the-art or near state-of-the-art RMSE on these standard collaborative filtering benchmarks.

Cold-Start Performance

Experiments on artificially constructed cold-start users in ML-100K, where each user retains only NrN_r ratings, demonstrate that side information fif_i substantially improves recovery of good embeddings as Nr1N_r \rightarrow 1, implicating the efficacy of content features in data-scarce regimes.

7. Summary and Significance

GC-MC frames collaborative filtering as bipartite graph link prediction, employing a single-layer graph convolutional encoder (message passing plus dense transform) with a bilinear softmax decoder. Its architecture enables the direct integration of arbitrary node features or auxiliary graphs and is computationally efficient through mini-batch training. Empirical results on MovieLens and other graph-structured datasets establish GC-MC as delivering competitive or leading accuracy under standard RMSE evaluation, especially notable in settings enriched with side information (Berg et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Graph Convolutional Matrix Completion (GC-MC).