Graph Convolutional Matrix Completion (GC-MC)

Updated 13 February 2026

Graph Convolutional Matrix Completion is a method that recasts collaborative filtering as a link prediction task on bipartite graphs, enabling efficient handling of sparse data.
It employs a graph convolutional encoder with edge-wise message passing and dense transformations to integrate user-item features and side information.
Empirical evaluations on benchmarks like MovieLens demonstrate GC-MC's state-of-the-art RMSE performance and robustness, especially in cold-start scenarios.

Graph Convolutional Matrix Completion (GC-MC) is a framework for matrix completion, formulated as a link prediction problem on bipartite graphs, particularly suited for collaborative filtering tasks in recommender systems. It models user-item interactions using a graph auto-encoder architecture that leverages differentiable message passing over a bipartite user-item graph, enabling seamless incorporation of side information and state-of-the-art prediction accuracy on benchmark datasets (Berg et al., 2017).

1. Problem Formulation: Matrix Completion as Graph Link Prediction

GC-MC addresses the matrix completion problem typically encountered in recommender systems, where the observed data is a sparse rating matrix $M \in \mathbb{R}^{N_u \times N_v}$ with $N_u$ users and $N_v$ items. Each entry $M_{ij}$ encodes an observed rating $r\in\{1,\dots,R\}$ or is missing (represented by $0$). The key innovation is to recast this as a link prediction task on an undirected bipartite graph $G = (\mathcal{W}, \mathcal{E}, \mathcal{R})$ , with node set $\mathcal{W} = \mathcal{U} \cup \mathcal{V}$ (users and items) and labeled edges $(u_i, r, v_j) \in \mathcal{E}$ corresponding to observed ratings.

For each rating level $r$ , an adjacency matrix $M_r \in \{0,1\}^{N_u\times N_v}$ is constructed where $(M_r)_{ij}=1$ if $M_{ij}=r$ . These may be assembled into a block adjacency tensor $\mathcal{M}_r = \begin{bmatrix} 0 & M_r \ M_r^T & 0 \end{bmatrix}$ of size $(N_u+N_v)\times(N_u+N_v)$ . Nodes are initialized with feature vectors $x_i\in\mathbb{R}^D$ ; in the absence of features, a unique one-hot identity is used ( $X = I_{N_u+N_v}$ ), while available user/item features are incorporated directly as rows of $X\in\mathbb{R}^{(N_u+N_v)\times D}$ .

2. Graph-Convolutional Encoder Architecture

The encoder consists of a single graph convolutional layer followed by a dense transformation, calculating user embeddings $U \in \mathbb{R}^{N_u \times E}$ and item embeddings $V \in \mathbb{R}^{N_v \times E}$ .

Edge-wise Message Passing

For each edge type (rating) $r$ and edge $j \to i$ , a message

$\mu_{j \rightarrow i, r} = \frac{1}{c_{ij}} W_r x_j$

is computed, where $W_r \in \mathbb{R}^{E \times D}$ are learnable parameters and $c_{ij}$ is a normalization constant (e.g., left-normalization: $c_{ij} = |\mathcal{N}_i|$ , or symmetric: $c_{ij} = \sqrt{|\mathcal{N}_i||\mathcal{N}_j|}$ ). Messages are accumulated as

$h_i = \sigma\Bigl(\mathrm{accum}\bigl(\sum_{j\in\mathcal{N}_{i,1}}\mu_{j\to i,1}, \dots, \sum_{j\in\mathcal{N}_{i,R}} \mu_{j\to i,R}\bigr)\Bigr),$

with “accum” as either concatenation or sum, and $\sigma$ an elementwise nonlinearity (typically ReLU). This is followed by a dense layer:

$u_i = \sigma(W h_i),$

and similarly for item embeddings $v_j$ .

Vectorized Form

Denoting $\mathcal{M}_r$ and letting $D$ be the degree matrix of $\sum_r \mathcal{M}_r$ , the entire operation for one layer is:

$H = \sigma\!\Bigl(\sum_{r=1}^R D^{-1}\,\mathcal M_r X W_r^T\Bigr),$

$[U;V] = \sigma(H W^T),$

where $[U;V]$ stacks user and item embeddings.

3. Bilinear Decoder and Rating Prediction

Predicted distributions over ratings are computed using a bilinear softmax decoder. Given $u_i \in \mathbb{R}^E$ and $v_j \in \mathbb{R}^E$ , the distribution over rating classes $r$ is:

$p(\hat{M}_{ij} = r) = \frac{\exp\left(u_i^T Q_r v_j\right)}{\sum_{s=1}^R \exp\left(u_i^T Q_s v_j\right)},$

where $Q_r \in \mathbb{R}^{E\times E}$ are learnable, often parameterized with a shared basis. The real-valued rating prediction is taken as the expected value:

$\hat M_{ij} = \sum_{r=1}^R r\;p(\hat M_{ij}=r).$

4. Training Objective and Regularization

GC-MC is trained by minimizing the negative log-likelihood over observed entries. With $\Omega \in \{0,1\}^{N_u \times N_v}$ as a masking matrix for observed ratings,

$\mathcal{L} = - \sum_{i,j: \Omega_{ij}=1} \sum_{r=1}^R 1[r=M_{ij}]\,\log p(\hat M_{ij}=r) + \lambda \|\Theta\|^2_2,$

where $\Theta$ denotes all learnable parameters and $\lambda$ is a coefficient for $L_2$ regularization. Generalization is further improved by:

Node-dropout: dropping all outgoing messages from each node independently with probability $p_\text{dropout}$ , with rescaling.
Standard dropout applied to the dense layer.

Mini-batch training is achieved by subsampling observed $(i,j)$ pairs and restricting computations to the subgraph induced by these nodes.

5. Incorporation of Side Information

GC-MC flexibly incorporates side information. When nodes have associated feature vectors $x_i^f\in\mathbb{R}^{D_f}$ (e.g., demographics, item content, auxiliary adjacency vectors), they are embedded via

$f_i = \sigma(W_1^f x_i^f + b),$

and the final representation is

$u_i = \sigma(W h_i + W_2^f f_i),$

with separate parameters $W_1^f,\, W_2^f,\, b$ for users and items. In this scenario, convolutional input is typically set to $X=I$ to route structure via the bipartite graph and content via $f_i$ .

6. Empirical Evaluation and Results

Datasets and Metrics

GC-MC is evaluated chiefly on:

MovieLens-100K, -1M, -10M: ratings in $\{1, \ldots, 5\}$ , matrix density $3$– $6\%$ .
Flixster, Douban, YahooMusic: $3\text{K}$ users $\times 3\text{K}$ items, available side graphs.

Performance metric is root mean squared error (RMSE) on held-out test set.

Training Protocol

Optimizer: Adam with learning rate $0.01$.
Decoder weight-sharing: two basis matrices.
Encoder layer sizes: graph convolution $500 \to \text{ReLU} \to$ dense $75$; no activation in final layer.
Dropout rates: $p_\text{node}\approx 0.7$ , $p_\text{dense} \approx 0.7$ ; choice of normalization (left/symmetric) by validation.
Full-batch for MovieLens-100K and 1M; mini-batch size $10{,}000$ for ML-10M.
Side-information layers: dimension $10$ (ML-100K), $64$ (others).

Comparative Results

Dataset/Setting	Method	RMSE
ML-100K + side info	MC (nuclear-norm)	0.973
	IMC (inductive MF)	1.653
	GMC	0.996
	GRALS	0.945
	sRGCNN	0.929
	GC-MC	0.910
	GC-MC + Feat	0.905
ML-1M / 10M	PMF	0.883 / –
	I-RBM	0.854 / 0.825
	BiasedMF	0.845 / 0.803
	LLORMA-Local	0.833 / 0.782
	AutoRec	0.831 / 0.782
	CF-NADE	0.829 / 0.771
	GC-MC	0.832 / 0.777
Flixster	GRALS	1.313
	sRGCNN	1.179
	GC-MC	0.941
Douban	GRALS	0.833
	sRGCNN	0.801
	GC-MC	0.734
YahooMusic	GRALS	38.0
	sRGCNN	22.4
	GC-MC	20.5

GC-MC attains state-of-the-art or near state-of-the-art RMSE on these standard collaborative filtering benchmarks.

Cold-Start Performance

Experiments on artificially constructed cold-start users in ML-100K, where each user retains only $N_r$ ratings, demonstrate that side information $f_i$ substantially improves recovery of good embeddings as $N_r \rightarrow 1$ , implicating the efficacy of content features in data-scarce regimes.

7. Summary and Significance

GC-MC frames collaborative filtering as bipartite graph link prediction, employing a single-layer graph convolutional encoder (message passing plus dense transform) with a bilinear softmax decoder. Its architecture enables the direct integration of arbitrary node features or auxiliary graphs and is computationally efficient through mini-batch training. Empirical results on MovieLens and other graph-structured datasets establish GC-MC as delivering competitive or leading accuracy under standard RMSE evaluation, especially notable in settings enriched with side information (Berg et al., 2017).

Markdown Report Issue Upgrade to Chat

References (1)

Graph Convolutional Matrix Completion (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Graph Convolutional Matrix Completion (GC-MC).

Graph Convolutional Matrix Completion (GC-MC)

1. Problem Formulation: Matrix Completion as Graph Link Prediction

2. Graph-Convolutional Encoder Architecture

Edge-wise Message Passing

Vectorized Form

3. Bilinear Decoder and Rating Prediction

4. Training Objective and Regularization

5. Incorporation of Side Information

6. Empirical Evaluation and Results

Datasets and Metrics

Training Protocol

Comparative Results

Cold-Start Performance

7. Summary and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Graph Convolutional Matrix Completion (GC-MC)

1. Problem Formulation: Matrix Completion as Graph Link Prediction

2. Graph-Convolutional Encoder Architecture

Edge-wise Message Passing

Vectorized Form

3. Bilinear Decoder and Rating Prediction

4. Training Objective and Regularization

5. Incorporation of Side Information

6. Empirical Evaluation and Results

Datasets and Metrics

Training Protocol

Comparative Results

Cold-Start Performance

7. Summary and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research