UltraGCN: Simplified Graph Collaborative Filtering

Updated 3 September 2025

UltraGCN is a graph-based collaborative filtering model that simplifies GCN architecture by approximating infinite-layer propagation through direct constraint enforcement.
It employs degree-based edge weighting and auxiliary losses to balance user–item and item–item interactions, reducing over-smoothing effects.
The model achieves up to 10x computational efficiency and significant performance gains on benchmarks compared to traditional GCN methods.

UltraGCN is a graph-based collaborative filtering model that radically simplifies the architecture of graph convolutional networks (GCNs) for recommendation tasks by eliminating explicit, iterative message passing. Instead, UltraGCN approximates the closed-form solution of infinite-layer graph convolution through direct constraint enforcement on user and item embeddings. By optimizing embedding alignment to neighbor aggregates under well-designed edge weighting schemes, UltraGCN achieves high recommendation accuracy with an order-of-magnitude improvement in computational efficiency over prevailing GCN approaches, such as LightGCN. Its design also offers strong adaptability for large-scale deployments and integration with contemporary recommender system workflows.

1. Theoretical Motivation and Model Formulation

UltraGCN is founded on the observation that iterative message passing in traditional GCNs and their simplified variants, such as LightGCN, incurs unneeded computational overhead and can cause undesirable effects like over-smoothing. In the limit of infinite propagation layers, node embeddings converge to a fixed point characterized by a weighted aggregation over their neighbors. UltraGCN forgoes explicit propagation layers and instead enforces the following convergence relationship directly in its optimization:

$e_u = \sum_{i \in \mathcal{N}(u)} \beta_{u,i} e_i$

The constraint coefficient $\beta_{u,i}$ for each user–item edge is defined as:

$\beta_{u,i} = \frac{1}{d_u} \sqrt{\frac{d_u + 1}{d_i + 1}}$

where $d_u$ and $d_i$ are the degrees of user $u$ and item $i$ , respectively. This setup reinterprets graph convolution as a special weighted matrix factorization with coefficients reflecting graph topology, eliminating the need for message passing.

The key loss function for enforcing this constraint is (using sigmoid $\sigma$ and negative sampling):

$\mathcal{L}_C = -\sum_{(u, i) \in N^+} \beta_{u,i} \log \sigma(e_u^\top e_i) - \sum_{(u, j) \in N^-} \beta_{u,j} \log \sigma(-e_u^\top e_j)$

This objective compels the representations to match their “infinitely propagated” states, while negative sampling counters the over-smoothing of the latent space.

2. Edge Weighting and Structural Constraints

Unlike previous GCN-based recommenders where normalization may be symmetric or heuristically defined, UltraGCN’s weighting scheme is grounded in the graph’s local structure and degree statistics. The coefficient $\beta_{u,i}$ ensures that interactions from highly active users or popular items are relatively down-weighted, which empirically helps mitigate dominance by high-degree nodes and fosters representation diversity.

UltraGCN further extends this approach by modeling item–item and, optionally, user–user relations through auxiliary projected graphs. The item–item graph is derived from co-occurrence (e.g., $A^\top A$ ), and edges are weighted with:

$\omega_{i, j} \propto \frac{G_{i,j}}{g_i \cdot g_j}$

where $G_{i,j}$ counts co-occurrences and $g_i = \sum_k G_{i,k}$ .

An auxiliary item-level constraint loss is introduced:

$\mathcal{L}_I = -\sum_{(u,i)\in N^+} \sum_{j \in S(i)} \omega_{i,j} \log \sigma(e_u^\top e_j)$

This dual-loss design enables UltraGCN to flexibly modulate the influence of user–item versus item–item signals, typically balanced by separate hyperparameters.

3. Optimization, Complexity, and Practical Scalability

UltraGCN optimizes the embeddings of users and items as the only learnable parameters, analogous to straightforward matrix factorization models. The structure constraint and auxiliary losses are jointly minimized using negative sampling and mini-batch stochastic gradient descent.

The per-epoch time complexity is: $O((K + R + 1) |A^+| (d^2 + 1))$ where $K$ is the top- $K$ neighbors in the auxiliary graphs, $R$ is the number of negative samples, $|A^+|$ is the number of positive interactions, and $d$ is the embedding dimensionality.

UltraGCN achieves more than $10\times$ speedup over LightGCN in both per-epoch runtime and convergence speed, with the empirical result that while LightGCN may require over 11 hours for convergence on a benchmark dataset, UltraGCN completes training in approximately 45 minutes.

Such efficiency is primarily attributable to the avoidance of layer-wise embedding propagation and the closed-form alignment to the infinite-propagation limit.

4. Empirical Evaluation and Performance

On four standard benchmarks—Amazon-Book, Yelp2018, Gowalla, and MovieLens-1M—UltraGCN consistently outperforms contemporary GCN-based models. Notably, on Amazon-Book, UltraGCN provides improvements up to $76.6\%$ in NDCG@20 against the strongest baseline. The improvements are statistically significant, with reported $p$ -values in the range $10^{-8}$ to $10^{-5}$ .

Additionally, UltraGCN is highly reproducible, with replications deviating by at most $10^{-3}$ in recall from the original reports across datasets, indicating robustness to stochastic variations or implementation differences (Anelli et al., 2023).

The model demonstrates adaptability to different dataset topologies, with empirical analyses showing that structural characteristics, such as average degree and assortativity, bear strong and statistically significant linear relationships to accuracy (Malitesta et al., 2023). This sensitivity to topological nuances enables UltraGCN to generalize robustly across diverse regimes, from sparse to dense user–item graphs.

5. Addressing Popularity Bias and Tail Recommendations

Despite improved global metrics, GCN-based recommenders—including UltraGCN—are prone to popularity bias: propensity to over-recommend popular items at the expense of long-tail diversity. The DAP (“Debias the Amplification of Popularity”) method intervenes in this bias by estimating the influence of high-degree nodes on embedding geometry and post-hoc removing this effect (Chen et al., 2023).

Within the UltraGCN framework, cluster-based debiasing is performed at inference by separating higher- and lower-degree neighbors, pooling their embeddings, measuring their similarity to each node’s own embedding, and subtracting a weighted projection to yield a de-biased representation:

$\hat{e}_v^{(l)} = e_v^{(l)} - \left [ \alpha \mathcal{M}(e_v^{(l)}, \hat{H}_v^{(l)}) \hat{H}_v^{(l)} + \beta \mathcal{M}(e_v^{(l)}, \hat{L}_v^{(l)}) \hat{L}_v^{(l)} \right ]$

Deploying DAP leads to increased recall and NDCG for tail items, as measured by the TR@20 metric, without significant sacrifices in head performance.

6. Linear Folding-in and Warm-Start Updates

UltraGCN's latest extensions further improve practicality for large-scale, dynamic scenarios by enabling efficient “warm start” recommendations (Yusupov et al., 1 Sep 2025). For newly active users, a closed-form linear folding-in update is computed:

$\min_{e_u} \lVert a_u^\top - \beta_u e_u^\top V^\top B_I \rVert_2^2$

where $a_u$ is the user’s interaction vector, $V$ the item embedding matrix, $B_I$ the diagonal item weight matrix, and $\beta_u$ a user-specific normalization.

The solution is given by:

$e_u^\top = \frac{1}{\beta_u} (a_u^\top B_I^{-1}) V^{+}$

where $V^{+}$ is the (pseudo-)inverse of $V$ .

This approach yields up to $30\times$ speedup in updating recommendations for active users, with resource requirements scaling linearly in item catalogue size and maintaining recommendation quality competitive with full retraining.

7. Limitations, Extensions, and Position within the Literature

While UltraGCN addresses computational limitations and over-smoothing of deep GCNs, it does not fully resolve the challenge of capturing heterophilic (non-homophilic) user–item interactions or local topological structures beyond the degree-normalized aggregate. State-of-the-art hypergraph methods such as WaveHDNN have been shown to outperform UltraGCN on datasets with high category diversity or where group-wise locality is prominent, by integrating explicit heterophily-aware encoders and multi-scale structural modeling (Sakong et al., 28 Jan 2025).

Additionally, studies establishing the theoretical equivalence between item recommendation and link prediction (e.g., with scoring via dot products) reaffirm that UltraGCN’s architecture can be viewed as an advanced instantiation of matrix factorization regularized by graph constraints (Malitesta et al., 11 Sep 2024).

UltraGCN’s empirical robustness, closed-form simplicity, and tunable relations weighting position it as a defensible baseline and practical system in both research and production-scale recommender contexts. Its ability to leverage graph topology via structural constraints, rather than deep propagation or explicit regularization, exemplifies an important trend in the evolution of scalable graph collaborative filtering.