FourierKAN-GCF: Graph Collaborative Filtering

Updated 2 March 2026

Graph Collaborative Filtering (FourierKAN-GCF) is a graph-based recommendation architecture that employs Fourier-parameterized, nonlinear transformations to enhance user–item interactions.
It balances the expressive power of neural models with the simplicity of aggregation, leveraging the Kolmogorov–Arnold theorem for universal approximation.
Empirical results show state-of-the-art performance on benchmarks like MOOC and Amazon Games, with improved Recall@20 and NDCG@20 metrics.

Graph Collaborative Filtering (FourierKAN-GCF) denotes a class of graph-based recommendation architectures that integrate nonlinear feature transformations based on the Kolmogorov–Arnold theorem, instantiated via a Fourier parameterization, into the message passing process of graph collaborative filtering models. This design balances the representational capacity of earlier neural graph collaborative filtering approaches with the stability and simplicity of aggregation-centric models. FourierKAN-GCF achieves state-of-the-art accuracy and robustness on collaborative filtering tasks for large-scale, sparse, user–item implicit-feedback graphs (Xu et al., 2024).

1. Foundations and Theoretical Motivation

Graph collaborative filtering (GCF) models exploit the topology of the user–item bipartite interaction graph to propagate user and item embeddings across local neighborhoods. Early approaches such as NGCF (Neural Graph Collaborative Filtering) conduct layer-wise message passing, where both self-feature transformations and explicit interactions (such as element-wise user–item embeddings multiplications passed through a multilayer perceptron) are included in each layer’s update. LightGCN removes both the feature transformation (weight matrix) and nonlinear activation, retaining only normalized neighbor aggregation. Empirical ablation studies have demonstrated that while removing the self-feature transform (matrix $W_1$ ) does not degrade accuracy, eliminating the explicit interaction transform (matrix $W_2$ ) or the interaction term diminishes performance, underscoring the value of nonlinear interaction modeling (Xu et al., 2024).

Kolmogorov–Arnold Networks (KAN) offer a theoretical foundation for learning general nonlinear transformations. The Kolmogorov–Arnold theorem assures that any continuous multivariate function can be decomposed as a sum of univariate functions, affording a universal approximation mechanism. Embedding a lightweight, expressive interaction function into GCF presents a principled method to reintroduce nonlinearity without the overparameterization or optimization instability associated with standard feedforward MLPs.

2. Fourier KAN Interaction Features

The distinctive component of FourierKAN-GCF is the use of a Fourier-parameterized KAN (“Fourier KAN”) to realize the nonlinear interaction transform in message passing. In contrast to spline-parameterized KANs (e.g., employing B-splines), the Fourier KAN implements each univariate function as a finite Fourier series: $\phi_F(x) = \sum_{i=1}^d \sum_{k=1}^g \bigl(a_{ik} \cos(k x_i) + b_{ik} \sin(k x_i)\bigr)$ where $x \in \mathbb{R}^d$ is the element-wise multiplication of the user and item embeddings ( $e_i \odot e_u$ ), $g$ is the grid size (number of harmonics), and $\{a_{ik}, b_{ik}\}$ are trainable parameters. This parameterization achieves universal approximation through trigonometric series while maintaining parameter efficiency and stable optimization behavior. The original MLP-based $W_2(e_i \odot e_u)$ is thus replaced by $\phi_F(e_i \odot e_u)$ (Xu et al., 2024).

3. Layer-wise Propagation and Architecture

The message passing procedure in FourierKAN-GCF modifies the NGCF/LightGCN paradigm by:

Discarding the self-feature kernel ( $W_1$ ).
Retaining only the interaction transform, instantiated as a single-layer Fourier KAN.
Aggregating neighbor and interaction information with symmetric normalization.
Applying a single nonlinearity per layer (commonly ReLU).

Formally, for user $u$ and item $i$ at layer $l$ , the updates are: $e_u^{(l+1)} = \sigma\!\Bigl( e_u^{(l)} + \sum_{i\in\mathcal N_u} \frac{ e_i^{(l)} + \phi_F(e_i^{(l)} \odot e_u^{(l)}) }{\sqrt{|\mathcal N_u|\,|\mathcal N_i|}} \Bigr)$

$e_i^{(l+1)} = \sigma\!\Bigl( e_i^{(l)} + \sum_{u\in\mathcal N_i} \frac{ e_u^{(l)} + \phi_F(e_u^{(l)} \odot e_i^{(l)}) }{\sqrt{|\mathcal N_u|\,|\mathcal N_i|}} \Bigr)$

where $\sigma(\cdot)$ denotes the activation function. After $L$ layers, the model concatenates the respective embeddings from each layer to generate the final user and item representations.

4. Regularization Strategies and Optimization

Robustness and generalization are enhanced through message dropout and node dropout:

Message dropout: Each message in the neighbor aggregation is zeroed with probability $1-p_m$ , introducing stochasticity and preventing co-adaptation of edge-wise signals.
Node dropout: Embedding vectors for a node are masked entirely with probability $1-p_n$ prior to propagation, which fortifies the model against overfitting and adversarial adjacency perturbations.

Training utilizes the Bayesian Personalized Ranking (BPR) loss typical for implicit feedback settings: $\mathcal L = -\sum_{(u,i_p,i_n)\in\mathcal D} \ln\sigma(\hat e_u^\top\hat e_{i_p} - \hat e_u^\top\hat e_{i_n}) + \lambda\|\Theta\|_2^2$ where $\hat e_v$ concatenates the layer-wise embeddings and $\Theta$ comprises all leaf embedding and Fourier parameters. Optimization is performed with Adam, and regularization on Fourier coefficients (either via $\ell_2$ penalty or normalization) is critical for preventing the dominance of high-frequency components.

The per-layer computational cost is $O(g d |\mathcal E|)$ , typically less than MLP-based versions when $g \ll d$ , and memory requirements scale as $O(d g)$ for Fourier coefficients.

5. Empirical Evaluation and Comparative Results

Empirical studies on real-world interaction datasets with high sparsity (MOOC, Amazon Video Games) and established benchmarks (Recall@K, NDCG@K) demonstrate that FourierKAN-GCF outperforms strong baselines, including BPR-MF, NGCF, LightGCN, UltraGCN, and KAN-GCF. On both MOOC and Amazon Games, FourierKAN-GCF achieves the highest Recall@20 and NDCG@20. Ablation experiments confirm the benefit of both message and node dropout (+1–2% Recall@20) and indicate a consistent advantage for Fourier-based parameterization over splines (Xu et al., 2024).

Model	MOOC R@20	MOOC N@20	Games R@20	Games N@20
BPR-MF	0.3353	0.1898	0.0369	0.0183
NGCF	0.3361	0.1894	0.0379	0.0196
LightGCN	0.3307	0.1811	0.0447	0.0227
UltraGCN	0.3194	0.1962	0.0459	0.0230
FourierKAN-GCF	0.3564	0.2147	0.0473	0.0252

The model's hyperparameters demonstrate smooth, robust dependency curves, and the grid size $g$ controls expressivity (with risk of overfitting if oversized on sparse graphs).

6. Connections to Spectral and Filter-Based GCF

From a graph signal processing perspective, collaborative filtering on bipartite user–item graphs is interpreted as spectral filtering, where various methods correspond to different spectral kernels: diffusion transforms, low-pass filters, or polynomial approximations. Linear, closed-form baselines such as GF-CF and SpectralCF use spectral convolution, typically employing scalar filters of the Laplacian eigenvalues without spatial localization or nonlinear interaction modeling (Shen et al., 2021, Alshareet et al., 2023).

FourierKAN-GCF, in contrast, reintroduces nonlinearity at the interaction level via a universal Fourier-based feature transform, while the propagation scheme remains linear. A plausible implication is that FourierKAN-GCF bridges the divide between the interpretability/efficiency of spectral convolutional CF models and the interaction expressivity of neural message passing architectures (Xu et al., 2024).

7. Extensions and Practical Implications

Practical recommendations include initializing Fourier coefficients with small random perturbations, applying normalization/regularization to avoid high-frequency explosion, and preferentially reducing $g$ on extremely sparse graphs. The current instantiation uses a single Fourier layer per message; hierarchical or multi-layered Fourier KANs, as well as adaptive frequency selection strategies, remain promising future directions (Xu et al., 2024).

The FourierKAN-GCF paradigm can be extended to multi-channel or anisotropic filtering by stacking or summing feature transforms across various Laplacians (e.g., user–item, knowledge graphs, metadata co-occurrence), each with a domain-specific spectral kernel. Optimization of spectral coefficients can be conducted end-to-end or in a modular fashion, providing a pathway to interpretable, parameter-efficient hybrid recommendation systems (Shen et al., 2021).