Continuous-Filter Convolution

Updated 3 March 2026

Continuous-filter convolution is a neural network design that uses continuous, parameterized filters instead of fixed, grid-based kernels, providing adaptable receptive fields.
It enables operations on unstructured data, such as point clouds and molecular graphs, by employing MLPs, basis expansions, or Gaussian derivatives for filter generation.
The approach improves efficiency, interpolation smoothness, and parameter savings, with demonstrated success in quantum chemistry, image processing, and scientific simulations.

A continuous-filter convolutional architecture generalizes classical convolutional neural network (CNN) design by replacing or augmenting the standard discrete convolutional kernel—with its fixed, grid-based support and finite set of scalar weights—with a parameterized, spatially continuous filter. This paradigm enables convolutional operators to natively process signals on unstructured domains, operate at sub-grid spatial precision, adapt filter support or receptive field size as learnable model parameters, and interpolate network behavior smoothly across tasks or levels. Continuous-filter architectures have driven progress in quantum chemistry, scientific machine learning, image processing, and model efficiency, and subsume classical convolution as a special case.

1. Mathematical Foundations of Continuous-Filter Convolution

Continuous-filter convolution is defined by replacing the discrete kernel in standard convolution with a continuous function. For an input signal $I$ defined on discrete or continuous points, and a continuous filter function $\mathcal{K}_\theta: \mathbb{R}^n \rightarrow \mathbb{R}^{C_{\text{out}} \times C_{\text{in}}}$ , the output at location $s$ is given by

$Y(s) = \sum_{i=1}^{C_{\text{in}}} \int_{\mathbb{R}^n} \mathcal{K}_\theta(s-t) \, X^i(t) \, dt.$

Depending on the architecture, the continuous filter may be realized analytically (e.g., as a sum of Gaussian derivatives (Tomen et al., 2024)), parameterized as a basis expansion (e.g., cosine series (Costain et al., 2022)), represented as an MLP over spatial coordinates (Coscia et al., 2022), or constructed via a filter-generating network conditioned on inter-entity distances (e.g., in graphs or point clouds (Schütt et al., 2017)). In grid-based scenarios, the continuous formulation reduces to the classical convolution by restricting the kernel and evaluation domain to discrete locations.

The following table summarizes representative parameterizations:

Approach	Filter Representation	Key Formula/Parameterization
MLP-based (Coscia et al., 2022)	$\mathcal{K}_\theta(u) \approx f_\theta(u)$ (MLP)	$h^{(i+1)} = \sigma(W^{(i)} h^{(i)} + b^{(i)})$
Basis expansion (Costain et al., 2022)	Cosine/Chebyshev: $\widehat{w}(x,y) = \sum a_{ij}\,\varphi_i(x)\varphi_j(y)$	$a_{ij}$ coefficients, $\varphi$ basis (cos, Chebyshev)
Gaussian N-jet (Tomen et al., 2024)	$F_{\alpha}^{ji}(x, y; \sigma) = \sum_{l+k\leq N} \alpha_{l,k}^{ji} G^{(l,k)}(x,y;\sigma)$	$\sigma$ is learned, $G^{(l,k)}$ are derivatives of Gaussian
Filter network (Schütt et al., 2017)	Condition on $r_i - r_j$ or $d_{ij}$	$W^l(r_i - r_j)$ via MLP and RBF expansion over distance

2. Filter Learning, Adaptation, and Interpolation

Continuous-filter architectures enable advanced paradigms for filter adaptation and interpolation. The Filter Transition Network (FTN) (Lee et al., 2020, Lee et al., 2020) enables continuous-level learning: FTN morphs filters between discrete task-specific endpoints (e.g., denoising at $\sigma=20$ to $\sigma=80$ ) via a parameter $\lambda \in [0,1]$ using

$W(\lambda) = (1-\lambda) W_b + \lambda \, \text{FTN}(W_b),$

where $W_b$ is the baseline kernel and $\text{FTN}(W_b)$ is the transformed kernel for the target level. This supports smooth, interpretable network steering—empirically yielding superior interpolation and adaptation compared to direct weight interpolation or feature-space augmentation.

FTN architectures typically use identity-initialized, grouped 1x1 convolutions and pointwise nonlinearities, enforcing strong regularization on filter transitions to ensure faithful, artifact-free interpolations at negligible compute cost. A two-stage training protocol—(1) train baseline, (2) freeze baseline and train FTN—ensures stability, with architectural grouping controlling the trade-off between adaptation capacity and filter similarity (Lee et al., 2020, Lee et al., 2020).

3. Architectures for Unstructured and Continuous Domains

Continuous-filter convolutions extend convolutional operations to unstructured domains such as point clouds, molecular graphs, and FEM meshes. In SchNet (Schütt et al., 2017), continuous filters parameterized by MLPs over interatomic distances replace discrete grid-based filters, enabling rotation-, translation-, and permutation-invariant operator construction directly on molecular systems. The forward pass updates atom-wise representations by aggregating over all neighbor atoms using a filter $W^l(r_i - r_j)$ evaluated at continuous spatial offsets.

Similarly, the CCNN model (Coscia et al., 2022) applies a learned MLP-filter to data sampled at arbitrary point locations, producing outputs via sums over neighborhoods and directly supporting tasks defined on irregular spatial domains (e.g., scientific simulation snapshots, point clouds). No gridding or interpolation is required; integration and neighbor-search respect mesh structure.

CGNN (Alberti et al., 2022) further generalizes this paradigm to operator learning in function spaces: layers inhabit wavelet decomposition spaces, and convolutions are operations between functions with continuous support, enabling injective, stable generative models for inverse problems in infinite-dimensional spaces.

4. Practical Implementation and Computational Considerations

Implementing continuous-filter convolutional layers introduces trade-offs:

Kernel evaluation: The filter is generated per sub-pixel offset, often via an MLP or basis expansion; lookup and evaluation can be more expensive than fixed grids but can be vectorized efficiently (Shocher et al., 2020, Coscia et al., 2022).
Integration/domain truncation: Integrations over space are approximated by summing over local neighborhoods within the filter's compact support or by projection onto a scaling basis (e.g., wavelets (Alberti et al., 2022)).
Parameter efficiency: Basis expansions (e.g., Cosine, N-jet) yield significant parameter reductions; e.g., ApproxConv achieves $\approx$ 50% parameter savings at $<2\%$ accuracy loss by regressing filters into a low-order basis and fine-tuning (Costain et al., 2022). DCN achieves comparable accuracy to ResNet/ODENet with 40–50% fewer parameters using 7 trainable parameters per filter (Tomen et al., 2024).
Adaptive receptive field: DCNs enable learnable spatial scale (e.g., by learning $\sigma$ in the Gaussian N-jet expansion)—shown to match distributional properties observed in biological vision (Tomen et al., 2024).

Efficiency gains, compatibility with quantization and pruning, and the possibility for end-to-end differentiability across variable sampling grids make these approaches attractive for scalable models and deployment (Costain et al., 2022, Shocher et al., 2020).

5. Applications and Empirical Outcomes

Continuous-filter architectures have realized state-of-the-art results or significant efficiency gains in multiple domains:

Quantum chemistry/physics: SchNet delivers accurate, physically consistent modeling of molecular energies and forces, with energy MAE ≈ 0.31 kcal/mol on QM9, and <0.12 kcal/mol on MD17 (Schütt et al., 2017). CGNN ensures injectivity and stability for continuum inverse problems (Alberti et al., 2022).
Image processing and computer vision: FTN-based CLL networks achieve higher interpolation smoothness and flexibility versus feature-space-tuned or discretely interpolated models, with <0.1% GFLOP overhead and negligible accuracy drop (Lee et al., 2020, Lee et al., 2020). CC layers support sub-pixel, anisotropic, and fractional scaling at inference, improving alignment and equivariance (Shocher et al., 2020).
Compression and efficiency: ApproxConv demonstrates that regressing discrete filters into a low-order basis with subsequent fine-tuning preserves accuracy while halving parameter count (Costain et al., 2022).
Scientific machine learning: CCNN and CGNN architectures support learning from data on unstructured grids, outperforming pure MLP models on PDE solution snapshots and inpainting with high accuracy (Coscia et al., 2022, Alberti et al., 2022).
Biologically-inspired modeling: DCN matches biological observation of filter scale distributions and outperforms conventional architectures on pattern completion and low-data settings (Tomen et al., 2024).

6. Comparisons, Regularization, and Architectural Choices

Empirical ablations highlight the importance of regularization and parameterization:

FTN grouping: Increased grouping yields smoother interpolation but less flexibility at task extremes (Lee et al., 2020, Lee et al., 2020).
Weight regularity: Basis expansions enforce smoothness and enable global regression of filters (Costain et al., 2022, Tomen et al., 2024).
Identity initialization: Ensures stable filter transitions in FTN and continuous-level networks, avoiding catastrophic changes on endpoints (Lee et al., 2020, Lee et al., 2020).
Data-independence: Data-agnostic FTN modules provide efficiency and modularity, as overhead depends on filter dimension, not input size (Lee et al., 2020).

Side effects observed in naive filter or feature interpolation (e.g., color drift, artifacts) are substantially mitigated in methods that interpolate in the filter parameter space with strong architectural constraints.

7. Limitations and Open Directions

Computationally, continuous-filter convolutions can impose overheads in kernel evaluation and neighbor search, especially on large unstructured domains. Efficient implementation often requires batch evaluation, strided or grouped processing, and caching strategies (Coscia et al., 2022). Hyperparameter tuning for MLP size or basis order is necessary to balance expressiveness and regularization. For CCNNs, per-stride neighbor search can be prohibitive without spatial data structures.

Extending these models to higher dimensions, increasing their interpretability for scientific domains, and coupling with novel architectures (e.g., continuous-depth ODE solvers, functional priors) remain active research areas. Further study into the learning dynamics of scale parameters, especially in meta-parameterized or biological analogs, is also ongoing (Tomen et al., 2024).

References are to (Schütt et al., 2017, Lee et al., 2020, Lee et al., 2020, Costain et al., 2022, Shocher et al., 2020, Tomen et al., 2024, Coscia et al., 2022), and (Alberti et al., 2022).