Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learnable Spacing & Interpolation (DCLS)

Updated 15 April 2026
  • Learnable Spacing and Interpolation (DCLS) is a neural technique that parameterizes non-zero convolutional tap positions as continuous, learnable variables.
  • It employs differentiable interpolation methods, like bilinear and Gaussian kernels, to optimize tap positions via gradient descent for adaptive receptive fields.
  • Empirical results show that DCLS boosts performance and interpretability in vision, audio, and spiking neural networks, while modestly increasing computational cost.

Learnable Spacing and Interpolation (DCLS) refers to a family of neural architectural techniques whereby the positions of the non-zero elements (“taps”) in a convolutional or delay kernel are parameterized as continuous, learnable variables rather than fixed, integer positions on a regular grid. By leveraging a differentiable interpolation operator, these positions can be optimized via gradient descent, enabling the network to adapt receptive fields, temporal alignments, or context aggregation to the task and data distribution. DCLS was initially introduced in vision and later extended to audio, spiking neural networks, and various deep learning modalities, where it has shown consistent improvements over standard and classical dilated convolutions across a range of supervised learning benchmarks (Khalfaoui-Hassani et al., 2021, Khalfaoui-Hassani et al., 2023, Khalfaoui-Hassani, 2024, Khalfaoui-Hassani et al., 2023, Chamas et al., 2024, Hammouamri et al., 2023).

1. Mathematical Formulation and Core Principles

For a standard dd-dimensional dilated convolution, the kernel samples the input at fixed, integer-offset positions determined by a dilation factor. DCLS generalizes this by introducing a set of learnable (real-valued) offsets for each kernel tap. In 2D, a DCLS kernel with mm nonzero elements is parameterized by learnable weights {wk}\{w_k\} and real-valued positions {(pkx,pky)}\{(p^x_k, p^y_k)\}: Y(x,y)=k=1mwkX(x+pkx,y+pky)Y(x, y) = \sum_{k=1}^m w_k \cdot X\big(x + p^x_k,\, y + p^y_k\big) For non-integer positions (pkx,pky)(p^x_k, p^y_k), DCLS employs differentiable interpolation schemes to sample XX (e.g., bilinear, triangle, or Gaussian kernels). These interpolations guarantee that gradients with respect to both kernel weights and positional parameters are well-defined, ensuring effective end-to-end training (Khalfaoui-Hassani et al., 2021, Khalfaoui-Hassani et al., 2023, Khalfaoui-Hassani, 2024).

The n-dimensional generalization is immediate: Each kernel tap is parameterized by an nn-vector of continuous offsets, and task-adaptive sampling is performed via an appropriate, typically separable, interpolation kernel.

2. Interpolation Schemes and Learnability

DCLS achieves differentiability with respect to position parameters by using interpolation kernels with continuous derivatives. Two primary classes are used:

  • Bilinear (triangle) interpolation: Each tap contributes mass to the four nearest grid points, weighted by the area of overlap. The interpolation function is Λ(x)=max(0,1x)\Lambda(x) = \max(0,\, 1 - |x|), yielding the classic bilinear 2×2 stencil.
  • Gaussian interpolation: Each tap spreads its mass over an entire neighborhood according to a normalized Gaussian kernel Gσ(x)=exp(x2/2σ2)G_\sigma(x) = \exp(-x^2 / 2\sigma^2). The spread mm0 can itself be a learnable parameter or scheduled during training.

The forward map for a single “impulse” mm1 at mm2 with scale mm3 is given by: mm4 where mm5 is either the triangle or the Gaussian kernel, and normalization ensures the total mass equals mm6 (Khalfaoui-Hassani et al., 2023).

This flexible interpolation provides smooth, informative gradients for weights and positions, enabling arbitrary positioning of kernel taps and, for Gaussian, modulation of their local receptive field extent.

3. Implementation and Optimization

The canonical DCLS implementation is a two-step process:

  1. Kernel Construction: For each convolutional group or channel, synthesize a sparse, large spatial kernel of size mm7 by “splat-and-sum” of the learned weighted impulses via the chosen interpolant. The process can be vectorized over batches, channels, and kernel count for memory and compute efficiency.
  2. Convolution: Apply any standard convolution routine to perform the actual filtering with the synthesized kernel over the input feature map.

Gradient propagation is automatic under standard deep learning frameworks since the interpolation operators are differentiable. Careful scheduling of learning rates—typically increasing those for position variables by a factor ≈5 relative to weights—yields stable convergence. No weight decay is applied to position or mm8 parameters, only to weights (Khalfaoui-Hassani, 2024, Khalfaoui-Hassani et al., 2021).

Position and scale parameters are initialized uniformly or near a regular grid, with optional clamping at each step to respect valid kernel support. In practice, position sharing across layers or repulsive regularization to avoid tap overlap can further stabilize training (Khalfaoui-Hassani et al., 2023, Khalfaoui-Hassani et al., 2021).

Pseudocode for kernel assembly is provided below: {wk}\{w_k\}8 For the 1D case, used in temporal processing and SNNs, a Gaussian bump is constructed as a kernel tap with learnable delay; in the limit mm9, this approaches a delta-function at integer delay (Hammouamri et al., 2023).

4. Integration Across Domains

DCLS is a drop-in replacement for classical 2D/1D convolutions and dilated convolutions in deep architectures:

5. Empirical Evaluation and Performance

Experiments across vision, audio, and spiking benchmarks uniformly demonstrate that DCLS-based models outperform or match their fixed-grid counterparts at iso-parameter count, with modest throughput reductions due to larger effective receptive fields. Key results:

Task/Model Baseline DCLS Δ
ConvNeXt-T ImageNet top-1 82.1% 82.5% (m=34,s=17) +0.4
ConvNeXt-B ImageNet top-1 83.8% 84.1% (m=34,s=17) +0.3
ADE20K Segmentation mIoU 46.0 – 49.1 47.1 – 49.3 +1.1
SHD (SNN, 10-class) 94.62% 95.07% ± 0.24 +0.45
AudioSet mAP (ConvNeXt-T) 44.83% 45.52% +0.7
Speech Command (SNN) 77.4% 80.7% +3.3

Throughput reductions are minor for depthwise-separable settings (e.g., ConvNeXt-DCLS 6% throughput drop), and the parameter overhead is negligible (per-kernel: 2D positions, optional {wk}\{w_k\}3). Ablations confirm that DCLS’s gain is not replicated by “just” more layers or isolated learnable delays; the key is interpolated, flexible context adaptation (Hammouamri et al., 2023, Khalfaoui-Hassani et al., 2023, Khalfaoui-Hassani et al., 2023, Chamas et al., 2024, Khalfaoui-Hassani, 2024).

6. Interpretability and Alignment

Recent Grad-CAM studies show that models equipped with DCLS not only outperform in accuracy but also exhibit increased interpretability as measured by alignment with human attention heatmaps (ClickMe): Spearman correlations for DCLS-augmented ConvNeXt models are 4–5 points higher; ResNet50 increases from 0.6135 to 0.6252 (standard Grad-CAM) and 0.7125 to 0.7261 (Threshold-Grad-CAM). Seven of eight tested architectures saw improvements, with only specialized kernel reparametrizations (FastViT_sa24) showing slight degradation (Chamas et al., 2024). This suggests DCLS adaptively concentrates model attention on task-relevant spatial regions.

7. Limitations, Extensions, and Future Work

The main limitations stem from increased FLOPs and memory when the dilated-kernel size {wk}\{w_k\}4 is large, and from the use of separable interpolation (triangle or Gaussian), which may not fully exploit non-axis-aligned patterns. Gains are typically saturated at moderate kernel sizes (e.g., {wk}\{w_k\}5 achieves ≥95% of possible improvement); much larger {wk}\{w_k\}6 yields diminishing returns (Khalfaoui-Hassani, 2024).

Current implementations lack sparse matrix optimization and efficient custom CUDA kernels for extremely large or sparse DCLS kernels, and DCLS integration in multi-dimensional (3D, video) settings is underexplored. Further directions include the study of adaptive {wk}\{w_k\}7 learning versus scheduled decay, dynamic kernel count, integration into local-attention modules, quantization for neuromorphic deployment, and hardware-aware DCLS variants (Khalfaoui-Hassani et al., 2023, Khalfaoui-Hassani, 2024, Hammouamri et al., 2023, Khalfaoui-Hassani et al., 2023).


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Learnable Spacing and Interpolation (DCLS).