Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learnable Shifting Filters

Updated 3 December 2025
  • Learnable shifting filters are adaptive linear operators that adjust spatial supports and coefficients through end-to-end or meta-optimization.
  • They reduce complex convolutional filter spaces into a compact universal kernel set, retaining over 99% ImageNet accuracy with fewer parameters.
  • This approach unifies CNN and GNN methodologies, enhancing model interpretability, transferability, and efficiency across image and graph domains.

Learnable shifting filters are parameterized classes of linear operators in neural architectures whose spatial supports and coefficients are not fixed a priori, but are adapted either directly through the primary loss function (as in end-to-end training) or through an explicit meta-optimization, often taking the form of affine, shifted, or metric-based modifications of a small universal set of core kernels. This class unifies advances in both graph neural networks, where metric filters are learned via data-dependent diffusion distances, and depthwise separable convolutional networks, where thousands of spatial filters are reduced to parameterized shifts of a handful of universal templates. Learnable shifting filters thus enable highly adaptive receptive fields and efficient parameter sharing, and facilitate greater interpretability and transferability across domains.

1. Formal Definitions and Canonical Instances

In DS-CNNs, learnable shifting filters take the form:

fi(x,y)Uk(i)(xΔxi,yΔyi)+bi,f_i(x, y) \approx U_{k(i)}(x - \Delta x_i, y - \Delta y_i) + b_i,

where fif_i is a learned 7×77 \times 7 spatial filter, Uk(i)U_{k(i)} is one of eight master key filters, (Δxi,Δyi)(\Delta x_i, \Delta y_i) are learnable (possibly non-integer) spatial shifts, and bib_i is a per-filter bias term. The master key filters UjU_j (for j=1,,8j = 1,\dots,8) are closed-form spatial primitives: four central-difference operators (horizontal, vertical, diagonal, anti-diagonal), two first derivatives of Gaussians (in xx and yy), one Difference-of-Gaussians (DoG), and one isotropic Gaussian (Babaiee et al., 15 Sep 2025):

Kernel Type Mathematical Formulation Representative UjU_j
Central Difference 12[δ(x±1,y)]\frac{1}{2}[\delta(x\pm1, y)] etc. U1U_1 to U4U_4
Gaussian Derivative xσ2ex2+y22σ2-\frac{x}{\sigma^2} e^{-\frac{x^2+y^2}{2\sigma^2}} U5U_5, U6U_6
DoG G(x,y;σ1)G(x,y;σ2)G(x, y; \sigma_1) - G(x, y; \sigma_2) U7U_7
Gaussian ex2+y22σ2e^{-\frac{x^2+y^2}{2\sigma^2}} U8U_8

In graph neural networks, the analogous objects are the structural “jumps” JkJ^k constructed from learned diffusion distances DijD_{ij} and adaptive supports Sk(i)S_k(i) (Begga et al., 2023):

Jijk={exp(Dij)if jSk(i) 0otherwiseJ^k_{ij} = \begin{cases} \exp(-D_{ij}) & \text{if } j \in S_k(i) \ 0 & \text{otherwise} \end{cases}

with Sk(i)={j:rk1<d(i,j)rk}S_k(i) = \{j: r_{k-1} < d(i,j) \leq r_k\} defined by sorted diffusion distances and data-driven radii rkr_k.

2. Principle of Shift-Based Parameterization

Learnable shifting filters exploit the observation that the trained spatial filters in certain architectures—especially those with extreme parameter sharing, such as depthwise separable CNNs—can be effectively described as affine (linear plus bias) transforms and spatially shifted versions of a small universal kernel bank. This is formalized in (Babaiee et al., 15 Sep 2025) as the “master key filters hypothesis,” and operationalized through an unsupervised codebook discovery algorithm. For any filter fif_i, its approximation requires only:

  • An index k(i)k(i) into the universal filter bank;
  • Real-valued shifts (Δxi,Δyi)(\Delta x_i, \Delta y_i);
  • An additive bias bib_i;
  • (Optionally) a global scale aia_i (absorbed by pointwise weights in practice).

This structure enables the drastic reduction of learned spatial filters without sacrificing accuracy—DS-CNNs with only eight frozen spatial kernels plus learnable shifts and biases retain >99%>99\% of ImageNet accuracy compared to fully trainable models (Babaiee et al., 15 Sep 2025).

In the context of graphs, shifting occurs over the metric structure implicitly learned by the data-driven diffusion embedding UU, mapping nodes to Rp\mathbb{R}^p and defining filter supports and weights via distances in the learned space (Begga et al., 2023).

3. Learning Methodologies

In DS-CNNs

The construction and validation of learnable shifting filters in DS-CNNs employs a three-stage process (Babaiee et al., 15 Sep 2025):

  1. Filter Matrix Collection and Normalization: Flatten trained 7×77 \times 7 depthwise kernels, zero-center, and normalize.
  2. Codebook Learning: Train a 1D autoencoder to represent all filters in a compressed code; decode a large set of candidate filters.
  3. Linear Approximation and Backward Elimination: For each filter, solve for optimal scaling and bias to fit each candidate, assigning the best match. Iteratively prune the candidate set while tracking validation accuracy, revealing an “elbow” at eight essential kernels.

In Diffusion-Jump GNNs

The learnable metric filter bank is established via the following pipeline (Begga et al., 2023):

  1. Fiedler Environment Learning: Learn U=fθ(A)U = f_\theta(A), satisfying UU=IU^\top U = I, by minimizing the trace-ratio Dirichlet loss LD(U)=Tr(UΔU)/Tr(UDU)L_D(U) = \operatorname{Tr}(U^\top \Delta U)/\operatorname{Tr}(U^\top D U).
  2. Diffusion Distance Computation: Build Dij=Ui:Uj:2D_{ij} = \|U_{i:} - U_{j:}\|_2.
  3. Jump Definition: For each scale kk, compute support sets Sk(i)S_k(i) via sorted diffusion distances and projection matrices Πk\Pi^k via differentiable top-kk selection.
  4. Structural Filter Application: Form Jk=Πkexp(D)J^k = \Pi^k \circ \exp(-D), apply to features, and aggregate via nonlinear transformations.
  5. End-to-End Joint Optimization: Backpropagate joint Dirichlet and classification loss through all parameters, so both metric structure UU and the supports/weights of each JkJ^k are updated.

4. Empirical Results and Applications

Empirical evidence demonstrates the sufficiency of a small, universal set of master key filters augmented with shifts and biases for high-accuracy image classification in DS-CNNs (Babaiee et al., 15 Sep 2025). On ImageNet, models trained with only eight frozen universal kernels equal or surpass the accuracy of fully trainable architectures with thousands of filter parameters. These filters generalize across architecture variants (ConvNeXtv2, HorNet) and transfer without modification to diverse datasets, outperforming conventional fine-tuned transfers in low-data regimes.

In graph domains, learnable metric filters in Diffusion-Jump GNNs are capable of dynamically discovering long-range structural dependencies, ideal for the heterophilic regime where classical hop-based aggregation fails. The architecture, via end-to-end optimization, reshapes both the learned metric and the filtering “jumps” to restore homophily in latent space, improving classification quality in graphs where class boundaries cut across local structures (Begga et al., 2023).

5. Interpretability, Model Compression, and Theoretical Implications

The universal eight master key filters in DS-CNNs (central differences, DoGs, and Gaussian derivatives) have strong correspondence to classical image processing operators and to receptive field structures measured in biological vision systems. This suggests that even unconstrained modern architectures naturally converge to the same efficient basis as found in scale-space theory and biological evolution.

Learnable shifting filters support large-scale parameter compression and architectural interpretability: thousands of convolutional kernels can be replaced with eight universal primitives plus spatial shifts, substantially reducing the model's parameter count and offering clearer semantic grounding for each learned operation. Similar parameter reduction and transparency can be achieved in graph settings, as each “jump” filter has explicitly defined supports and metric-induced weights.

A plausible implication is that enforcing a shift-based parameterization or pre-baking master key filters into architecture design can yield highly efficient, robust, and interpretable models without performance degradation—facilitating deployment in resource-constrained or explainable machine learning settings.

6. Connections to Transfer Learning and Biological Vision

The eight universal spatial operators identified via the learnable shifting filter approach are directly transferable across architectures and visual recognition tasks, supporting strong out-of-distribution generalization and transfer learning. The close match between DS-CNN-learned master key filters and known retinal/V1 receptive fields (Hubel & Wiesel 1968; Young 1987; Jones & Palmer 1987) underlines a convergence between artificial and biological visual computation principles (Babaiee et al., 15 Sep 2025).

In graphs, the learnable metric filters enable semi-supervised models to reconstruct piecewise smooth class manifolds in latent space even under severe label–structure misalignment, reflecting the power of end-to-end adaptive metric learning (Begga et al., 2023). This highlights the unifying principle: constraining filter learning to shifted, parameterized transformations of a universal set not only streamlines model design but also ties deep learning systems more tightly to efficient, interpretable, and biologically plausible signal processing foundations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Learnable Shifting Filters.