Learnable Shifting Filters
- Learnable shifting filters are adaptive linear operators that adjust spatial supports and coefficients through end-to-end or meta-optimization.
- They reduce complex convolutional filter spaces into a compact universal kernel set, retaining over 99% ImageNet accuracy with fewer parameters.
- This approach unifies CNN and GNN methodologies, enhancing model interpretability, transferability, and efficiency across image and graph domains.
Learnable shifting filters are parameterized classes of linear operators in neural architectures whose spatial supports and coefficients are not fixed a priori, but are adapted either directly through the primary loss function (as in end-to-end training) or through an explicit meta-optimization, often taking the form of affine, shifted, or metric-based modifications of a small universal set of core kernels. This class unifies advances in both graph neural networks, where metric filters are learned via data-dependent diffusion distances, and depthwise separable convolutional networks, where thousands of spatial filters are reduced to parameterized shifts of a handful of universal templates. Learnable shifting filters thus enable highly adaptive receptive fields and efficient parameter sharing, and facilitate greater interpretability and transferability across domains.
1. Formal Definitions and Canonical Instances
In DS-CNNs, learnable shifting filters take the form:
where is a learned spatial filter, is one of eight master key filters, are learnable (possibly non-integer) spatial shifts, and is a per-filter bias term. The master key filters (for ) are closed-form spatial primitives: four central-difference operators (horizontal, vertical, diagonal, anti-diagonal), two first derivatives of Gaussians (in and ), one Difference-of-Gaussians (DoG), and one isotropic Gaussian (Babaiee et al., 15 Sep 2025):
| Kernel Type | Mathematical Formulation | Representative |
|---|---|---|
| Central Difference | etc. | to |
| Gaussian Derivative | , | |
| DoG | ||
| Gaussian |
In graph neural networks, the analogous objects are the structural “jumps” constructed from learned diffusion distances and adaptive supports (Begga et al., 2023):
with defined by sorted diffusion distances and data-driven radii .
2. Principle of Shift-Based Parameterization
Learnable shifting filters exploit the observation that the trained spatial filters in certain architectures—especially those with extreme parameter sharing, such as depthwise separable CNNs—can be effectively described as affine (linear plus bias) transforms and spatially shifted versions of a small universal kernel bank. This is formalized in (Babaiee et al., 15 Sep 2025) as the “master key filters hypothesis,” and operationalized through an unsupervised codebook discovery algorithm. For any filter , its approximation requires only:
- An index into the universal filter bank;
- Real-valued shifts ;
- An additive bias ;
- (Optionally) a global scale (absorbed by pointwise weights in practice).
This structure enables the drastic reduction of learned spatial filters without sacrificing accuracy—DS-CNNs with only eight frozen spatial kernels plus learnable shifts and biases retain of ImageNet accuracy compared to fully trainable models (Babaiee et al., 15 Sep 2025).
In the context of graphs, shifting occurs over the metric structure implicitly learned by the data-driven diffusion embedding , mapping nodes to and defining filter supports and weights via distances in the learned space (Begga et al., 2023).
3. Learning Methodologies
In DS-CNNs
The construction and validation of learnable shifting filters in DS-CNNs employs a three-stage process (Babaiee et al., 15 Sep 2025):
- Filter Matrix Collection and Normalization: Flatten trained depthwise kernels, zero-center, and normalize.
- Codebook Learning: Train a 1D autoencoder to represent all filters in a compressed code; decode a large set of candidate filters.
- Linear Approximation and Backward Elimination: For each filter, solve for optimal scaling and bias to fit each candidate, assigning the best match. Iteratively prune the candidate set while tracking validation accuracy, revealing an “elbow” at eight essential kernels.
In Diffusion-Jump GNNs
The learnable metric filter bank is established via the following pipeline (Begga et al., 2023):
- Fiedler Environment Learning: Learn , satisfying , by minimizing the trace-ratio Dirichlet loss .
- Diffusion Distance Computation: Build .
- Jump Definition: For each scale , compute support sets via sorted diffusion distances and projection matrices via differentiable top- selection.
- Structural Filter Application: Form , apply to features, and aggregate via nonlinear transformations.
- End-to-End Joint Optimization: Backpropagate joint Dirichlet and classification loss through all parameters, so both metric structure and the supports/weights of each are updated.
4. Empirical Results and Applications
Empirical evidence demonstrates the sufficiency of a small, universal set of master key filters augmented with shifts and biases for high-accuracy image classification in DS-CNNs (Babaiee et al., 15 Sep 2025). On ImageNet, models trained with only eight frozen universal kernels equal or surpass the accuracy of fully trainable architectures with thousands of filter parameters. These filters generalize across architecture variants (ConvNeXtv2, HorNet) and transfer without modification to diverse datasets, outperforming conventional fine-tuned transfers in low-data regimes.
In graph domains, learnable metric filters in Diffusion-Jump GNNs are capable of dynamically discovering long-range structural dependencies, ideal for the heterophilic regime where classical hop-based aggregation fails. The architecture, via end-to-end optimization, reshapes both the learned metric and the filtering “jumps” to restore homophily in latent space, improving classification quality in graphs where class boundaries cut across local structures (Begga et al., 2023).
5. Interpretability, Model Compression, and Theoretical Implications
The universal eight master key filters in DS-CNNs (central differences, DoGs, and Gaussian derivatives) have strong correspondence to classical image processing operators and to receptive field structures measured in biological vision systems. This suggests that even unconstrained modern architectures naturally converge to the same efficient basis as found in scale-space theory and biological evolution.
Learnable shifting filters support large-scale parameter compression and architectural interpretability: thousands of convolutional kernels can be replaced with eight universal primitives plus spatial shifts, substantially reducing the model's parameter count and offering clearer semantic grounding for each learned operation. Similar parameter reduction and transparency can be achieved in graph settings, as each “jump” filter has explicitly defined supports and metric-induced weights.
A plausible implication is that enforcing a shift-based parameterization or pre-baking master key filters into architecture design can yield highly efficient, robust, and interpretable models without performance degradation—facilitating deployment in resource-constrained or explainable machine learning settings.
6. Connections to Transfer Learning and Biological Vision
The eight universal spatial operators identified via the learnable shifting filter approach are directly transferable across architectures and visual recognition tasks, supporting strong out-of-distribution generalization and transfer learning. The close match between DS-CNN-learned master key filters and known retinal/V1 receptive fields (Hubel & Wiesel 1968; Young 1987; Jones & Palmer 1987) underlines a convergence between artificial and biological visual computation principles (Babaiee et al., 15 Sep 2025).
In graphs, the learnable metric filters enable semi-supervised models to reconstruct piecewise smooth class manifolds in latent space even under severe label–structure misalignment, reflecting the power of end-to-end adaptive metric learning (Begga et al., 2023). This highlights the unifying principle: constraining filter learning to shifted, parameterized transformations of a universal set not only streamlines model design but also ties deep learning systems more tightly to efficient, interpretable, and biologically plausible signal processing foundations.