Frequency-Adaptive Modulation (AdaFM)
- AdaFM is a modulation method that adaptively defines low- and high-frequency processing in neural networks based on input or task requirements.
- It employs learnable, interpolated filters across CNNs, graphs, and implicit neural representations to adjust restoration and feature extraction dynamically.
- Empirical evaluations demonstrate improved PSNR, SSIM, and efficiency while overcoming rigid fixed-filter limitations in various image and signal tasks.
Frequency-Adaptive Modulation (AdaFM) is a class of learnable mechanisms that dynamically adjust the spectral processing characteristics within neural architectures, allowing models to modulate representations, restoration strengths, or reconstruction spectra in direct response to input content or task requirements. AdaFM methods have found application across image restoration, visual recognition, frequency-domain modeling, and implicit neural representations, offering a principled remedy for the rigidity of fixed filters and fixed restoration levels. Contemporary implementations span channel-wise feature recalibration, adaptive graph attention, position-dependent spectral gains, and locally parameterized band-pass filters.
1. Core Principles and Mathematical Frameworks
At its foundation, Frequency-Adaptive Modulation introduces learnable, input- or position-dependent filters or gains that govern the relative contributions of low- and high-frequency components throughout the network pipeline, either in feature space, the Fourier domain, or over graph representations.
Classic AdaFM in CNNs
The original AdaFM layer, as presented in image restoration tasks, is a channel-wise feature modulation layer, typically realized as a depth-wise convolution plus bias immediately after convolutions but before nonlinearities. The AdaFM operation for channel and input is:
where is a small (, , or ) depthwise kernel and the bias. For task modulation, AdaFM weights for “start” () and “end” () degradation levels are denoted 0 and 1. Adaptive modulation is achieved by interpolating with coefficient 2:
3
This gives continuous, artifact-free control over restoration levels (He et al., 2019).
Spatially Adaptive Frequency Filtering
In implicit neural representations (INRs), AdaFM manifests as a spatially varying modulation field 4 that parameterizes a learnable band-pass filter for Fourier-encoded features:
5
where 6 is a differentiable band-pass function (e.g., sigmoid difference) centered on 7 with bandwidth 8, applied over Fourier feature channel 9. The result is a per-position, continuous transition between low-pass, band-pass, and high-pass behaviors, making the model adaptive to local signal structure (Shi et al., 3 Apr 2026).
2. Implementation Strategies Across Domains
AdaFM principles have been implemented in several architectural forms, each tuned to its domain.
| Domain / Task | AdaFM Mechanism | Key Implementation Details |
|---|---|---|
| Image Restoration (CNN) | Depthwise conv + bias, interpolated | Layer inserted after every conv in residual blocks; only AdaFM params finetuned for task transition |
| Visual Recognition (Graph) | Channel-wise low/high graph attention | Per-feature-channel gating blends low- and high-frequency graph attention logits, normalized and aggregated |
| Raw Image Deblurring | Local position-adaptive frequency kernels | Patch-based kernels/biases generated from normalized 2D distance, applied to frequency subbands (AFPM) |
| All-in-one Restoration | Mask-guided frequency band separation | Adaptive binary masks split spectrum; band features mined and bidirectionally modulated across attention |
| INRs | Spatial α-grid modulates Fourier input | Learnable scalar grid 0 interpolated per location, filtering Fourier features pre-MLP |
AdaFM-Net for Continual Image Restoration
The AdaFM-Net workflow starts with a single-level convolutional restoration network, inserts AdaFM layers in residual blocks, and trains in two phases: base model training on level 1, then freezing convolutions and finetuning AdaFM on level 2. Test-time modulation leverages 3 interpolation, with mapping 4 calibrated to maximize task-specific PSNR (He et al., 2019).
Adaptive Frequency Modulation in Graphs (AFM)
Within hierarchical graph architectures, AdaFM is realized via an adaptive channel-wise gate 5 (from a small MLP) modulating between graph attention logits biased for low- or high-frequency propagation. Feature aggregation employs the adaptively combined attention matrix, and AFM is applied both intra-window (local) and inter-window (global) for enhanced representation (Zhao et al., 15 Aug 2025).
Adaptive Frequency Positional Modulation (AFPM)
In RAW image deblurring, AFPM modules generate per-patch, position-dependent modulation kernels and biases (6, 7) as direct functions of the patch’s normalized spectral position, via a compact MLP (“Kernel-Bias Generator”). These are applied to frequency domain features for fine-grained, spectrum-aware gain control, merged with a global context branch and fused before return to the spatial domain. This patch-wise adaptivity is crucial for addressing heterogeneous blur (Jiao et al., 30 May 2025).
3. Restoration, Recognition, and Reconstruction Impact
AdaFM enables models to adapt restoration and representation capacity to the frequency characteristics of input data and task demands. The impact is empirically verified across numerous benchmarks:
Restoration Quality—AdaFM-Net
- Super-resolution (×3→×4 on Set5): Baseline (×4) PSNR 32.13 dB; AdaFM-Net 32.00 dB (gap 0.13 dB).
- Denoising (σ=15→75 on CBSD68): Baseline 26.49 dB; AdaFM-Net 26.35 dB (gap 0.14 dB).
- DeJPEG (q80→q10 on LIVE1): Baseline 29.55 dB; AdaFM-Net 29.35 dB (gap 0.20 dB).
- Deviation across intermediate levels: ≤0.2 dB from single-level nets, matching nearly oracle-level quality. Competing techniques such as AdaBN and Conditional IN suffer from large artifacts and poor PSNR (He et al., 2019).
Recognition and Segmentation—HGFE (AFM)
- CIFAR-100 (YOLOv12 baseline vs. HGFE+AFM): Top-1 rises from 57.1% to 58.2%.
- PASCAL VOC detection ([email protected]): 84.5→85.2 (+0.7).
- VisDrone detection ([email protected]): 49.5→50.7 (+1.2).
- CrackSeg segmentation ([email protected]): 67.5→68.4 (+0.9).
- Ablation studies show full AFM (dual graph + frequency-adaptive) gives the highest gains (Zhao et al., 15 Aug 2025).
RAW Deblurring—FrENet (AFPM)
- Deblur-RAW: FrENet achieves 44.73 dB PSNR/0.993 SSIM at 2.22 G MACs, outperforming LoFormer-L (44.04 dB at 8.98 G MACs) by 0.69 dB, and achieving higher efficiency.
- Ablation: Removing AFPM or using global-only or local-only modulation notably degrades performance (Jiao et al., 30 May 2025).
All-in-One Image Restoration—AdaIR
- Multi-degradation (denoise/dehaze/derain): AdaIR PSNR 32.69 dB vs PromptIR 32.06 dB; deraining 38.64 dB (+2.27) over PromptIR.
- Combination of adaptive spectra splitting and bidirectional band modulation yields consistent, sizeable gains. Mask-guided splitting outperforms spatial pooling and Gaussian filtering (Cui et al., 2024).
Implicit Neural Representations—Local Frequency Filtering
- 2D Image Fitting: Ours-Sine variant achieves PSNR 46.27, SSIM 0.9938, LPIPS 0.0024 (well above DINER, SIREN, PE-MLP).
- 3D SDF (IoU): Ours 0.983 versus SIREN 0.980, BACON 0.976 (Shi et al., 3 Apr 2026).
4. Theoretical Analysis and Design Insights
Frequency-Adaptive Modulation alters the underlying kernel spectrum of neural architectures, especially evident in signal representation pipelines.
Kernel Perspective—Neural Tangent Kernels
Classically, Fourier-encoded MLPs induce stationary NTKs with dominant low-frequency response. The adaptive filter parameterized by 8 shapes the local effective eigenvalue at frequency scale 9:
0
This directly augments the learning dynamics—regions with high 1 allocate more capacity to fine-scale details, while smooth areas retain a bias toward low frequencies, improving both learning convergence and fidelity (Shi et al., 3 Apr 2026).
Band Decoupling and Bidirectional Modulation
In AdaIR, image-specific learnable spectral masks are used to separate features into low- and high-frequency bands, which are then mined and modulated via cross-attention and bidirectional (high→low, low→high) attention mechanisms. Empirical study confirms that combining both directions outperforms single-path attention, and that mask-guided partitioning of the spectrum is superior to handcrafted or pooling-based splits (Cui et al., 2024).
Feature Control and Modulation Direction
- Restoration adaptation: Best accuracy is consistently obtained when AdaFM modules adapt from “easy” to “hard” levels, e.g., from lower to higher noise or super-resolution factors. For large restoration ranges, sub-interval modulation is recommended (He et al., 2019).
- Spatial/frequency selectivity: Position-dependent and graph-aware modulation enables simultaneous global context capture and local edge preservation (Zhao et al., 15 Aug 2025, Jiao et al., 30 May 2025).
5. Practical Considerations and Computational Overhead
AdaFM approaches are designed for efficiency:
- CNN AdaFM layers: 0.1–3.7% extra parameters (e.g., 5×5 kernels in 16-block ResNet).
- HGFE AFM modules: 2 for intra-window; quadratic in number of windows for inter-window (mitigated by window size).
- FrENet AFPM: Efficient due to patch-based MLP modulation and frequency domain processing.
- AdaIR: 28.8 M parameters, 147.5 G FLOPs—favorable compared to alternatives at similar or better PSNR.
Key deployment advice includes tuning filter size to task (e.g., 3 or 4 for SR, 5 for denoising/DeJPEG) and aligning modulation direction and range to expected usage scenarios (He et al., 2019, Zhao et al., 15 Aug 2025, Jiao et al., 30 May 2025).
6. Limitations, Variants, and Prospective Extensions
While AdaFM and its variants exhibit robust performance and broad applicability, several limitations and open directions are identified:
- Spectral granularity vs memory: For spatially varying filters (INRs), fine 6 grids increase memory/run-time interpolation cost, but coarse grids risk over-smoothing (Shi et al., 3 Apr 2026).
- Computational scaling: Inter-window graph AFM can be costly at high spatial resolutions unless the window granularity is relaxed (Zhao et al., 15 Aug 2025).
- Hyperparameter sensitivity: Filter bandwidth, grid size, and kernel sizes typically require tuning to balance denoising/super-resolution power and computational resources.
- Extendability: Incorporation into transformer self-attention, multi-band/multistage design, and applications to irregular domains (e.g., meshes, point clouds) are promising directions (Zhao et al., 15 Aug 2025, Shi et al., 3 Apr 2026).
Potential improvements include adaptive grid allocation, dynamic hyperparameter search, multiband (beyond binary low/high) filtering, and explicit temporal extensions for video or spatiotemporal signals. The visualization of learned 7 or adaptive spectral masks also provides insight into model adaptation to nonstationary signals and may inform further research into explainability and robust frequency-selective representations (Shi et al., 3 Apr 2026, Cui et al., 2024).
References:
- "Modulating Image Restoration with Continual Levels via Adaptive Feature Modification Layers" (He et al., 2019)
- "Hierarchical Graph Feature Enhancement with Adaptive Frequency Modulation for Visual Recognition" (Zhao et al., 15 Aug 2025)
- "Efficient RAW Image Deblurring with Adaptive Frequency Modulation" (Jiao et al., 30 May 2025)
- "AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation" (Cui et al., 2024)
- "Adaptive Local Frequency Filtering for Fourier-Encoded Implicit Neural Representations" (Shi et al., 3 Apr 2026)