FAMamba: Frequency-Aware Mamba Architecture
- Frequency-Aware Mamba (FAMamba) is an architectural enhancement that integrates frequency-domain transforms with state-space models to balance global and local feature extraction.
- FAMamba employs dual-path designs with dedicated modules for low-frequency and high-frequency processing using Fourier, wavelet, and learnable gating techniques.
- It demonstrates state-of-the-art performance across applications such as time series classification, image/video restoration, and speech enhancement by mitigating spectral bias.
Frequency-Aware Mamba (FAMamba) refers to a class of architectural enhancements and design principles that augment the state-space sequence modeling capabilities of the Mamba backbone by integrating explicit frequency-domain or time-frequency-aware feature representations. Across the literature, FAMamba architectures are constructed to mitigate Mamba’s spectral bias toward low-frequency content and to explicitly fuse information from spatial, frequency, and sometimes semantic modalities. These models leverage frequency-aware modules at varying levels: continuous/discrete Fourier or wavelet transforms, learnable frequency selection gates, dual-path or hybrid processing, and frequency-tuned scanning mechanisms. FAMamba has demonstrated state-of-the-art or highly competitive results across multivariate time series classification, multi-modal fusion, image/video restoration, medical imaging, super-resolution, 3D generative modeling, speech enhancement, and sequential recommendation.
1. Principles and Motivation for Frequency-Aware Mamba
Canonical Mamba architectures implement a continuous or discretized state-space model (SSM), providing efficient, linear-complexity global sequence modeling. In vision and sequential domains, however, this approach introduces an inherent low-pass filtering effect due to the underlying convolutional kernel structure and the serialization of spatial features, which suppresses high-frequency detail and attenuates local structures. Empirical analysis and spectral studies (e.g., TinyViM (Ma et al., 26 Nov 2024), FaRMamba (Rong et al., 26 Jul 2025)) have established that vanilla Mamba blocks primarily capture low-frequency or global information, leading to suboptimal performance for tasks requiring fine texture or structural detail.
Frequency-Aware Mamba designs counter this limitation by decoupling feature processing into frequency bands via explicit spectral transformations—Continuous Wavelet Transform (CWT), Discrete Wavelet Transform (DWT), Laplacian pyramid, Fourier or cosine transforms—and routing low- and high-frequency components through specialized branches (e.g., Mamba SSMs for low-frequency, CNNs for high-frequency regions). This paradigm systematically enhances representational expressivity, enables joint modeling of global and local (or periodic and aperiodic) structures, and allows more parameter-efficient, task-tuned feature fusion (Ahamed et al., 6 Jun 2024, Ma et al., 26 Nov 2024, Sun et al., 10 Nov 2025, Rong et al., 26 Jul 2025).
2. Core Methodological Patterns
FAMamba architectures instantiate frequency awareness via the following core patterns, traced across domains:
- Explicit Frequency Transform Modules
- Continuous Wavelet Transform (Morlet, Daubechies) for time-frequency analysis in time series and medical imaging (Ahamed et al., 6 Jun 2024, Rong et al., 26 Jul 2025).
- DWT, Laplacian pyramid, FFT, DCT for multiresolution image decompositions in restoration, segmentation, fusion, and classification (Zhen et al., 15 Apr 2024, Wang et al., 1 Jul 2025, Ma et al., 26 Nov 2024).
- Learnable frequency selection modules using depthwise or 1×1 convolutions on FFT spectra, enabling data-driven spectral gating (Xiao et al., 8 May 2024, Sun et al., 10 Nov 2025, Xu et al., 17 Jun 2025).
- Low- and High-Frequency Pathways
- Parallel Mamba (SSM-based) modules on downsampled low-frequency tokens for efficient global modeling, with CNN- or attention-based refinement modules on high-frequency details (Wang et al., 1 Jul 2025, Ma et al., 26 Nov 2024, Xu et al., 17 Jun 2025).
- Dual-branch or dual-path designs, often with fusion via learnable gates, affine transforms, or attention (Zhang et al., 7 May 2025, Pan et al., 3 Dec 2025, Xiao et al., 8 May 2024).
- Hybrid or Modular Fusion
- Elementwise, attention, or channel-wise gating to dynamically balance spectral and spatial/global and local/semantic and frequency-driven features (Ahamed et al., 6 Jun 2024, Sun et al., 10 Nov 2025, Zhang et al., 7 May 2025, Xu et al., 17 Jun 2025).
- Multi-scale or multi-band decomposition, where subbands are processed or scanned with domain-adaptive SSMs (Zhen et al., 15 Apr 2024, Pan et al., 3 Dec 2025, Liu et al., 17 Mar 2025).
- Spectral-Aware Scanning and Attention
- Frequency-adaptive scanning mechanisms that tailor the SSM scan topology to frequency subgraphs, e.g., horizontal/vertical scans on low-frequency bands, diagonal scans on high-frequency details (Pan et al., 3 Dec 2025).
- Hybrid Mamba-attention modules (time-frequency multi-head attention) for learning joint dependencies in signal enhancement (Kühne et al., 1 Jul 2025).
3. Detailed Architectural Instantiations
Below is a comparative table of representative FAMamba instantiations across domains:
| Domain/task | Frequency-aware module(s) | Low/high-freq routing & fusion |
|---|---|---|
| Time series classification | CWT (Morlet), fusion MLPs, SSM | CWT spectral, local/global temporal, Mamba, gating, concatenation (Ahamed et al., 6 Jun 2024) |
| Image restoration (dehazing, deraining, weather) | Laplacian, DWT, FFT, frequency-prior blocks | Mamba SSM on low-freq, CNN/attn on high-freq, explicit fusion (Wang et al., 1 Jul 2025, Pan et al., 3 Dec 2025, Zhen et al., 15 Apr 2024) |
| Vision backbone (TinyViM) | Laplacian mixer | Mamba on downsampled low-freq, Conv3x3 on high-freq, frequency ramp (Ma et al., 26 Nov 2024) |
| Medical segmentation | DWT/FFT/DCT multi-scale, region-guided SSRAE | Shared encoder, frequency-enhanced, self-supervised spatial decoder (Rong et al., 26 Jul 2025) |
| Super-resolution | Learnable frequency selection (FFT-based) | Parallel VSSM (spatial SSM), frequency MLP, hybrid gating (Xiao et al., 8 May 2024, Xu et al., 17 Jun 2025) |
| Speech enhancement | Shared time–frequency multi-head attn | Interleaved with bidirectional Mamba layers (Kühne et al., 1 Jul 2025) |
| 3D Point cloud diffusion | Time-variant freq-encoder (graph Laplacian HPF) | Dual latent Mamba on serializations, affine fusion (Liu et al., 17 Mar 2025) |
| Sequential recommendation | FFT filter, frequency band block | Bandpass-specific Mamba per frequency band, adaptive gate with LLM embeddings (Zhang et al., 7 May 2025) |
| Multi-modal image fusion | Fourier-based freq block in Mamba | SFMB with spatial, channel, and Fourier-domain paths; dynamic fusion (Sun et al., 10 Nov 2025) |
| Video demoireing | FFT compressor block in Mamba | Spatial and temporal Mamba + Adaptive Frequency Block, Channel Attn Block (Xu et al., 20 Aug 2024) |
Methods such as Laplace-Mamba (Wang et al., 1 Jul 2025), TinyViM (Ma et al., 26 Nov 2024), and FaRMamba (Rong et al., 26 Jul 2025) universally route only the downsampled (“compressed”) low-frequency content into the SSM/Mamba for global context modeling, while dedicating convolutional or attention-based modules to high-frequency cues—capitalizing on Mamba’s strengths and mitigating spectral bias.
4. Fusion, Training Losses, and Optimization Regimes
All FAMamba models employ explicit fusion strategies and spectral- or amplitude-/phase-aware loss functions:
- Fusion mechanisms: Elementwise or channel-wise soft-gating, learned affine or convex combinations, concatenation after gating, or frequency-adaptive reweighting ensure robust information integration.
- Losses: Composite objectives incorporating spatial L₁/L₂ norm, frequency-domain discrepancy (e.g., amplitude or phase differences in Fourier/wavelet space), perceptual losses (VGG-based), and application-specific adversarial or SSIM terms are standard (Zhen et al., 15 Apr 2024, Pan et al., 3 Dec 2025, Wang et al., 1 Jul 2025, Sun et al., 10 Nov 2025, Kühne et al., 1 Jul 2025).
- Training/optimization: All FAMamba models maintain the linear (or near-linear) computational complexity of Mamba in sequence length or token count, even with multi-path or frequency-augmented branches. Hyperparameter tuning includes layer/block expansion, frequency band count, and fusion weights.
5. Empirical Results and Performance Analysis
FAMamba-based architectures record consistent or state-of-the-art performance across a range of domains:
- Multivariate time series: TSCMamba surpasses TimesNet and TSLANet by 4.01–7.93% absolute accuracy, with an 80.05% mean accuracy on UEA datasets (Ahamed et al., 6 Jun 2024).
- Vision (classification, segmentation, super-resolution): TinyViM achieves higher ImageNet top-1 accuracy and throughput versus comparable transformer/convolution baselines with up to 2–3× efficiency improvement (Ma et al., 26 Nov 2024). FMSR improves PSNR by +0.11 dB while using 19–28% of prior memory/flops (Xiao et al., 8 May 2024).
- Medical and video restoration: FaRMamba elevates Dice scores by up to 2% over UMamba and beats nnU-Net and UKAN in segmentation (Rong et al., 26 Jul 2025); DemMamba improves raw video demoireing PSNR by +1.3 dB and halves runtime vs. prior art (Xu et al., 20 Aug 2024).
- Speech enhancement: MambAttention outperforms Conformer, xLSTM-MHA, and pure Mamba across out-of-domain benchmarks (DNS 2020, EARS-WHAM); weight-shared time–frequency modules improve generalization (Kühne et al., 1 Jul 2025).
- 3D point generation: TFDM achieves superior COV/EMD metrics at up to 10–9× efficiency over DiT-3D (Liu et al., 17 Mar 2025).
- Sequential recommendation: FAMamba (M²Rec) yields +3.2% HR@10 vs. Mamba4Rec and 20% faster inference than Transformer baselines (Zhang et al., 7 May 2025).
- Ablation studies universally demonstrate that removal of frequency-aware modules results in measurable decline in performance and impoverished high-frequency recovery or boundary accuracy.
6. Extensions, Open Challenges, and Future Directions
Current FAMamba research identifies several open directions and limitations:
- Resolution trade-offs: Discrete spectral splits (e.g. DWT downsampling) can restrict spatial resolution and ultra-fine detail unless multi-level or learnable band decompositions are attempted (Pan et al., 3 Dec 2025, Wang et al., 1 Jul 2025).
- Generalization: Context-dependent adaptations, such as multi-modality fusion (LLMs, semantic text, multi-modal medical imaging) and adaptive fusion/gating mechanisms, remain an active area.
- Learnable frequency transforms: Many models employ static, handcrafted filters (Laplacian, wavelet, graph Laplacian) rather than jointly optimized or data-driven spectral decompositions (Liu et al., 17 Mar 2025, Rong et al., 26 Jul 2025).
- Dynamic frequency routing: Adaptive schemes, such as frequency ramping or time-variant encoding, are effective yet typically fixed by schedule—future work suggests schedulable or neural-predicted routing (Ma et al., 26 Nov 2024, Liu et al., 17 Mar 2025).
- Computational efficiency: Most models match or exceed transformer/CNN baselines in FLOPs/throughput, but real-time hardware deployment and quantization for embedded systems, especially in multi-branch settings, remain underexplored (Pan et al., 3 Dec 2025).
Application domains predicted for further FAMamba impact include remote sensing, video restoration, multi-modal biomedical imaging, and high-throughput large-scale generative modeling. Generalization to video, spatiotemporal, and dynamic frequency hierarchies (multi-level bands, learnable splits) is a suggested trajectory (Pan et al., 3 Dec 2025, Rong et al., 26 Jul 2025, Xiao et al., 8 May 2024).
7. Representative Advantages and Theoretical Insights
The architectural design choices in FAMamba produce several key advantages, which are supported by the cited works:
- Spectrally non-redundant processing: By assigning global, low-frequency cues to Mamba and local, high-frequency (edge, texture) cues to convolution or attention branches, redundancy is minimized and computational budget focuses on the most relevant subspace (Ma et al., 26 Nov 2024, Wang et al., 1 Jul 2025, Xu et al., 17 Jun 2025).
- Enhanced generalization and robustness: Frequency-aware fusion (via explicit spectral priors, per-band SSMs, or wavelet/FFT-informed fusion) empirically yields improvements in robustness to shift, inversion, noise, and distributional variation, supporting claims of superior generalization in time series and speech domains (Ahamed et al., 6 Jun 2024, Kühne et al., 1 Jul 2025).
- Linear scaling in model complexity: FAMamba modules preserve Mamba's linear complexity by downsampling low-frequency tokens and separable processing of subbands, matching or exceeding state-of-the-art efficiency (Ma et al., 26 Nov 2024, Xiao et al., 8 May 2024, Sun et al., 10 Nov 2025).
Overall, Frequency-Aware Mamba models represent a principled integration of multi-resolution spectral analysis with linear-run-time state-space sequence modeling, substantiating their empirical superiority across diverse signal, vision, sequential, and generative modeling challenges.