Low-High Frequency Interaction Block

Updated 11 June 2026

LHFIB is an architectural module that decouples, processes, and reintegrates low- and high-frequency information to improve image restoration and super-resolution.
It employs frequency decomposition methods such as Haar DWT, average pooling, and Laplacian pyramids for specialized feature extraction.
Fusion techniques like cross-attention, residual addition, and adaptive gating enable precise recovery of semantic structure and fine details.

The Low-High Frequency Interaction Block (LHFIB) is an architectural module designed to explicitly decouple, process, and re-integrate low- and high-frequency information in deep neural networks for image restoration and super-resolution tasks. LHFIBs enable networks to leverage the distinct properties of frequency components, which is crucial for accurate recovery of both semantic structure and fine detail in degraded or low-resolution inputs. Originating in state-of-the-art models such as ML-CrAIST and HLNet, LHFIBs are characterized by explicit frequency decomposition, specialized processing branches for each frequency, and interaction modules—often involving cross-attention—to fuse multi-scale representations for enhanced performance in challenging visual tasks (Pramanick et al., 2024, Chen et al., 2024).

1. Frequency Decomposition and Representation

LHFIBs begin by decomposing the input, typically a tensor $I \in \mathbb{R}^{H \times W \times C}$ (where $C = 3$ for RGB images), into low-frequency and high-frequency components. Distinct decomposition strategies have been employed:

Discrete Wavelet Transform (DWT): In ML-CrAIST, a single-level 2D Haar-DWT yields four sub-bands: $LL$ (low-low), $LH$ (low-high), $HL$ (high-low), and $HH$ (high-high), each of size $(H/2) \times (W/2) \times 1$ . The $LL$ sub-band encodes coarse, background information (low frequency), while the $LH$ , $HL$ , and $C = 3$ 0 bands capture increasingly localized and directional high-frequency details (Pramanick et al., 2024).
Average Pooling and Subtraction: HLNet/HLFDB applies average pooling with a stride and kernel size $C = 3$ 1 (often $C = 3$ 2), then upsamples the pooled tensor and subtracts it from the input to isolate the high-frequency residue. Mathematically:

$C = 3$ 3

This scheme efficiently separates structure from detail with minimal artifacts (Chen et al., 2024).

Laplacian Pyramid: Other works, such as LFINet, employ Gaussian blurring and downsampling to obtain a global semantic base, then recover detail by “blur-and-subtract” at multiple scales, enhancing the focus on frequency-specific content in deeper stages (Chen et al., 4 May 2026).

2. Specialized Branches for Frequency-specific Processing

After decomposition, LHFIBs process each frequency component via dedicated sub-networks tailored to their content distribution:

Low-frequency Branches implement deep modules or stacked transformer blocks (e.g., SCATB, channel-wise transformers) aimed at modeling global context, smooth backgrounds, and semantic coherence. In ML-CrAIST, the $C = 3$ 4 band is forwarded through multiple SCATB units, which preserve spatial resolution and enhance semantic consistency (Pramanick et al., 2024). HLNet employs three-level downsampling and transformer encoding to capture long-range dependencies (Chen et al., 2024).
High-frequency Branches apply shallow CNNs, attention-based fusion, or multi-scale convolutions to refine textures, enhance edges, and restore high-frequency signal lost in subsampling or degradation. ML-CrAIST uses a sequence of $C = 3$ 5 convolutions followed by depth-wise $C = 3$ 6 convolutions, whereas HLNet constructs a densely connected CNN block, and LFINet applies a high-frequency block with explicit grouping and channel weighting (Chen et al., 4 May 2026).

3. Frequency Interaction and Fusion Mechanisms

A central feature is the mechanism by which low- and high-frequency information interact and are fused:

Cross-Attention Blocks (CAB): ML-CrAIST computes cross-attention between spatially and channel-aligned representations of low- and high-frequency features. The CAB structure uses a pointwise ( $C = 3$ 7) convolution followed by depth-wise ( $C = 3$ 8) convolutions for projection, forms queries, keys, and values, and then applies channel-wise attention before projecting back to the original feature space:

$C = 3$ 9

$LL$ 0

$LL$ 1

This process models inter-frequency correspondence at each location and channel (Pramanick et al., 2024).

Element-wise Residual Fusion: In HLNet, after processing, outputs from both branches are linearly summed and residually added to the input:

$LL$ 2

This ensures that each specialization learns only refined corrections to the base features (Chen et al., 2024).

Adaptive Fusion via Gating: LFINet incorporates frequency gated modulation, where outputs of the high-frequency branch and a spatial transformer-encoded low-frequency feature are adaptively weighted via global average pooling, softmax normalization, and channel-aware importance weights to yield the final fused feature for reconstruction (Chen et al., 4 May 2026).

4. Multi-Scale and Cascaded Design

LHFIBs are typically instantiated at multiple scales and cascaded to enable hierarchical integration of frequency information:

Multi-scale Cascading: In ML-CrAIST, the output low-frequency sub-band $LL$ 3 from the first LHFIB is recursively decomposed and processed by a second LHFIB at half-resolution, yielding multi-scale frequency features. These are upsampled (e.g., via bicubic interpolation) and fused at higher resolutions through cross-attention modules for scale-consistent recovery (Pramanick et al., 2024).
Progressive Decoder Integration: LFINet propagates refined features across scales, feeding outputs from each CFIB into a progressive reconstruction decoder, thus maintaining continuity and coherence from global semantic content to fine local detail at full resolution (Chen et al., 4 May 2026).
Wavelet-based Multi-scale Fusion: HLNet utilizes wavelet fusion at coarser scales to reintegrate global features with detailed subbands, leveraging both spatial and frequency hierarchies (Chen et al., 2024).

Model	Frequency Decomposition	Multi-scale Processing	Fusion Mechanism
ML-CrAIST	Haar DWT	Cascaded LHFIBs	Cross-attention (CAB)
HLNet	AvgPool+Subtraction	Multi-level Downscale	Residual + Addition
LFINet	Laplacian Pyramid	Multi-scale CFIBs	Gated Adaptive Fusion

5. Empirical Evaluation and Architectural Impact

Ablation studies and benchmark evaluations demonstrate that LHFIBs substantially improve restoration and reconstruction quality:

In ML-CrAIST, exclusion of LHFIBs causes drops of $LL$ 4– $LL$ 5 dB PSNR on canonical super-resolution datasets (Set5, Set14, B100, Urban100, Manga109), confirming that explicit frequency interaction is critical for state-of-the-art performance. The full ML-CrAIST pipeline (including LHFIBs) surpasses OmniSR by up to $LL$ 6 dB (Manga109 ×3) and $LL$ 7 dB (Manga109 ×4) (Pramanick et al., 2024).
HLNet reports that removing HLFDB (or processing both frequency bands through a single shared branch) leads to 0.69–1 dB losses in PSNR on the NTIRE bracketed imaging restoration track. The explicit average-pooling-based split outperforms pure wavelet separation by $LL$ 8 dB, confirming the architectural choices (Chen et al., 2024).
In LFINet, CFIB-enabled fusion achieves state-of-the-art F1-score and IoU in thematic rural road extraction, highlighting the broad applicability of this paradigm for detail-sensitive visual reasoning (Chen et al., 4 May 2026).

6. Applications and Generalizations

The LHFIB principle has been validated across several computer vision domains:

Image Super-Resolution: Direct modeling of frequency interactions, especially where high-frequency regions require semantically-aware guidance for reconstruction, as in ML-CrAIST (Pramanick et al., 2024).
Image Restoration and Enhancement: Context-sensitive recovery from diverse degradations by handling each frequency range with a tailored module, as in HLNet (Chen et al., 2024).
Structured Feature Extraction in Remote Sensing and Trajectory Analysis: Multi-scale CFIB modules in LFINet demonstrate improvements in spatial topology extraction and noise resilience (Chen et al., 4 May 2026).

A plausible implication is that the LHFIB concept forms a unifying framework for frequency-domain-aware neural architectures, complementing and often outperforming traditional purely spatial pipelines when detailed spectral reasoning is vital.

7. Implementation and Algorithmic Summaries

Below is the typical workflow for a single LHFIB forward pass as constructed in ML-CrAIST:

$LH$ 2 (Pramanick et al., 2024)

The general workflow is decomposed as: frequency separation $LL$ 9 frequency-specialized feature extraction $LH$ 0 explicit interaction $LH$ 1 multi-scale integration and output. This modular design enables instantiation in arbitrary architectures, provided the frequency split and fusion mechanisms are appropriately adapted to the domain. The effectiveness of LHFIBs is empirically confirmed across restoration, super-resolution, and structural extraction tasks.