HBIS: Heatmap-driven Information Synergy Module

Updated 31 December 2025

The paper introduces HBIS, a module that refines class embeddings and spatial features via bidirectional updates using class-specific heatmaps.
HBIS employs dynamic feature-to-class and class-to-feature interactions with top-K selection and adaptive gating to improve boundary precision and semantic discrimination.
Empirical results show that HBIS boosts segmentation quality by reducing inter-class confusion and enhancing interpretability on high-resolution remote sensing datasets.

The Heatmap-driven Bidirectional Information Synergy Module (HBIS) is a neural network architectural component designed for high-resolution remote sensing semantic segmentation, specifically introduced in the BiCoR-Seg framework. HBIS establishes a dynamic two-way interaction between class embeddings and spatial feature maps, leveraging class-specific spatial heatmaps to enhance semantic discrimination, sharpen boundaries, and improve the interpretability of deep models operating in scenarios marked by high inter-class similarity and substantial intra-class variability (Shi et al., 23 Dec 2025).

1. Core Mechanisms and Structure

HBIS operates at each decoder stage within the segmentation network. Its main inputs consist of: (a) a feature map $F_{l-1} \in \mathbb{R}^{H'\times W'\times C_f}$ from the previous layer, and (b) a set of $N$ class embeddings $\{CE_{l-1,n}\}^N_{n=1}$ with each $CE_{l-1,n} \in \mathbb{R}^{C_c}$ . Each HBIS module executes the following sequence:

Class-Heatmap Generation: Each class embedding is projected into the feature space to create a query vector $q_{l-1,n}=W_qCE_{l-1,n}$ , where $W_q\in\mathbb{R}^{C_f\times C_c}$ . The similarity between $q_{l-1,n}$ and each spatial feature vector $F_{l-1}(x,y)$ is measured via dot product, then passed through a sigmoid to form class confidence heatmaps $H_{l,n}(x,y)$ .
F2CE (Feature-to-Class Embedding Update): For each class, the locations with top $K\%$ response in $H_{l,n}$ are selected, forming the index set $\Omega_{l,n}$ . The features at these locations are projected with $W_v$ into the class embedding space, aggregated via weighted sum (normalized heatmap values) into a new context vector $C_{l,n}$ . This is then blended with the previous embedding $CE_{l-1,n}$ by a gate $G_{l,n}$ computed as a sigmoid over their concatenation, yielding the updated $CE_{l,n}=(1-G_{l,n})CE_{l-1,n} + G_{l,n}C_{l,n}$ .
CE2F (Class Embedding-to-Feature Modulation): Each refined class embedding $CE_{l,n}$ generates affine modulation parameters $\gamma_n=1+\tanh(W_\gamma CE_{l,n})$ and $\beta_n=W_\beta CE_{l,n}$ , applied to the feature map. These class-specific modulated features $\tilde{F}_{l,n}$ are integrated back into a single feature map $F_l$ via softmax-normalized heatmaps and a learnable residual coefficient $\alpha$ .

This process is iterated, with bidirectional updates ensuring mutual refinement of spatial and semantic representations.

2. Detailed Computational Pipeline

The computational steps in a single HBIS layer ( $l$ ) are as follows:

Query Construction:

$q_{l-1,n} = W_q CE_{l-1,n}$

Heatmap Computation:

$H_{l,n}(x,y) = \sigma(F_{l-1}(x,y) \cdot q_{l-1,n})$

Top-K Feature Selection: The top $K\%$ indices $\Omega_{l,n}$ of $H_{l,n}$ are chosen.
Heatmap Normalization:

$\tilde{H}_{l,n}(x,y) = \begin{cases} \frac{H_{l,n}(x,y)}{\sum_{(u,v)\in\Omega_{l,n}} H_{l,n}(u,v)+\varepsilon}, & (x,y)\in\Omega_{l,n}\ 0, & \text{otherwise} \end{cases}$

Aggregated Context Vector:

$C_{l,n} = \sum_{(x,y)\in\Omega_{l,n}} \tilde H_{l,n}(x,y) (W_v F_{l-1}(x,y))$

Gated Fusion:

$G_{l,n} = \sigma(W_g[CE_{l-1,n}\Vert C_{l,n}])$

$CE_{l,n} = (1-G_{l,n})CE_{l-1,n} + G_{l,n}C_{l,n}$

Semantic Modulation:

$\gamma_n = 1+\tanh(W_{\gamma} CE_{l,n}), \quad \beta_n = W_\beta CE_{l,n}$

$\tilde F_{l,n}(x,y) = \gamma_n \odot F_{l-1}(x,y) + \beta_n$

Heatmap Softmax-normalization:

$H^{\mathit{soft}}_{l,n}(x,y) = \frac{\exp(H_{l,n}(x,y))}{\sum_{k=1}^N\exp(H_{l,k}(x,y))}$

Final Feature Update:

$F_l(x,y) = \alpha F_{l-1}(x,y) + (1-\alpha)\sum_{n=1}^N H^{\mathit{soft}}_{l,n}(x,y)\tilde F_{l,n}(x,y)$

All projection weights ( $W_q$ , $W_v$ , $W_g$ , $W_\gamma$ , $W_\beta$ ) are $1\times1$ convolutions or linear layers without bias. The default values in implementation are $C_f=C_c=512$ , $L=2$ , $K=2\%$ .

3. Bidirectional Information Flow and Stabilization

The synergy in HBIS arises from explicit bidirectional links:

F2CE (feature to class embedding): Pools spatially localized features with strong class evidence, producing context-aware class embeddings.
CE2F (class embedding to feature): Class-specific semantic knowledge is injected into the pixel-level feature map via affine channel-wise modulation, influencing spatial features based on the refined, context-dependent class semantics.

Stabilization mechanisms include a residual weight $\alpha$ (learnable, initialized as 1) and adaptive gating via $G_{l,n}$ . These act to regularize updates, preventing abrupt overwriting of either features or semantic representations and supporting convergence.

4. Hierarchical Supervision and Losses

HBIS employs multi-scale hierarchical supervision to encourage discriminative capability even in early network stages:

Heatmap Deep Supervision: At each decoder stage, the intermediate heatmap $H_l$ (after upsampling) is compared to the ground-truth segmentation via pixel-wise cross-entropy and Dice loss:

$\mathcal{L}_{\mathrm{HM}} = \sum_{l=1}^L [\mathcal{L}_{\mathrm{CE}}(\mathrm{Up}(H_l),Y) + \mathcal{L}_{\mathrm{Dice}}(\mathrm{Up}(H_l),Y)]$

Main Segmentation Loss: The final output $F_L$ combines with the final embeddings $CE_L$ for pixel-level prediction logits, supervised by the same segmentation losses.
Fisher Discriminative Loss: To maximize intra-class compactness and inter-class separation of class embeddings, a Fisher Discriminative loss is applied across layers:

$\mathcal{L}_{\mathrm{FD}^{(l)}} = \frac{S_w^{(l)}}{S_b^{(l)}+\epsilon}$

$S_w^{(l)}$ is within-class scatter, $S_b^{(l)}$ between-class scatter, and $\epsilon$ is a small constant for numerical stability.

The final objective is a weighted sum:

$\mathcal{L}_{\mathrm{total}} = \mathcal{L}_{\mathrm{main}} + \lambda_1 \mathcal{L}_{\mathrm{HM}} + \lambda_2 \mathcal{L}_{\mathrm{FD}}$

with $\lambda_1 = \lambda_2 = 0.1$ .

5. Implementation Hyperparameters and Training Strategy

Key elements for practical deployment include:

Backbone: ConvNeXt-B pretrained on ImageNet.
Feature dimensions: $C_f=C_c=512$ .
Number of HBIS layers: $L=2$ .
F2CE sampling: Top 2% of pixels per class, per layer.
Residual weight: $\alpha$ initialized as 1, learnable.
Optimization: Adam with initial learning rate $8\times 10^{-5}$ , cosine annealing schedule, batch size 8, zero weight decay.
Activation functions: Sigmoid for gating/heatmap, tanh for $\gamma$ parameters, softmax for class assignment.

A summary of the HBIS configuration appears below:

Component	Value/Setting	Notes
Backbone	ConvNeXt-B (ImageNet pretrained)	Main encoder
Feature/embedding dim ( $C_f,C_c$ )	512
Number of HBIS layers ( $L$ )	2
F2CE pooling fraction ( $K$ )	2%	Per class, per layer
Optimizer	Adam
Loss weights ( $\lambda_1,\lambda_2$ )	0.1	For $\mathcal{L}_{\mathrm{HM}}$ , $\mathcal{L}_{\mathrm{FD}}$

6. Contributions to Segmentation Quality and Interpretability

Empirical and architectural analysis demonstrates multiple benefits of HBIS:

Boundary Delineation: Hierarchical heatmap supervision enforces class localization from early stages, improving ability to resolve fine boundaries, which is a significant challenge in high-resolution remote sensing.
Discriminative Representation: The F2CE–CE2F bidirectional co-refinement loop enables class embeddings to absorb instance/contextual variations, then imprint back class-wise attention, leading to lower inter-class confusion and more cohesive intra-class predictions.
Training Stability: The design's adaptive gating and residual structure avoid destabilizing semantic updates and reduce susceptibility to over-smoothing or vanishing gradients.
Interpretability: The heatmaps generated at each stage act as semantically meaningful attention maps, providing insight into spatial focus for each class and supporting visual diagnosis of model behavior.
Semantic Separability: Fisher Discriminative Loss further structures the class embedding space, producing more compact intra-class clusters and enhancing the network's ability to differentiate between visually similar categories.

These characteristics enable BiCoR-Seg with HBIS modules to outperform prior approaches in segmentation tasks on datasets such as LoveDA, Vaihingen, and Potsdam, with particular strengths in interpretability and boundary precision (Shi et al., 23 Dec 2025).

7. Context and Research Significance

The HBIS module in BiCoR-Seg represents a methodological advance in addressing the persistent issues of inter-class similarity and intra-class variability in HRSS. Its tightly coupled bidirectional architecture, explicit spatial-semantic heatmaps, and direct embedding supervision align with current research trajectories emphasizing explainable, robust segmentation in Earth observation and other dense-prediction contexts. The release of source code for BiCoR-Seg further facilitates reproducibility and adaptation in related remote sensing and semantic segmentation studies (Shi et al., 23 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

BiCoR-Seg: Bidirectional Co-Refinement Framework for High-Resolution Remote Sensing Image Segmentation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Heatmap-driven Bidirectional Information Synergy Module (HBIS).