HBIS: Heatmap-driven Information Synergy Module
- The paper introduces HBIS, a module that refines class embeddings and spatial features via bidirectional updates using class-specific heatmaps.
- HBIS employs dynamic feature-to-class and class-to-feature interactions with top-K selection and adaptive gating to improve boundary precision and semantic discrimination.
- Empirical results show that HBIS boosts segmentation quality by reducing inter-class confusion and enhancing interpretability on high-resolution remote sensing datasets.
The Heatmap-driven Bidirectional Information Synergy Module (HBIS) is a neural network architectural component designed for high-resolution remote sensing semantic segmentation, specifically introduced in the BiCoR-Seg framework. HBIS establishes a dynamic two-way interaction between class embeddings and spatial feature maps, leveraging class-specific spatial heatmaps to enhance semantic discrimination, sharpen boundaries, and improve the interpretability of deep models operating in scenarios marked by high inter-class similarity and substantial intra-class variability (Shi et al., 23 Dec 2025).
1. Core Mechanisms and Structure
HBIS operates at each decoder stage within the segmentation network. Its main inputs consist of: (a) a feature map from the previous layer, and (b) a set of class embeddings with each . Each HBIS module executes the following sequence:
- Class-Heatmap Generation: Each class embedding is projected into the feature space to create a query vector , where . The similarity between and each spatial feature vector is measured via dot product, then passed through a sigmoid to form class confidence heatmaps .
- F2CE (Feature-to-Class Embedding Update): For each class, the locations with top response in are selected, forming the index set . The features at these locations are projected with into the class embedding space, aggregated via weighted sum (normalized heatmap values) into a new context vector . This is then blended with the previous embedding by a gate computed as a sigmoid over their concatenation, yielding the updated .
- CE2F (Class Embedding-to-Feature Modulation): Each refined class embedding generates affine modulation parameters and , applied to the feature map. These class-specific modulated features are integrated back into a single feature map via softmax-normalized heatmaps and a learnable residual coefficient .
This process is iterated, with bidirectional updates ensuring mutual refinement of spatial and semantic representations.
2. Detailed Computational Pipeline
The computational steps in a single HBIS layer () are as follows:
- Query Construction:
- Heatmap Computation:
- Top-K Feature Selection: The top indices of are chosen.
- Heatmap Normalization:
- Aggregated Context Vector:
- Gated Fusion:
- Semantic Modulation:
- Heatmap Softmax-normalization:
- Final Feature Update:
All projection weights (, , , , ) are convolutions or linear layers without bias. The default values in implementation are , , .
3. Bidirectional Information Flow and Stabilization
The synergy in HBIS arises from explicit bidirectional links:
- F2CE (feature to class embedding): Pools spatially localized features with strong class evidence, producing context-aware class embeddings.
- CE2F (class embedding to feature): Class-specific semantic knowledge is injected into the pixel-level feature map via affine channel-wise modulation, influencing spatial features based on the refined, context-dependent class semantics.
Stabilization mechanisms include a residual weight (learnable, initialized as 1) and adaptive gating via . These act to regularize updates, preventing abrupt overwriting of either features or semantic representations and supporting convergence.
4. Hierarchical Supervision and Losses
HBIS employs multi-scale hierarchical supervision to encourage discriminative capability even in early network stages:
- Heatmap Deep Supervision: At each decoder stage, the intermediate heatmap (after upsampling) is compared to the ground-truth segmentation via pixel-wise cross-entropy and Dice loss:
- Main Segmentation Loss: The final output combines with the final embeddings for pixel-level prediction logits, supervised by the same segmentation losses.
- Fisher Discriminative Loss: To maximize intra-class compactness and inter-class separation of class embeddings, a Fisher Discriminative loss is applied across layers:
is within-class scatter, between-class scatter, and is a small constant for numerical stability.
The final objective is a weighted sum:
with .
5. Implementation Hyperparameters and Training Strategy
Key elements for practical deployment include:
- Backbone: ConvNeXt-B pretrained on ImageNet.
- Feature dimensions: .
- Number of HBIS layers: .
- F2CE sampling: Top 2% of pixels per class, per layer.
- Residual weight: initialized as 1, learnable.
- Optimization: Adam with initial learning rate , cosine annealing schedule, batch size 8, zero weight decay.
- Activation functions: Sigmoid for gating/heatmap, tanh for parameters, softmax for class assignment.
A summary of the HBIS configuration appears below:
| Component | Value/Setting | Notes |
|---|---|---|
| Backbone | ConvNeXt-B (ImageNet pretrained) | Main encoder |
| Feature/embedding dim () | 512 | |
| Number of HBIS layers () | 2 | |
| F2CE pooling fraction () | 2% | Per class, per layer |
| Optimizer | Adam | |
| Loss weights () | 0.1 | For , |
6. Contributions to Segmentation Quality and Interpretability
Empirical and architectural analysis demonstrates multiple benefits of HBIS:
- Boundary Delineation: Hierarchical heatmap supervision enforces class localization from early stages, improving ability to resolve fine boundaries, which is a significant challenge in high-resolution remote sensing.
- Discriminative Representation: The F2CE–CE2F bidirectional co-refinement loop enables class embeddings to absorb instance/contextual variations, then imprint back class-wise attention, leading to lower inter-class confusion and more cohesive intra-class predictions.
- Training Stability: The design's adaptive gating and residual structure avoid destabilizing semantic updates and reduce susceptibility to over-smoothing or vanishing gradients.
- Interpretability: The heatmaps generated at each stage act as semantically meaningful attention maps, providing insight into spatial focus for each class and supporting visual diagnosis of model behavior.
- Semantic Separability: Fisher Discriminative Loss further structures the class embedding space, producing more compact intra-class clusters and enhancing the network's ability to differentiate between visually similar categories.
These characteristics enable BiCoR-Seg with HBIS modules to outperform prior approaches in segmentation tasks on datasets such as LoveDA, Vaihingen, and Potsdam, with particular strengths in interpretability and boundary precision (Shi et al., 23 Dec 2025).
7. Context and Research Significance
The HBIS module in BiCoR-Seg represents a methodological advance in addressing the persistent issues of inter-class similarity and intra-class variability in HRSS. Its tightly coupled bidirectional architecture, explicit spatial-semantic heatmaps, and direct embedding supervision align with current research trajectories emphasizing explainable, robust segmentation in Earth observation and other dense-prediction contexts. The release of source code for BiCoR-Seg further facilitates reproducibility and adaptation in related remote sensing and semantic segmentation studies (Shi et al., 23 Dec 2025).