FD Loss for Cross-layer Embedding
- The paper demonstrates FD loss's role in compacting intra-class embeddings and increasing inter-class separation across decoder layers.
- It employs HBIS modules and rigorous scatter computations to iteratively refine class prototypes, yielding a measurable ~0.2% mIoU gain.
- The approach showcases FD loss’s potential to improve semantic segmentation in remote sensing by aligning multi-layer embedding representations.
The cross-layer class embedding Fisher Discriminative Loss (FD loss) is a regularization objective introduced within the BiCoR-Seg architecture for high-resolution remote sensing image semantic segmentation, directly targeting issues of high inter-class similarity and large intra-class variability. By enforcing compactness of embeddings within each class and maximizing separation between distinct classes, FD loss promotes the emergence of discriminative class prototypes at multiple decoder depths. Its aggregation across all hierarchical layers underpins measurable gains in segmentation performance and interpretability for BiCoR-Seg (Shi et al., 23 Dec 2025).
1. Formal Definition and Mathematical Formulation
FD loss operationalizes discriminative supervision on batches of class embeddings at each decoder layer , designated , where is the batch size, the number of classes, and the embedding dimensionality. Key components at each layer include:
- Class center:
- Overall center:
- Within-class scatter:
- Between-class scatter:
- Layerwise Fisher ratio loss: (with for stability)
- Aggregated across all decoder layers:
Finally, total loss for training is constructed as:
with recommended hyperparameters and (Shi et al., 23 Dec 2025).
2. Class Embedding Construction Across Decoder Layers
Class embeddings are iteratively refined at each hierarchical decoder depth via the cascaded Heatmap-driven Bidirectional Information Synergy (HBIS) modules. The process initiates with an initial set derived from a post-encoder MLP. At each HBIS layer , the "feature-to-class" (F2CE) pathway consists of:
- Linearly projecting previous class prototypes to query vectors.
- Generating class heatmaps by applying a sigmoid to the dot product of pixel features and projected class prototypes.
- Selecting top-K pixels per class and using a weighted sum (projected to ) to extract context vectors .
- Fusing with via a learned gating function to form updated .
After F2CE refinement, each is used for FD loss computation, embodying image-adapted semantic prototypes.
3. Layerwise Supervision and Aggregation Mechanism
FD loss operates independently at each layer without inter-layer fusion or alignment, aggregating the discriminative effect by summation. At every depth, the scatter computations are centered via normalizations using and . The gradient of FD loss propagates through the F2CE modules, ultimately influencing feature extraction and heatmap branches.
4. Hyperparameters, Training Integration, and Implementation
Key implementation details and hyperparameters are:
- : Weighting for , typically set to $0.1$.
- : Numerical stability constant in denominator ().
- Batch/embedding parameters: determine capacity (set by model design).
Pseudocode integration is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
for mini_batch in dataloader: images, labels = mini_batch F = F_0 # encoder output CE = CE_0 # initial class prototypes saved_CE = [] for l in 1...L: H = compute_heatmaps(F, CE) CE = F2CE_update(F, CE, H) F = CE2F_enhance(F, CE, H) saved_CE.append(CE) logits = segmentation_head(F, CE) L_main = CE_loss(logits, labels) + Dice_loss(logits, labels) L_HM = sum(CE_loss(upsample(H_l), labels) + Dice_loss(upsample(H_l), labels) for H_l in heatmaps_from_each_layer) L_FD = 0 for l, CE_l in enumerate(saved_CE, start=1): mu_n = mean_over_batch(CE_l, dim=0) mu = mean_over_classes(mu_n) S_w = sum_over_n_b((CE_l[b,n] - mu_n)**2)/(B*N) S_b = sum_over_n((mu_n - mu)**2)/N L_FD += S_w/(S_b + epsilon) L_total = L_main + lambda_1 * L_HM + lambda_2 * L_FD L_total.backward() optimizer.step() |
5. Empirical Contributions and Comparative Analysis
The FD loss demonstrates additive and complementary gains across segmentation benchmarks. Experimental ablation on LoveDA reports:
| Configuration | mIoU (%) |
|---|---|
| Baseline (cross-attn+) | 54.20 |
| +HBIS (no FD, no HM) | 55.15 |
| +FD only (no HBIS) | 55.21 |
| +HBIS+HM (no FD) | 55.36 |
| +HBIS+HM+FD | 55.49 |
FD loss alone yields a mIoU gain; combined with HBIS and hierarchical heatmap supervision, maximal segmentation accuracy is achieved. This suggests FD loss impacts both intra-class compactness and inter-class separation, leveraging multi-layer adaptation for robust semantic representations.
6. Context and Implications for Semantic Segmentation
The cross-layer class embedding Fisher Discriminative Loss serves as a discriminative regularizer for dynamic prototype refinement, particularly relevant in domains with large intra-class variability and high inter-class similarity, such as remote sensing. It integrates seamlessly into hierarchical decoders, encourages interpreted heatmap generation, and complements synergistic feature-class interactions as instantiated in BiCoR-Seg. A plausible implication is the broader applicability of FD-type objectives for other tasks requiring hierarchical embedding discrimination. The observed improvements in mIoU and prototype coherence reinforce its efficacy within the proposed segmentation paradigm (Shi et al., 23 Dec 2025).