FD Loss for Cross-layer Embedding

Updated 31 December 2025

The paper demonstrates FD loss's role in compacting intra-class embeddings and increasing inter-class separation across decoder layers.
It employs HBIS modules and rigorous scatter computations to iteratively refine class prototypes, yielding a measurable ~0.2% mIoU gain.
The approach showcases FD loss’s potential to improve semantic segmentation in remote sensing by aligning multi-layer embedding representations.

The cross-layer class embedding Fisher Discriminative Loss (FD loss) is a regularization objective introduced within the BiCoR-Seg architecture for high-resolution remote sensing image semantic segmentation, directly targeting issues of high inter-class similarity and large intra-class variability. By enforcing compactness of embeddings within each class and maximizing separation between distinct classes, FD loss promotes the emergence of discriminative class prototypes at multiple decoder depths. Its aggregation across all hierarchical layers underpins measurable gains in segmentation performance and interpretability for BiCoR-Seg (Shi et al., 23 Dec 2025).

1. Formal Definition and Mathematical Formulation

FD loss operationalizes discriminative supervision on batches of class embeddings at each decoder layer $l$ , designated $CE_l = \{CE_{l, n}\}_{n=1}^N \in \mathbb{R}^{B \times N \times C_{class}}$ , where $B$ is the batch size, $N$ the number of classes, and $C_{class}$ the embedding dimensionality. Key components at each layer $l$ include:

Class center: $\mu_n^{(l)} = \frac{1}{B} \sum_{b=1}^{B} CE_{l, n}^{(b)}$
Overall center: $\mu^{(l)} = \frac{1}{N} \sum_{n=1}^N \mu_n^{(l)}$
Within-class scatter: $S_w^{(l)} = \frac{1}{BN} \sum_{n=1}^N \sum_{b=1}^B \|CE_{l, n}^{(b)} - \mu_n^{(l)}\|_2^2$
Between-class scatter: $S_b^{(l)} = \frac{1}{N} \sum_{n=1}^N \|\mu_n^{(l)} - \mu^{(l)}\|_2^2$
Layerwise Fisher ratio loss: $\mathcal{L}_{FD}^{(l)} = \frac{S_w^{(l)}}{S_b^{(l)} + \epsilon}$ (with $\epsilon \approx 10^{-6}$ for stability)
Aggregated across all $L$ decoder layers: $\mathcal{L}_{FD} = \sum_{l=1}^L \mathcal{L}_{FD}^{(l)}$

Finally, total loss for training is constructed as:

$\mathcal{L}_{total} = \mathcal{L}_{main} + \lambda_1 \mathcal{L}_{HM} + \lambda_2 \mathcal{L}_{FD}$

with recommended hyperparameters $\lambda_1 = 0.1$ and $\lambda_2 = 0.1$ (Shi et al., 23 Dec 2025).

2. Class Embedding Construction Across Decoder Layers

Class embeddings are iteratively refined at each hierarchical decoder depth via the cascaded Heatmap-driven Bidirectional Information Synergy (HBIS) modules. The process initiates with an initial set $CE_0$ derived from a post-encoder MLP. At each HBIS layer $l$ , the "feature-to-class" (F2CE) pathway consists of:

Linearly projecting previous class prototypes $CE_{l-1,n}$ to query vectors.
Generating class heatmaps $H_{l,n}(x,y)$ by applying a sigmoid to the dot product of pixel features $F_{l-1}(x,y)$ and projected class prototypes.
Selecting top-K pixels per class and using a weighted sum (projected to $C_{class}$ ) to extract context vectors $C_{l,n}$ .
Fusing $C_{l,n}$ with $CE_{l-1,n}$ via a learned gating function to form updated $CE_{l,n}$ .

After F2CE refinement, each $CE_{l}$ is used for FD loss computation, embodying image-adapted semantic prototypes.

3. Layerwise Supervision and Aggregation Mechanism

FD loss operates independently at each layer $l=1...L$ without inter-layer fusion or alignment, aggregating the discriminative effect by summation. At every depth, the scatter computations are centered via $\ell_2$ normalizations using $\mu_n^{(l)}$ and $\mu^{(l)}$ . The gradient of FD loss propagates through the F2CE modules, ultimately influencing feature extraction and heatmap branches.

4. Hyperparameters, Training Integration, and Implementation

Key implementation details and hyperparameters are:

$\lambda_2$ : Weighting for $\mathcal{L}_{FD}$ , typically set to $0.1$.
$\epsilon$ : Numerical stability constant in denominator ( $\sim 10^{-6}$ ).
Batch/embedding parameters: $(B, N, C_{class})$ determine capacity (set by model design).

Pseudocode integration is as follows:

for mini_batch in dataloader:
    images, labels = mini_batch
    F = F_0               # encoder output
    CE = CE_0             # initial class prototypes
    saved_CE = []
    for l in 1...L:
        H  = compute_heatmaps(F, CE)
        CE = F2CE_update(F, CE, H)
        F  = CE2F_enhance(F, CE, H)
        saved_CE.append(CE)
    logits = segmentation_head(F, CE)
    L_main = CE_loss(logits, labels) + Dice_loss(logits, labels)
    L_HM = sum(CE_loss(upsample(H_l), labels) + Dice_loss(upsample(H_l), labels)
               for H_l in heatmaps_from_each_layer)
    L_FD = 0
    for l, CE_l in enumerate(saved_CE, start=1):
        mu_n = mean_over_batch(CE_l, dim=0)
        mu = mean_over_classes(mu_n)
        S_w = sum_over_n_b((CE_l[b,n] - mu_n)**2)/(B*N)
        S_b = sum_over_n((mu_n - mu)**2)/N
        L_FD += S_w/(S_b + epsilon)
    L_total = L_main + lambda_1 * L_HM + lambda_2 * L_FD
    L_total.backward()
    optimizer.step()

Higher

\lambda_2

accentuates class compactness and separability but may hinder pixel-level learning convergence. Too small

\lambda_2

leads to negligible impact. Ablation shows optimal

\lambda_2=0.1

delivers a consistent

0.2\%

mIoU improvement over HBIS baseline.

5. Empirical Contributions and Comparative Analysis

The FD loss demonstrates additive and complementary gains across segmentation benchmarks. Experimental ablation on LoveDA reports:

Configuration	mIoU (%)
Baseline (cross-attn+ $\mathcal{L}_{main}$ )	54.20
+HBIS (no FD, no HM)	55.15
+FD only (no HBIS)	55.21
+HBIS+HM (no FD)	55.36
+HBIS+HM+FD	55.49

FD loss alone yields a $~0.21\%$ mIoU gain; combined with HBIS and hierarchical heatmap supervision, maximal segmentation accuracy is achieved. This suggests FD loss impacts both intra-class compactness and inter-class separation, leveraging multi-layer adaptation for robust semantic representations.

6. Context and Implications for Semantic Segmentation

The cross-layer class embedding Fisher Discriminative Loss serves as a discriminative regularizer for dynamic prototype refinement, particularly relevant in domains with large intra-class variability and high inter-class similarity, such as remote sensing. It integrates seamlessly into hierarchical decoders, encourages interpreted heatmap generation, and complements synergistic feature-class interactions as instantiated in BiCoR-Seg. A plausible implication is the broader applicability of FD-type objectives for other tasks requiring hierarchical embedding discrimination. The observed improvements in mIoU and prototype coherence reinforce its efficacy within the proposed segmentation paradigm (Shi et al., 23 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

BiCoR-Seg: Bidirectional Co-Refinement Framework for High-Resolution Remote Sensing Image Segmentation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cross-layer Class Embedding Fisher Discriminative Loss.

FD Loss for Cross-layer Embedding

1. Formal Definition and Mathematical Formulation

2. Class Embedding Construction Across Decoder Layers

3. Layerwise Supervision and Aggregation Mechanism

4. Hyperparameters, Training Integration, and Implementation

5. Empirical Contributions and Comparative Analysis

6. Context and Implications for Semantic Segmentation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FD Loss for Cross-layer Embedding

1. Formal Definition and Mathematical Formulation

2. Class Embedding Construction Across Decoder Layers

3. Layerwise Supervision and Aggregation Mechanism

4. Hyperparameters, Training Integration, and Implementation

5. Empirical Contributions and Comparative Analysis

6. Context and Implications for Semantic Segmentation

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research