HiSCE: Hierarchy-Sibling Smoothed Cross-Entropy
- HiSCE is a hierarchy-aware loss that applies sibling-sensitive smoothing to promote taxonomic consistency in vision-language models.
- It integrates seamlessly with fine-tuning frameworks using low-rank adaptations, enabling robust predictions across multi-level classifications.
- Empirical results show improved Full-Path Accuracy and reduced Tree-based Inconsistency Error on benchmarks like CUB-200-2011 and FGVC-Aircraft.
Hierarchy-Sibling Smoothed Cross-Entropy (HiSCE) is a hierarchy-aware objective formulated for @@@@1@@@@ vision-LLMs (VLMs) under structured taxonomies, particularly in scenarios where class labels possess multi-level granularity such as order, family, and species. HiSCE loss, proposed by the authors of “Hierarchy-Aware Fine-Tuning of Vision-LLMs” (Li et al., 25 Dec 2025), introduces sibling-sensitive smoothing to cross-entropy, which encourages robust and taxonomically consistent predictions with efficient parameter adaptation schemes.
1. Formal Construction and Notation
Let denote the depth of the taxonomy and the path of ground-truth labels per example, where indicates the correct class at taxonomy level . For each class at level , its sibling set is defined as . Model predictions at level are over class scores (image–text cosine similarities in multimodal VLMs).
HiSCE operationalizes a Categorical smoothing matrix for each level :
Given a label , the smoothed target distribution is used for cross-entropy at level :
and the total HiSCE loss is the sum over taxonomy levels:
Normalization is explicit: per distribution.
2. Motivation and Rationale
Standard cross-entropy penalizes any deviation from the ground-truth class equally, driving the model to concentrate probability mass exclusively on that class. In hierarchical classification, this leads to excessive overconfidence, particularly within fine-grained sibling categories, resulting frequently in taxonomically inconsistent predictions such as selecting a leaf whose parent node is mismatched. By redistributing a small probability mass uniformly to siblings, HiSCE softens intra-level decision boundaries (“horizontal” smoothing), operationalizing semantic and visual proximity of siblings. The effect is to penalize “close” sibling misclassifications less severely than errant predictions outside the correct branch, directly reducing tree-structure inconsistency metrics while also enabling more robust uncertainty modeling for ambiguous visual data.
3. Integration with Hierarchy-Aware Fine-Tuning Frameworks
HiSCE is integrated in a multi-term fine-tuning objective, working synergistically with both standard cross-entropy at the leaf level () and a Tree-Path KL divergence (), which enforces pathwise vertical coherence:
All computations occur in the shared embedding space of frozen CLIP backbones adapted with LoRA modules. Only low-rank LoRA matrices (4.4M parameters) and layer norm weights are updated, achieving computational efficiency and low resource requirements for adaptation. HiSCE’s design ensures compatibility with lightweight parameter update protocols prevalent in scalable model fine-tuning.
4. Empirical Evaluation and Observed Gains
Empirical studies across four benchmark datasets—CUB-200-2011, FGVC-Aircraft, Butterfly-200, and ChestX-ray14—demonstrate that replacing or augmenting standard cross-entropy with HiSCE () yields substantial improvements in Full-Path Accuracy (FPA) and reductions in Tree-based Inconsistency Error (TICE). For example, on CUB-200-2011, FPA improves from 50.2 to 63.1 and TICE decreases from 21.9 to 10.8. Comparable gains are observed across all reported benchmarks. Joint ablation with TP-KL loss shows that the combination offers the best overall hierarchy-aware performance (Li et al., 25 Dec 2025). This suggests the essentiality of both vertical and horizontal consistency in taxonomy adaptation.
| Dataset | Metric | CE Baseline | HiSCE | CE+HiSCE+TP-KL (best) |
|---|---|---|---|---|
| CUB-200-2011 | FPA | 50.2 | 63.1 | [see article Table 9] |
| CUB-200-2011 | TICE | 21.9 | 10.8 | [see article Table 9] |
| FGVC-Aircraft | FPA | 38.3 | 57.0 | [see article Table 9] |
| FGVC-Aircraft | TICE | 17.9 | 11.7 | [see article Table 9] |
A plausible implication is that HiSCE directly mitigates taxonomic path inconsistencies associated with naïve cross-entropy objectives.
5. Hyperparameterization and Practical Considerations
HiSCE requires selection of a smoothing factor per taxonomy level, with recommended values in and default . The loss weight should be set to 1.0 or tuned using a small validation split for optimal performance. Balancing with (TP-KL weight) can be conducted via a few Optuna trials; empirical observations indicate that a 1:1 ratio facilitates stable improvements across datasets. The methodology is robust to typical choices of hyperparameters, suggesting broad applicability in hierarchical domains.
6. Taxonomy-Aware Label Smoothing: Implications and Significance
HiSCE fundamentally generalizes conventional label smoothing by structurally redistributing probability mass only among sibling classes at each level, instead of uniformly across all classes. This taxonomy-aware formulation ensures the fine-tuning process respects target structure, yielding more robust and consistent hierarchical predictions while pairing naturally with vertical coherence objectives (e.g., TP-KL). The approach is compatible with large-scale vision-LLM fine-tuning using LoRA, offering substantial scalability and efficiency. These characteristics position HiSCE as a practical choice for structured prediction in hierarchical classification tasks where path-consistent output and efficient adaptation are critical requirements.