HyperPath: Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI Analysis (2506.16398v3)

Published 19 Jun 2025 in cs.CV

Abstract: Pathology is essential for cancer diagnosis, with multiple instance learning (MIL) widely used for whole slide image (WSI) analysis. WSIs exhibit a natural hierarchy -- patches, regions, and slides -- with distinct semantic associations. While some methods attempt to leverage this hierarchy for improved representation, they predominantly rely on Euclidean embeddings, which struggle to fully capture semantic hierarchies. To address this limitation, we propose HyperPath, a novel method that integrates knowledge from textual descriptions to guide the modeling of semantic hierarchies of WSIs in hyperbolic space, thereby enhancing WSI classification. Our approach adapts both visual and textual features extracted by pathology vision-language foundation models to the hyperbolic space. We design an Angular Modality Alignment Loss to ensure robust cross-modal alignment, while a Semantic Hierarchy Consistency Loss further refines feature hierarchies through entailment and contradiction relationships and thus enhance semantic coherence. The classification is performed with geodesic distance, which measures the similarity between entities in the hyperbolic semantic hierarchy. This eliminates the need for linear classifiers and enables a geometry-aware approach to WSI analysis. Extensive experiments show that our method achieves superior performance across tasks compared to existing methods, highlighting the potential of hyperbolic embeddings for WSI analysis.

Summary

The paper introduces a novel method that integrates hyperbolic geometry with vision-language models to capture multi-level semantic hierarchies in whole slide images.
It employs state-of-the-art feature extraction, hyperbolic embedding via the Lorentz model, and attention-based hierarchical aggregation to retain both local details and global context.
Experimental results reveal significant AUC and F1 improvements along with interpretable latent space visualizations that validate the model's robustness and scalability.

Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI Analysis

The paper "HyperPath: Knowledge-Guided Hyperbolic Semantic Hierarchy Modeling for WSI Analysis" (2506.16398) introduces a novel method for computational pathology, focusing on the hierarchical modeling of whole slide images (WSIs) using hyperbolic geometry and semantic knowledge transfer from pathology vision-LLMs. HyperPath is motivated by the inherent multi-level structure of WSIs (patches, regions, slides) and the limitations of Euclidean embeddings in capturing semantic hierarchies within this context.

Methodological Contributions

HyperPath systematically integrates visual and textual modalities into a hyperbolic space, aligning features across hierarchical levels and semantics. The key components are:

Vision-Language Feature Extraction: Utilizing state-of-the-art foundation models (such as CONCH), both visual (from image patches/regions) and textual (from pathological concept descriptions) features are extracted.
Hyperbolic Embedding via the Lorentz Model: Both modalities are mapped via trainable adaptors to a common tangent space and then into the Lorentz model of hyperbolic space, preserving semantic and hierarchical relationships due to the space's exponential capacity for hierarchical embedding.
Hierarchical Aggregation: Features are aggregated from patch to region to slide level using an attention-based mechanism before being embedded in hyperbolic space, allowing for both local detail and global context to be retained.
Angular Modality Alignment Loss (AMA): To address modality gaps in latent space, AMA aligns embeddings using angular distance in hyperbolic space instead of geodesic distance. This ensures that visual and textual features from the same class are brought closer despite their differing distributional properties across the hierarchy. Its contrastive setup pulls positive pairs together and pushes negatives apart, softening the alignment process and mitigating scale-induced bias between modalities.
Semantic Hierarchy Consistency Loss (SHC): To explicitly model entailment and contradiction relationships, entailment cones are defined in hyperbolic space. General concepts (close to the origin with wide cones) entail more specific ones. Entailment and contradiction losses enforce that the hierarchical dependencies (intra- and inter-modal) are respected, ensuring that the outputs not only align but are structurally consistent.
Classifier-Free Geodesic Decision: Final slide classification is performed directly in hyperbolic space via geodesic distance to textual class embeddings, dispensing with a linear classifier. This enforces geometry-aware, semantically structured decision boundaries.

Experimental Results

The method is benchmarked on four TCGA tasks: BRCA/NSCLC subtyping and molecular status (HER2, EGFR) prediction. Key findings include:

Strong Quantitative Improvements: On OOD settings, HyperPath surpasses prior approaches (ABMIL, CLAM, TransMIL, DTFD-MIL, ACMIL, HIPT, HIT) by margins of 1.9–9.2% in AUC and 2.6–8.8% in F1 score. In intra-domain (IND) settings, improvements of 0.7–5.1% (AUC) and 3.1–12.2% (F1) are observed, indicating robustness to distribution shift and data source variation.
Hierarchical Feature Structure: Visualizations using CO-SNE and HoroPCA on hyperbolic embeddings reveal clear, well-formed hierarchies: textual class embeddings cluster at the origin, surrounded systematically by slide, region, and patch embeddings. This confirms the model's ability to learn interpretable, hierarchical latent spaces.
Ablation Studies: Removal of AMA or SHC loss degrades performance, with SHC-only ablation performing worst due to improper visual-textual alignment, highlighting the necessity of both cross-modal alignment and explicit hierarchy modeling.

Theoretical and Practical Implications

HyperPath provides compelling evidence that hyperbolic space, guided by knowledge from large vision-LLMs, is highly effective for representing and reasoning over semantically hierarchical data such as WSIs. The geometry supports both scalable hierarchy capture and more faithful alignment of multimodal features.

The practical implications are significant:

Open-set Generalization: Geometry-driven alignment reduces the risk of overfitting and mode collapse, improving transferability and reliability of WSI classifiers, especially on OOD samples.
Classifier Independence: Dispensing with a post-hoc linear classifier simplifies deployment and may reduce overfitting to dataset-specific idiosyncrasies.
Interpretability: The structured latent space provides interpretable embeddings potentially useful for downstream analysis, visualization, and error diagnosis.

Limitations and Future Directions

Several avenues remain for exploration:

Scalability: While the Lorentz model offers numerical stability, computational demands grow with the number of hierarchical levels and the volume of patches, requiring efficient batching and potentially approximation techniques for larger scale deployments.
Foundation Model Dependence: Performance relies on the quality and domain specificity of employed vision-LLMs, and further work could explore adaptation strategies for underspecified or low-data domains.
Beyond Pathology: HyperPath's general methodology is applicable to other domains featuring inherent hierarchies, such as remote sensing, document analysis, or social network modeling.

Future work may focus on dynamic hierarchy learning, active and continual learning in hyperbolic space, and the integration of more nuanced domain knowledge sources to further enhance generalization and interpretability.