BioHiCL: Hierarchical Multi-Label Contrastive Learning for Biomedical Retrieval with MeSH Labels

Published 17 Apr 2026 in cs.IR and cs.AI | (2604.15591v1)

Abstract: Effective biomedical information retrieval requires modeling domain semantics and hierarchical relationships among biomedical texts. Existing biomedical generative retrievers build on coarse binary relevance signals, limiting their ability to capture semantic overlap. We propose BioHiCL (Biomedical Retrieval with Hierarchical Multi-Label Contrastive Learning), which leverages hierarchical MeSH annotations to provide structured supervision for multi-label contrastive learning. Our models, BioHiCL-Base (0.1B) and BioHiCL-Large (0.3B), achieve promising performance on biomedical retrieval, sentence similarity, and question answering tasks, while remaining computationally efficient for deployment.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces BioHiCL, a framework aligning embedding similarity with MeSH-based label hierarchies to enhance retrieval precision.
It employs regression alignment and hierarchy-aware contrastive loss alongside LoRA-based tuning for robust, efficient biomedical text representation.
Empirical results demonstrate improved IR, sentence similarity, and QA performance with low latency and memory requirements.

Hierarchical Multi-Label Contrastive Learning for Biomedical Retrieval with MeSH

Motivation and Background

Biomedical information retrieval (IR) presents unique challenges due to its reliance on highly specialized terminology and the prevalence of complex, hierarchically structured semantic relationships between concepts. Most dense retrievers in the biomedical domain have focused on modeling semantic similarity through coarse binary relevance signals—either a document is deemed “relevant” or not—ignoring graded or partially overlapping meanings that are prevalent in clinical and scientific texts.

Existing biomedical IR models frequently rely on domain-pretrained LLMs and contrastive learning, yet their supervision signals are too coarse to capture nuanced semantic overlap. MeSH (Medical Subject Headings) provides a curated, hierarchical ontology that encodes latent semantic relationships far beyond binary categorization. Leveraging the depth and structure of MeSH allows for more granular modeling of semantic similarity.

Figure 1: Example of sentence pairs labeled as neutral in MedNLI, but with MeSH annotations sharing a common parent in the disease hierarchy, exposing relatedness not captured by binary labels.

The BioHiCL Framework

Hierarchical Multi-Label Supervision

BioHiCL (Biomedical Retrieval with Hierarchical Multi-Label Contrastive Learning) proposes aligning embedding similarity directly with MeSH-based label similarity. For any pair of biomedical abstracts, the degree of similarity is computed by weighting MeSH label matches according to their depth in the ontology hierarchy, favoring more specific, deeper nodes to emphasize semantically precise overlap. This multi-label, hierarchy-aware structure provides a fine-grained supervisory signal for embedding learning, in contrast to traditional contrastive approaches using binary relevance.

Modeling and Objectives

Each biomedical sentence or abstract is mapped through a dense encoder (based on BGE) to an embedding space. MeSH annotations, supplemented by their ancestors in the hierarchy, are encoded as multi-hot vectors with depth-based weighting, resulting in a specificity-sensitive, high-dimensional label representation.

Two main objectives are employed:

Regression Alignment: A mean squared error loss aligns embedding similarity (cosine distance in the learned space) with MeSH-label similarity. Only pairs with meaningful MeSH overlap (above a similarity threshold) are used, avoiding supervision collapse from unrelated pairs.
Hierarchy-Aware Contrastive Loss: Positive pairs are weighted proportionally to their MeSH similarity, while negatives are sampled from non-overlapping pairs. This contrastive setup ensures separation in the embedding space based on hierarchical and multi-label overlap.
Figure 2: Schematic of the BioHiCL training loop: LoRA-based parameter-efficient fine-tuning aligns dense embeddings with MeSH-based label similarity, guiding the model to represent pairs with greater MeSH overlap as closer in embedding space.

Efficient Adaptation with LoRA

BioHiCL adapts a general-domain retriever to the biomedical domain via LoRA-based parameter-efficient fine-tuning, injecting low-rank adapters into the backbone weights while freezing the vast majority of parameters. This yields rapid adaptation without the memory footprint or compute requirements of full fine-tuning.

Empirical Results

BioHiCL was evaluated on multiple biomedical benchmarks, including information retrieval (NFCorpus, TREC-COVID, SciFact, SCIDOCS), sentence similarity (BIOSSES, SciFact sentences), and question answering (PubMedQA). Both the BioHiCL-Base (0.1B) and BioHiCL-Large (0.3B) variants were extensively benchmarked against leading general and biomedical-domain dense retrievers, covering a range of parameter sizes and training paradigms.

Key findings:

Retrieval Effectiveness: BioHiCL-Base achieves the highest IR average (0.543) despite its compact size, outperforming larger models such as BMRetriever-1B (1B parameters). BioHiCL-Large further improves or matches performance on select benchmarks.
Sentence Similarity and QA: BioHiCL-Base achieves the highest BIOSSES Spearman correlation (0.896), while BioHiCL-Large reaches the best Recall@1 on PubMedQA (0.898). Both models are robust across tasks without a reliance on task-specific prompts.
Computational Efficiency: Both variants exhibit low latency and modest memory consumption (e.g., 3.5 ms/doc corpus encoding for BioHiCL-Base, 0.63 ms/query query encoding), making them suitable for real-time, large-scale deployments on standard hardware.

Ablation studies confirm that all four architectural and training design choices (inclusion of ancestor labels, depth-based weighting, regression alignment, and contrastive loss) are essential for peak performance. In particular, omitting the contrastive objective or hierarchical label expansion consistently degrades retrieval quality.

Implications and Future Directions

The use of hierarchical, multi-label supervision based on MeSH provides a substantially richer training signal for biomedical dense retrieval. This moves beyond binary or instance-level contrastive objectives toward models with graded, specificity-aware semantic matching. As a result, BioHiCL’s learned embeddings capture nuanced partial overlaps and hierarchical semantic ties critical for biomedical knowledge work.

Practically, the demonstrated efficiency and effectiveness of BioHiCL suggest the viability of wide deployment in clinical and scientific information systems, with resource demands compatible with existing GPU infrastructure. The superior performance of BioHiCL at small and medium model scales also underscores the utility of parameter-efficient fine-tuning for domain adaptation.

Theoretically, the framework generalizes to other domains where expert-curated, hierarchical multi-label ontologies exist, such as e-commerce taxonomy or Wikipedia categories. Extending this approach could drive advances in dense retrieval across diverse semi-structured, hierarchical-labeled corpora.

Limitations

BioHiCL is dependent on high-quality, domain-wide hierarchical annotation resources. Domains lacking such curated ontologies, or those in which the hierarchy does not reflect semantic specificity appropriately, may not benefit from this method. Additionally, the fixed weighting approach may not always correspond to task- or context-specific relevance.

Conclusion

BioHiCL demonstrates that incorporating expert-curated hierarchical multi-label structures into contrastive learning yields dense biomedical retrievers that are both efficient and highly effective. By explicitly aligning embedding geometry with the graded, hierarchy-aware semantics of MeSH, BioHiCL substantially improves representational fidelity for biomedical text, supporting more accurate and nuanced information access. Extensions to other hierarchically-annotated domains may facilitate similar improvements in retrieval and representation learning.

Markdown Report Issue