Content-Based Medical Image Retrieval

Updated 25 November 2025

CBMIR is a specialized domain that retrieves medical images based on intrinsic visual features like texture, intensity, and shape rather than textual metadata.
It employs diverse methodologies—including handcrafted features, statistical radiomics, and deep learning models—to extract and encode medically significant image characteristics.
Advanced indexing and similarity metrics in CBMIR support efficient, large-scale retrieval, enhancing diagnostic decision-making and research in complex, multi-modal imaging environments.

Content-Based Medical Image Retrieval (CBMIR) is a specialized domain within medical informatics that focuses on retrieving relevant medical images from repositories based on image content rather than metadata or textual annotations. CBMIR systems leverage intrinsic visual or structural features—such as intensity, texture, shape, anatomy, or pathology—extracted algorithmically from images to index, query, and rank results, enabling diagnostic support, cohort mining, and comparative analysis across multiple modalities and diseases.

1. Problem Definition, Significance, and Core Workflow

CBMIR systems address the limitations of classical text-based search by directly exploiting the granularity and complexity of medical image data. The standard pipeline comprises several phases:

Feature Extraction: Input images are processed by handcrafted operators (Radon, LBP, SIFT/SURF), statistical radiomics, or deep learning (CNNs, autoencoders, transformers, or segmentation-guided encoders) to produce high-dimensional feature vectors or embeddings, which are intended to encode medically significant content (Zhu et al., 2016, Na et al., 11 Jul 2025, Qayyum et al., 2017, Denner et al., 11 Mar 2024, Chung et al., 2017, Xing et al., 2022).
Indexing: Features for all database images are computed offline and stored—often with additional structures for scalability/incremental search (e.g., approximate nearest-neighbor indices such as FAISS, HNSW, or inverted files).
Querying: At search time, the query image (or composite query: partial image, sketch, semantic vector) is encoded to the same feature space; similarity to all indexed images is computed with a metric such as Euclidean or cosine distance, Hamming distance, learned Mahalanobis, or application-specific measures (Zhu et al., 2016, 0811.4717, Tabatabaei et al., 17 Nov 2025, Qayyum et al., 2017).
Ranking & Retrieval: The system sorts candidate images by similarity, presenting clinicians with a ranked list of cases for diagnostic, research, or decision-support functions.

CBMIR is essential in clinical settings to enable finding relevant prior cases, support rare disease recognition, serve as a second opinion, or offer image-based search in databases where metadata is incomplete or non-standardized (Tizhoosh, 2 Aug 2024, Tabatabaei et al., 2023).

2. Representation Learning: Handcrafted, Statistical, and Deep Embeddings

A central theme in CBMIR research is the selection or learning of feature representations that are semantically meaningful, scalable, and robust to the variability in medical image data.

Handcrafted/Statistical Features:

Early systems used Radon projections and Local Binary Patterns with SVM classifiers and binary barcodes to encode shape, texture, and structure, facilitating efficient retrieval and classification while maintaining low storage costs and real-time search capability (Zhu et al., 2016, Camlica et al., 2015).
Histogram-based, Gabor-filter, or Bag-of-Visual-Words (BoVW) models (e.g., SIFT/SURF clustering, k-means, tf-idf weighting) capture local keypoint statistics and aggregate them into robust global signatures (S et al., 2020, Tizhoosh, 2 Aug 2024).

Radiomics-Based Embeddings:

Higher-order radiomics features (GLCM, shape descriptors) coupled with anatomical context embeddings allow fine-grained, tumor-level search across 3D volumes, supporting flexible queries based on partial feature sets (volume, sphericity, intensity texture) or anatomical localization (Na et al., 11 Jul 2025).

Deep Learning Approaches:

Convolutional Neural Networks (CNNs) trained for classification (ResNet, DenseNet, ViT, SwinTransformer) provide discriminative embeddings for content-based search. Embeddings are often derived from penultimate or intermediate layers and may be fine-tuned on medical images or adopted zero-shot from large-scale supervised or weakly/self-supervised vision models (Denner et al., 11 Mar 2024, Mahbod et al., 14 Sep 2024).
Stacked Autoencoders and variational schemes (SAE, VAE) learn compressed codes unsupervisedly, optionally binarized for memory efficiency and rapid Hamming-based retrieval (Sharma et al., 2016, Camlica et al., 2015).
Recent advances include Siamese CNNs with contrastive loss, where representations are trained to minimize the distance between similar disease states and maximize it for dissimilar, requiring only binary “same/different” supervision and significantly reducing annotation overhead (Chung et al., 2017).

Specialized/Hybrid Representations:

Disentangled and compositional approaches factor embeddings into normal and abnormal (disease-specific) codes, enabling queries that target either anatomy or pathology, or their linear combination, thus supporting comparative diagnostic reading (Kobayashi et al., 2021, Kobayashi et al., 2020, Kobayashi et al., 2023).
Proxy-based metric learning generalizes classical nearest centroid approaches to multi-label, multi-morbidity datasets, assigning samples to multiple learned “disease proxies,” which allows robust multi-pathology search with explicit cluster centers representing each disease entity (Xing et al., 2022).
Topological data analysis (TDA) maps images into fixed-length Betti number curves via persistent homology, encoding structural information that is independent of pixel intensities and yielding unsupervised, rotation/scaling-invariant descriptors that outperform many learned and hand-crafted baselines, especially in histopathology (Tabatabaei et al., 17 Nov 2025).

3. Query Formalisms, Modes, and User Interactions

Contemporary CBMIR supports a variety of query types beyond “search by example image”:

Semantic Decomposition: Systems that explicitly separate normal and abnormal components enable queries for isolated pathologies regardless of anatomical variation, or for specific anatomical patterns irrespective of pathology (Kobayashi et al., 2021, Kobayashi et al., 2020, Kobayashi et al., 2023).
Sketch-Based Retrieval: Novel systems let users specify a normal template and then sketch an abnormality (e.g., tumor outline), integrating features from both input modalities to form a query that can target fine-grained, rare, or previously unseen variants (Kobayashi et al., 2023).
Attribute-Level Query: Radiomics-driven systems allow querying by sets of feature values (e.g., “find high-sphericity, low-entropy tumors in anterior brain”), distancing from image-pixel matching toward physiologically meaningful cohort selection (Na et al., 11 Jul 2025).
Multi-modal and Multi-label Search: Multi-morbidity-aware models enable simultaneous retrieval based on combinations of co-occurring diseases, supporting the retrieval of similar “disease spectra” rather than single-label nearest neighbor (Xing et al., 2022).
Interpretable, Slice-Aggregated Retrieval: Slice-wise VAE-based encoders aggregated over volumes support interpretable CBMIR in 3D datasets (e.g., iCBIR-Sli), with saliency mapping back to anatomical regions driving query/result similarity (Tomoshige et al., 3 Jan 2025).

4. Similarity Metrics, Indexing, and Large-Scale Retrieval

Similarity computation in CBMIR is tightly coupled to the nature of the feature space:

Distance Metrics: l₂ (Euclidean), cosine, or Hamming (for binarized codes) predominate (Sharma et al., 2016, Chung et al., 2017, Zhu et al., 2016). For topological descriptors or radiomics, variants such as Wasserstein may be considered, but empirical results show Euclidean/cosine suffice in most settings (Tabatabaei et al., 17 Nov 2025, Na et al., 11 Jul 2025).
Ranking Functions: Systems returning k-nearest neighbors can use mean reciprocal rank (MRR), mean average precision (mAP), precision@k, recall@k, normalized discounted cumulative gain (nDCG), and F1-score as their main evaluation metrics, tailored to the intended application (diagnosis, differential diagnosis, cohort retrieval) (Chung et al., 2017, Xing et al., 2022, Denner et al., 11 Mar 2024).
Indexing and Scalability: For practical search on millions of images, efficient indices such as FAISS, HNSW or partial-sort heuristics are used (Tabatabaei et al., 17 Nov 2025, Denner et al., 11 Mar 2024, Jush et al., 23 Jul 2025), with storage footprints and query times monitored in realistic deployment studies (Tizhoosh, 2 Aug 2024).

5. Domain-Specific Extensions and Validation

CBMIR has been rigorously validated in multiple medical imaging domains:

Histopathology: Both supervised and unsupervised methods (CNN-AEs, TDA/Betti curves) consistently achieve high accuracy and generalize across different magnifications; federated learning extends CBMIR across privacy-constrained, multi-institutional datasets without data sharing, maintaining accuracy while reducing training time (Tabatabaei et al., 2023, Tabatabaei et al., 17 Nov 2025).
Radiology: Vision foundation models (BioMedCLIP, UNI, CONCH, CLIP) pre-trained on massive biomedical or natural datasets perform robustly as zero-shot feature extractors, with BiomedCLIP reporting micro-average P@1 up to 0.594 across 161 pathologies/24 anatomy classes (Denner et al., 11 Mar 2024, Mahbod et al., 14 Sep 2024). Index size ablation suggests that indexing ~1,000 examples per class is sufficient for maximal retrieval performance.
3D Volumetric Retrieval: ColBERT-inspired late-interaction architectures (C-MIR) enable segmentation-free, region-localized retrieval and improve tumor flagging and staging precision, with no need for pre-segmented ROIs and demonstrated sub-minute computation on ~100,000+ slices (Jush et al., 23 Jul 2025, Tomoshige et al., 3 Jan 2025).
Multi-label/Multimorbidity: Embedding spaces with multi-proxy assignment demonstrate consistent improvements in multi-pathology retrieval performance, particularly under data scarcity or label imbalance (Xing et al., 2022).
Validation Protocols: Beyond accuracy, studies systematically evaluate index memory overhead, search latency, storage per image, and robustness across domains and institutions, offering comprehensive ranking of engines per clinical relevance (Tizhoosh, 2 Aug 2024).

6. Interpretability, Disentanglement, and Explainability

Interpretability is increasingly emphasized:

Disentangled Latent Codes: Dual-branch or vector-quantized architectures explicitly separate normal and abnormal components. Retrieval can target anatomic or pathological similarity independently, supporting clinically meaningful use cases such as comparative reading or disease spectrum analysis (Kobayashi et al., 2021, Kobayashi et al., 2020, Kobayashi et al., 2023).
Proxy-Based and Prototype Supervision: Proxy centroids or class prototypes serve as interpretable anchors in embedding space, allowing visualization of disease clusters and reporting of per-disease similarity scores (Xing et al., 2022, Tomoshige et al., 3 Jan 2025).
Voxel-level Probability Mapping: Block- or slice-based retrieval frameworks provide per-voxel or per-region contribution scores, facilitating the mapping of retrieval outcomes to medically significant brain regions (Tomoshige et al., 3 Jan 2025).
Topological Summarization: Persistent homology features are inherently interpretable as counts of topological structures (loops, voids) within specific intensity ranges, providing domain-invariant fingerprints (Tabatabaei et al., 17 Nov 2025).

7. Limitations, Open Challenges, and Emerging Directions

Despite significant progress, CBMIR faces ongoing technical and translational challenges:

Annotation and Supervision: Obtaining granular multiclass or multilabel annotations remains costly; contrastive or semi-supervised objectives, pairwise training, and flexible input modes (sketch, radiomics) aim to address this (Chung et al., 2017, Kobayashi et al., 2023, Na et al., 11 Jul 2025).
Domain Shift and Harmonization: Heterogeneity in scanners, protocols, and institutions necessitates harmonization techniques (e.g., CycleGAN-style pseudo-scanner standardization, adversarial training) to avoid degraded matching performance across sites (Arai et al., 2021).
Semantic Gap: Bridging low-level visual similarity and high-level clinical semantics remains nontrivial; UMLS-concept fusion and ontology-based methods provide one pathway, while multimodal models and large language–vision pretraining offer alternatives (0811.4717, Denner et al., 11 Mar 2024).
Scalability: Ensuring sub-second retrieval on billion-scale repositories with high-dimensional medical embeddings requires ongoing optimization in index design and feature compression (Tabatabaei et al., 17 Nov 2025, Camlica et al., 2015).
Generalization and Fine-Grained Phenotypes: Most CBMIR benchmarks focus on binary or small-multiclass settings; future work must advance retrieval for rare phenotypes, multi-pathology cases, and subtypes within disease classes, possibly leveraging few-shot, multi-modal, or generative pretraining (Mahbod et al., 14 Sep 2024, Xing et al., 2022).
User-Centric Interfaces: Real-world adoption will require integration of interpretable, flexible querying with efficient backend search, supporting both clinician-driven and population-level use cases (Kobayashi et al., 2023, Tizhoosh, 2 Aug 2024).

CBMIR thus represents a continuously evolving intersection of computer vision, radiology, pathology, machine learning, and semantics, with current best practices emphasizing adaptable feature extraction (deep or statistical), rigorous benchmarking, domain and pathology-level diversity, and pipeline validation for computational and clinical feasibility.