Papers
Topics
Authors
Recent
2000 character limit reached

Zero-Shot Disease Detection

Updated 23 December 2025
  • Zero-shot disease detection is a computational approach that identifies novel diseases by leveraging auxiliary textual and imaging data without requiring annotated examples.
  • Architectural strategies integrate vision–language alignment, prompt-tuning, contrastive learning, and anomaly detection to facilitate scalable and adaptive diagnostics.
  • Applications span clinical imaging, text phenotyping, and agriculture, offering rapid insights for rare and emerging pathologies through cross-modal integration.

Zero-shot disease detection is a paradigm in computational medicine in which models identify and characterize diseases they have never seen annotated examples for in either training or development phases. Leveraging advances in vision–LLMs (VLMs), contrastive learning, prompt engineering, self-supervised representation, and cross-modal architectures, zero-shot approaches are increasingly feasible for rare or emerging pathologies in clinical imaging, text phenotyping, or agriculture. These models circumvent the need for costly and labor-intensive annotation, instead generalizing from rich auxiliary information such as textual descriptions, clinical attributes, or domain knowledge to novel disease categories.

1. Architectural Strategies for Zero-Shot Disease Detection

Zero-shot disease detection systems often hinge on progress in large vision–LLMs and synergistic modules tailored for medical or scientific domains. Distinct architectural patterns have emerged:

  • Vision–Language Alignment: Most frameworks pair an image encoder (CNN, ViT, or MAE) with a text encoder (BERT, BioBERT, or CLIP text tower). Input images are mapped to a latent feature space, while disease classes are represented by semantic embeddings—either canonical names, multi-line textual descriptions, clinical findings, or attribute vectors (Mishra et al., 2023, Wang et al., 13 Jun 2024, Hayat et al., 2021).
  • Prompt-Tuning and Knowledge Banks: Rather than fixed disease names, frameworks inject rich, disease-centric prompts, sometimes dynamically generated via LLMs (ChatGPT, GPT-4, BLIP). These prompts surface visual features, radiological patterns, or attribute sets and are typically encoded with CLIP or another transformer (Liu et al., 2023, Yang et al., 22 Feb 2025).
  • Structural and Layer-wise Fusion: In advanced designs, prompt features are mapped layer by layer into hierarchical knowledge banks, with mutual selection schemes retrieving highly relevant prompt tokens (structural representation) at each fusion stage (Yang et al., 22 Feb 2025). These are then incorporated via cross-modal multi-head attention, cross-attention blocks, or FiLM modulation.
  • Contrastive and Meta-Learning Modules: Contrastive learning (InfoNCE, NT-Xent) aligns image–text pairs, enforcing semantic coherence in latent space. Meta-learning adapters (e.g., MAML, LoRA) provide rapid transfer and adaptation when the language or modality shifts (Sar et al., 24 Sep 2025, Shaukat et al., 2 Jul 2024).
  • Unsupervised and Few-Shot Extensions: In unsupervised settings, normality models are trained exclusively on healthy samples. Anomaly maps or outlier scoring in embedding space flag deviations, as used for rare tumors or congenital heart disease when annotation is sparse (Chin et al., 23 May 2025, Saha et al., 10 Mar 2025).

2. Prompt Engineering and Semantic Representation

Prompt engineering is central to zero-shot accuracy and interpretability:

  • Template Design: Prompts generally include disease names plus specific, evidence-based visual features, attributes, or radiological signs. Templates emphasize published literature, concise symptom phrases, or the context of imaging modalities (Liu et al., 2023, Yang et al., 22 Feb 2025).
  • LLM Augmentation: ChatGPT or domain-tuned LLMs generate attribute descriptions, symptom lists, or full-sentence disease narratives, intended to mirror clinical diagnostic logic (Liu et al., 2023, Wang et al., 13 Jun 2024). Prompt banks may be curated per class (category-level) or per instance (instance-level).
  • Pairwise or Multi-label Expansion: Datasets can be configured for multi-label GZSL (predicting both seen and unseen findings per image), with attribute-driven prompt sets spanning hundreds of rare diseases (Wang et al., 13 Jun 2024, Hayat et al., 2021).
  • Concept Bottleneck and Interpretability: Extracted concept banks, validated by clinicians and produced via GPT models, enable scoring of rare diseases by aggregating probability across meaningful findings—providing traceable, interpretable rationales for each prediction (Mehta et al., 4 Mar 2025).

3. Training Protocols and Loss Functions

Zero-shot frameworks employ training routines that align image and text spaces or model normal–anomaly separation:

  • Contrastive Alignment (InfoNCE, NT-Xent): For every image–prompt pair, embeddings are projected into a joint latent space. Contrastive loss maximizes similarity for matched pairs and minimizes it for unmatched ones. Options include symmetric (bi-directional) losses and temperature scaling (Shaukat et al., 2 Jul 2024, Mishra et al., 2023).
  • Prefix-Tuning and Adapter Models: Only prompt prefixes, lightweight adapters, or domain-specific projection layers are tuned, minimizing overfitting and facilitating modular transfer without touching the backbone (Shaukat et al., 2 Jul 2024).
  • Loss Composition: Combined objectives typically sum segmentation loss (BCE, Dice), alignment, semantic consistency, and/or classification losses with hyperparameter control (su et al., 26 Feb 2025, Sar et al., 24 Sep 2025).
  • Meta-Learning (MAML, LoRA): Models are trained across meta-tasks (languages, disease variants), with inner- and outer-loop updates to guarantee rapid adaptation at inference (Sar et al., 24 Sep 2025).

4. Inference Methods and Decision Rules

Inference in zero-shot disease detection is fully automatic, with no manual bounding boxes or click annotations required:

  • Similarity Scoring: For each test image, cosine similarity is computed between image and disease embeddings (from prompts or attribute vectors). Decisions are made by ranking, thresholding, or majority voting among k-nearest gallery prototypes (Shaukat et al., 2 Jul 2024, Hayat et al., 2021).
  • Multi-label Handling: Both seen and unseen classes can be assigned per test sample, with either ranking or threshold-based selection, as in generalized multi-label ZSL (Hayat et al., 2021).
  • Anomaly Detection: Unsupervised approaches score deviation from learned healthy embedding banks. Samples exceeding a decision threshold are flagged pathological (Chin et al., 23 May 2025, Saha et al., 10 Mar 2025).
  • Clinical Phenotyping Pipelines: In text-based EHR phenotyping, retrieval-augmented LLMs process relevant snippets and aggregate snippet-level predictions into patient-level diagnosis using MapReduce or max-vote strategies (Thompson et al., 2023).

5. Performance, Benchmarks, and Limitations

Zero-shot frameworks achieve notable—though not always clinically adequate—accuracy:

  • Imaging Benchmarks: Lung-CADex reached sensitivity of 0.86, mean IoU ≈ 0.72, and Dice ≈ 0.80 for nodule segmentation, within 5% of specialized models (Shaukat et al., 2 Jul 2024). Zero-shot nodule classification attained AUC 0.69 and accuracy 0.71 on LIDC, AUC 0.656, and accuracy 0.706 on LUNGx.
  • Attribute and Rare Disease Detection: RetiZero achieved Top-5 zero-shot accuracy of 0.843 (15 diseases) and 0.756 (52 diseases); performance on rare diseases matched or exceeded clinical experts in Top-3/Top-5 recall (Wang et al., 13 Jun 2024).
  • Textual Cross-Lingual Models: SwasthLLM reached zero-shot accuracy of 92.78% (Hindi) and 73.33% (Bengali), surpassing mBERT, IndicBERT, and XLM-R baselines by up to 10 points in F1 (Sar et al., 24 Sep 2025).
  • Unsupervised Detection: PathoSCOPE delivered image-level AUROC 89.19%, pixel-level AUROC 97.87%, and DICE 49.21% for tumor detection with only two healthy shots (Chin et al., 23 May 2025).
  • Large Benchmarks: In CXR-LT 2024, zero-shot mAP across five unseen CXR findings was 0.129 for the top team, with mean AUROC and F1 lagging behind performance on known labels (Lin et al., 9 Jun 2025). XDT-CXR cross-disease transfer accuracy reached up to 80.12% on COVID after training on pediatric pneumonia (Rahman et al., 21 Aug 2024).
  • Limitations: All zero-shot architectures report a performance gap to fully supervised or fine-tuned models in real-world settings, especially for subtle or ambiguous findings, low-resolution images, or domain-shifted cohorts (Roumeliotis et al., 29 Apr 2025, Marzullo et al., 14 Nov 2024).

6. Applications and Clinical Impact

Zero-shot disease detection is applicable in diverse settings:

7. Future Directions and Research Opportunities

Research continues to address critical limitations and extend zero-shot capabilities:

  • Improved Prompt Engineering and Knowledge Extraction: Dynamic, disease-specific prompt banks, automated extraction from literature, and structured knowledge graphs hold promise for scaling to hundreds of diseases (Yang et al., 22 Feb 2025, Liu et al., 2023).
  • Hybrid and Ensemble Models: Combining general and domain-tuned text encoders, feature synthesis via GANs/VAEs, and integrating structured lab data or ontologies may balance performance across common and rare classes (Mishra et al., 2023, Wang et al., 13 Jun 2024).
  • 3D and Multi-Slice Extensions: Native 3D modeling and multi-view aggregation merit further study for modalities such as MRI, CT, and fetal ultrasound (Marzullo et al., 14 Nov 2024, Saha et al., 10 Mar 2025).
  • Few-Shot and Continual Learning Hybrids: Sophisticated adaptation from small numbers of examples, coupled with continual domain updating, aims to close the zero-shot vs. supervised gap (Roumeliotis et al., 29 Apr 2025, Chin et al., 23 May 2025).
  • Algorithmic Fairness and Robustness: Performance under domain shift, noisy labels, label imbalance, and privacy constraints remain active research areas, with federated and privacy-preserving training paradigms (Saha et al., 10 Mar 2025, Sar et al., 24 Sep 2025).
  • Explainable Decision-Making: Concept bottleneck architectures, attention visualizations, and interpretable aggregation mechanisms are essential for clinical trust and regulatory acceptance (Mehta et al., 4 Mar 2025).

Zero-shot disease detection represents a versatile, generalizable frontier in computational diagnostics, promising scalable solutions to annotation scarcity, emerging disease response, and multimodal integration—subject to sustained innovation in prompt engineering, cross-modal alignment, explicit knowledge representation, and robust benchmarking.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Zero-Shot Disease Detection.