Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 59 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 40 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 104 tok/s Pro

Kimi K2 195 tok/s Pro

GPT OSS 120B 467 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records (2505.09435v1)

Published 14 May 2025 in cs.CV and cs.AI

Abstract: Pre-training on image-text colonoscopy records offers substantial potential for improving endoscopic image analysis, but faces challenges including non-informative background images, complex medical terminology, and ambiguous multi-lesion descriptions. We introduce Endo-CLIP, a novel self-supervised framework that enhances Contrastive Language-Image Pre-training (CLIP) for this domain. Endo-CLIP's three-stage framework--cleansing, attunement, and unification--addresses these challenges by (1) removing background frames, (2) leveraging LLMs to extract clinical attributes for fine-grained contrastive learning, and (3) employing patient-level cross-attention to resolve multi-polyp ambiguities. Extensive experiments demonstrate that Endo-CLIP significantly outperforms state-of-the-art pre-training methods in zero-shot and few-shot polyp detection and classification, paving the way for more accurate and clinically relevant endoscopic analysis.