Automatic dense-correspondence-to-SAM prompt conversion without fine-tuning

Develop a training-free method that automatically converts dense image correspondences (for example, patch-level matches produced by self-supervised Vision Transformers) into reliable foreground and background point prompts for the Segment Anything Model (SAM), ensuring that the conversion requires no model fine-tuning.

Background

Segment Anything (SAM) and SAM2 are powerful foundation models for segmentation but their performance is highly sensitive to prompt type and placement, which limits full automation without human clicks or an external prompt supplier. Retrieval-augmented paradigms leverage exemplar memories and dense correspondences from self-supervised ViTs to aid few-shot segmentation, suggesting a path toward automated prompting.

The paper highlights a specific gap: while dense correspondences can be obtained, turning them into effective foreground/background point prompts for SAM without any model fine-tuning is not yet solved. This open problem motivates the proposed Memory-SAM approach, which attempts a retrieval-to-prompt pipeline that uses mask-constrained matching and dense negative cues.

References

However, automatically converting dense correspondences into reliable foreground/background point prompts for SAM while requiring no model fine-tuning—remains open.

— Memory-SAM: Human-Prompt-Free Tongue Segmentation via Retrieval-to-Prompt (2510.15849 - Chae et al., 17 Oct 2025) in Section 1 (Introduction)

Automatic dense-correspondence-to-SAM prompt conversion without fine-tuning

Sponsor

Background

References

Related Problems