Do anatomical priors in vision-language models cause systematic errors on atypical anatomy?

Determine whether anatomical priors learned by vision-language models from typical anatomy cause systematic errors when these models encounter atypical anatomical presentations in medical imaging.

Background

The paper examines the reliability of vision-LLMs (VLMs) in medical settings, emphasizing that training data are heavily skewed toward typical anatomy (e.g., five fingers, two separate kidneys, standard organ positioning). This creates strong statistical priors that may override visual evidence when models are faced with rare anatomical variants.

Existing evaluations rarely compare performance on typical versus atypical anatomy, leaving a gap in understanding whether learned anatomical priors lead to systematic errors on rare presentations. The authors explicitly identify this uncertainty and design AdversarialAnatomyBench to test it.

References

It remains unknown whether anatomical priors learned from typical anatomy cause systematic errors when models encounter atypical presentations.

6 Fingers, 1 Kidney: Natural Adversarial Medical Images Reveal Critical Weaknesses of Vision-Language Models  (2512.04238 - Mayer et al., 3 Dec 2025) in Section 1 (Introduction)