Effect of embedding and prompt conditioning choices on Vendi Score-based autoevaluation
Investigate whether selecting different image and multimodal embedding models and conditioning prompts for computing Vendi Score diversity scores leads to improved agreement with human-annotated rankings of attribute-specific diversity across text-to-image models, and determine which specific embeddings and conditioning prompts yield better performance than those evaluated (Inception, ViT, DINOv2, CLIP, and PALI variants with attribute/object conditioning).
Sponsor
References
It is possible that better choices of models and conditioning prompts can lead to better results, but we leave this question open for future investigation.
— Benchmarking Diversity in Image Generation via Attribute-Conditional Human Evaluation
(2511.10547 - Albuquerque et al., 13 Nov 2025) in Section 3, Subsection “Ranking models with autoevaluation approaches” (label: sec:autoeval-ranking)