Automating VLM question design for adaptive negative prompting

Develop an automated method for selecting the question formulations posed to the Vision-Language Model within the VLM-Guided Adaptive Negative-Prompting framework, tailored to different semantic categories, to achieve optimal performance in creative image generation while preserving category validity.

Background

The paper proposes VLM-Guided Adaptive Negative-Prompting, an inference-time method that queries a Vision-LLM during diffusion denoising to identify dominant concepts in intermediate outputs and convert them into negative prompts that steer generation away from conventional modes. The effectiveness of this guidance depends on the specific questions asked of the VLM, which determine which visual attributes (e.g., object type, shape, design, material) are detected and subsequently discouraged.

Empirical analyses in the paper show that different question formulations produce distinct creative outcomes and that object-focused questions tend to work better for “new type of” prompts, whereas style- or attribute-focused questions better drive aesthetic novelty. However, selecting these questions currently requires manual design choices, and the authors identify the need to automate this selection across semantic categories to improve robustness and practicality.

References

Third, our approach requires careful question design for optimal performance; different question formulations work better for different semantic categories, and automating this selection remains an open challenge.

— VLM-Guided Adaptive Negative Prompting for Creative Generation (2510.10715 - Golan et al., 12 Oct 2025) in Conclusions

Automating VLM question design for adaptive negative prompting

Sponsor

Background

References

Related Problems