Implicitly guiding LVLMs to accept harmful intent
Determine effective techniques to implicitly guide Large Vision-Language Models (LVLMs) to accept the premise of answering harmful intent-related questions, thereby eliciting an initial response aligned with harmful intent without overt refusal.
Sponsor
References
However, the question of how to implicitly guide LVLMs to accept the premise of answering harmful intent-related questions remains unresolved.
— Safe + Safe = Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
(2411.11496 - Cui et al., 18 Nov 2024) in Section 3 (Our Approach: Safety Snowball Agent, SSA)