Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 52 tok/s

Gemini 2.5 Pro 47 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 13 tok/s Pro

GPT-4o 100 tok/s Pro

Kimi K2 192 tok/s Pro

GPT OSS 120B 454 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

SafeCFG: Controlling Harmful Features with Dynamic Safe Guidance for Safe Generation (2412.16039v2)

Published 20 Dec 2024 in cs.CV

Abstract: Diffusion models (DMs) have demonstrated exceptional performance in text-to-image tasks, leading to their widespread use. With the introduction of classifier-free guidance (CFG), the quality of images generated by DMs is significantly improved. However, one can use DMs to generate more harmful images by maliciously guiding the image generation process through CFG. Existing safe alignment methods aim to mitigate the risk of generating harmful images but often reduce the quality of clean image generation. To address this issue, we propose SafeCFG to adaptively control harmful features with dynamic safe guidance by modulating the CFG generation process. It dynamically guides the CFG generation process based on the harmfulness of the prompts, inducing significant deviations only in harmful CFG generations, achieving high quality and safety generation. SafeCFG can simultaneously modulate different harmful CFG generation processes, so it could eliminate harmful elements while preserving high-quality generation. Additionally, SafeCFG provides the ability to detect image harmfulness, allowing unsupervised safe alignment on DMs without pre-defined clean or harmful labels. Experimental results show that images generated by SafeCFG achieve both high quality and safety, and safe DMs trained in our unsupervised manner also exhibit good safety performance.