Papers
Topics
Authors
Recent
Search
2000 character limit reached

A General Framework for Inference-time Scaling and Steering of Diffusion Models

Published 12 Jan 2025 in cs.LG, cs.CL, and cs.CV | (2501.06848v3)

Abstract: Diffusion models produce impressive results in modalities ranging from images and video to protein design and text. However, generating samples with user-specified properties remains a challenge. Recent research proposes fine-tuning models to maximize rewards that capture desired properties, but these methods require expensive training and are prone to mode collapse. In this work, we propose Feynman Kac (FK) steering, an inference-time framework for steering diffusion models with reward functions. FK steering works by sampling a system of multiple interacting diffusion processes, called particles, and resampling particles at intermediate steps based on scores computed using functions called potentials. Potentials are defined using rewards for intermediate states and are selected such that a high value indicates that the particle will yield a high-reward sample. We explore various choices of potentials, intermediate rewards, and samplers. We evaluate FK steering on text-to-image and text diffusion models. For steering text-to-image models with a human preference reward, we find that FK steering a 0.8B parameter model outperforms a 2.6B parameter fine-tuned model on prompt fidelity, with faster sampling and no training. For steering text diffusion models with rewards for text quality and specific text attributes, we find that FK steering generates lower perplexity, more linguistically acceptable outputs and enables gradient-free control of attributes like toxicity. Our results demonstrate that inference-time scaling and steering of diffusion models, even with off-the-shelf rewards, can provide significant sample quality gains and controllability benefits. Code is available at https://github.com/zacharyhorvitz/Fk-Diffusion-Steering .

Summary

  • The paper introduces the Feynman-Kac (FK) diffusion steering framework, a general inference-time method to enhance diffusion model controllability and sample quality without requiring fine-tuning.
  • This framework guides the diffusion process by sampling multiple particle trajectories, scoring them using potentials derived from intermediate rewards, and resampling to favor desired outcomes.
  • Empirical results demonstrate FK steering's effectiveness, enabling smaller models to outperform larger fine-tuned ones in prompt fidelity and quality, and facilitating control with non-differentiable rewards.

The paper introduces Feynman-Kac (FK) diffusion steering, a novel inference-time framework designed to enhance the controllability and sample quality of diffusion models across various modalities. The core challenge addressed is generating samples with specific user-defined properties without incurring the high computational costs and inflexibility associated with fine-tuning.

The FK steering framework leverages the concept of interacting particle systems to approximate sampling from a tilted distribution, effectively guiding the diffusion process towards desired outcomes by incorporating reward functions. The method involves:

  • Sampling multiple diffusion processes, referred to as particles.
  • Scoring these particles at intermediate steps using potential functions derived from intermediate rewards.
  • Resampling particles based on their potential scores to amplify promising trajectories and eliminate less favorable ones.

A key aspect of FK steering lies in the strategic selection of potential functions, intermediate rewards, and samplers to optimize performance for specific tasks. The framework accommodates both continuous and discrete state-space models and is compatible with generic reward functions, whether differentiable or not.

The authors explore several instantiations of intermediate rewards and potentials, demonstrating empirically that these choices can significantly impact performance. These include:

  • Difference Potential: Gt(xt,xt+1,c)=exp(λ(rϕ(xt,c)rϕ(xt+1,c)))G_t(x_t, x_{t+1}, c) = \exp(\lambda(r_\phi(x_t, c) - r_\phi(x_{t+1}, c))), where rϕr_\phi represents intermediate rewards.
  • Max Potential: Gt(xT,,xt,c)=exp(λmaxs=tTrϕ(xs,c))G_t(x_T, \dots, x_t, c) = \exp( \lambda \max_{s =t}^{T} r_\phi(x_s, c)).
  • Sum Potential: Gt(xT,,xt)=exp(λs=tTrϕ(xs,c))G_t(x_T, \dots, x_t) = \exp( \lambda \sum_{s =t}^{T} r_\phi(x_s, c)).

The paper highlights that existing techniques like Twisted Diffusion Sampling (TDS) and Soft Value-Based Decoding in Diffusion Models (SVDD) can be viewed as specific instances of FK interacting particle systems, further emphasizing the generality of the proposed framework.

The authors conduct experiments in text-to-image and text diffusion models to validate the efficacy of FK steering. A notable finding is that a smaller (0.8B parameter) model steered with FK steering outperforms a larger (2.6B parameter) fine-tuned model in terms of prompt fidelity, while requiring less computation. Additionally, FK steering demonstrates superior performance in controlling text attributes like toxicity, achieving higher toxicity rates compared to gradient-based guidance and best-of-n sampling.

Key empirical results from the paper include:

  • In text-to-image generation, FK steering with just k=4k=4 particles outperforms fine-tuning in terms of prompt fidelity and aesthetic quality, as measured by the GenEval benchmark and human preference scores.
  • FK steering enables smaller models (0.8B parameters) to surpass larger models (2.6B parameters) on prompt fidelity, using fewer floating point operations (FLOPs).
  • In text diffusion models, FK steering generates lower perplexity, more linguistically acceptable outputs and facilitates gradient-free control of attributes like toxicity.
  • On toxicity, without gradient guidance, FK steering can increase the toxicity rate of a text diffusion model from 0.3%0.3\% to 69.7%69.7\% with k=8k=8 particles.

The methodology introduces the concept of interval resampling, where potentials GtG_t are selected such that resampling occurs only at a few steps, encouraging exploration and reducing computational demands. The paper also discusses various choices for intermediate rewards rϕ(xt,c)r_\phi(x_t, c), including rewards at expected x0x_0, many-sample rϕr_\phi, and learned rϕr_\phi. The learned rϕr_\phi are trained using the objective $\argmin_{\phi} _{t \sim U[0,T]} _{(x_0 c) q(x_t x_0)} {a_\phi(x_t, c) - \exp(r(x_0, c)}_2^2$.

The framework is applicable to both discrete-time and continuous-time diffusion models, with the latter involving the use of numerical methods like Euler-Maruyama for sampling.

The paper positions FK steering within the context of existing literature on controllable generation, contrasting it with fine-tuning approaches and inference-time steering methods like universal guidance. A key advantage of FK steering is its ability to handle non-differentiable rewards and discrete state spaces, overcoming limitations of gradient-based techniques.

The authors acknowledge that FK steering relies on the availability of strong reward functions, underscoring the importance of continued research in automated evaluation and reward modeling. They also note the potential for varying the number of particles dynamically during inference to optimize performance in applications like protein design.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 8 tweets with 132 likes about this paper.