Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Path Planning for Masked Diffusion Model Sampling (2502.03540v4)

Published 5 Feb 2025 in cs.LG and cs.AI

Abstract: Any order generation of discrete data using masked diffusion models (MDMs) offers a compelling alternative to traditional autoregressive models, especially in domains that lack a natural causal ordering of data. However, current popular MDMs depart from their successful continuous diffusion model counterparts with simplified masked inference wherein unmasked tokens cannot be iteratively refined -- even if there is a mistake. In this paper, we extract the full power of MDMs by introducing a novel inference sampling strategy termed Path Planning (P2) that decomposes each generation step into two sub-stages: planning and denoising. Under P2, the planner at every step selects appropriate tokens that are marked to be updated, which can then be sampled using the denoiser. We demonstrate that P2 generalizes all existing sampling strategies for MDMs and critically enhances generative quality through the new capability of refining and updating existing unmasked tokens. We theoretically prove that P2 establishes a (new) expanded evidence lower bound (ELBO) on the log marginal likelihood of data. We instantiate P2 with a family of planners including: 1.) Self-Planning, 2.) BERT-Planning, and 3.) Trained-Planning with a learned planner leading to SOTA generative performance for MDMs on a suite of domains. Specifically, solely using P2 inference, we observe relative improvements of 22% in protein sequence foldability, 8% in RNA sequence pLDDT, 4% in math reasoning, 68% in story generation (ROUGE score), and 33% in code generation for the challenging pass@1 metric.

Summary

  • The paper introduces Path Planning (P2), a framework that uses an expanded ELBO to strategically plan token unmasking order in Masked Diffusion Models for improved generative quality.
  • Experiments show Path Planning (P2) improves protein and language generation tasks, outperforming existing methods and competing with larger autoregressive models.
  • The research challenges the uniform unmasking assumption, showing planning is essential for practical MDMs and offering a more resource-effective alternative to large autoregressive models.

Path Planning for Masked Diffusion Model Sampling

The paper "Path Planning for Masked Diffusion Model Sampling" investigates the impact of token unmasking order in masked diffusion models (MDMs) on generative quality. Through the introduction of an expanded evidence lower bound (ELBO), the authors propose a planner responsible for the token unmasking sequence during MDM inference. This expanded ELBO forms the foundation for Path Planning (P2), a novel sampling framework that aims to enhance generative performance by strategically selecting unmasking orders. P2 is designed to integrate pre-trained models like BERT or the denoiser itself to guide unmasking decisions across a wide range of domains including language generation and biological sequence generation.

The authors begin by revisiting the traditional framework of MDMs, highlighting recent advancements in discrete diffusion models that utilize absorbing state diffusion and simplified training objectives. While traditional approaches predominantly focus on optimizing training algorithms, this paper shifts the spotlight to inference processes, particularly the sequence in which tokens are revealed during the reverse process. The authors argue that this sequence, typically considered to be uniformly random, is suboptimal, especially with imperfect denoisers—a quality inherently embedded in any trained MDM due to estimation errors in real-world data distributions.

The core contribution of the paper lies in the theoretical expansion of the ELBO, making it inclusive of two new terms involving the planner. This planner works as a mechanism to select optimal tokens for unmasking or remasking during inference. Notably, the paper establishes that for an optimal denoiser, a uniform unmasking order is satisfactory. However, for practical implementations with non-perfect denoisers, leveraging a planner to dictate a non-uniform strategy can yield superior generative outcomes. Grounded with a theoretical framework, the proposed Path Planning (P2) constructs a planner from either pre-trained models like BERT or the denoiser itself and validates its efficacy across diverse experiments.

Experimental results provide compelling evidence for the effectiveness of P2. In protein sequence generation, P2 paired with a 650M parameter DPLM outperformed existing state-of-the-art models on several folding metrics, thus demonstrating P2’s competency in maintaining high structural quality while achieving high token diversity. This paper also presented improvements in language generation tasks, with P2-optimized MDMs surpassing performance of much larger autoregressive models in math reasoning tasks, achieving notable boosts in accuracy for complex tasks like story infilling and code generation when compared to prevailing benchmarks.

The implications of this research are significant, both theoretically and practically. On the theoretical front, it challenges the prior assumption regarding uniform unmasking as an optimal strategy and emphasizes the necessity of planning in diffusion-based generative models. Practically, by demonstrating the ability of MDMs, equipped with P2, to compete with or surpass larger autoregressive models, it proposes a more resource-effective approach to tasks traditionally dominated by AR models.

The paper also speculates on potential future developments in AI. It highlights the role for even lighter planners and encourages further exploration in training specialized planners, or holistic planner-denoiser models, for broader generative tasks, cementing P2 as a principled direction for advancing MDM generative quality.

Overall, the paper sets a significant stride in the understanding of inference mechanisms within MDMs, introducing a planner not merely as an accessory but a fundamental element to guide generative pathways efficiently, promising a new frontier for discrete diffusion models in sequence generation.