Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Non-Markovian Discrete Diffusion with Causal Language Models (2502.09767v2)

Published 13 Feb 2025 in cs.LG, cs.AI, and cs.CL

Abstract: Discrete diffusion models offer a flexible, controllable approach to structured sequence generation, yet they still lag behind causal LLMs in expressive power. A key limitation lies in their reliance on the Markovian assumption, which restricts each step to condition only on the current state, leading to potential uncorrectable error accumulation. In this paper, we introduce CaDDi, a discrete diffusion model that conditions on the entire generative trajectory, thereby lifting the Markov constraint and allowing the model to revisit and improve past states. By unifying sequential (causal) and temporal (diffusion) reasoning in a single non-Markovian transformer, CaDDi also treats standard causal LLMs as a special case and permits the direct reuse of pretrained LLM weights with no architectural changes. Empirically, CaDDi outperforms state-of-the-art discrete diffusion baselines on natural-language benchmarks, substantially narrowing the remaining gap to large autoregressive transformers.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Yangtian Zhang (8 papers)
  2. Sizhuang He (5 papers)
  3. Daniel Levine (6 papers)
  4. Lawrence Zhao (2 papers)
  5. David Zhang (83 papers)
  6. Syed A Rizvi (2 papers)
  7. Emanuele Zappala (38 papers)
  8. Rex Ying (90 papers)
  9. David van Dijk (22 papers)

Summary

  • The paper proposes CaDDi, a model that combines non-Markovian discrete diffusion with causal language models for expressive and controllable sequence generation.
  • It utilizes a hybrid diffusion kernel and 2D rotary positional encoding to efficiently capture dependencies and mitigate error accumulation.
  • Empirical results demonstrate superior performance in both biological and text generation tasks, achieving higher pLDDT scores and improved coherence.

Non-Markovian Discrete Diffusion with Causal LLMs

The paper introduces CaDDi, a causal discrete diffusion model that integrates non-Markovian processes with causal LLMs, advancing the field of sequence generation. The authors propose an innovative approach that combines the advantages of discrete diffusion models with traditional causal LLMs to enhance expressiveness and control over generated sequences.

Background and Motivation

Autoregressive transformers have long been the gold standard in sequence modeling, achieving impressive results across various domains, from NLP to biological sequence prediction. However, their left-to-right decoding can limit flexibility, especially in tasks requiring bidirectional or partially specified generation, like text infilling. Discrete diffusion models, on the other hand, offer robust structures for such scenarios, but often fall short in comparison to autoregressive methods in terms of generation quality.

This paper aims to merge these two paradigms, retaining the strengths of both. The proposed model, CaDDi, operates within a non-Markovian diffusion framework, allowing for more expressive and controllable generation by leveraging information from the entire generative trajectory.

Methodology

CaDDi is built upon extending the non-Markovian diffusion process to discrete domains, allowing each denoising step to utilize information from previous states. This approach mitigates error accumulation, aligning the backward process naturally with causal LLMing.

The authors design a non-Markovian forward trajectory by independently corrupting the original data at each timestep, rather than relying on a single latent state. This introduces a more informative sequence of noisy states, which better retains intermediate information across timesteps. Additionally, a hybrid diffusion kernel is used, mixing absorbing and uniform kernels to enhance the diversity and utility of the noise trajectory.

The CaDDi model unifies sequential and temporal modeling by constructing training trajectories that account for both token positions and diffusion timesteps. By incorporating a 2D rotary positional encoding, the model efficiently captures dependencies across these dimensions, maintaining backward compatibility with traditional causal LLM architectures.

Empirical Evaluation

The empirical results demonstrate CaDDi's superiority over existing discrete diffusion models in generating high-quality sequences. In biological sequence generation tasks, CaDDi achieves higher pLDDT scores, indicating better structural feasibility, and outperforms in TM-score, RMSD, and H-prob, showing strong homology to known protein structures.

In text generation, CaDDi surpasses other models in guided generative perplexity across multiple LLMs, indicating improved coherence in generated text. Additionally, CaDDi achieves comparable self-BLEU scores, reflecting the diversity of its outputs.

Implications and Future Directions

CaDDi represents a significant step forward in the integration of diffusion processes with causal LLMs. By enabling a non-Markovian approach to discrete diffusion, the model achieves robust and flexible sequence generation, applicable to both language and biological data. The ability to adapt pretrained LLMs for discrete diffusion without architectural changes further underscores CaDDi's versatility and potential for broad adoption.

This paper opens several avenues for future research. Further exploration into optimizing semi-speculative decoding techniques could enhance inference efficiency, and expanding the framework to other modalities or incorporating additional conditioning signals may yield even richer generative capabilities. The integration of such frameworks into real-world applications, especially in fields like protein engineering and complex real-time systems, presents promising opportunities for advancement.