- The paper proposes CaDDi, a model that combines non-Markovian discrete diffusion with causal language models for expressive and controllable sequence generation.
- It utilizes a hybrid diffusion kernel and 2D rotary positional encoding to efficiently capture dependencies and mitigate error accumulation.
- Empirical results demonstrate superior performance in both biological and text generation tasks, achieving higher pLDDT scores and improved coherence.
Non-Markovian Discrete Diffusion with Causal LLMs
The paper introduces CaDDi, a causal discrete diffusion model that integrates non-Markovian processes with causal LLMs, advancing the field of sequence generation. The authors propose an innovative approach that combines the advantages of discrete diffusion models with traditional causal LLMs to enhance expressiveness and control over generated sequences.
Background and Motivation
Autoregressive transformers have long been the gold standard in sequence modeling, achieving impressive results across various domains, from NLP to biological sequence prediction. However, their left-to-right decoding can limit flexibility, especially in tasks requiring bidirectional or partially specified generation, like text infilling. Discrete diffusion models, on the other hand, offer robust structures for such scenarios, but often fall short in comparison to autoregressive methods in terms of generation quality.
This paper aims to merge these two paradigms, retaining the strengths of both. The proposed model, CaDDi, operates within a non-Markovian diffusion framework, allowing for more expressive and controllable generation by leveraging information from the entire generative trajectory.
Methodology
CaDDi is built upon extending the non-Markovian diffusion process to discrete domains, allowing each denoising step to utilize information from previous states. This approach mitigates error accumulation, aligning the backward process naturally with causal LLMing.
The authors design a non-Markovian forward trajectory by independently corrupting the original data at each timestep, rather than relying on a single latent state. This introduces a more informative sequence of noisy states, which better retains intermediate information across timesteps. Additionally, a hybrid diffusion kernel is used, mixing absorbing and uniform kernels to enhance the diversity and utility of the noise trajectory.
The CaDDi model unifies sequential and temporal modeling by constructing training trajectories that account for both token positions and diffusion timesteps. By incorporating a 2D rotary positional encoding, the model efficiently captures dependencies across these dimensions, maintaining backward compatibility with traditional causal LLM architectures.
Empirical Evaluation
The empirical results demonstrate CaDDi's superiority over existing discrete diffusion models in generating high-quality sequences. In biological sequence generation tasks, CaDDi achieves higher pLDDT scores, indicating better structural feasibility, and outperforms in TM-score, RMSD, and H-prob, showing strong homology to known protein structures.
In text generation, CaDDi surpasses other models in guided generative perplexity across multiple LLMs, indicating improved coherence in generated text. Additionally, CaDDi achieves comparable self-BLEU scores, reflecting the diversity of its outputs.
Implications and Future Directions
CaDDi represents a significant step forward in the integration of diffusion processes with causal LLMs. By enabling a non-Markovian approach to discrete diffusion, the model achieves robust and flexible sequence generation, applicable to both language and biological data. The ability to adapt pretrained LLMs for discrete diffusion without architectural changes further underscores CaDDi's versatility and potential for broad adoption.
This paper opens several avenues for future research. Further exploration into optimizing semi-speculative decoding techniques could enhance inference efficiency, and expanding the framework to other modalities or incorporating additional conditioning signals may yield even richer generative capabilities. The integration of such frameworks into real-world applications, especially in fields like protein engineering and complex real-time systems, presents promising opportunities for advancement.