Planner and Executor: Collaboration between Discrete Diffusion And Autoregressive Models in Reasoning (2510.15244v2)

Published 17 Oct 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Current autoregressive LLMs (ARMs) achieve high accuracy but require long token sequences, making them costly. Discrete diffusion LLMs (DDLMs) enable parallel and flexible generation within a fixed number of steps and have recently emerged for their strong performance in complex reasoning and long-term planning tasks. We present a study exploring hybrid architectures that couple DDLMs with ARMs to assess whether their collaboration can yield complementary benefits. We first examine collaboration in text space, where one model plans the reasoning process and another executes the final answer based on that plan. We then extend this setup to latent-space communication, introducing a learned projector that maps DDLM latents into the ARM's embedding space, potentially bypassing some of the text-generation limitations of diffusion models. We find that shifting DDLM --> ARM communication from text space to latent space yields significant accuracy gains, for example increasing from 27.0% to 54.0% on DART-5 and from 0.0% to 14.0% on AIME24. We also find that combining a DDLM planner with an ARM executor can provide substantial computational savings with little to no impact on accuracy. For example, the latent-space pipeline, using 64 tokens for planning and roughly 5 for execution, surpasses Qwen3.1-7B on DART-5 and AIME, despite Qwen using 44 times more tokens. Overall, our study offers new insights into reasoning with DDLMs and highlights their potential in hybrid architectures.

Summary

The paper presents a novel planner-executor framework that integrates DDLMs for planning with ARMs for execution.
It demonstrates that latent-space collaboration significantly boosts accuracy and reduces token usage on reasoning benchmarks.
Diagnostic analyses isolate planner and executor errors, guiding future improvements in hybrid AI architectures.

Summary of "Planner and Executor: Collaboration between Discrete Diffusion and Autoregressive Models in Reasoning"

The paper "Planner and Executor: Collaboration between Discrete Diffusion and Autoregressive Models in Reasoning" explores the potential of coupling Discrete Diffusion LLMs (DDLMs) with Autoregressive Models (ARMs) to enhance reasoning tasks. The investigation centers on how these models can complement each other's weaknesses through a planner-executor framework, and compares text-space and latent-space collaboration channels.

Introduction to Hybrid Architectures

Recent advancements in reasoning tasks have been largely dominated by autoregressive models (ARMs) due to their proficiency in generating coherent and human-readable outputs. However, these models are computationally expensive due to their reliance on long token sequences. Discrete diffusion LLMs (DDLMs), with their ability to generate outputs in parallel using fixed steps, present a compelling alternative, especially for complex reasoning and planning tasks. This paper proposes integrating these two paradigms to leverage DDLMs' planning capabilities while utilizing ARMs' execution strength.

Two primary modes of collaboration are analyzed: text-space collaboration, where the planner generates an explicit textual plan for the executor, and latent-space collaboration, which employs a learned projection to communicate plans in a latent form. Initial findings reveal that latent-space collaboration significantly improves accuracy while reducing token consumption compared to text-space efforts and standalone models.

Methodology

Planner-Executor Framework

In this framework, the planner (often a DDLM) is responsible for structuring intermediary reasoning steps, while the executor (an ARM) produces the final answer. The interaction between the planner and executor can occur via:

Text-space collaboration: The planner's output is a textual plan that conditions the executor. This maintains interpretability but depends heavily on fluency.
Latent-space collaboration: A projector layer maps DDLM states directly into the ARM's embedding space, aiming to maintain the integrity of reasoning signals without fluency constraints.
Figure 1: Overview of the latent-space collaboration pipeline. A discrete diffusion LLM (DDLM) generates a structured plan from noisy latents. The plan is projected directly into the autoregressive model (ARM) embedding space through a learned projection layer (latent space). The ARM then conditions on the plan and the question to produce the final answer.

Experimental Setup

Models and Benchmarks

The paper employs two diffusion models, LLada-8B-Instruct and Dream-v0-Instruct-7B, and compares their collaboration with ARMs like Qwen2.5-7B-Instruct and Llama-3.1-8B-Instruct. The models are evaluated across reasoning benchmarks including ARC, MMLU, AIME 2024, and DART.

Diagnostic Analysis

The diagnostic component assesses the failure modes in planner-executor interactions. Setup X and Setup Y identify whether failures originate from planning errors or execution limitations, respectively (Figure 2). This analysis aids in understanding the comparative effectiveness of text-space versus latent-space collaboration, with latent setups showing reduced planner-related errors.

Figure 2: Diagnostic configurations for attributing errors to planner or executor. Setup X tests whether failures stem from the planner: if replacing the diffusion planner (DDLM) with an autoregressive planner (ARM) fixes the output, the error is attributed to the DDLM. Setup Y tests executor reliability: if a diffusion executor succeeds where an ARM executor fails, the limitation lies in the executor.

Results

Text-Space vs Latent-Space Collaboration

The paper reveals substantial performance gains with latent-space collaboration. On benchmarks like DART and AIME, latent-space interaction achieves higher accuracy while using substantially fewer tokens, highlighting both the computational efficiency and effectiveness of latent exchange.

Figure 3: Benchmark comparison of text-space vs.\ latent-space collaboration. Accuracy of isolated models (LLaMA-3.2-3B ARM, LLaDA-8B DDLM) and collaborative configurations. In the latent setting, DDLM (64-token planner) combined with the ARM executor consistently outperforms text-space collaboration on DART and AIME, while maintaining comparable performance on ARC and MMLU.

Latent-space setups surpass even strong reasoning models like Qwen3 in a significantly more token-efficient manner. Specific configurations using 64 planner tokens provide an optimal balance of accuracy and economy, avoiding redundant reasoning steps observed with longer plans.

Discussion

The paper establishes discrete diffusion and autoregressive models as complementary agents in reasoning tasks. The successful implementation of latent-space collaboration shows promise for more efficient and accurate AI architectures. However, the trade-off between interpretability and latent efficiency presents an avenue for future exploration, particularly in enhancing interpretability without sacrificing performance. Furthermore, aligning planner and executor representations through joint training offers another potential advancement path.

Conclusion

Positioning DDLMs as planners and ARMs as executors, with latent-space data exchange, offers an efficient framework for addressing complex reasoning tasks. This paper's insights lay a groundwork for future research in hybrid architectures, balancing computational savings with robust performance, thus expanding the horizons of reasoning in large models.