Dynamic-Size Reasoning Blocks

Updated 11 May 2026

Dynamic-size reasoning blocks are adaptive reasoning units that dynamically adjust their number, size, and composition based on input and task demands.
They integrate methods such as integer block prediction, depth-specialized routing, and semantic boundary detection to optimize resource usage and coherence.
Empirical evaluations show enhancements in throughput, interpretability, and fine-grained control over inference in varied modular AI systems.

Dynamic-size reasoning blocks are a class of neural and symbolic reasoning architectures in which the number, size, and composition of intermediate reasoning units—termed "blocks"—are dynamically selected during inference or training based on task or input characteristics. Unlike rigid, fixed-depth computational pipelines or static chain-of-thought frameworks, dynamic-size block approaches enable adaptive allocation of reasoning steps, facilitating efficiency-accuracy trade-offs, coherence preservation, resource-aware inference, and interpretability. This paradigm has been instantiated across autoregressive LLMs, diffusion-based generation, modular Mixture-of-Experts transformers, and compositional message-passing systems.

1. Formal Definitions and Paradigm Variants

A reasoning block is formally defined in various settings as a coherent computational or cognitive unit, typically associated with a single step of intermediate reasoning, a tool invocation, or a semantically distinct sub-process. In autoregressive LLMs, as in "Think in Blocks" (Zhu et al., 21 Aug 2025), a reasoning block comprises a text segment demarcated syntactically (e.g., separated by special tokens), and the overall response is partitioned as:

$\hat{y} = (B_1, B_2, \dots, B_B, \text{answer}),$

where $B_j$ is the $j$ -th reasoning block, and $B$ is an integer, dynamically predicted per instance.

In modular transformer architectures such as DS-MoE ("Dynamic Reasoning Chains through Depth-Specialized Mixture-of-Experts") (Roy et al., 24 Sep 2025), a reasoning block is an ordered sequence:

$C(E_{i_1}, E_{i_2}, \dots, E_{i_k}),$

with $E_{i_t}$ denoting expert modules specialized for different reasoning depths or types, allowing $k$ (the number of active blocks or depth) to vary dynamically per input.

Diffusion-based LLMs (dLLMs), as in "Break the Block" (Jiang et al., 4 May 2026), structure generation into blocks $b_k = \{x_{(k-1)c+1}, \dots, x_{\min(kc,L)}\}$ , but introduce mechanisms to dynamically detect block boundaries at semantic thought units via monotonic entropy descent.

In the Flows framework (Josifoski et al., 2023), the block abstraction is generalized as an actor-like module:

$F = \left(S,\; M_\text{in},\; M_\text{out},\; \delta,\; s_0\right),$

where composition and nesting allow dynamic construction of reasoning pipelines, and the depth/size of block sequences adapt at runtime according to message routing and state updates.

2. Mechanisms for Dynamic Block Control

Approaches to dynamic block sizing differ across model architectures:

Integer Reasoning Budget Prediction (Zhu et al., 21 Aug 2025): The model emits an explicit block count $B$ via a classification head, conditioned on the prompt $B_j$ 0. $B_j$ 1 corresponds to direct response (no intermediate reasoning), while $B_j$ 2 rebuilds the response as $B_j$ 3 block segments, each demarcated by special tokens.
Depth-Specialized Routing (Roy et al., 24 Sep 2025): A router computes gates $B_j$ 4 for each expert block as functions of input features, selecting the optimal subset (top- $B_j$ 5) of experts for the input. Meta-cognitive modules can terminate computation early when sufficient confidence is reached, supporting both dynamic depth and early exit.
Semantic Block Boundary Learning (Jiang et al., 4 May 2026): A learned "block-end" token $B_j$ 6 demarcates semantic steps within block-wise generation; the model adapts the number and length of blocks by inserting $B_j$ 7s dynamically during diffusion.
Recursive Message-Passing Flows (Josifoski et al., 2023): Each Flow can instantiate arbitrary numbers of sub-Flows, or iterate until task-specific conditions are met (e.g., until tests pass), producing variable-length, dynamically structured reasoning chains.

Across these systems, mechanisms for block control are reinforced by auxiliary reward functions (favoring minimal reasoning, correct outputs, or monotonic entropy descent), gating and load-balancing regularization, and explicit user overrides or caps during inference.

3. Training Objectives and Optimization

Dynamic-size block models necessitate specialized training and optimization regimes:

Think in Blocks (Zhu et al., 21 Aug 2025): Employs a three-stage pipeline. Supervised fine-tuning with prompt engineering teaches explicit block demarcation. Direct Preference Optimization (DPO) is then used with reward functions that favor both answer accuracy and brevity (token budget). Finally, reinforcement learning with a constrained Lagrangian loss jointly optimizes for increased no-thinking responses, fewer total blocks, block-length minimization, consistency between predicted and realized block counts, and accuracy.
DS-MoE (Roy et al., 24 Sep 2025): Combines standard task cross-entropy, routing loss (encouraging accurate depth selection when annotated), and balance loss (preventing collapse to a single expert or depth). The joint loss promotes both expert specialization and dynamic route selection.
Diffusion LLMs with b₁ Objective (Jiang et al., 4 May 2026): Adds monotonic entropy descent (MED) and block number (R_ind) rewards to standard correctness signals, integrated into Group Relative Policy Optimization (GRPO). The MED enforces that block-wise entropy decreases monotonically throughout the reasoning chain, using both pairwise (local) and global (Spearman) rank correlation objectives.
Flows (Josifoski et al., 2023): Optimization occurs at the level of the underlying LLMs and the composition logic. The depth and structure of flows are emergent from state-machine transitions and recursion in the composite system.

The following table summarizes principal training ingredients by approach:

Approach	Supervised SFT	RL with Custom Rewards	Routing/Specialization
Think in Blocks	✓	PPO (+ DPO)	Integer block head
DS-MoE	✓	Loss + sparsity penalty	Gated router
b₁ for dLLMs	—	GRPO with MED	End-of-block tokens
Flows	✗ (LLM-level)	Task-dependent	Recursive messages

4. Empirical Evaluations and Observed Benefits

Dynamic-size reasoning block paradigms deliver improvements in throughput, accuracy, and transparency across a variety of reasoning benchmarks and domains:

Think in Blocks (Zhu et al., 21 Aug 2025): On DeepMath (1k test questions, difficulty 2.0–9.0), the dynamic block pipeline (SFT → DPO → GRPO) achieved a 25.1% reduction in answer length (7735→5791 tokens) with a negligible drop in accuracy (85.5%→85.3% overall). Easy questions saw an accuracy increase (88.8%→92.0%), with hard question accuracy stable (74.9%). Forced block caps at low values demonstrated a tradeoff: $B_j$ 8 yielded 16.1% further length savings but cost −7.4% accuracy.
DS-MoE (Roy et al., 24 Sep 2025): On The Pile, accuracy improved on complex benchmarks by 2.8 percentage points. DS-MoE achieved up to 16% computational savings and 35% faster inference compared to uniform-depth transformers, with up to 65–70% FLOP reductions and 1.8–2.2× faster inference observed.
Flows (Josifoski et al., 2023): On competitive coding, augmenting LLM-only flows with additional nested blocks or human-in-the-loop blocks increased Pass@1 solve rate by up to +54 absolute points over baseline LLM calls. Iteratively composing Flows enabled richer, deeper chains and facilitated dynamic routing based on problem difficulty and verification status.
b₁ for dLLMs (Jiang et al., 4 May 2026): Across GSM8K, MATH, Sudoku, Countdown, integrating b₁ raised accuracy significantly: e.g., on Countdown, wd1 improved from 39.5% to 58.9%; on GSM8K, Diffu-GRPO improved from 19.9% to 28.9%. Monotonic entropy descent correlated strongly with accurate completions, validating entropy as a coherence proxy.

Model	Countdown (Base)	Countdown (+b₁)	GSM8K (Base)	GSM8K (+b₁)
wd1	39.5%	58.9%	78.9%	80.8%
Diffu-GRPO	24.2%	32.0%	19.9%	28.9%
d1	26.2%	34.4%	25.4%	30.5%

5. Interpretability, Inference Control, and Applications

Dynamic block approaches offer several interpretability and deployment benefits:

Explicit Reasoning Chains: DS-MoE enables direct tracing of expert block sequences per input, with interpretability scores assigned by human annotators ( $B_j$ 9 for DS-MoE vs. $j$ 0 for baselines) (Roy et al., 24 Sep 2025).
Fine-Grained Inference Control: Think in Blocks exposes integer-level control: users can override the predicted $j$ 1 to enforce reasoning depth constraints, trading efficiency for accuracy on the fly (Zhu et al., 21 Aug 2025).
Coherence Diagnostics: In b₁, block-level entropy profiles offer a real-time lens on reasoning coherence, elucidating when and how errors arise (e.g., entropy spikes at fractured semantic units) (Jiang et al., 4 May 2026).
Concurrency and Modularity: The Flows framework natively accommodates parallel and sequential composition, supporting dynamic expansion/contraction of reasoning chains for collaborative or distributed computation (Josifoski et al., 2023).

Applications include real-time or low-resource environments, accuracy-critical pipelines requiring deep reasoning, settings where explicit attribution of reasoning steps is essential, and systems benefiting from concurrent and modular sub-task decomposition.

6. Limitations and Future Directions

Known limitations and open challenges include:

Block Boundary Consistency: Ensuring alignment between declared and produced block counts remains imperfect, with some divergence even after RL regularization. Architectural innovations—such as specialized block-counter heads or improved loss designs—may further reduce these mismatches (Zhu et al., 21 Aug 2025).
Block-Level Coherence Metrics: Current entropy-based metrics are mean-field and insensitive to inter-token dependencies within a block; future work may explore finer-grained or structured coherence objectives (Jiang et al., 4 May 2026).
Task and Domain Generality: While most dynamic block models show gains in math and structured coding, broad generalization to open-ended domains or multi-modal generative tasks awaits further validation.
Hyperparameter Sensitivity: Heuristic choices such as $j$ 2 placement and $j$ 3 in b₁ can impact training dynamics; adaptive calibration or learned termination may improve robustness (Jiang et al., 4 May 2026).
Scalability and Efficiency: Although significant compute savings are demonstrated, further reductions without accuracy loss may be realized via more aggressive routing, pruning, or architectural modularization.

7. Cross-Model Synthesis and Theoretical Implications

Dynamic-size reasoning blocks instantiate a broader computational principle: the division of cognitive computation into adaptive, composable units, with block structure learned or inferred on demand. The convergence of block-based thinking across LLMs, modular transformers, diffusion models, and actor-based systems signals an architectural trend toward hierarchical, flexible reasoning pipelines. This supports the hypothesis that adaptive allocation of intermediate computation is critical not only for efficiency, but also for achieving alignment between model confidence, problem complexity, and semantic coherence. This emerging design space motivates further exploration of curriculum learning, meta-reasoning controllers, and task-adaptive block specialization, as well as rigorous theoretical analysis of dynamic computation graphs for reasoning systems (Zhu et al., 21 Aug 2025, Roy et al., 24 Sep 2025, Josifoski et al., 2023, Jiang et al., 4 May 2026).