Papers
Topics
Authors
Recent
2000 character limit reached

Query-Adaptive DAG Skeletons: AutoMR

Updated 3 December 2025
  • The paper demonstrates that AutoMR achieves superior accuracy (up to 69.6% on MATH-500 and 91.5% on GSM8K) by dynamically sampling query-adaptive DAG skeletons.
  • AutoMR is a framework that represents meta-reasoning skeletons as single-source directed acyclic graphs, enabling flexible strategy composition and query-specific adaptation.
  • Dynamic skeleton sampling exploits predecessor context and token budget constraints to efficiently guide LLM reasoning while incurring minimal computational overhead.

Query-Adaptive DAG Skeletons (AutoMR) refer to a family of automatically searched meta-reasoning skeletons expressed as directed acyclic graphs (DAGs), used to guide LLM reasoning for query-specific adaptation and improved logical dependency modeling. The AutoMR framework implements dynamic skeleton sampling, integrating query context and inference-evolving intermediate states to efficiently construct skeletons that unify and surpass previous hand-crafted sequential, parallel, and tree-structured approaches (Zhang et al., 5 Oct 2025).

1. Formal Definition of Meta-Reasoning Skeletons as DAGs

A meta-reasoning skeleton for a given query qq is formalized as a single-source, edge-heterogeneous DAG: α=(V,E,τ,c)\alpha = (V, E, \tau, c) where:

  • V={n0,n1,,nN1}V=\{n_0, n_1, \dots, n_{N-1}\} is the set of nodes, each ni=(i,ci)n_i=(i, c_i) with topological index i{0,,N1}i\in\{0, \dots, N-1\} and textual content cic_i (a token sequence). The unique source n0n_0 is assigned c0=qc_0 = q.
  • EV×VE \subseteq V \times V is the set of directed edges. For njnin_j \to n_i, the edge encodes the meta-strategy τ(njni)\tau(n_j \to n_i) guiding cic_i's generation from cjc_j.
  • τ:ES{zero}\tau: E \to S \cup \{\mathtt{zero}\} assigns each edge a meta-reasoning strategy from the finite set SS (e.g., {Next,Reflect,Explore,}\{\mathrm{Next}, \mathrm{Reflect}, \mathrm{Explore},\dots\}) or zero (denoting the absence of an edge).

All prior sequential, parallel, and tree-structured skeleton designs conform to this DAG formalism, enabling universal representation and compositional flexibility.

2. Mathematical Formulation of AutoMR Skeleton Search Space

Given a global token budget TT for total skeleton-generated content, the search space is defined as: A(S,T)={α=(V,E,τ,c)  |  α is single-source DAG,  τ:ES{zero},  niV{n0}ciT}\mathcal{A}(S, T) = \left\{ \alpha = (V, E, \tau, c)\;\middle|\; \alpha \text{ is single-source DAG},\; \tau: E \to S \cup \{\mathtt{zero}\},\; \sum_{n_i\in V\setminus\{n_0\}} |c_i| \leq T \right\} Parameterization is realized by discrete edge-strategy variables: {s(j,i)}0j<i<N,  s(j,i)S{zero}\{s_{(j,i)}\}_{0 \leq j < i < N}, \; s_{(j,i)} \in S \cup \{\mathtt{zero}\} subject to acyclicity and budget constraints.

The skeleton search is governed by a policy parameter θ\theta for architecture-sampling model Pθ(αq)P_\theta(\alpha|q). The supervised learning objective is: argmaxθ    E(q,a)D  EαPθ(q)[r(a,LLM(q;α))]\arg\max_{\theta} \;\; \mathbb{E}_{(q,a)\sim\mathcal{D}} \; \mathbb{E}_{\alpha\sim P_\theta(\cdot|q)} \left[ r\left(a, \mathrm{LLM}(q;\alpha)\right) \right] where r(a,LLM(q;α))=1r(a,\mathrm{LLM}(q;\alpha))=1 if LLM output matches ground truth aa; else 1-1.

3. Dynamic Skeleton Sampling and Optimization

AutoMR introduces a dynamic skeleton sampling algorithm that interleaves skeleton construction and stepwise text generation, exploiting current reasoning context to select edge strategies at each node expansion.

A lightweight MLP predicts sampling distributions for each candidate edge based on:

  • Predecessor content embedding
  • Already sampled strategies
  • Mean embeddings of previously generated skeleton steps

The full sampling process (see Algorithm 1 in (Zhang et al., 5 Oct 2025)) executes as follows:

  • Given qq and TT, initialize empty α\alpha.
  • For each node index ii, iteratively sample s(j,i)s_{(j,i)} for all j<ij<i using pθp_\theta.
  • If no nonzero edges are selected, terminate and output final answer via LLM.
  • Otherwise, prompt LLM with predecessor contents and chosen strategies, generate cic_i, and expand α\alpha.
  • Repeat until token budget is exhausted.

Optimization is performed with a REINFORCE policy-gradient update: θθ+ηMNi=1Nj=1Mr(ai,LLM(qi;αij))θlogPθ(αijqi)\theta \leftarrow \theta + \frac{\eta}{MN} \sum_{i=1}^N \sum_{j=1}^M r\left(a_i, \mathrm{LLM}(q_i; \alpha_i^j)\right)\, \nabla_\theta \log P_\theta(\alpha_i^j|q_i) where η\eta is the learning rate.

The dynamic sampling process realizes any allowable DAG structure with at most O(N2)O(N^2) MLP calls, with theoretical guarantees of completeness and negligible computational overhead relative to LLM inference.

4. Integration of Query Context and Reasoning State

The skeleton sampling MLP is explicitly conditioned on:

  1. Text embedding e(cj)e(c_j) of predecessor node njn_j
  2. Mean embedding of selected strategies for the current node
  3. Mean embedding of all preceding node contents c0,...,ci1c_0, ..., c_{i-1}

This design ensures that each strategy choice s(j,i)s_{(j,i)} is adaptively gated by both the original query and the cumulatively generated reasoning steps. Mathematically,

hj,i=MLPθ(e(cj)1>jk>je(s(k,i))1i<ie(c))pθ(s(j,i))=Softmax(hj,i)h_{j,i} = \mathrm{MLP}_\theta\Bigl( e(c_j) \Vert \tfrac{1}{|>j|}\sum_{k>j}e\bigl(s_{(k,i)}\bigr) \Vert \tfrac{1}{i}\sum_{\ell<i}e\bigl(c_\ell\bigr) \Bigr) \quad p_\theta(s_{(j,i)}) = \mathrm{Softmax}(h_{j,i})

This mechanism supports context-sensitive adaptation, where the skeleton topology and strategies reflect both query semantics and evolving intermediate inferences.

5. Experimental Setup and Comparative Results

AutoMR is evaluated on Math QA (GSM8K, MATH-500, AMC, Olympiad) and general multi-choice MMLU-Pro datasets. The framework is instantiated with both LLaMA-3B-Inst and Qwen2.5-3B-Inst backbones.

Compared baselines include Direct I/O, Chain-of-Thought (CoT), MRP, sequential Meta-Reasoner, tree-structured rStar, MaAS, and AutoMR. Main results demonstrate superior accuracy for AutoMR across all datasets:

Method MATH-500 GSM8K AMC Olympiad
Direct I/O 16.8 15.8 8.4 5.5
CoT 61.6 85.3 34.9 26.2
MRP 63.8 88.2 33.7 26.6
Meta-Reasoner 65.4 87.0 36.1 27.4
rStar 67.0 88.7 32.5 25.4
MaAS 63.6 86.4 33.7 27.7
AutoMR 69.6 91.5 38.6 30.4
Method Science Humanities Social Other
Direct I/O 32.7 25.1 39.0 29.1
CoT 41.6 28.3 51.5 39.8
MRP 42.8 30.1 53.5 41.6
Meta-Reasoner 45.4 31.9 55.0 42.2
rStar 43.6 30.8 55.4 36.0
MaAS 45.5 31.0 56.0 41.7
AutoMR 49.4 33.7 57.4 45.6

Ablation studies verify steep improvement in performance with token budget scaling and establish that only query-aware, context-adaptive sampling (AutoMR) achieves full performance. Training efficiency is substantially higher than full LLM fine-tuning, with minimal inference overhead.

6. Theoretical Properties and Empirical Analyses

Two key propositions are established:

  • Proposition 1: All sequential, parallel, and tree-structured meta-reasoning skeletons from prior work are representable as single-source, edge-heterogeneous DAGs.
  • Proposition 2: The dynamic sampling algorithm is complete with respect to the defined search space, requiring up to O(N2)O(N^2) MLP calls and negligible additional compute relative to LLM inference.

Empirical analyses indicate AutoMR skeletons adapt both to query domain and difficulty, e.g., increasing use of Recall strategies for knowledge queries, and expanding parallel branches for higher complexity. The observed performance and theoretical bounds confirm the dominance of DAG search spaces over topologically restricted alternatives in budget-limited regimes.

7. Significance and Research Context

AutoMR introduces a principled, unified approach to meta-reasoning skeleton search in LLM systems. By leveraging the expressivity of single-source DAGs and combining them with context-aware, query-adaptive dynamic sampling, it enables efficient automatic discovery of reasoning skeletons tailored to specific queries and evolving intermediate states. This approach outperforms manually-designed sequential and tree-based methods, achieving consistent gains with modest compute requirements (Zhang et al., 5 Oct 2025). The framework provides a robust substrate for future research into advanced LLM reasoning architectures and meta-reasoning optimization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Query-Adaptive DAG Skeletons (AutoMR).