Query-Adaptive DAG Skeletons: AutoMR

Updated 3 December 2025

The paper demonstrates that AutoMR achieves superior accuracy (up to 69.6% on MATH-500 and 91.5% on GSM8K) by dynamically sampling query-adaptive DAG skeletons.
AutoMR is a framework that represents meta-reasoning skeletons as single-source directed acyclic graphs, enabling flexible strategy composition and query-specific adaptation.
Dynamic skeleton sampling exploits predecessor context and token budget constraints to efficiently guide LLM reasoning while incurring minimal computational overhead.

Query-Adaptive DAG Skeletons (AutoMR) refer to a family of automatically searched meta-reasoning skeletons expressed as directed acyclic graphs (DAGs), used to guide LLM reasoning for query-specific adaptation and improved logical dependency modeling. The AutoMR framework implements dynamic skeleton sampling, integrating query context and inference-evolving intermediate states to efficiently construct skeletons that unify and surpass previous hand-crafted sequential, parallel, and tree-structured approaches (Zhang et al., 5 Oct 2025).

1. Formal Definition of Meta-Reasoning Skeletons as DAGs

A meta-reasoning skeleton for a given query $q$ is formalized as a single-source, edge-heterogeneous DAG: $\alpha = (V, E, \tau, c)$ where:

$V=\{n_0, n_1, \dots, n_{N-1}\}$ is the set of nodes, each $n_i=(i, c_i)$ with topological index $i\in\{0, \dots, N-1\}$ and textual content $c_i$ (a token sequence). The unique source $n_0$ is assigned $c_0 = q$ .
$E \subseteq V \times V$ is the set of directed edges. For $n_j \to n_i$ , the edge encodes the meta-strategy $\tau(n_j \to n_i)$ guiding $c_i$ 's generation from $c_j$ .
$\tau: E \to S \cup \{\mathtt{zero}\}$ assigns each edge a meta-reasoning strategy from the finite set $S$ (e.g., $\{\mathrm{Next}, \mathrm{Reflect}, \mathrm{Explore},\dots\}$ ) or zero (denoting the absence of an edge).

All prior sequential, parallel, and tree-structured skeleton designs conform to this DAG formalism, enabling universal representation and compositional flexibility.

2. Mathematical Formulation of AutoMR Skeleton Search Space

Given a global token budget $T$ for total skeleton-generated content, the search space is defined as: $\mathcal{A}(S, T) = \left\{ \alpha = (V, E, \tau, c)\;\middle|\; \alpha \text{ is single-source DAG},\; \tau: E \to S \cup \{\mathtt{zero}\},\; \sum_{n_i\in V\setminus\{n_0\}} |c_i| \leq T \right\}$ Parameterization is realized by discrete edge-strategy variables: $\{s_{(j,i)}\}_{0 \leq j < i < N}, \; s_{(j,i)} \in S \cup \{\mathtt{zero}\}$ subject to acyclicity and budget constraints.

The skeleton search is governed by a policy parameter $\theta$ for architecture-sampling model $P_\theta(\alpha|q)$ . The supervised learning objective is: $\arg\max_{\theta} \;\; \mathbb{E}_{(q,a)\sim\mathcal{D}} \; \mathbb{E}_{\alpha\sim P_\theta(\cdot|q)} \left[ r\left(a, \mathrm{LLM}(q;\alpha)\right) \right]$ where $r(a,\mathrm{LLM}(q;\alpha))=1$ if LLM output matches ground truth $a$ ; else $-1$ .

3. Dynamic Skeleton Sampling and Optimization

AutoMR introduces a dynamic skeleton sampling algorithm that interleaves skeleton construction and stepwise text generation, exploiting current reasoning context to select edge strategies at each node expansion.

A lightweight MLP predicts sampling distributions for each candidate edge based on:

Predecessor content embedding
Already sampled strategies
Mean embeddings of previously generated skeleton steps

The full sampling process (see Algorithm 1 in (Zhang et al., 5 Oct 2025)) executes as follows:

Given $q$ and $T$ , initialize empty $\alpha$ .
For each node index $i$ , iteratively sample $s_{(j,i)}$ for all $j<i$ using $p_\theta$ .
If no nonzero edges are selected, terminate and output final answer via LLM.
Otherwise, prompt LLM with predecessor contents and chosen strategies, generate $c_i$ , and expand $\alpha$ .
Repeat until token budget is exhausted.

Optimization is performed with a REINFORCE policy-gradient update: $\theta \leftarrow \theta + \frac{\eta}{MN} \sum_{i=1}^N \sum_{j=1}^M r\left(a_i, \mathrm{LLM}(q_i; \alpha_i^j)\right)\, \nabla_\theta \log P_\theta(\alpha_i^j|q_i)$ where $\eta$ is the learning rate.

The dynamic sampling process realizes any allowable DAG structure with at most $O(N^2)$ MLP calls, with theoretical guarantees of completeness and negligible computational overhead relative to LLM inference.

4. Integration of Query Context and Reasoning State

The skeleton sampling MLP is explicitly conditioned on:

Text embedding $e(c_j)$ of predecessor node $n_j$
Mean embedding of selected strategies for the current node
Mean embedding of all preceding node contents $c_0, ..., c_{i-1}$

This design ensures that each strategy choice $s_{(j,i)}$ is adaptively gated by both the original query and the cumulatively generated reasoning steps. Mathematically,

$h_{j,i} = \mathrm{MLP}_\theta\Bigl( e(c_j) \Vert \tfrac{1}{|>j|}\sum_{k>j}e\bigl(s_{(k,i)}\bigr) \Vert \tfrac{1}{i}\sum_{\ell<i}e\bigl(c_\ell\bigr) \Bigr) \quad p_\theta(s_{(j,i)}) = \mathrm{Softmax}(h_{j,i})$

This mechanism supports context-sensitive adaptation, where the skeleton topology and strategies reflect both query semantics and evolving intermediate inferences.

5. Experimental Setup and Comparative Results

AutoMR is evaluated on Math QA (GSM8K, MATH-500, AMC, Olympiad) and general multi-choice MMLU-Pro datasets. The framework is instantiated with both LLaMA-3B-Inst and Qwen2.5-3B-Inst backbones.

Compared baselines include Direct I/O, Chain-of-Thought (CoT), MRP, sequential Meta-Reasoner, tree-structured rStar, MaAS, and AutoMR. Main results demonstrate superior accuracy for AutoMR across all datasets:

Method	MATH-500	GSM8K	AMC	Olympiad
Direct I/O	16.8	15.8	8.4	5.5
CoT	61.6	85.3	34.9	26.2
MRP	63.8	88.2	33.7	26.6
Meta-Reasoner	65.4	87.0	36.1	27.4
rStar	67.0	88.7	32.5	25.4
MaAS	63.6	86.4	33.7	27.7
AutoMR	69.6	91.5	38.6	30.4

Method	Science	Humanities	Social	Other
Direct I/O	32.7	25.1	39.0	29.1
CoT	41.6	28.3	51.5	39.8
MRP	42.8	30.1	53.5	41.6
Meta-Reasoner	45.4	31.9	55.0	42.2
rStar	43.6	30.8	55.4	36.0
MaAS	45.5	31.0	56.0	41.7
AutoMR	49.4	33.7	57.4	45.6

Ablation studies verify steep improvement in performance with token budget scaling and establish that only query-aware, context-adaptive sampling (AutoMR) achieves full performance. Training efficiency is substantially higher than full LLM fine-tuning, with minimal inference overhead.

6. Theoretical Properties and Empirical Analyses

Two key propositions are established:

Proposition 1: All sequential, parallel, and tree-structured meta-reasoning skeletons from prior work are representable as single-source, edge-heterogeneous DAGs.
Proposition 2: The dynamic sampling algorithm is complete with respect to the defined search space, requiring up to $O(N^2)$ MLP calls and negligible additional compute relative to LLM inference.

Empirical analyses indicate AutoMR skeletons adapt both to query domain and difficulty, e.g., increasing use of Recall strategies for knowledge queries, and expanding parallel branches for higher complexity. The observed performance and theoretical bounds confirm the dominance of DAG search spaces over topologically restricted alternatives in budget-limited regimes.

7. Significance and Research Context

AutoMR introduces a principled, unified approach to meta-reasoning skeleton search in LLM systems. By leveraging the expressivity of single-source DAGs and combining them with context-aware, query-adaptive dynamic sampling, it enables efficient automatic discovery of reasoning skeletons tailored to specific queries and evolving intermediate states. This approach outperforms manually-designed sequential and tree-based methods, achieving consistent gains with modest compute requirements (Zhang et al., 5 Oct 2025). The framework provides a robust substrate for future research into advanced LLM reasoning architectures and meta-reasoning optimization.

PDF Markdown Chat (Pro)

References (1)

Searching Meta Reasoning Skeleton to Guide LLM Reasoning (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Query-Adaptive DAG Skeletons (AutoMR).