Query-Adaptive DAG Skeletons: AutoMR
- The paper demonstrates that AutoMR achieves superior accuracy (up to 69.6% on MATH-500 and 91.5% on GSM8K) by dynamically sampling query-adaptive DAG skeletons.
- AutoMR is a framework that represents meta-reasoning skeletons as single-source directed acyclic graphs, enabling flexible strategy composition and query-specific adaptation.
- Dynamic skeleton sampling exploits predecessor context and token budget constraints to efficiently guide LLM reasoning while incurring minimal computational overhead.
Query-Adaptive DAG Skeletons (AutoMR) refer to a family of automatically searched meta-reasoning skeletons expressed as directed acyclic graphs (DAGs), used to guide LLM reasoning for query-specific adaptation and improved logical dependency modeling. The AutoMR framework implements dynamic skeleton sampling, integrating query context and inference-evolving intermediate states to efficiently construct skeletons that unify and surpass previous hand-crafted sequential, parallel, and tree-structured approaches (Zhang et al., 5 Oct 2025).
1. Formal Definition of Meta-Reasoning Skeletons as DAGs
A meta-reasoning skeleton for a given query is formalized as a single-source, edge-heterogeneous DAG: where:
- is the set of nodes, each with topological index and textual content (a token sequence). The unique source is assigned .
- is the set of directed edges. For , the edge encodes the meta-strategy guiding 's generation from .
- assigns each edge a meta-reasoning strategy from the finite set (e.g., ) or zero (denoting the absence of an edge).
All prior sequential, parallel, and tree-structured skeleton designs conform to this DAG formalism, enabling universal representation and compositional flexibility.
2. Mathematical Formulation of AutoMR Skeleton Search Space
Given a global token budget for total skeleton-generated content, the search space is defined as: Parameterization is realized by discrete edge-strategy variables: subject to acyclicity and budget constraints.
The skeleton search is governed by a policy parameter for architecture-sampling model . The supervised learning objective is: where if LLM output matches ground truth ; else .
3. Dynamic Skeleton Sampling and Optimization
AutoMR introduces a dynamic skeleton sampling algorithm that interleaves skeleton construction and stepwise text generation, exploiting current reasoning context to select edge strategies at each node expansion.
A lightweight MLP predicts sampling distributions for each candidate edge based on:
- Predecessor content embedding
- Already sampled strategies
- Mean embeddings of previously generated skeleton steps
The full sampling process (see Algorithm 1 in (Zhang et al., 5 Oct 2025)) executes as follows:
- Given and , initialize empty .
- For each node index , iteratively sample for all using .
- If no nonzero edges are selected, terminate and output final answer via LLM.
- Otherwise, prompt LLM with predecessor contents and chosen strategies, generate , and expand .
- Repeat until token budget is exhausted.
Optimization is performed with a REINFORCE policy-gradient update: where is the learning rate.
The dynamic sampling process realizes any allowable DAG structure with at most MLP calls, with theoretical guarantees of completeness and negligible computational overhead relative to LLM inference.
4. Integration of Query Context and Reasoning State
The skeleton sampling MLP is explicitly conditioned on:
- Text embedding of predecessor node
- Mean embedding of selected strategies for the current node
- Mean embedding of all preceding node contents
This design ensures that each strategy choice is adaptively gated by both the original query and the cumulatively generated reasoning steps. Mathematically,
This mechanism supports context-sensitive adaptation, where the skeleton topology and strategies reflect both query semantics and evolving intermediate inferences.
5. Experimental Setup and Comparative Results
AutoMR is evaluated on Math QA (GSM8K, MATH-500, AMC, Olympiad) and general multi-choice MMLU-Pro datasets. The framework is instantiated with both LLaMA-3B-Inst and Qwen2.5-3B-Inst backbones.
Compared baselines include Direct I/O, Chain-of-Thought (CoT), MRP, sequential Meta-Reasoner, tree-structured rStar, MaAS, and AutoMR. Main results demonstrate superior accuracy for AutoMR across all datasets:
| Method | MATH-500 | GSM8K | AMC | Olympiad |
|---|---|---|---|---|
| Direct I/O | 16.8 | 15.8 | 8.4 | 5.5 |
| CoT | 61.6 | 85.3 | 34.9 | 26.2 |
| MRP | 63.8 | 88.2 | 33.7 | 26.6 |
| Meta-Reasoner | 65.4 | 87.0 | 36.1 | 27.4 |
| rStar | 67.0 | 88.7 | 32.5 | 25.4 |
| MaAS | 63.6 | 86.4 | 33.7 | 27.7 |
| AutoMR | 69.6 | 91.5 | 38.6 | 30.4 |
| Method | Science | Humanities | Social | Other |
|---|---|---|---|---|
| Direct I/O | 32.7 | 25.1 | 39.0 | 29.1 |
| CoT | 41.6 | 28.3 | 51.5 | 39.8 |
| MRP | 42.8 | 30.1 | 53.5 | 41.6 |
| Meta-Reasoner | 45.4 | 31.9 | 55.0 | 42.2 |
| rStar | 43.6 | 30.8 | 55.4 | 36.0 |
| MaAS | 45.5 | 31.0 | 56.0 | 41.7 |
| AutoMR | 49.4 | 33.7 | 57.4 | 45.6 |
Ablation studies verify steep improvement in performance with token budget scaling and establish that only query-aware, context-adaptive sampling (AutoMR) achieves full performance. Training efficiency is substantially higher than full LLM fine-tuning, with minimal inference overhead.
6. Theoretical Properties and Empirical Analyses
Two key propositions are established:
- Proposition 1: All sequential, parallel, and tree-structured meta-reasoning skeletons from prior work are representable as single-source, edge-heterogeneous DAGs.
- Proposition 2: The dynamic sampling algorithm is complete with respect to the defined search space, requiring up to MLP calls and negligible additional compute relative to LLM inference.
Empirical analyses indicate AutoMR skeletons adapt both to query domain and difficulty, e.g., increasing use of Recall strategies for knowledge queries, and expanding parallel branches for higher complexity. The observed performance and theoretical bounds confirm the dominance of DAG search spaces over topologically restricted alternatives in budget-limited regimes.
7. Significance and Research Context
AutoMR introduces a principled, unified approach to meta-reasoning skeleton search in LLM systems. By leveraging the expressivity of single-source DAGs and combining them with context-aware, query-adaptive dynamic sampling, it enables efficient automatic discovery of reasoning skeletons tailored to specific queries and evolving intermediate states. This approach outperforms manually-designed sequential and tree-based methods, achieving consistent gains with modest compute requirements (Zhang et al., 5 Oct 2025). The framework provides a robust substrate for future research into advanced LLM reasoning architectures and meta-reasoning optimization.