Searching Meta Reasoning Skeleton to Guide LLM Reasoning (2510.04116v1)

Published 5 Oct 2025 in cs.AI

Abstract: Meta reasoning behaviors work as a skeleton to guide LLM reasoning, thus help to improve reasoning performance. However, prior researches implement meta reasoning skeleton with manually designed structure, limiting ability to adapt to query-specific requirement and capture intricate logical dependency among reasoning steps. To deal with the challenges, we represent meta reasoning skeleton with directed acyclic graph (DAG) to unify skeletons proposed in prior works and model intricate logical dependency. Then we propose AutoMR, a framework that searches for query-aware meta reasoning skeleton automatically inspired by automated machine learning (AutoML). Specifically, we construct search space based on DAG representation of skeleton and then formulate the search problem. We design a dynamic skeleton sampling algorithm by expanding meta reasoning skeleton along with reasoning context at inference time. This algorithm can derive any meta reasoning skeleton in search space efficiently and adapt skeleton to evolving base reasoning context, thus enable efficient query-aware skeleton search. We conduct experiments on extensive benchmark datasets. Experimental results show that AutoMR achieves better reasoning performance than previous works broadly.

Summary

The paper introduces AutoMR, which uses a DAG-based meta reasoning skeleton to dynamically guide LLM reasoning.
It employs a dynamic skeleton sampling algorithm that adapts reasoning paths to query-specific contexts.
Experimental results show improved performance on datasets like GSM8K and MMLU-Pro compared to static reasoning methods.

Searching Meta Reasoning Skeleton to Guide LLM Reasoning

The paper "Searching Meta Reasoning Skeleton to Guide LLM Reasoning" (2510.04116) explores a novel approach to improving the reasoning capabilities of LLMs by utilizing meta-reasoning frameworks. This approach leverages Directed Acyclic Graphs (DAGs) to represent the meta-reasoning structure, dynamically adapting to query-specific requirements. The proposed AutoMR framework seeks to address the limitations of manually designed reasoning structures by automating the search for adaptable reasoning skeletons. This essay explores the implementation details, the advantages of using DAGs, and explores the implications of this research.

Introduction and Motivation

LLMs have shown competence in handling complex reasoning tasks, such as mathematical problem-solving, through structured reasoning processes. Traditional methods typically rely on static, manually-designed meta-reasoning skeletons. These predefined structures, such as sequential or tree-based approaches, often fail to model the intricate dependencies and dynamic requirements of specific queries, resulting in sub-optimal reasoning performance.

This paper addresses these limitations by proposing an automated solution using AutoML techniques to adapt reasoning frameworks dynamically. The AutoMR framework utilizes query-aware meta-reasoning skeletons represented by single-source edge-heterogeneous DAGs. By doing so, it unifies previous concepts and provides a means to capture the complex dyadic dependencies among reasoning steps.

Methodology

DAG-Based Representation

The paper introduces the use of DAGs to represent meta-reasoning skeletons, allowing for a flexible and more nuanced structure that reflects the logical dependencies present in complex reasoning tasks. The DAG-based representation subsumes traditional sequential, parallel, and tree-based skeletons, offering a comprehensive and adaptable search space for meta-reasoning schemas.

Dynamic Skeleton Sampling

The AutoMR framework introduces a dynamic skeleton sampling algorithm that builds reasoning structures on-the-fly during the reasoning process. This algorithm operates by incrementally constructing the skeleton in topological order, allowing adaptation to evolving base reasoning contexts. Each potential reasoning step is evaluated within its context, ensuring that the resultant reasoning paths are efficient and context-sensitive.

Search Strategy

The search problem is framed as finding the optimal policy that maximizes reasoning performance by effectively guiding the LLM through the reasoning process. This policy is governed by a Meta-Level Policy Network, implemented using a multi-layer perception (MLP). The network determines potential edges and strategies dynamically as reasoning progresses, ensuring that meta-reasoning is contextually relevant.

Experimental Results

The paper presents extensive evaluations across multiple datasets, focusing on both math-based Q&A tasks and general multiple-choice queries. AutoMR demonstrates clear improvements over baseline meta-reasoning strategies, including MRP and rStar, highlighting its capacity for efficient adaptation and optimal reasoning guidance.

Performance metrics across challenging datasets such as GSM8K and MMLU-Pro indicate that AutoMR not only improves accuracy but also scales more efficiently with increased token budgets, compared to traditional sequential and tree-structured methods.

Implications and Future Directions

The AutoMR framework opens up new possibilities for augmenting LLM reasoning by using meta-reasoning strategies that are both dynamic and context-aware. By exploring the DAG-based search space, the framework identifies reasoning paths that optimize accuracy and efficiency for various types of tasks.

Future research could expand upon this by integrating additional elements of human cognition, such as uncertainty quantification and adaptive recalibration, into the reasoning process. Moreover, further exploration of how these dynamic skeletons can be applied to other AI challenges, such as real-time decision-making or multi-modal reasoning, could offer significant advancements.

Conclusion

The "Searching Meta Reasoning Skeleton to Guide LLM Reasoning" paper provides a substantive contribution to enhancing the flexibility and effectiveness of LLM reasoning. By leveraging DAGs for dynamic skeleton composition, it overcomes previous limitations of static reasoning frameworks. The results indicate a promising direction for future research into adaptive and context-sensitive reasoning systems, potentially extending the applicability of LLMs in complex cognitive tasks.