Papers
Topics
Authors
Recent
Search
2000 character limit reached

AutoTIR: Autonomous Tool-Integrated Reasoning

Updated 18 June 2026
  • AutoTIR is a dual-purpose framework that integrates adaptive tool usage in LLMs and real-time vehicle system identification through autonomous, context-aware methods.
  • In LLMs, AutoTIR employs a reinforcement learning-driven Markov decision process that dynamically decides when and which external tools to invoke, balancing language fluency with computational precision.
  • For autonomous racing, it combines vision-based friction mapping, S4 temporal modeling, and Nelder-Mead optimization to substantially reduce friction estimation error and convergence time.

AutoTIR encompasses two distinct but technically significant frameworks: (1) Autonomous Tools-Integrated Reasoning in LLMs via reinforcement learning, and (2) vision-augmented iterative system identification for autonomous racing vehicles. Both are united by their focus on autonomous, context-aware interaction with external resources—tools for reasoning in LLMs and sensors/model components for dynamic system estimation.

1. Autonomous Tool-Integrated Reasoning in LLMs

AutoTIR, as introduced in "AutoTIR: Autonomous Tools Integrated Reasoning via Reinforcement Learning" (Wei et al., 29 Jul 2025), formalizes tool-augmented problem solving in LLMs as a sequential decision process. Traditional Tool-Integrated Reasoning (TIR) pipelines utilize hand-crafted, static tool-invocation strategies (e.g., fixed orderings like retrieval→code execution), which limit adaptability across heterogeneous tasks and risk eroding base instruction-following capabilities. AutoTIR solves this limitation by enabling the LLM to autonomously determine—at each reasoning step—whether to invoke an external tool, and if so, which tool is most appropriate, thereby balancing core linguistic fluency with precision augmentation via external computation.

AutoTIR formulates the TIR workflow as a Markov decision process (MDP), where the state sks_k at step kk includes the question and the accumulated reasoning trace, and the action aka_k consists of a tool invocation choice and (potentially) a free-form think-step in natural language:

  • State: sk=(Q,Ï„k−1)s_k = (Q, \tau_{k-1}), where Ï„k−1\tau_{k-1} aggregates prior (state, tool, output) tuples.
  • Action: ak=(tk,λk)a_k = (t_k, \lambda_k), where tk∈{search,code,∅}t_k \in \{\texttt{search}, \texttt{code}, \emptyset\} and λk\lambda_k is a textual continuation.

The environment executes tool calls and appends their outputs, updating the state for the next step. AutoTIR's RL agent learns an adaptive tool-use policy πθ\pi_\theta, leveraging a hybrid reward composed of (1) task-specific answer correctness (RcorrectR_\mathrm{correct}), (2) structured output–format adherence (kk0), and (3) explicit penalties for inappropriate tool use (kk1).

2. Reinforcement Learning Framework and Training Algorithm

Training is performed with Group Relative Policy Optimization (GRPO), a variant of PPO with a reference policy kk2 that stabilizes updates by minimizing KL divergence from a pretrained base. Key steps include:

  • Generating multiple rollouts per input under the current policy, recording full action/reward trajectories, and masking tool execution outputs to prevent policy contamination.
  • Computing total rewards kk3 as kk4, with normalization for variance stabilization.
  • Updating policy parameters by maximizing the clipped objective:

kk5

where kk6 and kk7 is the normalized advantage.

Hyperparameters include a learning rate kk8, batch size 256, and rollout count kk9, with curriculum mixing for math, retrieval, and instruction datasets to maintain base reasoning capacities. The process is warm-started from an instruct-tuned LLM.

3. System Architecture: Inference and Tool Integration

At inference, AutoTIR operates in either tool-assisted or standalone mode, controlled by the system prompt:

  • Tool-assisted mode: The prompt allows for interleaved > , <search>, and <code> blocks. Queries to the retrieval engine or code interpreter are encapsulated in XML-style tags; results are returned and incorporated into the reasoning trace. Tool invocation decisions are directly sampled from the policy aka_k0. > > - Standalone mode: The prompt prohibits tool invocations, forcing the model to rely solely on native textual reasoning. > > All tool outputs are masked during backpropagation to ensure learning occurs exclusively through the policy network, not feedback from the environment. > > Example Trace: > aka_k5 > > ## 4. Experimental Protocols and Benchmarking > > AutoTIR is evaluated on a comprehensive suite spanning: > > - Knowledge-Intensive QA: HotpotQA, 2WikiMultiHopQA, MuSiQue, Bamboogle (measured by exact match). > > - Mathematical Reasoning: AIME2024, AIME2025, MATH500, GSM8K (exact match, accuracy). > > - Instruction Following: LogiQA and IFEval (accuracy, soft-accuracy). > > Baselines include text-only RL reasoning, code-enhanced solvers, and retrieval-based models. Two auxiliary metrics reflect tool efficiency: tool-selection accuracy (TS) and tool-productivity (TP = correct answers per tool invocation). > > Ablation studies reveal: > > - Removing tool access leads to a ~20-point average performance drop. > > - Exclusion of instruction-following data (e.g., IF) causes IFEval scores to plummet from 51.0 to 13.1. > > - Omitting penalty terms results in small but consistent drops in tool efficiency due to over-invocation. > > - Prior rule–based orchestration underperforms compared to RL-driven flexibility, especially on instruction-following and mathematical tasks. > > ## 5. Results and Tool-Efficiency Analysis > > AutoTIR demonstrates a substantial improvement in overall accuracy and generalization: > > - Average performance across 10 benchmarks: 46.01% (vs. 29.42% state-of-the-art baseline, 21.84% base model). > > - Tool selection accuracy: TS ∼92–100%. > > - Tool productivity: TP highest across evaluated domains. > > Gains are task-dependent, with the largest improvements occurring in high-difficulty (e.g., AIME mathematics) scenarios, confirming that AutoTIR selectively invokes tools where they yield maximal incremental benefit. RL fine-tuning itself (even absent tools) outperforms purely supervised instruction baselines. > > Scaling analysis indicates that with increasing RL steps, both average reward and chain-of-thought length increase, signifying progressive acquisition of more elaborate, tool-aware reasoning skills. > > ## 6. Vision-Augmented System Identification for Autonomous Vehicles > > In the context of autonomous racing, AutoTIR refers to a vision-augmented, iterative system identification architecture (Wu et al., 10 Mar 2026). The system integrates three main components: > > - MobileNetV3-Based Probabilistic Friction Mapper: Visual textures from on-track images are mapped to a friction prior (aka_k1), initializing the peak friction (aka_k2) in the Pacejka tire model. This module is trained on RSCD with cross-entropy and aka_k3 regularization, using SE modules for channel recalibration. > > - S4 Temporal Residual Model: High-frequency, non-linear dynamics not captured by nominal models are learned as temporal residuals using a Structured State Space Sequence (S4) framework. This approach incorporates HiPPO-initialized A matrices to encode long-range memory and oscillatory phenomena with fast, global FFT-based convolution. > > - Nelder-Mead Simplex Optimization: A derivative-free routine iteratively extracts physically interpretable parameters aka_k4 for the Pacejka "magic formula," minimizing RMSE across simulated lateral force trajectories. > > This architecture enables robust, real-time identification of tire dynamics, reducing friction estimation error by 76.1%, cold-start convergence iterations by 71.4%, and lateral force RMSE by >60% compared to prior neural architectures, with lower FLOP requirements and competitive iteration time. > > ## 7. Implications, Limitations, and Future Directions > > Both AutoTIR instantiations demonstrate the efficacy of autonomous, context-adaptive interaction with external modules (tools or sensor-informed models) for achieving superior performance and flexibility relative to static rule-based systems. > > Limitations and promising directions for language-based AutoTIR include: > > - High inference overhead on simple tasks; anticipated improvements with tool-switch predictors to bypass unnecessary tool invocation. > > - Scaling to broader tool libraries (e.g., API calls, simulators) and multi-agent orchestration. > > - Enhancing reward schemes to capture stepwise reasoning quality and verification. > > In autonomous racing, future work may further reduce cold-start latency, integrate broader perceptual priors, and extend to more complex maneuvers and multi-modal data fusion. > > AutoTIR thus establishes a foundational shift towards models and controllers that autonomously determine the optimal moments and modalities for external augmentation, whether in cognitive reasoning or real-time control, ensuring that advanced systems can adaptively balance native competence with situational tool use (Wei et al., 29 Jul 2025, Wu et al., 10 Mar 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AutoTIR.