Execution-Aware Tool-Jumping

Updated 30 December 2025

Execution-aware tool-jumping is a mechanism where LLM agents dynamically select and chain external tool invocations based on real-time feedback and execution traces.
It leverages reinforcement learning, formal planning languages, and cost/latency considerations to enhance security, efficiency, and multi-step task performance.
The approach integrates sequential decision-making and multi-modal signals, enabling adaptive tool chaining in robotics, code navigation, and dynamic service scheduling.

Execution-aware tool-jumping is the mechanism by which LLM agents or multi-modal policies coordinate, manage, and select external tool invocations during multi-step task execution, using feedback from the environment or execution traces. Rather than statically invoking specific tools or relying solely on prompt context, execution-aware tool-jumping leverages ongoing state, partial outputs, real-time signals, and cost or safety constraints to adaptively choose, chain, or switch between tools with high precision and responsiveness. This concept encompasses security-motivated exploit chains, reinforcement-learning-driven navigation, latency-optimized serving systems, and reasoning-driven cost minimization. The field covers agent robustness, end-to-end RL integration, formal planning languages supporting tool branching and dependencies, and practical LLM/plugin orchestration.

1. Conceptual Foundations and Formal Definitions

Execution-aware tool-jumping generalizes single-step agent actions to sequences where tool selection and chaining are guided by environment states, dynamic goals, or side-channel feedback. In the STAC framework, adversarial tool-jumping enables attacks by constructing sequences $(a_1, ..., a_T)$ where each $a_t$ is benign in isolation, but the cumulative sequence manipulates the environment state $s_t$ such that the final action $a_T$ achieves a harmful effect undetectable by superficial prompt analysis (Li et al., 30 Sep 2025). Formally, with policy $\pi$ and deterministic transition $T(s_t, a_t)$ , the adversary seeks chains:

$\forall i < T$ , $\pi(h_{i-1}, s_{i-1}) \implies a_i$ (passes safety checks)
$T(s_{i-1}, a_i) = s_i$ (sets up preconditions)
$a_T$ (on full micro-context) effects the malicious goal

Execution traces, history embeddings, and joint reasoning over agent state and tool outcomes underlie policy learning for robust tool-jumping and sequence adaptation.

2. Sequential Tool-Chaining and Security Analysis

STAC demonstrates execution-aware tool-jumping as a multi-turn, closed-loop pipeline for attack generation, verification, and stealth induction (Li et al., 30 Sep 2025):

Generator $G$ : Synthesizes candidate tool chains using planner-style prompting, targeting specific failure modes (Agent-SafetyBench taxonomy: harmful content, incorrect parameters, blind trust, etc.).
Verifier $V$ : Executes each chain in a sandbox, refines tool arguments, and ensures state transitions meet subgoal criteria.
Prompt Writer $W$ : Reverse-engineers benign-seeming prompts that reliably elicit the intended verified sequence under the agent's learned policy.
Planner $P$ : Interacts adaptively with the real agent, feeding constructed prompts, monitoring state, and recording whether final malicious actions are triggered.
Judge $J$ : Scores each turn for harmlessness, goal progress, and helpfulness.

Attack Success Rate (ASR) is defined as $ASR = \left|\{ C_i : (a_1 \circ \cdots \circ a_T)(s_0) \text{ reaches the harmful target state} \} \right| / N$ for $N$ verified candidate chains.

Taxonomic failures (premature execution, omission, misparameterization, constraint violation) occur frequently in tool-jumping sequences, revealing critical vulnerabilities in current agent architectures.

3. Policy Learning and Reinforcement Feedback for Tool-Jumping

Execution-aware tool-jumping requires agents to not only select tools but also decide when to invoke them and how to parameterize calls based on trial-and-error feedback. The TRICE framework instantiates this through a two-stage pipeline (Qiao et al., 2023):

Stage I: Behavior cloning, teaching the agent to imitate weakly supervised tool-usage labels (from ChatGPT) for both self-solved and tool-requiring queries.
Stage II: Reinforcement Learning with Execution Feedback (RLEF), ranking alternative completions by final post-execution accuracy. The loss combines a ranking hinge (higher for execution-accurate candidates) and a supervised term enforcing syntactic tool-call correctness.

Mathematically, the policy $\pi_\theta$ is optimized with:

$L_{RLEF} = \alpha C_{rank} + L_{sft}$

where $C_{rank}$ orders completions by execution outcome, $\alpha$ tunes ranking vs. format sanity.

Empirical results show that execution feedback dramatically improves selective tool usage and accuracy while eliminating unnecessary or erroneous calls, supporting both "self-solve" and adaptive tool invocation.

SwitchVLA introduces execution-aware tool-jumping to robotic Vision-Language-Action models by segmenting demonstration trajectories into temporally grounded contact phases and dynamically modulating behavior mode—forward, rollback, or advance—conditioned on execution state and changing instructions (Li et al., 4 Jun 2025). The policy embeds multi-modal signals:

$s_t = (o_t, q_t, c^{pre})$ for observation, proprioception, and contact feedback
$I = (l^{pre}, l^{cur})$ for instruction context

A transformer-based conditional decoder predicts next action chunk $A_t$ and behavior mode $b_t$ , supporting seamless mid-execution tool switches in response to user intent or environmental changes. Real-time switching and interpolation yield robust multi-tool chaining, with ablations showing success rates up to $96\%$ on mid-phase task switches compared to prior baselines ($8$– $11\%$ ). This supports both intentional and recovery-driven tool jumps, enabling unified control for long horizon and multi-step tasks.

RepoNavigator streamlines tool-jumping for repository-level LLM agents by formalizing a single, execution-aware jump operation: at each agent turn $t$ , the policy $\pi_\theta$ chooses either a reasoning action or a JSON-formatted tool call to jump to a symbol's definition within the codebase (Zhang et al., 24 Dec 2025). The jump tool invokes precise static analysis (e.g., Pyright's AST parsing, LEGB resolution) returning relevant code context as the observation stream.

The RL setup optimizes sequence localization by maximizing expected reward:

$J(\theta) = \mathbb{E}_{\tau \sim \pi_\theta}[R(\hat{Y}, Y^*, \tau)],$

where $R$ includes DICE score for ground-truth location and $S(\tau)$ for successful tool calls.

This execution-structured approach reduces error propagation and search scope, yielding higher success rates and matching real invocation flows compared to multi-tool or retrieval-based agents.

6. Cost and Latency-Aware Tool Planning in LLM Agents

CATP-LLM advances execution-aware tool-jumping via cost-aware planning, using a formal Tool Planning Language (TPL) to describe arbitrary branched (DAG-structured) tool chains and a cost-augmented offline RL algorithm (Decision Transformer-based) to optimize final plan quality (Wu et al., 2024). Each plan $\tau = (t_1, d_1, ..., t_n)$ yields task performance $P(\tau)$ and execution cost $C(\tau)$ , with policy learning maximizing expected net utility

$\max_\pi \, \mathbb{E}_{x, \tau} \left[ P(\tau) - \lambda C(\tau) \right]$

Context embedding incorporates cost vectors, and training penalizes expensive intermediate expansions and rewards high-performance, low-cost completion. CATP-LLM achieves up to $30.2\%$ higher plan performance and $45.8\%$ lower cost than GPT-4 prompting, while guaranteeing $100\%$ valid plans.

7. Efficient Serving and Scheduling for Execution-Aware Tool-Jumping

Conveyor implements execution-aware tool-jumping with partial execution alongside ongoing LLM decoding (Xu et al., 2024). The system overlaps tool execution and GPU-based token generation, reducing end-to-end latency for multi-stage workloads. The central parser/plugin API enables streaming detection of tool calls as soon as enough LLM output is available; the scheduler non-blockingly polls tool processes, injecting outputs to the next LLM decoding pass.

Formally, Conveyor's model bounds latency as:

$L_{new} = \sum_{i=1}^n \max\{g_i, t_i\} + g_{n+1}$

where $g_i$ are LLM decoding times and $t_i$ tool execution times.

Empirical speedups reach $38.8\%$ for planning workloads, with overhead under $1\%$ and plugin adaptation requiring minimal code. The system's efficiency depends on matching tool and decode latency; future work will involve richer dependency tracking and predictive scheduling.

Execution-aware tool-jumping constitutes a broad and rapidly evolving methodological substrate for reliable, efficient, and secure LLM agent operation across planning, coding, robotics, and serving. Advances in sequential chain reasoning, joint policy adaptation, cost/performance tradeoff optimization, and system-level scheduling have enabled agents to reason over extended execution traces, adapt to real-time context, and orchestrate multi-tool pipelines with robust safety and task fidelity. Continued research targets scalability, cross-language generalization, dynamic scheduling, and defense against emergent chain vulnerabilities.

Markdown Upgrade to Chat

References (6)

STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents (2025)

Making Language Models Better Tool Learners with Execution Feedback (2023)

SwitchVLA: Execution-Aware Task Switching for Vision-Language-Action Models (2025)

One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents (2025)

CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning (2024)

Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Execution-Aware Tool-Jumping.

Execution-Aware Tool-Jumping

1. Conceptual Foundations and Formal Definitions

2. Sequential Tool-Chaining and Security Analysis

3. Policy Learning and Reinforcement Feedback for Tool-Jumping

5. RL-Driven, Execution-Structured Tool-Jumping for Code Navigation

6. Cost and Latency-Aware Tool Planning in LLM Agents

7. Efficient Serving and Scheduling for Execution-Aware Tool-Jumping

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Execution-Aware Tool-Jumping

1. Conceptual Foundations and Formal Definitions

2. Sequential Tool-Chaining and Security Analysis

3. Policy Learning and Reinforcement Feedback for Tool-Jumping

4. Execution-Aware Tool-Jumping in Robotics and Multi-modal Agents

5. RL-Driven, Execution-Structured Tool-Jumping for Code Navigation

6. Cost and Latency-Aware Tool Planning in LLM Agents

7. Efficient Serving and Scheduling for Execution-Aware Tool-Jumping

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics