AgentReuse: Reusing Agent Artifacts in AI Systems

Updated 4 July 2026

AgentReuse is the systematic reuse of prior agent artifacts—such as plans, trajectories, and skills—to streamline repeated reasoning and improve task efficiency.
It employs key algorithmic patterns including behavioral validation, abstraction with parameterization, and selective replay to ensure reusable components are accurate and robust.
Empirical results show significant performance gains, with improvements like a 93% effective plan reuse rate and up to 80% reduced retrieval latency in various applications.

AgentReuse denotes the systematic reuse of prior agent artifacts—plans, trajectories, workflows, cached knowledge, KV states, reusable skills, user preferences, and sub-agent configurations—to reduce repeated reasoning, amortize expensive inference, and improve reliability or generalization on recurring or structurally related tasks. In the cited literature, the term names both a concrete plan-reuse mechanism for LLM-driven assistants and a broader design pattern spanning exploit reproduction, GUI and web automation, Retrieval-Augmented Generation, agentic reinforcement learning, long-lived personalization, and orchestration (Li et al., 24 Dec 2025, Chen et al., 2 Jul 2026, Li, 16 Jun 2026).

1. Conceptual scope and research lineage

A recurring premise across the literature is that agent execution is rarely a one-shot process. ReAct-style systems re-enter the model across many turns, tool-using systems solve families of similar tasks, and multi-agent systems repeatedly process overlapping contexts. AgentReuse addresses this redundancy by preserving some representation of prior successful or partially successful work and reapplying it under controlled conditions, rather than recomputing from scratch. In personal assistants, this appears as plan reuse conditioned on intent classification and template similarity; in exploit reproduction, it appears as trajectory repair; in GUI automation, as reusable state machines, RPA functions, and guarded workflows; in RAG systems, as compact per-agent caches; and in agentic RL, as reusable skill dictionaries (Li et al., 24 Dec 2025, Chen et al., 2 Jul 2026, Chen et al., 20 May 2026, Lin et al., 4 Nov 2025, Xu et al., 29 May 2026).

The idea predates contemporary LLM agents. “Automatic Reuse, Adaption, and Execution of Simulation Experiments via Provenance Patterns” formalized automatic reuse through provenance graphs, provenance patterns, and rule-driven graph transformation in the RASE framework, allowing prior simulation experiments to be identified, adapted, executed, and re-recorded across studies (Wilsdorf et al., 2021). Later work shifts the reuse substrate from explicit experiment specifications to agent-generated artifacts such as traces, plans, and caches, but preserves the same core intuition: prior computation should be treated as a reusable asset rather than disposable intermediate state.

A further unifying development is the move from static roles to compositional agent specifications. AOrchestra models an agent as the tuple $A := (I, C, T, M)$ , where instruction, context, tools, and model can each be re-instantiated on demand, making reuse a problem of dynamic concretization rather than fixed agent design (Ruan et al., 3 Feb 2026). This suggests that AgentReuse is not a single algorithmic family, but an architectural principle for separating reusable structure from task-specific variation.

2. Reuse targets and formal abstractions

The literature reuses several distinct object types. In plan-centric systems, AgentReuse defines requests $R$ , intents $I$ , and plans $P$ , with reuse permitted when two requests share intent and the cosine similarity between parameter-stripped templates exceeds a threshold $\gamma$ ; the default threshold is $\gamma = 0.75$ (Li et al., 24 Dec 2025). ReUseIt formalizes a reusable web workflow as $W = (S, A, T, G, \Pi)$ , a guarded, parameterized execution graph whose steps carry preconditions, postconditions, and fallback repairs (Liu et al., 16 Oct 2025). PreAct compiles a successful computer-use run into a state-machine program whose states check the screen and whose transitions act directly, enabling guarded replay without per-step language-model calls (Li, 16 Jun 2026). AutoRPA distills ReAct trajectories into reusable RPA functions through a translator–builder pipeline and hybrid repair loop (Chen et al., 20 May 2026).

Trajectory-centric reuse is explicit in Refploit. An initial exploit-generation run is represented as $T_{\text{init}} = (S_1, S_2, \ldots, S_n)$ , and reuse occurs through trajectory segmentation, progress analysis, and constraint-oriented recovery. Failed trajectories are not discarded; instead, they are mined for completed subtasks, missing requirements, and misleading directions to avoid (Chen et al., 2 Jul 2026). In ReuseRL, the reused object is not a literal trace but a compressed skill dictionary $\mathcal{C}$ over a projected skill alphabet $\Sigma$ , with per-trajectory segmentation cost acting as the reuse-sensitive regularizer (Xu et al., 29 May 2026).

Knowledge and state reuse take a different form. ARC defines a per-agent RAG cache $R$ 0 under a capacity constraint $R$ 1 and prioritizes items via demand- and geometry-aware scoring, combining Distance–Rank Frequency, hubness, and a size penalty (Lin et al., 4 Nov 2025). LRAgent decomposes KV cache reuse in multi-LoRA systems into a shared base component from pretrained weights and a low-rank adapter-dependent component, with BaseShared and BaseLRShared corresponding to different degrees of reusable cache sharing (Jeon et al., 1 Feb 2026). ATRBench frames reusable user information as latent standing rules $R$ 2, acquired during learning sessions and later applied in offline test sessions through Ask-to-Remember decisions (Wu et al., 27 May 2026).

A concise synthesis of these abstractions is given below.

Reuse target	Representative formalization	Representative papers
Plans and templates	Intent-gated plan retrieval with $R$ 3	(Li et al., 24 Dec 2025)
Workflows and programs	Guarded workflow $R$ 4; compiled state machine	(Liu et al., 16 Oct 2025, Li, 16 Jun 2026)
Trajectories and traces	$R$ 5 with repair constraints	(Chen et al., 2 Jul 2026)
Knowledge and KV state	Cache $R$ 6; base/LR KV decomposition	(Lin et al., 4 Nov 2025, Jeon et al., 1 Feb 2026)
Skills and sub-agents	Skill dictionary $R$ 7; tuple $R$ 8	(Xu et al., 29 May 2026, Ruan et al., 3 Feb 2026)
User preferences	Hidden standing-rule pool $R$ 9 in cross-session evaluation	(Wu et al., 27 May 2026)

3. Core algorithmic patterns

A first common pattern is behavioral validation before reuse. Refploit validates an exploit by differential execution on vulnerable and patched versions and only accepts a trajectory when the intended behavior is observed on $I$ 0 and blocked on $I$ 1 (Chen et al., 2 Jul 2026). PreAct applies a verify-before-store gate: a freshly compiled state machine enters the store only if, when replayed from a clean state, it both reaches terminal state and passes an independent evaluator (Li, 16 Jun 2026). ReUseIt attaches pre/post execution guards to every critical workflow step and escalates only after repairs are exhausted (Liu et al., 16 Oct 2025). ReuseDroid uses action-level feedback, test-level reflection, and an explicit stop-condition check to validate migrated GUI tests against observed UI state changes (Li et al., 3 Apr 2025). Across these systems, reuse is grounded in external behavior rather than surface syntactic similarity.

A second pattern is abstraction and parameterization. AgentReuse strips slots from requests to produce reusable plan templates before embedding and retrieval (Li et al., 24 Dec 2025). ReuseDroid distills source tests into a reusable “test skeleton” containing core logic rather than full operational logic, then adapts the target exploration to differing UI flows (Li et al., 3 Apr 2025). AutoRPA converts hard-coded selectors into soft-coded procedures anchored by semantic attributes such as text, content description, editable state, and target description, while lifting task constants into parameters (Chen et al., 20 May 2026). ReUseIt similarly introduces parameter slots $I$ 2 so that guarded workflows can be rebound across websites, categories, and attribute variations (Liu et al., 16 Oct 2025). These mechanisms all separate invariant task structure from instance-specific arguments.

A third pattern is selective replay combined with repair or fallback. Refploit replays reliable trajectory prefixes, preserves correct subtasks, repairs incomplete ones, and imposes avoidance constraints on misleading directions (Chen et al., 2 Jul 2026). AutoRPA resumes execution from a breakpoint with ReAct-based fallback, concatenates code and recovery traces into a hybrid trajectory, and rebuilds the reusable function (Chen et al., 20 May 2026). PreAct verifies each state before acting and hands control back to the full computer-using agent at the first mismatch (Li, 16 Jun 2026). ARC first serves queries from a compact cache and escalates to the full corpus only when mean in-cache distance exceeds a threshold $I$ 3 (Lin et al., 4 Nov 2025). AOrchestra similarly delegates subtasks via $I$ 4 and can alter instruction, context, tool subset, or model choice on subsequent attempts (Ruan et al., 3 Feb 2026).

A fourth pattern is prioritization under resource constraints. ARC optimizes cache composition through a has-answer objective under finite storage (Lin et al., 4 Nov 2025). AccMER reuses a top-weighted subset $I$ 5 of transitions for a window of $I$ 6 steps to improve cache locality while mixing in fresh transitions $I$ 7 for diversity (Gogineni et al., 2023). ReuseRL penalizes successful trajectories that require many phrase segments under a learned dictionary, thereby favoring structurally compressible behaviors over idiosyncratic shortcuts (Xu et al., 29 May 2026). LRAgent shares the semantically invariant portion of the KV cache and keeps adapter-specific effects in low-rank form, reducing both memory and compute overhead in multi-LoRA multi-agent systems (Jeon et al., 1 Feb 2026). The common principle is that reuse must be selective and budget-aware; indiscriminate retention is not the objective.

4. Empirical performance across domains

The empirical record shows that AgentReuse can improve latency, throughput, success rate, or generalization, depending on the substrate being reused. In LLM-driven personal assistants, AgentReuse achieves a 93% effective plan reuse rate, an $I$ 8 score of 0.9718, and an accuracy of 0.9459 in request-similarity evaluation, reducing latency by 93.12% relative to baselines without reuse (Li et al., 24 Dec 2025). ARC reduces storage to 0.015% of the original corpus, reaches up to 79.80% has-answer rate, and lowers retrieval latency by about 80% on large-scale Wikipedia-backed QA (Lin et al., 4 Nov 2025). In multi-LoRA systems, BaseLRShared in LRAgent attains throughput and time-to-first-token close to FullShared while preserving accuracy near the non-shared baseline; at total length 66.4k on LLaMA-3.1-8B, FullShared achieves 1826.5 toks/s and BaseLRShared 1790.6 toks/s, whereas Non-Shared runs out of memory (Jeon et al., 1 Feb 2026).

In automation and interaction-heavy settings, the gains are similarly pronounced. ReuseDroid successfully migrates 90.3% of 578 GUI test-migration tasks and outperforms the best mapping-based and LLM-based baselines by 318.1% and 109.1%, respectively (Li et al., 3 Apr 2025). ReUseIt increases success across fifteen web tasks from 24.2% to 70.1%, and in follow-up runs reaches 86.5% with reusable workflows and 88.6% with user edits (Liu et al., 16 Oct 2025). AutoRPA generates reusable RPA functions that reduce token usage by 82% to 96%; on AndroidWorld with GPT-5, success rises from 74.1% for ReAct to 75.9% for AutoRPA, while tokens drop from 142.5k to 30.6k (Chen et al., 20 May 2026). PreAct replays stored state-machine programs 8.5–13× faster than solving from scratch, and its store-time verification gate contributes a marginal gain of 2.6 tasks on AndroidWorld, 2.6 on OSWorld, and 1.75 on WebArena relative to gate-off comparisons (Li, 16 Jun 2026).

Security, learning, and orchestration results extend the same pattern. Refploit reproduces 138 of 172 exploits, corresponding to an 80.2% reproduction rate, and improves 64.3% over initially generated exploit trajectories under DeepSeek V4-Flash (Chen et al., 2 Jul 2026). ReuseRL improves both in-distribution and out-of-distribution results, for example reaching 97.14 and 93.28 on ALFWorld IID and OOD, compared with 84.29 and 79.85 for GRPO (Xu et al., 29 May 2026). AOrchestra achieves a 16.28% relative improvement over the strongest baseline when paired with Gemini-3-Flash, and its context-sharing ablation shows curated context outperforming both no-context and full-context variants (Ruan et al., 3 Feb 2026). These results suggest that the benefits of reuse are not confined to inference efficiency; they also extend to structured recovery, decomposition quality, and policy generalization.

5. Failure modes, limits, and disputed assumptions

The most persistent misconception is that any apparent reuse success is genuine task success. Refploit shows that generated exploits may “appear” successful while never traversing the vulnerable library path, for example by replacing the vulnerable API with a self-implemented function or demonstrating the effect with unrelated primitives (Chen et al., 2 Jul 2026). PreAct documents an analogous failure mode in computer use: a replayed program can achieve 100% replay coverage yet still fail the evaluator because the final state is wrong, a case the paper characterizes as “lossy replay” (Li, 16 Jun 2026). ReUseIt and AutoRPA similarly note brittleness under dynamic UI changes, pop-ups, selector drift, and timing variance (Liu et al., 16 Oct 2025, Chen et al., 20 May 2026). In all of these cases, naive reuse without behavioral validation accumulates incorrect artifacts.

A second major limit concerns cache sharing and judge-centric inference. LRAgent shows that sharing the full cache across multi-LoRA agents erases adapter-specific information and degrades accuracy more than BaseShared or BaseLRShared (Jeon et al., 1 Feb 2026). An even stronger warning appears in “When KV Cache Reuse Fails in Multi-Agent Systems,” which finds that reuse strategies effective for execution agents can severely perturb LLM judges. End-task accuracy may remain stable, yet Judge Consistency Rate drops sharply, especially under candidate shuffling, because reuse weakens cross-candidate attention and later candidate blocks become under-attended (Liang et al., 13 Jan 2026). This establishes that judge-side reuse is a distinct regime: preserving answer accuracy is not sufficient if candidate selection itself becomes non-invariant.

A third limitation is that reuse depends on the quality and completeness of the reusable substrate. ARC notes non-stationarity, embedding updates, and per-agent specialization as sources of cache staleness or interference (Lin et al., 4 Nov 2025). ATRBench shows that long-lived agents usually fail not at recall or application, but at proactive acquisition: defaults across eight frontier agents remain at least 62 points below an oracle given the relevant preference, and prompting closes little of that gap (Wu et al., 27 May 2026). ReuseDroid reports difficulties with ambiguous UI semantics and non-intuitive gestures such as swipe-to-delete (Li et al., 3 Apr 2025). RASE depends on complete and semantically meaningful provenance; missing or coarse-grained provenance weakens pattern matching and adaptation (Wilsdorf et al., 2021). These results constrain the scope of AgentReuse: it is only as reliable as the artifact being reused and the mechanism used to validate or reinterpret it.

6. Broader implications and future directions

Taken together, these works suggest that AgentReuse is evolving from ad hoc caching toward explicit, typed, and behavior-validated reusable artifacts. The reusable object may be a plan template, workflow, trajectory segment, skill phrase, standing preference, sub-agent tuple, or low-rank cache component, but in each case the trend is toward making reuse first-class and inspectable rather than implicit in opaque model state. This is most explicit in AOrchestra’s tuple $I$ 9, ReUseIt’s guarded workflow units, PreAct’s executable state machines, and Refploit’s preservation/repair/avoidance constraints (Ruan et al., 3 Feb 2026, Liu et al., 16 Oct 2025, Li, 16 Jun 2026, Chen et al., 2 Jul 2026).

Several future directions are already articulated in the papers. ARC proposes adaptive cache size, online learning of $P$ 0, $P$ 1, and $P$ 2, session-aware extensions, and hierarchical shared-plus-private caches (Lin et al., 4 Nov 2025). ReuseRL supplies a PAC-Bayes generalization bound for MDL-grounded skill reuse and points toward extensions beyond text-only agentic benchmarks (Xu et al., 29 May 2026). ATRBench motivates value-of-information-like policies for proactive preference acquisition and typed memory schemas aligned to tool-action surfaces (Wu et al., 27 May 2026). ReUseIt proposes cross-site workflow-unit libraries, temporal-logic or finite-state verification of guards, and multi-signal verification combining screenshots, DOM diffs, and network logs (Liu et al., 16 Oct 2025). LRAgent points to richer KV sharing under architectural constraints such as shared- $P$ 3 multi-LoRA (Jeon et al., 1 Feb 2026).

The domain boundary is also widening. Refploit explicitly argues that trajectory repair can transfer beyond exploit reproduction to program synthesis, software build, data wrangling, ETL, and robotics, where failed attempts often encode partial progress that should be preserved rather than discarded (Chen et al., 2 Jul 2026). ReuseDroid extends core-logic migration ideas beyond Android to web and desktop UIs (Li et al., 3 Apr 2025). RASE demonstrates that reuse can be mediated through provenance rather than direct agent traces, suggesting a path for integrating agentic systems with scientific workflow infrastructure (Wilsdorf et al., 2021). A plausible implication is that AgentReuse is becoming a unifying systems concept for long-horizon automation: agents improve not only by reasoning better at each step, but by converting prior execution into reusable, validated structure that survives across turns, tasks, users, and agent configurations.