Frozen Agent Executor Architecture

Updated 8 May 2026

Frozen Agent Executor is a fixed-parameter LLM or tool-execution module that deterministically processes reasoning traces and skill sequences without internal adaptation.
It enables externalized learning via trainable planners, curators, and adapters, ensuring secure, modular, and reproducible operations in multi-agent systems.
Empirical evaluations in frameworks like SkillOS and TraceLift show significant performance gains and robust, deterministic behavior in complex agentic pipelines.

A Frozen Agent Executor is a fixed-parameter agentic module—most commonly a LLM or tool-execution agent—deployed in multi-component AI systems to consume reasoning traces, skills, or action sequences and deterministically produce outputs, without any on-policy parameter adaptation or fine-tuning. This architecture underpins a suite of frameworks to enable modularity, controlled learning, and reliable evaluation in modern agentic pipelines, spanning skill curation, autonomous tool invocation, reasoning-planner systems, and governance-enforced execution control. All learning or adaptation is externalized: the executor is paired with trainable planners, curators, adapters, or authorization governors. During training and deployment, the executor’s parameters remain fully frozen, supporting precise credit assignment and robust measurement of downstream component performance.

1. Formal Definitions and Core Architectures

A typical frozen agent executor can be formalized in two dominant variants:

Frozen LLM Policy Over Markov Tasks

Let $\mathcal{E}_\phi$ denote a pre-trained LLM executor, with parameters $\phi$ frozen for all training and test-time operations. Agents work in episodic, partially-observable Markov decision processes (POMDPs) with task $x$ , observation $o_t$ , history $\mathcal{H}_t$ , and a skill repository $\mathcal{S}$ (textual, code, or tool definitions). At each step, the executor:

Retrieves a subset $\tilde{\mathcal{S}} = \mathrm{Retrieve}(x, \mathcal{S})$ (e.g., via BM25)
Assembles a prompt with $x$ , $o_t$ , $\mathcal{H}_t$ , and $\phi$ 0
Generates textual action $\phi$ 1

No updates to $\phi$ 2 are permitted; only external modules (skill curators, adapters) evolve over time. The executor outputs state-action trajectories and correctness signals for downstream learning (Ouyang et al., 7 May 2026).

Planner–Frozen Executor Chains

In planner-executor architectures, e.g., TraceLift:

A trainable planner $\phi$ 3 emits explicit reasoning traces $\phi$ 4
The frozen executor $\phi$ 5 consumes $\phi$ 6—problem plus trace—and outputs a code artifact or answer
Verifier $\phi$ 7 scores the result; only $\phi$ 8 is trainable; $\phi$ 9 is strictly immutable (Han et al., 5 May 2026)

This enforces a clear separation between improvable planning and reproducible execution.

Skill and Action-Centric Frozen Executors

In skill-based and workflow execution domains:

The executor wraps a fixed LLM (e.g., Claude Code Opus), parameterized by $x$ 0, and a repository of skills (code, prompts, or Markdown files)
Skill invocation is triggered by matching or explicit tool calls; the executor runs generated code or scripts and integrates outputs into the agent's trajectory (Alzubi et al., 3 Mar 2026)

The executor is never fine-tuned; all adaptation arises from skill curation or external artifacts.

2. Executor Integration in Modular and Multi-Agent Systems

Frozen executors serve as the backbone in compositional agent stacks:

SkillOS: The executor is paired with an RL-trained skill curator. Executor outputs—task trajectories and success flags—furnish the environment for the curator’s learning. The curator updates the skill repository, indirectly dictating executor performance on future related tasks, but never updating executor weights (Ouyang et al., 7 May 2026).
TraceLift: The executor is the fixed interpreter of planner-emitted traces; the planner is trained using executor-grounded rewards which measure the improvement in executor output when consuming better reasoning traces (Han et al., 5 May 2026).
SEAM: A lightweight, plug-in, structured experience adapter generates guidance for a frozen LLM executor, trained purely to maximize utility for the underlying immutable LLM (Li et al., 30 Jan 2026).
Faramesh: The executor is placed behind an Action Authorization Boundary (AAB), accepting only authorized actions; it is “frozen” in operational semantics, blocking or deferring functions according to cryptographically-signed policy decisions, and embedding enforcement and audit guarantees into external infrastructure (Fatmi, 25 Jan 2026).
MoRAgent: The executor role is implemented as a mixture-of-experts LoRA network, with trainable per-role low-rank adapters and strictly frozen base model weights. The executor is solely responsible for generating function/tool call actions, and only the LoRA parameters for this role are updated (Han et al., 25 Dec 2025).

Table: Representative Executor Pairings

Framework	Executor Component	External Component/Learned Module
SkillOS	Frozen LLM (Qwen3-8B, etc.)	RL-trained skill curator
TraceLift	Frozen LLM executor	RL-trained planner
SEAM	Frozen LLM executor	Trained structured adapter
Faramesh	Tool executor (frozen)	Authorization governor/controller
MoRAgent	Frozen backbone + LoRA exec	Role-specific LoRA adapters
EvoSkill	Frozen code LLM	Skill folder repository

3. Training and Inference Protocols

In all settings, the frozen agent executor is never updated; adaptation is realized via:

Repository Evolution: SkillOS, EvoSkill, and similar frameworks evolve the skill repository or supporting files (Markdown, code) via skill curator or builder agents, but the executor remains unchanged. New tasks are solved by retrieving and applying evolved skills.
Trace or Guidance Consumption: Executors in TraceLift and SEAM consume explicit outputs (reasoning traces or structured experience) produced by external modules, which are trained using executor performance gradients.
Token Routing and Adapters: In MoRAgent, executor-phase tokens are routed to low-rank LoRA adapters with all other model weights frozen, localizing adaptation to the executor’s code pathways.

Pseudocode for SkillOS executor test-time usage:

$x$ 1

4. Experimental Impact and Limitations

Frozen agent executors consistently enable robust measurement and reliable improvement attribution across varied agent domains.

Notable Empirical Results

SkillOS: On ALFWorld, SkillOS improves average success rate (SR) from 47.9% (no memory) to 61.2% (frozen executor, RL-trained curator)—a +5.5% absolute gain over the strongest static memory baseline—and reduces steps per episode by 10%. On Gemini-2.5-Pro, SR increases from 66.4% (no memory) to 80.2%, a +8.8% gain (Ouyang et al., 7 May 2026).
TraceLift: Executor-grounded reward improves code pass@1 from 52.28% (exec-only RL) to 54.89%, and math accuracy from 64.72% to 69.23%, relative to exec-only RL (Han et al., 5 May 2026).
SEAM: Structured guidance via SEAM yields up to +9.7 percentage points on AIME24 (Qwen3-4B executor), with added robustness and transferability (23–28% relative boost on cross-executor tasks) (Li et al., 30 Jan 2026).
MoRAgent: Executor LoRA raises non-live function call execution success from 19.2% (base) to 80.0% (MoRAgent-Llama) on the BFCL (Han et al., 25 Dec 2025).
EvoSkill: Skill evolution under fixed executors improves OfficeQA accuracy 60.6% to 67.9%, and SealQA from 26.6% to 38.7% (Alzubi et al., 3 Mar 2026).

Limitations observed:

Executors cannot adapt their weights to new skills, causing potential skill-executor mismatch.
Relying on static retrieval (e.g., BM25) rather than learned retrievers can bottleneck skill usage (Ouyang et al., 7 May 2026).
In TraceLift and SEAM, overfitting to executor-specific idiosyncrasies or limited reward model coverage may diminish generality (Han et al., 5 May 2026, Li et al., 30 Jan 2026).
Faramesh-style frozen executors depend critically on the trustworthiness and liveness of the external authorization/control layer; agent “liveness” is mediable solely via its output artifacts (Fatmi, 25 Jan 2026).

5. Governance, Security, and Determinism

Frozen agent executors serve a crucial security and governance function in regulated, multi-tenant or safety-sensitive settings.

Authorization Boundaries: Systems such as Faramesh require all agent-emitted actions to pass through a canonicalization and signing stage (CAR + AAB) before execution. Frozen executors honor only valid, artifact-signed actions, guaranteeing non-bypassability and deterministic provenance (Fatmi, 25 Jan 2026).
Provenance Logging: Decided actions are recorded in an append-only, hash-chained ledger, recording for each action the canonical form, input policy/state, decision (PERMIT/DEFER/DENY), and cryptographic signature—enabling full audit and replay.
Determinism and Idempotence: Executing the same intent always yields the same outcome; actions deferred (“frozen”) due to policy must await an explicit RE-EVAL, lending strong predictability to the agent’s side-effect interface.

A plausible implication is that the “frozen” design pattern can be leveraged to provide deterministic, tamper-evident, and auditable agentic infrastructure irrespective of underlying model or protocol choices.

6. Extensions, Robustness, and Latency Optimization

Frozen executors are increasingly being extended for performance and usability:

Speculative Tool Execution: PASTE performs speculative execution of likely tool calls based on mined control/data-flow patterns, hiding up to 48.5% of tool latency and boosting throughput by 1.8×, all while respecting the executor’s frozen semantics (Sui et al., 19 Mar 2026).
Utility-Optimized Adapters: Small, externalized adapters (e.g., SEAM) trained with group-relative PPO (GRPO) allow skillful guidance of frozen LLMs, with low latency and principled experience scaling (Li et al., 30 Jan 2026).
Hierarchical and Cross-Modal Extensions: Multi-level frozen executors are leveraged for cross-modal (e.g., vision-language) and hierarchical reasoning scenarios, with modular skill or plan grounding (Han et al., 5 May 2026).

A plausible implication is that the frozen agent executor paradigm scales to highly compositional, tool-rich, and regulated agentic deployments, provided that skill curation, policy control, or guide modules are engineered to maximize executor utility and efficiency.

7. Summary and Outlook

The frozen agent executor paradigm has established itself as a foundational abstraction for controllable, modular, and auditable agent architectures. Its role as an immutable, specification-respecting module enables:

Precise credit assignment to external learning components (curators, planners, adapters)
Robust evaluation unaffected by ongoing changes to execution logic
Secure, policy-compliant action enforcement and provenance auditing
Compositional assembly of multi-agent, multi-tool, and multi-modal reasoning pipelines

The continuing expansion of this architecture—spanning skill evolution, trust boundaries, experience adapters, and latency-hiding mechanisms—suggests a growing centrality in both research and practical agent deployments (Ouyang et al., 7 May 2026, Han et al., 5 May 2026, Li et al., 30 Jan 2026, Fatmi, 25 Jan 2026, Sui et al., 19 Mar 2026, Han et al., 25 Dec 2025, Alzubi et al., 3 Mar 2026).