Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents

Published 21 May 2026 in cs.AI | (2605.22166v1)

Abstract: LLM agents are shaped not only by their LLMs, but also by the runtime harness that mediates observation, tool use, action execution, feedback interpretation, and trajectory control. While existing agent adaptation methods mainly update model parameters, many failures in deterministic, rule-governed domains stem from mismatches at the model--environment interface. We propose Life-Harness, a lifecycle-aware runtime harness that improves frozen LLM agents without changing model weights or evaluation environments. Life-Harness evolves from training trajectories by converting recurring interaction failures into reusable interventions across environment contracts, procedural skills, action realization, and trajectory regulation, and remains fixed during held-out evaluation. On seven deterministic environments from $τ$-bench, $τ^2$-bench, and AgentBench, Life-Harness improves 116 out of 126 model--environment settings across 18 model backbones, with an average relative improvement of 88.5%. Harnesses evolved only from Qwen3-4B-Instruct trajectories transfer to 17 other models, showing that Life-Harness captures reusable environment-side structure rather than model-specific behavior. These results position runtime interface adaptation as a complementary alternative to model-centric agent training. Code is available at GitHub.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper's main contribution is Life-Harness, a method that adapts the runtime harness instead of retraining model parameters.
The methodology uses four lifecycle layers to calibrate, validate, and regulate LLM agent interactions, achieving an 88.5% performance improvement.
Experimental validation across 18 model backbones and 7 benchmarks demonstrates Life-Harness's generalizability and complementary benefits to existing techniques.

Interface Adaptation for Deterministic LLM Agents

The paper "Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents" (arXiv ID: (2605.22166)) explores a paradigm shift in adapting LLM agents by focusing on the runtime harness that mediates model interactions with deterministic environments. Rather than modifying model parameters, the paper introduces Life-Harness, a method that evolves runtime interfaces from training trajectories, unlocking substantial improvements in agent performance across various deterministic settings.

Introduction and Motivation

Agents powered by LLMs like Qwen3-4B-Instruct are typically adapted by retraining model parameters with supervised techniques, reinforcement learning, or fine-tuning to improve performance. However, many failures in deterministic, rule-based environments arise from mismatches at the model-environment boundary rather than from model deficiency. The paper posits that by adapting the runtime harness—specifically, the layers that mediate the model's observation, execution, and feedback interpretation—significant performance gains can be achieved without altering the model weights.

Figure 1: An agent is not just an LLM; its behavior is shaped by the runtime harness.

Life-Harness Methodology

Life-Harness presents a structured approach to runtime adaptation, comprising four lifecycle layers that each address different phases of the agent-environment interaction:

Environment Contract Layer: Calibrates tool descriptions and interface constraints before interaction.
Procedural Skill Layer: Distills reusable procedures from training to guide task execution.
Action Realization Layer: Validates and canonicalizes model-generated actions before execution to ensure conformity with environment constraints.
Trajectory Regulation Layer: Monitors post-execution dynamics to rectify non-progressing patterns such as loops or invalid retries.
Figure 2: Overview of Life-Harness, detailing its multi-layer lifecycle approach.

Experimental Validation

The effectiveness of Life-Harness is verified across seven deterministic environments using 18 LLM model backbones. Notably, Life-Harness achieved an average performance improvement of 88.5% relative to baseline methods that alter model weights. This improvement was achieved by reusing a harness evolved from a single model's training trajectories across multiple model backbones.

Figure 3: Absolute performance improvement across 18 model backbones and 7 benchmarks.

The experiments underscore Life-Harness's capability to generalize across models and environments, demonstrating that it captures reusable environmental structures rather than model-specific behaviors.

Comparative Analysis

The paper contrasts Life-Harness against prompt evolution methods, showcasing its superior performance. While prompt optimization methods focus on refining the initial model prompt, Life-Harness adapts the broader interface, affecting tools, actions, and feedback loops crucial for deterministic domains.

Figure 4: Comparison with prompt evolving method highlighting Life-Harness's advantages.

Ablation Study and Harness Engineering

The study includes a robust ablation analysis, confirming the necessity of all four layers for optimal function. Furthermore, Life-Harness is shown to complement existing model-centric approaches. For instance, models extended with tool-specific training still benefit significantly from harness adaptation in terms of both performance and out-of-distribution generalization.

Figure 5: Comparison between specialized tool-use training and runtime harnessing.

Conclusion and Future Directions

Life-Harness establishes a compelling case for runtime interface adaptation as a viable alternative to traditional model adaptation strategies in deterministic environments. By evolving the runtime layer rather than updating model weights, flexibility and reusability are enhanced, suggesting a new pathway for improving LLM agent performance in rule-governed tasks.

This research opens avenues for further exploration into runtime harnessing for non-deterministic or open-ended environments, where stability and reproducibility are more challenging. Such future work could further extend the principles of Life-Harness to a broader array of artificial intelligence applications.

Markdown Report Issue