Springdrift: An Auditable Persistent Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Ambient Self-Perception

Published 6 Apr 2026 in cs.AI | (2604.04660v1)

Abstract: We present Springdrift, a persistent runtime for long-lived LLM agents. The system integrates an auditable execution substrate (append-only memory, supervised processes, git-backed recovery), a case-based reasoning memory layer with hybrid retrieval (evaluated against a dense cosine baseline), a deterministic normative calculus for safety gating with auditable axiom trails, and continuous ambient self-perception via a structured self-state representation (the sensorium) injected each cycle without tool calls. These properties support behaviours difficult to achieve in session-bounded systems: cross-session task continuity, cross-channel context maintenance, end-to-end forensic reconstruction of decisions, and self-diagnostic behaviour. We report on a single-instance deployment over 23 days (19 operating days), during which the agent diagnosed its own infrastructure bugs, classified failure modes, identified an architectural vulnerability, and maintained context across email and web channels -- without explicit instruction. We introduce the term Artificial Retainer for this category: a non-human system with persistent memory, defined authority, domain-specific autonomy, and forensic accountability in an ongoing relationship with a specific principal -- distinguished from software assistants and autonomous agents, drawing on professional retainer relationships and the bounded autonomy of trained working animals. This is a technical report on a systems design and deployment case study, not a benchmark-driven evaluation. Evidence is from a single instance with a single operator, presented as illustration of what these architectural properties can support in practice. Implemented in approximately Gleam on Erlang/OTP. Code, artefacts, and redacted operational logs will be available at https://github.com/seamus-brady/springdrift upon publication.

Abstract PDF Upgrade to Chat

Authors (1)

Seamus Brady

Summary

The paper introduces Springdrift, a robust persistent runtime for LLM agents that integrates case-based memory, normative safety gating, and continuous self-perception to ensure operational accountability and forensic traceability.
It employs an actor-model architecture with append-only memory and a hybrid lexical–semantic retrieval pipeline, yielding improved performance metrics such as higher P@4 scores on high-difficulty queries.
The system uses a deterministic normative calculus inspired by Stoic ethics to enforce operator-specified commitments, enabling safe, auditable autonomous behavior and preventing misrouted sensitive outputs.

Springdrift: A Persistent, Auditable Runtime for LLM Agents with Case-Based Memory, Normative Safety, and Self-Observation

Architectural Motivation and Design Principles

Springdrift is architected to address the deficiencies of session-bounded LLM agents in long-lived deployments. Existing agent frameworks typically instantiate stateless agents for ephemeral tasks, discarding nuanced experience, cross-session behavioral continuity, and operational accountability. In contrast, Springdrift foregrounds persistent state, full-cycle auditability, case-based experience accrual, and interpretable, deterministic normative safety gating as prerequisites for trustworthy, extensible agent operation.

The core thesis is that agents intended for enduring, principal-specific engagement require operational invariants that surpass ephemeral task optimization: authoritative decision-logging for forensic reconstruction, cross-channel persistent context, self-perception as a continuous architectural primitive, and explicit, operator-controllable normative frameworks. Simply layering memory atop sessionized architectures yields recall without accountability, precluding dependable forensic inspection and safe autonomy scaling.

System Architecture

Springdrift is implemented in Gleam atop the Erlang/OTP BEAM VM, leveraging actor-model supervision, per-process isolation, and preemptive scheduling to deliver robustness under continuous operation. Each agent instance is a supervised process, and every component — cognitive loop, memory actors, tool schedulers, and specialist sub-agents — is encapsulated as an OTP actor with strong, statically-typed message passing. State mutation occurs only via append-only logs; internal state is reconstructable solely from these persistent records, supporting exhaustive forensic replay and diff-based auditing via a git-backed storage substrate.

The memory subsystem is notably decomposed into ten append-only stores for narrative, cases, facts, artifacts, planner tasks, endeavours, communications, affect, threads, and per-cycle telemetry. Live indexes are maintained in ETS for query efficiency, and all memory manipulation is owner-enforced, preventing cross-process races.

The agent’s persona, normative commitments, and runtime configuration are codified as structured, operator-auditable files, providing durable identity and authority context that persist independent of transient sessions.

Self-Perception: The Sensorium

A distinctive property of Springdrift is the "sensorium" — a structured, non-interactive XML block capturing clock, vitals, delegations, workloads, and rolling performance metrics — which is injected into every cycle’s system prompt a priori. By architecturally ensuring the agent’s self-knowledge is always available without incurring tool call penalties or explicit self-diagnosis flows, Springdrift closes the gap between operational reality and model awareness. This supports context-calibrated responses, proactive repair, and introspective behaviors that are absent in session-bounded or tool call-initiated self-observation strategies.

Empirically, the presence of the sensorium enabled behaviors such as unsupervised infrastructure diagnosis, cross-session and cross-channel reference continuity, and early detection of routine failures, as verified in deployment logs.

Case-Based Memory: Hybrid Lexical–Semantic Retrieval with Utility Tracking

Springdrift’s memory system incorporates a case-based reasoning (CBR) layer, storing structured problem-solution-outcome cases with utility statistics. Retrieval operates via a six-signal hybrid pipeline: inverted lexical index (0.25), semantic embeddings (0.40), weighted field score (0.10), recency (0.05), domain match (0.10), and success-derived utility score (0.10), with precedence-based culling to fit within a context budget ( $K=4$ ).

Empirical evaluation on a synthetic 800-case, 200-query benchmark demonstrates that the hybrid CBR pipeline achieves higher mean reciprocal rank and P@4 than dense embedding-only baselines — especially on high-difficulty queries (P@4: 0.883 vs. 0.796), confirming that outcome-weighted, structurally indexed experience compounds value over time. Notably, ablation removing the embedding feature degrades performance to 0.620 P@4. The architecture is thus validated for scalable, retrieval-efficient context construction in high-noise, multi-domain deployments, with the caveat that real-world relevance labels are outside the synthetic assessment’s scope.

Deterministic Normative Calculus for Safety and Authority

Agent safety in Springdrift is realized via a two-stage architecture:

D $'$ Discrepancy Analysis: A feature-level, importance-weighted discrepancy scoring function (based on Beach’s Image Theory) quantitatively screens all outputs at three gates: input, tool dispatch, and output. Thresholds are contextually parameterized (e.g., email channels use tighter cutoffs).
Normative Calculus: For outputs in the ambiguous region (between "modify" and "reject" thresholds), a deterministic, axiom-based normative calculus is invoked. This system, inspired by Becker’s Stoic ethics, formalizes all agent and operator commitments as 14-tier priority propositions with Required/Ought/Indifferent modality and Possible/Impossible scope. Six axioms — including futility, indifference, absolute prohibition, and strict moral/priority rank ordering — resolve every proposition pair, producing reproducible, auditable axiom trails for each safety/rejection verdict. Eight floor rules map severity profiles to Flourishing/Constrained/Prohibited decisions (accept/modify/reject).

Exhaustive evaluation over the full proposition space ( $84 \times 84 = 7,056$ cases) yielded 100% coverage, strict determinism, and monotonicity, validating formal soundness for enforcing operator-authored, high-priority commitments (e.g., legal, ethical constraints) against lower-priority or overruled user intent. Empirical deployment observed this system successfully preventing unreviewed report delivery and rejecting misrouted sensitive content, highlighting its operational criticality.

Agent Character, Self-Observation, and the Artificial Retainer Paradigm

Springdrift formalizes "character" as operator-specified normative commitments, enforced independently of ephemeral instructions. This creates predictability and stability in autonomous agent behavior, enabling operators to reason about and evolve the agent’s boundaries over time. Importantly, self-observation is treated as a first-class architectural property: every error, degradation, or anomalous pattern is observable and actionable by the agent itself, closing the diagnostic loop.

This alignment is the foundation for the "Artificial Retainer" concept. Distinguished from assistants (which lack domain-refusal rights and durability) and autonomous agents (which are not necessarily accountable to operators), an Artificial Retainer is characterized by:

Persistent, session-transcending identity and context
Explicit operator authority boundaries
Proactive, cross-session engagement
Deterministic, audit-traceable refusal capacity ("right to say no")
Forensic activity logs and reconstructable decision histories
Domain- and operator-specific performance compounding

The analogy is drawn not with sentient agency but with bounded, trusted non-human professionals or trained working animals in human workflows, emphasizing practical autonomy with accountable, human-adjustable limits.

Empirical Observations

A 23-day (19 operating days) single-user deployment (10.7M tokens, 3,797 tool calls) revealed rich, architecturally enabled behaviors, including:

Autonomously recognizing and diagnosing infrastructure and delegation failures
Surfacing architectural vulnerabilities (e.g., control inversion via sub-agent prompt injection)
Maintaining cross-session, cross-channel context without operator intervention
Appraising its own operational blind spots and limitations of self-reference
Surfacing affective-like operational dynamics (e.g., desperation/confidence metrics) in response to infrastructure issues

Every such episode was isolatable via replayable logs and structured narrative memory, supporting full forensic analysis.

Practical and Theoretical Implications

Practically, Springdrift demonstrates that persistent, audit-focused runtime architectures make higher-order agent behaviors — proactive self-diagnosis, robust failure recovery, cross-task authority enforcement — attainable with existing LLMs and memory substrates. Stability, authority transparency, and proactive engagement are not emergent properties but depend critically on architectural choices: actor-model process isolation, append-only memory, auditable safety gating, and structured self-perception.

Theoretically, the Artificial Retainer is posited as an intermediary category between purely reactive tools and agents with general autonomy, mapping onto persistent, principal-specific, domain-bounded, and accountable digital entities analogous to working animals or retained professionals. This opens research avenues in authority calibration, long-term relationship modeling, declarative normative update, and forensic compliance tooling.

Springdrift’s design suggests that rigorous operational legibility and persistent context are attainable, but emphasizes that gate calibration, self-monitoring, and authority negotiation remain open, unsolved challenges as agent autonomy is scaled.

Limitations and Open Research Questions

All empirical results derive from a single deployment (n=1, single operator), thus lack statistical generality.
The CBR retrieval benchmark uses synthetic data, and stronger IR baselines remain unevaluated.
Longitudinal effects of outcome-weighted memory, fact decay, and authority/character drift are unassessed.
The deterministic normative calculus is validated formally but lacks external evaluation for legal, ethical, or cross-domain robustness.
Relationship decay, authority transfer, and operator trust dynamics are open for longitudinal and multi-user study.
Telemetry-inferred affect may not detect adversarial conditions where external signals do not reflect internal model stress or exploitation.

Conclusion

Springdrift operationalizes a reference architecture for persistent, auditable LLM agent systems with rich, principal-centered memory, deterministic normative safety, and structural self-perception. Its empirical evidence highlights the necessity of architecturally enforced auditability and authority invariants for long-lived agent trust, with practical mechanisms for memory, safety, and introspective behavior. The Artificial Retainer paradigm articulated herein suggests a tractable, valuable intermediate stage in AI deployment, emphasizing trustworthy, bounded autonomy over unconstrained agency. Future research should address longitudinal robustness, component ablation, authority calibration, and multi-operator generalization.

(2604.04660)

Markdown Report Issue