enclawed: A Configurable, Sector-Neutral Hardening Framework for Single-User AI Assistant Gateways

Published 18 Apr 2026 in cs.CR, cs.AI, and cs.MA | (2604.16838v1)

Abstract: We present enclawed, a hard-fork hardening framework built on top of the OpenClaw single-user personal AI assistant gateway. enclawed targets deployments that need attestable peer trust, deny-by-default external connectivity, signed-module loading, and a tamper-evident audit trail typically regulated industries such as financial services, healthcare, defense contracting, regulated R&D, and government enclaves. The framework ships in two flavors: an open flavor that preserves OpenClaw compatibility while still emitting audit, classification, and data-loss-prevention (DLP) signals, and an enclaved flavor that activates strict allowlists, Federal Information Processing Standards (FIPS) cryptographic-module assertion, mandatory module-manifest signature verification, and high-assurance peer attestation for the Model Context Protocol (MCP). The classification ladder is fully data-driven: a deploying organization selects from five built-in presets (generic, US-government, healthcare, financial services, three-tier) or supplies its own JSON. We accompany the implementation with a security review, a 204-case test suite (146 unit tests, 58 adversarial pen-tests for tamper detection, signature forgery, egress bypass, trust-root mutation, DLP evasion, prompt injection, and code injection), real-time human-in-the-loop control (per-agent pause / resume / stop and approval queues), a memory-bounded secure transaction buffer with rollback (default cap 50% of system RAM, configurable), a strict-mode TypeScript typecheck of all 22 framework files, and a GitHub Actions workflow ready for continuous integration. enclawed is a hardening framework, not an accredited compliance certification. The deploying organization remains responsible for hardware, validated cryptographic modules, certified facilities, and assessor sign-off.

Abstract PDF Upgrade to Chat

Authors (1)

Alfredo Metere

Summary

The paper introduces enclawed, a hardening framework that enforces binary-level security invariants for regulated AI assistant gateways.
It details dual enforcement modes and customizable, data-driven classification to meet diverse compliance standards in sectors like healthcare and finance.
Comprehensive testing validated robust policy enforcement, tamper-evident audit logging, and secure module loading under adversarial conditions.

enclawed: Architecture and Security Model for Regulated, Single-User AI Assistant Gateways

Motivation and Problem Statement

The paper presents "enclawed," a hard-forked hardening framework built atop OpenClaw, addressing deployment of single-user AI assistant gateways within regulated environments. Emergent risks in generative AI usage—especially in sectors governed by regulations such as HIPAA, GDPR, PCI DSS, ISO/IEC 27001, SOC 2, and NIST SP 800-53—necessitate deeper security controls than those offered by consumer-focused AI gateways. The upstream OpenClaw framework, with permissive defaults, broad plugin ecosystems, and cloud-first orientation, is fundamentally misaligned with confidentiality, auditability, and supply-chain assurances required in sectors like healthcare (PHI), finance (MNPI), defense (CUI/ITAR), and R&D.

The necessity for enclawed's architectural divergence is articulated through analysis of (i) the inability of configuration-only strategies to enforce security-invariant properties at the binary level, (ii) the dangers of treating plugin-loading as a trusted boundary, and (iii) the non-feasibility of landing mandatory security constraints upstream, due to product-market fit and community compatibility.

Design Commitments and Architecture

enclawed operationalizes three fundamental design commitments:

Always-on Policy Enforcement: Classification and policy enforcement activate at process bootstrap, precluding any bypass via late- or user-controlled code. There is intentional exclusion of a compatibility mode in production builds.
Dual Flavor Model: The framework ships as two enforcement flavors: an "open" mode for development and non-regulated deployment, preserving OpenClaw compatibility (warn-only, no enforcement), and an "enclaved" mode for regulated contexts, featuring strict allowlists, FIPS-validated cryptography assertion, mandatory Ed25519-signed module manifests, and strong peer attestation for Model Context Protocol.
Configurable, Data-driven Classification: The label lattice is sector-neutral and not hardcoded; schemes (e.g., for US government, financial services, healthcare, R&D) are customizable via built-in presets or JSON input, with invariants for rank and name uniqueness mechanically enforced.

The architectural fork excises 78 cloud- and external-facing modules from the extension tree, ensuring that only local-capable inference engines and channels survive. The bootstrapping sequence front-loads environment checks, policy and scheme loading, audit log initialization, module manifest pre-validation, trust root locking, and runtime singleton allocation with a genesis hash entry, providing deterministic and verifiable chain of trust from the first executed instruction.

Security Properties and Implementation Details

The framework integrates several security primitives, collectively targeting regulated-deployment needs unaddressed by extant open-source assistant gateways and orchestration frameworks:

Mandatory Classification and Access Control: Bell-LaPadula is implemented as the primary confidentiality model. Label dominance, least upper bound, and compartment handling are mechanized as total functions on integer ranks, with compartment/releasability support for richer sector-specific access semantics. The framework does not implement accredited identity; organizations must bind session identities externally.
Tamper-Evident, Hash-Chained Audit Logging: Every action, state transition, and significant event is recorded in a hash-chained, append-only JSON audit log with concurrent-append correctness, deep sanitization against injection, and prototype pollution prevention. Adversarial test coverage demonstrates that tampering, record reordering, or injection attempts are reliably surfaced by verification routines.
Denial of All Non-Local Egress: An egress guard replaces the fetch primitive, enforcing host allowlists and rendering all cloud egress impossible. In enclaved mode, descriptors are frozen and immutable, with evasion via reassignment resulting in hard failure.
Module-Loading Trust Boundaries: Only modules with manifests signed by explicitly trusted keys are loadable (enforced in enclaved), with trust roots locked post-bootstrap. Permission is denied statically at the time of channel or provider registration, not during runtime flux, and any signature, clearance, or manifest drift results in module denial.
DLP and Sensitive Content Scanning: Regex-based detectors for policy indicators, secrets, and PII run as a secondary defense; output with matches above severity thresholds is redacted. The limitations (lack of paraphrase or OCR-based detection) are clearly stated, making this a best-effort layer.
Human-in-the-Loop (HITL) Controls: Real-time operator control via an approval queue and stateful per-agent sessions enables forced pause, resume, or stop commands, with all actions and approvals auditable, and transactional buffer checkpointing for explicit mediation.
Secure Transaction Buffer with Bounded Rollback: Every reversible agent action registers an inverse. The buffer enforces rollback bounded by a percentage of system memory, with eviction mirroring Clark-Wilson's well-formed transaction discipline. Buffer and commits are hash-chained and auditable.
Zero-Trust, Blockchained Key Broker: For deployments requiring escrow in a shared custody model (e.g., multiple cloud HSM providers), the key broker achieves K-of-N consensus using Ed25519-attested blobs and chain-hashed ledgers. Both consensus and threshold-XOR schemes are supported. Auditors can reconstruct the chain independently, with limitations discussed (e.g., XOR not being K-of-N, no protection against ledger truncation).
Prompt Injection Mitigation: Input sanitization includes control character stripping, role-boundary neutralization, code fence isolation, and bidirectional/zwnj handling. Injection detection surfaces imperative attack markers. The scope is clearly limited to mechanical confusion vectors; semantic prompt injection defenses are recognized as open research.

Evaluation and Test Methodology

enclawed is supported by a comprehensive suite of 204 tests: 146 unit tests spanning lattice arithmetic, signature and manifest checking, audit-log invariants, and policy enforcement; and 58 adversarial pen-tests covering egress bypass, DLP evasion, prompt injection, code injection, log and chain tampering, trust-root mutation, and more. All run on Node 22, CI-enforced at every push/PR/cron, with strict static typing validation ( $\texttt{tsc --strict --noEmit}$ ). All documented bugs identified during development were fixed pre-publication, including race conditions, control/log injection, and trust root locking errors.

Empirically verified properties include: module signature verification, hash chain integrity (append, tamper, truncate), egress lock-down, audit event chain correctness under concurrency, HITL approval mediation, transaction buffer rollback (LIFO, partial error, eviction), and broker consensus and threshold-XOR key recovery.

Practical Implications and Theoretical Impact

enclawed provides an immediately composable, test-verified scaffold for regulated-enterprise deployments of LLM-based assistants, addressing the systemic policy and supply-chain gaps in the current generation of open assistant platforms. Its modular approach makes it agnostic to the underlying inference engine, able to integrate with dedicated DLP or content moderation services, and extendable by sector-specific classification policies without recoding.

However, the framework abstains from:

Delivering accredited cryptographic modules (organizations must supply FIPS140-3 or equivalent);
Enforcing kernel- or OS-level MLS/MAC (e.g., SELinux—this must be layered beneath enclawed);
Providing identity-to-clearance binding (integration with SAML/OIDC/smart card is external);
Audit log WORM shipping and secure facility requirements.

Sector-neutrality and decoupling of open/closed codebases—via MIT-licensed open-core and proprietary submodule for advanced modules—enable rapid review, adoption, and penetration of the primitives, while protecting organization-specific IP where mandated.

Future Work

Areas for future development include:

Implementation of true K-of-N secret-sharing key broker (e.g., via Shamir's scheme);
Extending information-flow control with orthogonal Biba or Clark-Wilson integrity enforcement;
OS-level integration for process hardening and tamper detection beyond JavaScript-controlled memory;
Automated or semi-automated identity and clearance assignment at session establishment;
Formal methods and proof-carrying code to enable third-party, machine-checkable certification;
Enhanced DLP leveraging model-assisted (rather than regex-based) content understanding.

Conclusion

enclawed delivers a systematically engineered hardening framework for single-user AI assistant gateways in regulated environments. Through deny-by-default policy, mechanized classification, hash-chained audit, module signature enforcement, zero-trust key brokering, and robust adversarial test coverage, it defines a security envelope around AI assistants compatible with sectoral compliance regimes. The separation of enforcement into open and enclaved flavors facilitates both community ecosystem compatibility and high-assurance enclave deployment. While not a total-certification product, enclawed supplies a verifiable foundation upon which regulated organizations can layer their compliance program, identity, infrastructure, and cryptographic assurances.

The architecture and threat model advance practical deployment security for LLM-based agents and frame a template for evolving regulatory expectations and compliance integration in AI systems.

Markdown Report Issue