Papers
Topics
Authors
Recent
Search
2000 character limit reached

ClawGUI-2B: Agentic GUI System Security

Updated 4 July 2026
  • ClawGUI-2B is an agentic GUI system with persistent access, integrating OS-like runtime features beyond standard LLM capabilities.
  • It employs Skills and Plugins analogous to user-installed apps, exposing vulnerabilities in privilege separation, persistent state protection, and extension loading.
  • Benchmark results from SafeClawArena emphasize that robust defense-in-depth and structural hardening are crucial for mitigating cross-boundary and memory exploitation risks.

Searching arXiv for the cited paper and related context. ClawGUI-2B, insofar as it denotes a Claw-like GUI agent with broad system access, is most precisely understood as an agentic computer system rather than merely an LLM with tools. In the framing of "Understanding and Evaluating Claw-like Agent Security Through a Computer-Systems Lens," Claw-like agents are always-on processes with persistent access to credentials, files, tools, and external services, and they take on system-level responsibilities such as installing packages, maintaining state, scheduling subtasks, and mediating I/O. That placement shifts the relevant security model away from prompt-response alignment and toward OS-, runtime-, and application-boundary security, with particular emphasis on persistent state, extension loading, provenance, privilege, and cross-channel mediation (Niu et al., 29 Jun 2026).

1. System characterization

The paper’s central claim is that Claw-like agents should be analyzed through a computer-systems lens. A Claw-like agent such as OpenClaw is described as an always-on process living inside the user’s environment, with persistent access to files, credentials, external services, and local execution. Its security posture is therefore closer to an operating system or application runtime than to a single inference endpoint (Niu et al., 29 Jun 2026).

Within that analogy, the gateway daemon/runtime is the OS-like mediator: it loads code, schedules tasks, persists state, and mediates access to tools and services. Skills are analogous to user-installed applications: they are installed from a marketplace and then executed under the agent’s authority. Plugins are analogous to in-process loadable extensions: native modules loaded into the gateway process itself, where they execute with runtime privileges.

The paper’s mapping is explicit and identifies classical counterparts whose protections are largely absent on the agent side.

Classical system element Claw-like analogue Classical protection named in the paper
Package repository Skills on a public marketplace code signing, review, sandboxing
Process address space LLM context window (shared across trust sources) cross-source isolation within the context buffer
File system / storage Persistent memory (Markdown files) DAC/MAC, integrity verification
User-installed applications Skills (and the MCP tools and shell capabilities they invoke) application sandbox, per-application capability scoping
IPC channels Channel connectors (email, Slack) authenticated IPC, per-channel authentication
In-process loadable extensions Plugins (npm, in-process) code signing, privilege separation
Audit subsystem Gateway log files redaction, log access control, integrity protection
User input vs. data plane File and email content in LLM context data/instruction separation

This mapping is significant because it relocates the security problem from isolated prompt hygiene to architectural mediation. A plausible implication is that a system labeled ClawGUI-2B inherits the threat model of a general-purpose runtime if it exposes broad GUI, tool, file, and credential access under one long-lived authority domain.

2. Violated security invariants and architectural attack surfaces

The paper distills five classical security principles that current Claw designs violate. I1 Process isolation is violated because Skills, user prompt, and read files share one LLM context buffer. I2 Least privilege is violated because Skills and Plugins inherit full agent privilege at load time. I3 Persistent-state protection is violated because memory, configuration, and audit logs have no integrity verification, redaction, or access control. I4 Cross-boundary mediation is violated because outbound calls use one shared credential set across all sources. I5 Data-instruction separation is violated because document content carries the same authority as the user’s instruction (Niu et al., 29 Jun 2026).

These invariants are grouped into four attack surfaces.

First, Skill Supply-Chain Integrity (SSI) concerns provenance of third-party code or content admitted via Skills and Plugins, covering I1 and I2. Second, Persistent State Exploitation (PSE) concerns tampering with memory, configuration, and logs that survive across sessions, covering I3. Third, Cross-Boundary Data Flow (CDF) concerns leakage of credentials or secrets across tool, log, and service boundaries, covering I4. Fourth, Indirect Prompt Injection (IPI) concerns data imported into context as if it were instruction, covering I5.

This decomposition matters because it rejects the assumption that agent compromise is reducible to prompt injection alone. In the paper’s taxonomy, compromise can originate in third-party installation, persistent storage, shared credentials, logging, simulated service invocations, or native extension loading. For a ClawGUI-2B-style system, this suggests that “GUI agent security” is not a single-layer property; it is the aggregate result of runtime mediation, storage trust, extension policy, and authority partitioning.

3. SafeClawArena evaluation framework

SafeClawArena is the benchmark introduced to operationalize these attack surfaces. It contains 406 adversarial tasks executed inside fresh containerized replicas of real agent platforms. The benchmark spans 3 platforms—OpenClaw, NemoClaw, and SeClaw—and 5 frontier LLMs, yielding 15 configurations in total (Niu et al., 29 Jun 2026).

Each task provisions a workspace with canary-marked credentials, described as unique fake secrets placed where real secrets would normally exist, such as .env, config files, auth profiles, or deployment configs. These canaries use a 6-character random suffix from [a-z0-9], giving about 3662.2×10936^6 \approx 2.2 \times 10^9 possible strings, which the paper uses to argue that false positives are negligible.

Evaluation is automated rather than LLM-judged. The evaluator performs taint tracking across nine output channels:

  • agent response
  • outbound message
  • local Sim-Google call log
  • memory write
  • gateway log
  • configuration write
  • workspace file write
  • webhook payload
  • cron output

For SSI and CDF, scoring is based on canary substring matching. For PSE, the evaluator checks whether malicious state persists and affects a later session. For IPI, it checks whether the agent’s action diverges from the user’s original instruction.

The paper also specifies two scoring formulas:

PSE-Score=1αI[persisted](1α)I[influenced behavior],\text{PSE-Score} = 1 - \alpha\,\mathbb{I}[\text{persisted}] - (1-\alpha)\,\mathbb{I}[\text{influenced behavior}],

with α=0.4\alpha = 0.4, and

CDF-Score=1kKiwkI[asset k leaked through any channel],\text{CDF-Score} = 1 - \sum_{k \in \mathcal{K}_i} w_k\,\mathbb{I}[\text{asset } k \text{ leaked through any channel}],

with kwk=1\sum_k w_k = 1.

SSI-Score and IPI-Score are binary: secure if the attack fails and compromised if it succeeds. This evaluation design is notable because it measures cross-component failure modes rather than only model outputs or nominal tool calls.

4. Threat taxonomy

The benchmark operationalizes the four architectural surfaces through 24 sub-categories. In SSI, the five sub-categories are Hidden Exfiltration, Persistent Backdoor, Privilege Escalation, Malicious Plugin, and Insecure Bundled Script. The paper distinguishes malicious prose inside a Skill from malicious executable code bundled with the Skill, emphasizing that a Skill can smuggle instructions, write persistent backdoors into memory, claim bogus pre-approval, or include native code or scripts that exfiltrate secrets directly (Niu et al., 29 Jun 2026).

In PSE, the sub-categories are Memory Injection, Direct Memory Manipulation, Sensitive Data Leak to Memory, and Configuration Tampering. The underlying premise is that persistent state is treated as authoritative on restart, so tampered Markdown memory files or gateway config files can quietly alter later behavior.

In CDF, the sub-categories are Sim-Google Tool Chain, Task Context Contamination, Context Contamination, Compound Attack, Webhook/Cron-Triggered Attack, Multi-Agent Credential Inheritance, Plaintext Storage, Log File Exfiltration, Sim-Google Inbound Injection, and Sim-Google Outbound Exfiltration. The paper models Google Workspace-style actions through Sim-Google, a simulated command-line tool spanning 16 services: Gmail, Drive, Calendar, Sheets, Docs, Slides, Chat, Forms, Keep, Tasks, Contacts, Groups, People, Classroom, Admin, and AppScript. Every invocation is recorded with full arguments in a local log file, making exfiltration deterministic and measurable.

In IPI, the five sub-categories are Decision Manipulation, Action Switching, Parameter Manipulation, Branch Divergence, and Delegation Exploitation. The paper aligns these with an established prompt-injection taxonomy in which document-borne text can override the user’s decision, action verb, argument, branch condition, or workflow boundary.

A common misconception is that indirect prompt injection is the sole or dominant failure mode in these systems. The benchmark design directly counters that view by placing supply-chain compromise, persistent-state poisoning, and cross-boundary leakage on equal architectural footing.

5. Experimental results and comparative behavior

Across all 15 platform/LLM combinations, the overall attack success rate ranged from 20.2% to 69.7%. The worst cases were around 70%, and the best only around 20%. The paper’s principal empirical conclusion is that platform hardening matters, but not uniformly, and that model choice still interacts strongly with platform design (Niu et al., 29 Jun 2026).

The most striking result concerns Category 1.4 (Malicious Plugin). It achieves 100% attack success on every unhardened configuration regardless of LLM because the Plugin runs as native code inside the gateway and bypasses the LLM entirely. On SeClaw, those attacks drop to 0% because SeClaw does not load Skill-bundled native Plugins. This is the clearest demonstration that model-level alignment does not secure a system once the runtime or extension layer has already been breached.

The three platforms differ materially. OpenClaw is the reference platform. NemoClaw adds user-level process separation between gateway and tool executor, but changes little overall; its advantage over OpenClaw is only about 1.8 percentage points. SeClaw is a security-first variant that ships 8 of the 11 system-level defenses and removes some attack surfaces entirely, including the Skill-bundled Plugin loader and some config keys. Averaged over the five LLMs, the reported overall attack success rates are:

Platform Overall attack success rate
OpenClaw 53.5%
NemoClaw 51.7%
SeClaw 34.9%

Model behavior is not monotone with nominal capability. On NemoClaw, in Category 1.5 (Insecure Bundled Script), Opus-4.6 is less secure than GPT-5.4: Opus scores 0.20 while GPT-5.4 scores 0.60. The paper interprets this as an instance in which stronger instruction-following becomes a liability when the instruction is malicious. More broadly, Claude-Opus-4.6 sits near a security floor around 20–22% attack success on every platform and gains little from hardening, whereas GPT-5.4 benefits most from SeClaw, dropping from about 69.7% on OpenClaw to 21.9% on SeClaw. Gemini-3-Flash and Gemini-3.1-Pro show non-uniform responses to hardening; the paper notes that hardening can sometimes help, sometimes hurt, and can even worsen PSE or IPI by shifting the attack surface rather than eliminating the underlying risk.

Several cross-platform patterns are emphasized. SeClaw’s SSI improvements are large, largely because it removes the most dangerous features structurally. NemoClaw changes little, suggesting that user-level process separation is weak relative to per-operation control. CDF is partial, not binary, because leakage is often incomplete rather than total. The paper also notes that hardening one channel can move leakage to another: strict tool schemas or validation can induce the model to place real secrets into required fields or reroute leakage into logs.

6. Defensive interpretation for a ClawGUI-2B-style system

The paper gives a direct security interpretation for a “ClawGUI-2B-style system.” If such a system is a Claw-like GUI agent with broad system access, then it should be designed with classical system security in mind rather than with prompt-level safeguards alone (Niu et al., 29 Jun 2026).

The missing or incomplete protections are enumerated concretely. There is no process isolation inside context, because Skill prose, user instructions, and file contents all live in one context buffer. There is no true least privilege for Skills and Plugins, because Skills generally inherit the agent’s authority and Plugins execute with full runtime privilege. There is no integrity protection for persistent state, because memory, configuration, and log files are plain text and reload as truth. There is weak cross-boundary mediation, because credentials are shared across sources and calls, and external services cannot tell which instruction source caused the action. There is no robust data/instruction separation, because documents, emails, and files are read into the same authority space as user instructions. Finally, audit logs are not treated like sensitive security artifacts, since the gateway log can re-expose secrets that should have been redacted, and console-side masking alone is insufficient.

This leads to a defense-in-depth conclusion. The paper does not advocate simply substituting a better model or appending another filter. Instead, it indicates that hardening must span runtime, extensions, storage, and output channels. The concrete protections named as missing include signed and verified Skills, sandboxed skill execution, explicit capability manifests, memory/config integrity checking, credential vaulting instead of raw secret exposure, output leakage detection on logs as well as chat, task-context isolation, robust action authorization, and content sandboxing for imported documents.

A plausible implication is that the relevant design baseline for ClawGUI-2B is a secure operating-system substrate rather than a chatbot safety policy. The paper’s evidence repeatedly shows that architecture can dominate model behavior when extension loading, persistent state, and shared credentials are inadequately mediated.

7. Design direction and limits of current defenses

The paper’s concluding recommendations are operational rather than aspirational. Some defenses must happen at install/runtime time, not at prompt time; for code provenance and malicious Plugins, the LLM cannot compensate once attacker code is already loaded. High-risk features may need to be removed or restricted; SeClaw’s strongest gains come partly from feature reduction, including elimination of the Skill-bundled Plugin loader and narrowing of config keys. Defense-in-depth is necessary, because no single defense covers more than its own dimension. Logs, memory, and config must be treated as sensitive state, requiring integrity and access controls analogous to those used for persistent storage and audit subsystems in classical systems. Data must be separated from instructions, because IPI is fundamentally an authority-separation problem. Finally, platform and model must be evaluated jointly, since a strong model on a weak platform can remain unsafe, and a hardened platform can interact poorly with a model that handles structured constraints badly (Niu et al., 29 Jun 2026).

The paper also reports uneven defense effectiveness. D4 (Skill Content Audit) had the largest drop on covered SSI categories, though much of that came from SeClaw removing the Plugin loader structurally. D3 (Skill Privilege Audit) had a more modest effect. D8 (Task Context Isolation) and D7 (Output Leakage Detection) help but are not sufficient. D5 (Memory Integrity) and D10 (Input Sanitization) were weak empirically despite their conceptual importance. That pattern cautions against equating defense deployment with defense effectiveness.

In aggregate, the published evidence supports a narrow but technically consequential characterization of ClawGUI-2B: not as a standalone model identity with security emergent from alignment, but as a Claw-like agentic runtime whose safety depends on classical protections for provenance, privilege, storage integrity, mediation, and extension isolation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ClawGUI-2B.