ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture

Published 6 Apr 2026 in cs.AI and cs.CL | (2604.04820v1)

Abstract: AI agents, autonomous digital actors, need agent-native protocols; existing methods include GUI automation and MCP-based skills, with defects of high token consumption, fragmented interaction, inadequate security, due to lacking a unified top-level framework and key components, each independent module flawed. To address these issues, we present ANX, an open, extensible, verifiable agent-native protocol and top-level framework integrating CLI, Skill, MCP, resolving pain points via protocol innovation, architectural optimization and tool supplementation. Its four core innovations: 1) Agent-native design (ANX Config, Markup, CLI) with high information density, flexibility and strong adaptability to reduce tokens and eliminate inconsistencies; 2) Human-agent interaction combining Skill's flexibility for dual rendering as agent-executable instructions and human-readable UI; 3) MCP-supported on-demand lightweight apps without pre-registration; 4) ANX Markup-enabled machine-executable SOPs eliminating ambiguity for reliable long-horizon tasks and multi-agent collaboration. As the first in a series, we focus on ANX's design, present its 3EX decoupled architecture with ANXHub and preliminary feasibility analysis and experimental validation. ANX ensures native security: LLM-bypassed UI-to-Core communication keeps sensitive data out of agent context; human-only confirmation prevents automated misuse. Form-filling experiments with Qwen3.5-plus/GPT-4o show ANX reduces tokens by 47.3% (Qwen3.5-plus) and 55.6% (GPT-4o) vs MCP-based skills, 57.1% (Qwen3.5-plus) and 66.3% (GPT-4o) vs GUI automation, and shortens execution time by 58.1% and 57.7% vs MCP-based skills.

Abstract PDF Upgrade to Chat

Authors (1)

Xu Mingze

Summary

The paper presents a protocol-first design that introduces the ANX protocol and 3EX decoupled architecture, addressing inefficiencies and security vulnerabilities.
It leverages a compact ANX Markup for deterministic SOPs and dynamic, secure task delegation in multi-agent collaboration.
Experimental results show up to 66% token reduction and 58% faster execution, affirming ANX’s efficiency and robustness in agent interactions.

ANX: Protocol-First Design for AI Agent Interaction with a Supporting 3EX Decoupled Architecture

Motivation and Background

The proliferation of AI agents as autonomous, goal-oriented digital actors necessitates agent-native interaction protocols that address the fundamental inadequacies of human-centric paradigms such as GUI automation, MCP-based skills, and ad-hoc CLIs. Prior solutions suffer from high token consumption, interface fragmentation, and critical security shortfalls due to absent data isolation and ad hoc SOP specification. Notably, privacy vulnerabilities and ambiguous task representation further undermine agent robustness in practical deployments, as illustrated in domains requiring sensitive data handling and reliable long-horizon multi-agent workflows.

ANX Protocol: Structured, Secure, and Extensible

The ANX protocol introduces a unified agent-native specification format consisting of ANX Config, ANX Markup, and ANX CLI. ANX Markup is a high-density, composable, and expressive language for task specification, fundamentally more compact than natural language or Markdown, supporting unambiguous, machine-executable SOPs. Sensitive fields marked within ANX Markup trigger privileged user interface (UI) rendering, where data transmitted from UI to ANX Core bypasses the LLM entirely, ensuring robust context isolation. Human-agent interaction is natively supported: the same ANX definition is dual-rendered as executable instructions for agents and as interactive UI for humans, enabling seamless co-use and lowering development cost.

The protocol enables dynamic, autonomous application generation and zero-preinstall "use-and-go" by design. Progressive disclosure in task definition and tool invocation reduces token overhead by ensuring that agents receive only minimal, relevant information for the present context. Multi-agent collaboration and SOPs leverage structured, schema-based primitives for explicit dependency modeling (sources/targets), deterministic control flow, and explicit security annotations.

The 3EX Architecture: Expression, Exchange, Execution

The ANX framework operationalizes the protocol via the 3EX (Expression–Exchange–Execution) decoupled architecture, with three compositional layers:

Expression Layer: Specifies tasks using ANX Markup, encapsulating all metadata, field constraints, and security annotations.
Exchange Layer (ANXHub): Implements a global semantic discovery marketplace for dynamic, zero-install skill and tool retrieval, outperforming MCP's directory-based registry in both scalability and start-up efficiency.
Execution Layer (ANX Core/CLI/Node): Translates structured tasks to lightweight CLI commands executed in a secure containerized environment, with ANX Core managing progressive command disclosure and real-time result feedback.
Figure 1: Overview of the ANX 3EX runtime, showing the decoupling between task specification, marketplace-based tool discovery, and execution orchestration.

This design ensures baseline token consumption is fixed irrespective of the application library size and decouples protocol complexity from the LLM, minimizing serialization and transmission overhead.

Security: Data Isolation and Human Validation

Security is architected as a first-class design objective. Sensitive data handling exploits direct, encrypted UI-to-Core channels that prohibit LLM access (agents see only reference tokens), and all critical actions invoke unbypassable human-only confirmation dialogs. The protocol's state machine strictly enforces these invariants: neither confirmation nor sensitive data input can be automated or subverted by the agent.

Figure 2: The ANX protocol state machine, highlighting security-critical WAITING_UI and CONFIRMING states that enforce UI-to-Core isolation and human-only gating.

Figure 3: Mechanisms for sensitive information handling, demonstrating context isolation of raw sensitive fields.

Figure 4: Human-validated handling, with confirmation dialogs requiring explicit user interaction and validation via user token.

These mechanisms provide application-level isolation absent from alternatives such as MCP, CHEQ, or network-layer encryption schemes, addressing both privacy and automation misuse vectors. The threat model clarifies residual risks, including UI spoofing, untrusted core/hub, and LLM-induced social engineering, recommending deployment best practices for mitigations.

SOP Semantics and Multi-Agent Collaboration

ANX Markup enables unambiguous SOP specification for complex, long-horizon task scheduling, eliminating natural language ambiguity via deterministic dependency graphs, schema-validated logic, and explicit role assignment. The protocol supports fork/join control flow, conditional routing, and human-in-the-loop branching for nuanced workflows. ANXHub orchestrates cross-agent synchronization, decomposing SOPs and routing sub-tasks based on semantic capability discovery.

Figure 5: The ANX SOP mechanism for deterministic and auditable workflow scheduling.

A case study of collaborative resume screening demonstrates agents autonomously executing high-confidence branches, with critical ambiguous cases escalated to human validation, all orchestrated by a shared ANX Node state.

Figure 6: Multi-agent and human collaborative SOP execution for role-aligned, security-validated workflows.

Comparative Analysis

The four-dimensional evaluation framework introduced—Protocol Tooling, Discovery, Security, and Collaboration—systematically differentiates ANX from prior domains. ANX uniquely combines zero-friction skill adoption, global marketplace discovery, built-in LLM data isolation, and deterministic, human-in-the-loop SOP orchestration within a unified protocol. Empirical evidence affirms these claims.

Experimental Results

Benchmarking ANX, MCP-based skills, and GUI-based agents on a realistic multi-field account registration task (with dynamic option fetching), the following efficiency advantages were observed (Qwen3.5-plus / GPT-4o):

Token Reduction: ANX achieves a 47.3%/55.6% reduction vs. MCP-based skill, and 57.1%/66.3% vs. GUI, in task-incremental token cost.
Execution Time: ANX shortens runtime by 58.1%/57.7% vs. MCP-based skill.
Statistical Robustness: All improvements are statistically significant after Bonferroni correction.

Underlying reasons are the offloading of data resolution and UI logic to ANX Core, eliminating DOM parsing and dynamic option tokenization in the LLM context.

Figure 7: Example of 'Create Job Account' form specification in ANX Markup.

(Figure 8)

Figure 8: GUI agent workflow, with high overhead from DOM traversal and action serialization.

(Figure 9)

Figure 9: MCP flow, with moderate efficiency but persistent payload serialization and pre-loaded dynamic options.

(Figure 10)

Figure 10: ANX workflow, with minimum LLM context exposure and efficient execution.

Practical and Theoretical Implications

ANX shifts the foundation of agent interaction from ad-hoc natural language or declarative schemas with limited isolation, toward a protocol-first, semantically precise, security-enforced, and efficiently orchestrated multi-agent ecosystem. The combination of dynamic skill discovery, robust data isolation, and deterministic SOP trajectory is directly applicable to production environments with stringent privacy, audit, and operational reliability requirements.

Theoretically, ANX demonstrates that meaningful protocol-level semantic abstraction is both feasible and necessary, as posited by recent critiques on agentic communication protocol limitations (Yuan et al., 30 Mar 2026). Progressive disclosure and market-driven discovery position ANX’s architecture for continued scalability as the agent skill landscape grows.

Limitations and Future Work

Experimental validation is currently constrained to single-task and synthetic scenarios without adversarial security evaluation or large-scale SOP graphs. Future work will expand to:

Multi-agent distributed coordination and agent elimination,
Adversarial and usability-centered security analysis,
Comparative studies against centralized SkillHub paradigms,
Enterprise deployment via federated or decentralized ANXHub topologies,
Large, deeply branched SOPs with real-time state synchronization,
Augmentation with skill-growing and traceability primitives,
Specialization and quantization for lightweight protocol models.

Conclusion

ANX represents a unified, extensible agent-native protocol and architecture for efficient, secure, and semantically precise agent interaction. By decoupling expression, discovery, and execution, embedding robust data isolation and unbypassable human confirmation, and supporting deterministic SOP execution with native multi-agent and human collaboration, ANX sets a rigorous foundation for scalable, production-grade agent ecosystems and calls for further study on protocol-level semantic abstraction and governance in AI agent systems (2604.04820).

Markdown Report Issue