Agent–Computer Interface (ACI) Overview

Updated 26 May 2026

Agent–Computer Interface (ACI) is a paradigm defining machine-readable, composable protocols that enable autonomous agents to interact with computer systems.
It integrates formal capability registries, explicit contracts, and schema-based invocation protocols to ensure reliable and robust execution of system functions.
Empirical studies demonstrate that ACIs boost agent success rates and reduce task times, marking a paradigm shift from traditional human-centric interfaces.

Agent–Computer Interface (ACI) refers to the set of structures, abstractions, and protocols by which AI agents discover, reason about, and invoke computer system capabilities autonomously, without human mediation. Unlike traditional user interfaces (UIs) optimized for human cognition and interaction (e.g., GUIs, CLI), ACIs are specified to be machine-readable, composable, and robust for agentic consumption, supporting reliable and semantically explicit binding between agent intent and system functionality. The emergence of LLMs and autonomous planning agents as primary consumers of software is driving a paradigm shift from human-centric graphical workflows to AI-native, capability-oriented orchestration layers.

1. Formal Foundations and Notational Definitions

The ACI paradigm formalizes the interaction between agents and digital systems as a precise interface specification comprising a set of invocable capabilities, formally defined as:

$\mathrm{ACI} = \langle \mathcal{C}, \Sigma_{\text{in}}, \Sigma_{\text{out}}, \Pi \rangle$

where:

$\mathcal{C} = \{c_1, c_2, ..., c_n\}$ is the set of capabilities (atomic, independently executable system functions),
$\Sigma_{\text{in}}: \mathcal{C} \to \text{Schemas}$ maps each capability to an explicit input schema,
$\Sigma_{\text{out}}: \mathcal{C} \to \text{Schemas}$ maps each to its output schema (typically formal JSON-Schema or protobuf definitions),
$\Pi: \mathcal{C} \to \text{Descriptions}$ assigns a contract (preconditions, postconditions, side-effects, error modes in logic formulae) to every capability (Wang et al., 19 Mar 2026).

Each capability $c$ is a record: $c = (\text{name}: \text{String},~ \text{inputSchema}: \text{JSON-Schema},~ \text{outputSchema}: \text{JSON-Schema},~ \text{contract}: \{\text{pre}: \text{LogicFormula},~ \text{post}: \text{LogicFormula},~ \text{errors}: \text{Set}\})$

For GUI-driven and accessibility-tree-based ACIs (e.g., Agent S), the interface is modeled as:

$\mathrm{ACI} : \mathcal{O} \times \mathcal{A} \to \mathcal{O}$

where $\mathcal{O}$ is the agent’s observation space (e.g., screenshot $I$ and accessibility tree $\mathcal{C} = \{c_1, c_2, ..., c_n\}$ 0), and $\mathcal{C} = \{c_1, c_2, ..., c_n\}$ 1 is the primitive action vocabulary such as $\mathcal{C} = \{c_1, c_2, ..., c_n\}$ 2 or $\mathcal{C} = \{c_1, c_2, ..., c_n\}$ 3 (Agashe et al., 2024).

2. Core Components and Data Structures

Across implementation domains, an ACI typically includes:

Capability registry: An indexed collection of capabilities, each with input and output schema plus a contract—in JSON-Schema, OpenAPI, or Protobuf.
Machine-readable schemas: Used by agents to validate invocation payloads and parse results.
Invocation protocols: Standardized RPC or REST endpoints such as POST /agent_interface/invoke for action execution.
Feedback and observation: Structured state descriptions (e.g., screenshots, augmented accessibility trees, or semantic JSON trees) given to the agent after each action.
Contract definitions: Explicit preconditions, postconditions, side-effect annotations, and enumerated error classes, enabling reliable composition and programmatic recovery.

For environments such as AIOS/LiteCUA, the system state is abstracted as:

$\mathcal{C} = \{c_1, c_2, ..., c_n\}$ 4

with each interactive element $\mathcal{C} = \{c_1, c_2, ..., c_n\}$ 5 annotated with type, label, spatial bounds, enabled state, and auxiliary properties (Mei et al., 24 May 2025).

3. Design Principles of Agent–Computer Interfaces

Research converges on several guiding principles for ACIs:

Machine Interpretability: All operations, types, and constraints are explicitly typed and structured. Free-form prompts are replaced by rigorous schema-driven invocation.
Composable Capability Design: Capabilities should be fine-grained and orthogonal, enabling dynamic planning and reliable chaining by agents.
Explicit Contracts: Every capability exposes formal contracts encompassing preconditions, postconditions, and error specifications to enforce robust execution and facilitate recovery semantics (Wang et al., 19 Mar 2026).
Reliability and Idempotency: Invocation endpoints must support idempotency keys, explicit retry strategies (e.g., exponential back-off), and deterministic error recovery logic.
Context Efficiency: Descriptions are concise to fit within agent context windows, with extraneous details deferred to external documentation.
Discovery and Documentation: Machine-readable capability indices and "describe" queries are provided for self-discovery and integration.
Monitoring: ACIs surface per-capability metrics (invocation latency, error rates, idempotency conflicts) to both development teams and agents.

4. Architectural Patterns and System Evolutions

System architectures have evolved through distinct eras:

Era	Invocation Flow	Primary User	Interface Layer
Human Interface Era	users → GUI/forms → app logic	Human	GUIs
API-Centric Transition	programs/scripts → REST/GraphQL APIs → app logic	Programmers	APIs
Agent Interface Era	AI agents → ACI layer → capability catalog → services	AI Agents/LLMs	ACI (invocable capabilities registry)

In the agent era, the ACI layer acts as a dynamic orchestration hub: agents plan over sequences of capabilities ("createTask", "scheduleMeeting"), invoking them directly, and assembling complex workflows at runtime, eschewing hardcoded page flows and monolithic endpoint logic (Wang et al., 19 Mar 2026).

Multi-layered orchestration has also been demonstrated in:

Skill graph-based systems: CUA-Skill encodes human computer-use skills as 4-tuples $\mathcal{C} = \{c_1, c_2, ..., c_n\}$ 6, where $\mathcal{C} = \{c_1, c_2, ..., c_n\}$ 7 is the target application, $\mathcal{C} = \{c_1, c_2, ..., c_n\}$ 8 the argument schema, $\mathcal{C} = \{c_1, c_2, ..., c_n\}$ 9 a parameterized control-flow execution graph, dynamically retrieved and composed (Chen et al., 28 Jan 2026).
API-UI hybrid models: AXIS prioritizes direct API invocation ("skills") over UI manipulation, employing an action selection criterion $\Sigma_{\text{in}}: \mathcal{C} \to \text{Schemas}$ 0. The system self-discovers skills by traversing application help documentation or using exploration agents to synthesize reusable API and UI wrappers (Lu et al., 2024).

5. Empirical Validation and Performance Metrics

Numerous studies provide quantitative evidence for the efficacy of ACIs:

Agent S ablation (OSWorld, 65 tasks):
- Baseline (coordinate-only): 10.77% success
- +ACI: 12.31%
- +ACI + retrieval-as-learning: 20.00%
PC Agent-E generalization (WindowsAgentArena-V2, 141 tasks):
- Baseline Qwen2.5-VL-72B: 14.9%
- PC Agent-E (few-shot, trajectory-boosted): 36.0% (+141% relative gain)
- OSWorld cross-domain: 10.9% on feasible tasks (+>2× improvement) (He et al., 20 May 2025)
AXIS in Office Word (50 tasks):
- Task time reduction: 59.5s → 29.9s
- Success rate: 84.0% vs. 52.0% (p < 0.001)
- Mental workload (NASA-TLX): –88%
- API usage: 55.7% (AXIS) vs. 8.1% (UI agent baseline) (Lu et al., 2024)
Design heuristic augmentations: UI-Verse experiments show that agent-compatible visual cues, layout stability, explicit step controls, and structured procedural help yield up to a 3.5× boost in agent success rates (e.g., from 0.13 to 0.59 for Qwen3-VL), without human usability regressions (Liu et al., 4 May 2026).

6. Comparison Across Modalities and Practical Implementations

Different manifestations of ACIs span a wide spectrum:

Machine-oriented API layers: Machine-discoverable JSON/RPC or OpenAPI endpoints exposing explicit capability semantics (Wang et al., 19 Mar 2026, Mei et al., 24 May 2025).
Modular GUI-grounding layers: Systems like Agent S employ a glue layer aligning LLM symbolic utterances to GUI element IDs through accessibility trees and OCR, exposing macros such as click, type, drag_and_drop (Agashe et al., 2024).
Skill abstraction frameworks: Parameterized execution graphs over UI primitives, skill retrieval and argument instantiation, and memory-informed failure recovery, as in CUA-Skill (Chen et al., 28 Jan 2026).
Terminal-based ACIs: Maximal representational compatibility, transparency (chronological logs), and low barrier to entry for mixed-initiative human–AI workflows (Masi, 11 Mar 2026).
Physical device ACIs: HIDAgent leverages hardware-in-the-loop emulation (keyboard/mouse) for external device control, supporting cross-platform, no-installation agent deployment on arbitrary devices (Bigham, 31 Jan 2026).

Empirical data supports the view that text-based, semantically-rich, and contract-driven ACIs outperform pixel-based or opaque function-calling approaches, both in agent reliability and system throughput.

7. Open Challenges, Research Gaps, and Future Outlook

Meta-analyses and surveys (Sager et al., 27 Jan 2025) highlight ongoing research gaps in ACI design:

Generalization: Agents frequently overfit to static UI conventions; broader, vision-centric observation and low-level HID action interfaces are proposed to improve out-of-domain robustness.
Learning efficiency: Sample complexity remains high; few-shot augmentation, synthetic trajectory boosting, and environment-adaptive RL are promising strategies (He et al., 20 May 2025).
Planning under partial observability: Most systems still rely on implicit, brittle planning; explicit environment models ( $\Sigma_{\text{in}}: \mathcal{C} \to \text{Schemas}$ 1) and tree search may yield improvement.
Benchmark alignment and evaluation: Necessity for complex, multi-app, multi-modal benchmarks and standardized, macro-averaged metrics.
Deployment resilience and safety: Real-world constraints (hardware heterogeneity, OS updates, privacy) demand adaptive policies and human-in-the-loop confirmations.
Co-design with human UIs: Empirical findings (Liu et al., 4 May 2026) show that agent-oriented augmentations to human UIs can multiply success rates by 3–4× without impacting human usability, supporting co-design as a best practice.

Future architectural developments include capability-centric operating systems, agent-native skill discovery pipelines, and end-to-end RL with dense formal contracts—enabling a shift from monolithic applications to modular, AI-native platforms where software consumption is dominated by autonomous, compositional agent workflows.