Open Voice Network (OVON)
- OVON is a standards-driven architecture defined by a universal JSON conversation envelope that enables interoperability among diverse AI agents.
- It features dual API layers: a Conversation Envelope API for managing session events and a Discovery API for capability manifest publication and retrieval.
- The framework supports multiparty communication with secure floor management, empowering coordinated interactions among chatbots, voicebots, videobots, and human endpoints.
The Open Voice Network (OVON), more formally known as the Open Voice Interoperability Initiative, is a standards-driven architecture and API framework for achieving seamless interoperability among heterogeneous conversational AI agents—including chatbots, voicebots, videobots, and human endpoints—across modalities and platforms. OVON leverages minimal, JSON-based, natural language-centric APIs to provide technology-agnostic communication envelopes, service discovery, multiparty session management, and manifest-based capability publication, supporting both dyadic and multiparty agentic AI exchanges (Gosmar et al., 2024, Gosmar et al., 2024).
1. Core Architectural Principles
OVON is defined around the concept of a universal "conversation envelope," a structured JSON schema that encapsulates turn-taking, conversational events, and multimodal utterance exchange between agents regardless of their internal technology or modality. The protocol delineates two principal API layers:
- Conversation Envelope API: Governs basic session primitives—invitation (invite), message exchange (utterance/whisper), and termination (bye).
- Discovery and Manifest API: Enables agents to advertise, locate, and negotiate capabilities through standardized manifest publications and retrieval.
This design facilitates "loose coupling", allowing interaction among disparate systems ranging from LLM-driven agents (e.g., GPT-3.5, Llama2, Claude.ai) to rule-based and human participants, without requiring proprietary SDKs or deep integration (Gosmar et al., 2024).
2. Envelope Schema, Event Types, and State Machines
Every OVON message is encapsulated in a standardized JSON envelope, referencing a versioned schema URL (e.g., "schema": {"version": "0.9.2"}), and includes:
- A unique
conversation.id(UUID). sender.fromand destinationtoURIs (service endpoints).- An array of event objects, each denoting an action such as
invite,utterance,whisper,bye,requestManifest,publishManifest,findAssistant,proposeAssistant.
The state transition models are formalized as follows:
- Serving Agent:
- Demanding Agent:
- Discovery Machines:
- Capability:
- Assistant:
These state interfaces enable formal, machine-validated session orchestration and agent lifecycle transitions (Gosmar et al., 2024).
3. Universal Natural Language APIs and Multimodal Support
Agents expose a single HTTPS POST endpoint ("serviceEndpoint"), receiving conversation envelopes as JSON payloads. Utterance and whisper events are conveyed using a Dialog Event Object, currently standardized for plain text tokens but extensible to support audio and video modalities.
Key eventType objects include:
| EventType | Description | Parameters Example |
|---|---|---|
| invite | Start participation | {} |
| utterance | Spoken or textual contribution | { "dialogEvent": { ... } } |
| whisper | Private or contextual addendum | { "dialogEvent": { ... }, "context": "..." } |
| bye | Session termination | {} |
| requestManifest | Request for agent's manifest | {} |
| publishManifest | Publication of an agent's manifest | { "manifest": { ... } } |
| findAssistant | Directory search for an agent via description | { "query": "find library bot" } |
| proposeAssistant | Response listing candidate serviceEndpoints | { "candidates": [ { ... } ] } |
This schema design supports minimal client and agent implementation, requiring only JSON parsing and production per the published OVON schemas (Gosmar et al., 2024).
4. Agent Discovery, Manifests, and Directory Services
OVON embeds agent discovery mechanisms based on two orthogonal models:
- Direct Manifest Lookup: An agent issues
requestManifestto a target endpoint, which replies withpublishManifestcontaining structured identification, capabilities, supported modalities, languages, keywords, and content type. - Directory/Search Model: Agents broadcast
findAssistantevents with a natural language capability description, receivingproposeAssistantresponses enumerating suitable candidates.
A sample manifest follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
{
"identification": {
"serviceEndpoint": "...",
"organization": "...",
"conversationalName": "...",
"serviceName": "...",
"role": "...",
"synopsis": "..."
},
"capabilities": [
{
"keywords": [...],
"languages": [...],
"descriptiveTexts": [...],
"modalities": [...],
"contentType": "..."
}
]
} |
The discovery protocol aligns with defined state-machine transitions for capability and assistant search, offering support for both direct and federated registries (Gosmar et al., 2024).
5. Multiparty Extensions: Convener Agent, Floor Management, and Security
To support multiparty conversational contexts, OVON introduces mechanisms as described in (Gosmar et al., 2024):
- Convener Agent: Serves as the arbiter, handling invitations, floor management (speaking rights via
Floor_request,Floor_grant,Floor_revoke), and participant control (mute, uninvite). - Floor-Shared Conversational Space ("Floor"): Acts as the logical blackboard for message publication and shared context, administered by a Floor Manager.
- Floor Manager: Enforces delivery, enacts floor policies from the Convener, and maintains shared or local conversational context.
- Multi-Conversant Broadcasting: The
tofield in event objects supports arrays of URIs, enabling directed or broadcast communication in a single event. - Interruptions and Access Control: Mechanisms for interjections, enforced ejections (via
uninvite), and private utterances ("private": truein utterance objects).
A formal specification is given as:
$\texttt{ \{ "ovon": \{ "schema":\{ "version": "0.9.2" \}, "conversation":\{ "id":\langle\mathrm{UUID}\rangle\}, "sender":\{ "from":\langle\mathrm{URI}\rangle\}, "events": [\langle\mathrm{EventObject}\rangle] \} \} }$
With EventObject instances:
$\texttt{ \{ "to": \langle\mathrm{URI~or~[URI]}\rangle, "eventType": \langle\mathrm{String}\rangle, "parameters": \langle\mathrm{Object}\rangle \} }$
Security enhancements include exclusive Convener rights for invite/uninvite events, scoped delivery of private utterances, and controlled context whispers (Gosmar et al., 2024).
6. Scalability, Extensibility, and Open Implementation
OVON is horizontally scalable: independent agent endpoints may be hosted and replicated behind load balancers. All multi-agent communication only requires adherence to the envelope schema; agents may join or leave conversations dynamically, and context management can be centralized or decentralized depending on system design.
Extensibility is provided by JSON Schema evolution, permitting new event types and richer media payloads. All specifications and schemas are maintained as open repositories, including reference implementations and sandboxes (browser and Python/LLM frameworks):
- Specification schemas: https://github.com/open-voice-interoperability/docs/tree/main/specifications
- Sandbox implementations: https://github.com/open-voice-interoperability/open-voice-sandbox
- Forthcoming security and ethics work: https://openvoicenetwork.org/trustmark-initiative
The architecture supports manifest-driven agent discovery, multimodal extension, and end-to-end reproducibility via public resources (Gosmar et al., 2024, Gosmar et al., 2024).
7. Canonical Use Cases and Significance
OVON is illustrated by compositional use cases:
- Smart Errands: A human assistant orchestrates specialized bots (florist, store, restaurant, post-office), chaining invites and utterances—each agent processes requests and responds with natural language (Gosmar et al., 2024).
- Smart Library: A frontend bot locates and invokes library agents via manifest search and selection, enabling relay of complex informational queries and replies.
These scenarios demonstrate OVON’s agnosticism to agent modality and internal architecture, promoting a heterogeneous landscape of agentic interactions.
The introduction of multiparty coordination primitives (Convener, Floor, participant lifecycle) addresses requirements for secure, policy-driven, and scalable collaborative conversation sessions, crucial for cross-organization, multi-LLM, or mixed human-AI interoperability (Gosmar et al., 2024).