Papers
Topics
Authors
Recent
Search
2000 character limit reached

Open Voice Network (OVON)

Updated 15 January 2026
  • OVON is a standards-driven architecture defined by a universal JSON conversation envelope that enables interoperability among diverse AI agents.
  • It features dual API layers: a Conversation Envelope API for managing session events and a Discovery API for capability manifest publication and retrieval.
  • The framework supports multiparty communication with secure floor management, empowering coordinated interactions among chatbots, voicebots, videobots, and human endpoints.

The Open Voice Network (OVON), more formally known as the Open Voice Interoperability Initiative, is a standards-driven architecture and API framework for achieving seamless interoperability among heterogeneous conversational AI agents—including chatbots, voicebots, videobots, and human endpoints—across modalities and platforms. OVON leverages minimal, JSON-based, natural language-centric APIs to provide technology-agnostic communication envelopes, service discovery, multiparty session management, and manifest-based capability publication, supporting both dyadic and multiparty agentic AI exchanges (Gosmar et al., 2024, Gosmar et al., 2024).

1. Core Architectural Principles

OVON is defined around the concept of a universal "conversation envelope," a structured JSON schema that encapsulates turn-taking, conversational events, and multimodal utterance exchange between agents regardless of their internal technology or modality. The protocol delineates two principal API layers:

  • Conversation Envelope API: Governs basic session primitives—invitation (invite), message exchange (utterance/whisper), and termination (bye).
  • Discovery and Manifest API: Enables agents to advertise, locate, and negotiate capabilities through standardized manifest publications and retrieval.

This design facilitates "loose coupling", allowing interaction among disparate systems ranging from LLM-driven agents (e.g., GPT-3.5, Llama2, Claude.ai) to rule-based and human participants, without requiring proprietary SDKs or deep integration (Gosmar et al., 2024).

2. Envelope Schema, Event Types, and State Machines

Every OVON message is encapsulated in a standardized JSON envelope, referencing a versioned schema URL (e.g., "schema": {"version": "0.9.2"}), and includes:

  • A unique conversation.id (UUID).
  • sender.from and destination to URIs (service endpoints).
  • An array of event objects, each denoting an action such as invite, utterance, whisper, bye, requestManifest, publishManifest, findAssistant, proposeAssistant.

The state transition models are formalized as follows:

  • Serving Agent: IDLE→READY→SEARCHING→(SENDING RESPONSE ∣ TERMINATING)→(READY ∣ IDLE)\mathrm{IDLE} \rightarrow \mathrm{READY} \rightarrow \mathrm{SEARCHING} \rightarrow (\mathrm{SENDING\ RESPONSE}~|~\mathrm{TERMINATING}) \rightarrow (\mathrm{READY}~|~\mathrm{IDLE})
  • Demanding Agent: IDLE→READY→CONSUMING RESPONSE→(READY ∣ IDLE)\mathrm{IDLE} \rightarrow \mathrm{READY} \rightarrow \mathrm{CONSUMING\ RESPONSE} \rightarrow (\mathrm{READY}~|~\mathrm{IDLE})
  • Discovery Machines:
    • Capability: CAPABILITY_SEARCH→WAITING_FOR_MANIFEST→READY\mathrm{CAPABILITY\_SEARCH} \rightarrow \mathrm{WAITING\_FOR\_MANIFEST} \rightarrow \mathrm{READY}
    • Assistant: ASSISTANT_SEARCH→WAITING_FOR_ASSISTANT_LIST→(READY ∣ ASSISTANT_SEARCH)\mathrm{ASSISTANT\_SEARCH} \rightarrow \mathrm{WAITING\_FOR\_ASSISTANT\_LIST} \rightarrow (\mathrm{READY}~|~\mathrm{ASSISTANT\_SEARCH})

These state interfaces enable formal, machine-validated session orchestration and agent lifecycle transitions (Gosmar et al., 2024).

3. Universal Natural Language APIs and Multimodal Support

Agents expose a single HTTPS POST endpoint ("serviceEndpoint"), receiving conversation envelopes as JSON payloads. Utterance and whisper events are conveyed using a Dialog Event Object, currently standardized for plain text tokens but extensible to support audio and video modalities.

Key eventType objects include:

EventType Description Parameters Example
invite Start participation {}
utterance Spoken or textual contribution { "dialogEvent": { ... } }
whisper Private or contextual addendum { "dialogEvent": { ... }, "context": "..." }
bye Session termination {}
requestManifest Request for agent's manifest {}
publishManifest Publication of an agent's manifest { "manifest": { ... } }
findAssistant Directory search for an agent via description { "query": "find library bot" }
proposeAssistant Response listing candidate serviceEndpoints { "candidates": [ { ... } ] }

This schema design supports minimal client and agent implementation, requiring only JSON parsing and production per the published OVON schemas (Gosmar et al., 2024).

4. Agent Discovery, Manifests, and Directory Services

OVON embeds agent discovery mechanisms based on two orthogonal models:

  • Direct Manifest Lookup: An agent issues requestManifest to a target endpoint, which replies with publishManifest containing structured identification, capabilities, supported modalities, languages, keywords, and content type.
  • Directory/Search Model: Agents broadcast findAssistant events with a natural language capability description, receiving proposeAssistant responses enumerating suitable candidates.

A sample manifest follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
  "identification": {
    "serviceEndpoint": "...",
    "organization": "...",
    "conversationalName": "...",
    "serviceName": "...",
    "role": "...",
    "synopsis": "..."
  },
  "capabilities": [
    {
      "keywords": [...],
      "languages": [...],
      "descriptiveTexts": [...],
      "modalities": [...],
      "contentType": "..."
    }
  ]
}

The discovery protocol aligns with defined state-machine transitions for capability and assistant search, offering support for both direct and federated registries (Gosmar et al., 2024).

5. Multiparty Extensions: Convener Agent, Floor Management, and Security

To support multiparty conversational contexts, OVON introduces mechanisms as described in (Gosmar et al., 2024):

  • Convener Agent: Serves as the arbiter, handling invitations, floor management (speaking rights via Floor_request, Floor_grant, Floor_revoke), and participant control (mute, uninvite).
  • Floor-Shared Conversational Space ("Floor"): Acts as the logical blackboard for message publication and shared context, administered by a Floor Manager.
  • Floor Manager: Enforces delivery, enacts floor policies from the Convener, and maintains shared or local conversational context.
  • Multi-Conversant Broadcasting: The to field in event objects supports arrays of URIs, enabling directed or broadcast communication in a single event.
  • Interruptions and Access Control: Mechanisms for interjections, enforced ejections (via uninvite), and private utterances ("private": true in utterance objects).

A formal specification is given as:

$\texttt{ \{ "ovon": \{ "schema":\{ "version": "0.9.2" \}, "conversation":\{ "id":\langle\mathrm{UUID}\rangle\}, "sender":\{ "from":\langle\mathrm{URI}\rangle\}, "events": [\langle\mathrm{EventObject}\rangle] \} \} }$

With EventObject instances:

$\texttt{ \{ "to": \langle\mathrm{URI~or~[URI]}\rangle, "eventType": \langle\mathrm{String}\rangle, "parameters": \langle\mathrm{Object}\rangle \} }$

Security enhancements include exclusive Convener rights for invite/uninvite events, scoped delivery of private utterances, and controlled context whispers (Gosmar et al., 2024).

6. Scalability, Extensibility, and Open Implementation

OVON is horizontally scalable: independent agent endpoints may be hosted and replicated behind load balancers. All multi-agent communication only requires adherence to the envelope schema; agents may join or leave conversations dynamically, and context management can be centralized or decentralized depending on system design.

Extensibility is provided by JSON Schema evolution, permitting new event types and richer media payloads. All specifications and schemas are maintained as open repositories, including reference implementations and sandboxes (browser and Python/LLM frameworks):

The architecture supports manifest-driven agent discovery, multimodal extension, and end-to-end reproducibility via public resources (Gosmar et al., 2024, Gosmar et al., 2024).

7. Canonical Use Cases and Significance

OVON is illustrated by compositional use cases:

  • Smart Errands: A human assistant orchestrates specialized bots (florist, store, restaurant, post-office), chaining invites and utterances—each agent processes requests and responds with natural language (Gosmar et al., 2024).
  • Smart Library: A frontend bot locates and invokes library agents via manifest search and selection, enabling relay of complex informational queries and replies.

These scenarios demonstrate OVON’s agnosticism to agent modality and internal architecture, promoting a heterogeneous landscape of agentic interactions.

The introduction of multiparty coordination primitives (Convener, Floor, participant lifecycle) addresses requirements for secure, policy-driven, and scalable collaborative conversation sessions, crucial for cross-organization, multi-LLM, or mixed human-AI interoperability (Gosmar et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Open Voice Network (OVON).