Papers
Topics
Authors
Recent
Search
2000 character limit reached

Among-Device AI: Distributed Edge Intelligence

Updated 20 March 2026
  • Among-Device AI is a distributed, collaborative intelligence paradigm that interconnects heterogeneous devices for joint data processing and inference.
  • It leverages unified protocols and NNStreamer pipelines to dynamically share neural computations and balance workloads across varied endpoints.
  • Experimental validations on platforms like Raspberry Pi demonstrate efficient resource utilization, low latency, and robust failover in real-time applications.

Among-Device AI encompasses a distributed, collaborative intelligence paradigm in which heterogeneous consumer and IoT devices dynamically share data streams, neural network computations, and service endpoints, jointly delivering AI-powered functionalities without relegating raw data to centralized cloud infrastructure. This approach advances beyond traditional cloud-based or isolated on-device AI by enabling unified, flexible, and privacy-preserving AI experiences at the network edge, leveraging device diversity and local resources through integrated protocols and stream-pipeline frameworks (Ham et al., 2022).

1. Conceptual Positioning and Motivations

Among-Device AI (sometimes termed "edge-to-edge" or "edge mesh" AI, Editor's term) is positioned as the successor to both cloud-based and on-device AI:

  • Cloud-based AI performs all model inference and training remotely, incurring privacy exposures, network latency, and continuous service costs, though benefiting from vast compute and streamlined management.
  • On-device AI localizes inference and data, preserving privacy, lowering latency, and reducing ongoing costs. However, it faces persistent challenges of restricted compute, hardware fragmentation, and redundant engineering across the device fleet.
  • Among-Device AI interlinks devices into a peer-to-peer or hub-and-spoke AI network, such that inputs, model execution, and outputs may each reside on disparate nodes but present a coherent service. This topology exploits local sensors, accelerators, and cross-device data without raw data egress, achieving both high quality and low latency across heterogeneous device types (TVs, phones, appliances, sensors, microcontrollers) (Ham et al., 2022).

This model is motivated by the proliferation of resource-diverse endpoints, the necessity to avoid cloud dependence due to regulatory/privacy or cost constraints, and the technical opportunity for topology-aware, service-composable AI pipelines.

2. Functional Requirements and System Design Challenges

Research by the NNStreamer group has distilled seven foundational requirements for among-device AI (Ham et al., 2022):

  1. Atomic, independently deployable AI services (R1): Inputs, inferences, and outputs must be isolatable, re-deployable, and reusable across different pipelines.
  2. Dynamic data stream schemas (R2): Support for synchronizing, compressing, and optionally sparsifying tensor streams with flexible or schema-less encodings.
  3. Capability-based discovery and connection (R3): Devices must auto-discover services, abstracting away fixed IPs and leveraging selection among multiple providers.
  4. Run-time robustness (R4): Automatic failover/rebinding when services become unavailable.
  5. Open-source licensing (R5): LGPL2.1/Apache2.0 support to prevent vendor lock-in.
  6. Cross-platform extensibility (R6): Compatibility extending down to microcontroller/RTOS endpoints.
  7. Full backward compatibility (R7): Seamless interoperation with existing on-device AI pipelines and their requirements.

These map directly to core challenges in heterogeneous hardware (CPUs, GPUs, NPUs, DSPs, microcontrollers), variable network conditions (bandwidth, jitter), privacy preservation, and dynamic resource scheduling. Among-device AI must arbitrate who executes inference, when to transfer streams, and how to load-balance in the face of arbitrary hardware, protocol, and service topology diversity (Ham et al., 2022).

3. Protocol, Pipeline, and API Innovations

The extensible NNStreamer framework exemplifies the realization of among-device AI through the following core modifications:

  • Distributed GStreamer Pipelines: Preserving the pipe-and-filter logic, but extending it with cross-device plugins for messaging and inference-offload, including mqttsink/mqttsrc for publish/subscribe, and tensor_query_client/tensor_query_serversrc/tensor_query_serversink for offload queries.
  • Protocol Layering: MQTT is the chosen core pub/sub protocol, offering topic wildcards, retained messages, and broker-mediated service discovery and failover capabilities. An "MQTT-Hybrid mode" segregates control-plane signaling (via MQTT) and high-bandwidth tensor payloads (via direct TCP), mitigating broker bottlenecks in inference-heavy workloads. Serialization is supported both by static and flexible schemas, with optional conversion between Protocol Buffers, FlatBuffers, or FlexBuffers. Synchronization across device clocks is maintained by NTP-derived timestamp propagation (Ham et al., 2022).
  • Atomic Inference APIs: Pipelines can be composed using GStreamer CLI, native NNStreamer APIs (C, Python, Java), or via the lightweight NNStreamer-Edge C library for non-GStreamer-capable endpoints.
  • Dynamic Service Discovery, Load Balancing, and Failover: Clients can subscribe to wildcard topics (e.g., /objdetect/#) and dynamically attach/detach to the best available server, with failover on stream interruption managed transparently at the protocol layer (Ham et al., 2022).
  • Performance Modeling: The system's end-to-end latency and network costs are modeled as:

L=i=1n(tproc,i+tcomm,i)L = \sum_{i=1}^n \left(t_{proc,i} + t_{comm,i}\right)

tcomm=S/B+τt_{comm} = S/B + \tau

where tproc,it_{proc,i} is local processing (e.g., model execution), tcomm,it_{comm,i} is inter-device transfer, SS is serialized frame size, BB is available bandwidth, and τ\tau is a protocol overhead such as broker-handoff (Ham et al., 2022).

4. Experimental Validation and Representative Topologies

On a cluster of heterogeneous Raspberry Pi 4 devices, NNStreamer 2.1.0's among-device features were validated in both publish/subscribe and client-query (offload) topologies:

Scenario Protocol Throughput (fps) CPU Utilization Memory
Pub/Sub (Full-HD) ZeroMQ Up to 2× MQTT 40% <50 MB
Query (Mobilenet V2) MQTT-Hybrid Matches raw TCP <40% <50 MB

MQTT-Hybrid achieves near parity with raw TCP throughput, while providing automatic discovery, failover, and efficient brokered service selection. At mid/low bandwidths, pipelines sustain 60 fps for varying stream resolutions at sub-40% CPU usage and modest RAM (Ham et al., 2022).

Textual topology diagrams include:

  • Offload: Input split on Device A, partial data sent via MQTT-Hybrid to Device B for Edge TPU inference.
  • Pub/Sub Fusion: Two camera devices publish frames with synchronization; another device merges and runs object detection.
  • Augmented Worker: Wearable streams sensor data to phones for data fusion and error inference, fully peer-to-peer.

5. Representative Applications

Among-device AI enables new classes of distributed applications not possible with traditional on-device or cloud models, including:

  • Real-time cross-device sensor fusion (smart home, surveillance)
  • Low-latency pose estimation by offloading to local accelerators
  • Multi-modal, multi-user interactive systems (e.g., augmented/smart worker assistance)
  • Privacy-preserving data fusion (healthcare, industrial monitoring)

These applications leverage the ability to split data flows, inference endpoints, and actuation across arbitrary device boundaries while preserving responsive and private operation (Ham et al., 2022).

6. Limitations, Open Problems, and Future Prospects

Despite its advantages, current among-device AI implementations encounter several unresolved challenges:

  • Developer Onboarding: The pipe-and-filter abstraction is unfamiliar to many; improved tooling (WYSIWYG editors, MediaPipe graph converters) is required.
  • Distributed Profiling: System-wide latency and memory profiling for multi-device pipeline execution remains under development.
  • DevOps and Composability: Greater documentation, best-practice libraries, and declarative sub-pipeline repositories are needed for rapid prototyping.
  • Ecosystem Integration: Ongoing work seeks to standardize inter-device AI protocols across Matter, SmartThings, and microcontroller/cloud AI platforms.
  • Pipeline Orchestration and Security: Streamlined distributed control loops, robust encryption, and universal clock synchronization across devices remain open engineering areas.
  • Scalability: Efficient protocols for large-scale networks, especially in the context of dynamic device discovery, resource arbitration, and heterogeneous endpoint support, are under active investigation.

Scaling among-device AI offers the potential for mesh architectures supporting multi-hop or broadcast AI pipelines, intra-device federated training, and integrated privacy controls—bridging the gap between isolated on-device capabilities and highly centralized cloud AI (Ham et al., 2022).

7. Summary

Among-Device AI transforms the landscape of distributed intelligence by empowering collections of diverse endpoints to autonomously discover, connect, and compose atomic inference services into robust, privacy-aware, and highly adaptive distributed AI pipelines. The extended NNStreamer stream pipeline framework, through protocol innovation, API extensibility, and coordinated resource-sharing logic, provides a concrete, open, and high-performance realization of this vision, facilitating emergent applications across smart environments, IoT, and consumer electronics (Ham et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Among-Device AI.