Communication-efficient distributed inference without edge offloading

Determine how to design a communication-efficient distributed inference protocol for partially executed artificial intelligence models on end devices that attains high decision accuracy without requiring transmission to edge servers for final decision-making, under the latency, synchronization, and reliability constraints of 6G wireless networks.

Background

The paper highlights that distributed inference must meet stringent latency, synchronization, and reliability requirements, and suggests splitting model execution between end devices and edge to reduce communication overhead.

However, when features are computed locally, it remains unclear how to coordinate device-to-device communications to preserve accuracy while avoiding uplink transmission to edge servers, given non-homogeneous device capabilities and dynamic wireless conditions.

References

For example, \ac{AI} models can be partially executed on the devices to extract features, while it is not clear how to design an efficient communications protocol with optimal accuracy, without the need to transmit to edge servers for final decision-making.

AI-Programmable Wireless Connectivity: Challenges and Research Directions Toward Interactive and Immersive Industry  (2603.29752 - Gacanin, 31 Mar 2026) in Section 2.3, Device Communications Efficiency