Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices (2111.02489v1)

Published 3 Nov 2021 in cs.LG and cs.DC

Abstract: The inference of Neural Networks is usually restricted by the resources (e.g., computing power, memory, bandwidth) on edge devices. In addition to improving the hardware design and deploying efficient models, it is possible to aggregate the computing power of many devices to enable the machine learning models. In this paper, we proposed a novel method of exploiting model parallelism to separate a neural network for distributed inferences. To achieve a better balance between communication latency, computation latency, and performance, we adopt neural architecture search (NAS) to search for the best transmission policy and reduce the amount of communication. The best model we found decreases by 86.6% of the amount of data transmission compared to the baseline and does not impact performance much. Under proper specifications of devices and configurations of models, our experiments show that the inference of large neural networks on edge clusters can be distributed and accelerated, which provides a new solution for the deployment of intelligent applications in the internet of things (IoT).

Summary

We haven't generated a summary for this paper yet.