Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 52 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Cooperative Inference with Interleaved Operator Partitioning for CNNs (2409.07693v1)

Published 12 Sep 2024 in cs.DC

Abstract: Deploying deep learning models on Internet of Things (IoT) devices often faces challenges due to limited memory resources and computing capabilities. Cooperative inference is an important method for addressing this issue, requiring the partitioning and distributive deployment of an intelligent model. To perform horizontal partitions, existing cooperative inference methods take either the output channel of operators or the height and width of feature maps as the partition dimensions. In this manner, since the activation of operators is distributed, they have to be concatenated together before being fed to the next operator, which incurs the delay for cooperative inference. In this paper, we propose the Interleaved Operator Partitioning (IOP) strategy for CNN models. By partitioning an operator based on the output channel dimension and its successive operator based on the input channel dimension, activation concatenation becomes unnecessary, thereby reducing the number of communication connections, which consequently reduces cooperative inference de-lay. Based on IOP, we further present a model segmentation algorithm for minimizing cooperative inference time, which greedily selects operators for IOP pairing based on the inference delay benefit harvested. Experimental results demonstrate that compared with the state-of-the-art partition approaches used in CoEdge, the IOP strategy achieves 6.39% ~ 16.83% faster acceleration and reduces peak memory footprint by 21.22% ~ 49.98% for three classical image classification models.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.