Hybrid-Parallel: Achieving High Performance and Energy Efficient Distributed Inference on Robots (2405.19257v1)
Abstract: The rapid advancements in machine learning techniques have led to significant achievements in various real-world robotic tasks. These tasks heavily rely on fast and energy-efficient inference of deep neural network (DNN) models when deployed on robots. To enhance inference performance, distributed inference has emerged as a promising approach, parallelizing inference across multiple powerful GPU devices in modern data centers using techniques such as data parallelism, tensor parallelism, and pipeline parallelism. However, when deployed on real-world robots, existing parallel methods fail to provide low inference latency and meet the energy requirements due to the limited bandwidth of robotic IoT. We present Hybrid-Parallel, a high-performance distributed inference system optimized for robotic IoT. Hybrid-Parallel employs a fine-grained approach to parallelize inference at the granularity of local operators within DNN layers (i.e., operators that can be computed independently with the partial input, such as the convolution kernel in the convolution layer). By doing so, Hybrid-Parallel enables different operators of different layers to be computed and transmitted concurrently, and overlap the computation and transmission phases within the same inference task. The evaluation demonstrate that Hybrid-Parallel reduces inference time by 14.9% ~41.1% and energy consumption per inference by up to 35.3% compared to the state-of-the-art baselines.
- [n. d.]. iPerf - Download iPerf3 and original iPerf pre-compiled binaries. https://iperf.fr/iperf-download.php
- Time-sensitive networking in IEEE 802.11 be: On the way to low-latency WiFi 7. Sensors 21, 15 (2021), 4954.
- Energy cost models of smartphones for task offloading to the cloud. IEEE Transactions on Emerging Topics in Computing 3, 3 (2015), 384–398.
- Anh-Quan Cao and Raoul de Charette. 2022. Monoscene: Monocular 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3991–4001.
- Energy-efficient offloading for DNN-based smart IoT systems in cloud-edge environments. IEEE Transactions on Parallel and Distributed Systems 33, 3 (2021), 683–697.
- InferLine: latency-aware provisioning and scaling for prediction serving pipelines. In Proceedings of the 11th ACM Symposium on Cloud Computing. 477–491.
- Nonlinear approximation and (deep) ReLU networks. Constructive Approximation 55, 1 (2022), 127–172.
- Differentiable model compression via pseudo quantization noise. arXiv preprint arXiv:2104.09987 (2021).
- Performance impact of LoS and NLoS transmissions in dense cellular networks. IEEE Transactions on Wireless Communications 15, 3 (2015), 2365–2380.
- Cloud-assisted computation offloading to support mobile services. IEEE Transactions on Cloud Computing 4, 3 (2014), 279–292.
- Qos-aware scheduling of heterogeneous servers for inference in deep neural networks. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 2067–2070.
- Understanding the efficiency of GPU algorithms for matrix-matrix multiplication. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware. 133–137.
- Ștefan Gheorghe and Mihai Ivanovici. 2021. Model-based weight quantization for convolutional neural network compression. In 2021 16th International Conference on Engineering of Modern Electric Systems (EMES). IEEE, 1–4.
- VecQ: Minimal loss DNN model compression with vectorized weight quantization. IEEE Trans. Comput. 70, 5 (2020), 696–710.
- Knowledge distillation: A survey. International Journal of Computer Vision 129, 6 (2021), 1789–1819.
- 2.4 a distributed autonomous and collaborative multi-robot system featuring a low-power robot soc in 22nm cmos for integrated battery-powered minibots. In 2019 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 48–50.
- Dynamic adaptive DNN surgery for inference acceleration on the edge. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1423–1431.
- Pipeedge: Pipeline parallelism for large-scale model inference on heterogeneous edge devices. In 2022 25th Euromicro Conference on Digital System Design (DSD). IEEE, 298–307.
- Densely Connected Convolutional Networks. arXiv:1608.06993 [cs.CV]
- ultralytics/yolov5: v4. 0-nn. SiLU () activations, Weights & Biases logging, PyTorch Hub integration. Zenodo (2021).
- Towards Open World Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5830–5840.
- Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615–629.
- Leakage current: Moore’s law meets static power. computer 36, 12 (2003), 68–75.
- Evaluating modern gpu interconnect: Pcie, nvlink, nv-sli, nvswitch and gpudirect. IEEE Transactions on Parallel and Distributed Systems 31, 1 (2019), 94–110.
- Graph neural networks for decentralized multi-robot path planning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11785–11792.
- Voxformer: Sparse voxel transformer for camera-based 3d semantic scene completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9087–9098.
- DNN surgery: Accelerating DNN inference on the edge through layer partitioning. IEEE transactions on Cloud Computing (2023).
- Cost-driven off-loading for DNN-based applications over cloud, edge, and end devices. IEEE Transactions on Industrial Informatics 16, 8 (2019), 5456–5466.
- Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems 33 (2020), 2351–2363.
- Ruofeng Liu and Nakjung Choi. 2023. A First Look at Wi-Fi 6 in Action: Throughput, Latency, Energy Efficiency, and Security. Proceedings of the ACM on Measurement and Analysis of Computing Systems 7, 1 (2023), 1–25.
- Multi-Object Tracking Meets Moving UAV. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8876–8885.
- Large-margin softmax loss for convolutional neural networks. arXiv preprint arXiv:1612.02295 (2016).
- Antoni Masiukiewicz. 2019. Throughput comparison between the new HEW 802.11 ax standard and 802.11 n/ac standards in selected distance windows. International Journal of Electronics and Telecommunications 65, 1 (2019), 79–84.
- Rethinking keypoint representations: Modeling keypoints and poses as objects for multi-person human pose estimation. In European Conference on Computer Vision. Springer, 37–54.
- Distributed inference acceleration with adaptive DNN partitioning and offloading. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 854–863.
- Efficient large-scale language model training on gpu clusters using megatron-lm. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–15.
- Mohammad Noormohammadpour and Cauligi S Raghavendra. 2017. Datacenter traffic control: Understanding techniques and tradeoffs. IEEE Communications Surveys & Tutorials 20, 2 (2017), 1492–1525.
- NVIDIA. 2024. The World’s Smallest AI Supercomputer. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-xavier-series/.
- FPGA components for integrating FPGAs into robot systems. IEICE TRANSACTIONS on Information and Systems 101, 2 (2018), 363–375.
- Connectivity and bandwidth-aware real-time exploration in mobile robot networks. Wireless Communications and Mobile Computing 13, 9 (2013), 847–863.
- pytorch. 2024a. pytroch. https://pytorch.org/.
- pytorch. 2024b. pytroch. https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html.
- Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE transactions on Evolutionary Computation 13, 2 (2008), 398–417.
- Proportional and preemption-enabled traffic offloading for IP flow mobility: Algorithms and performance evaluation. IEEE Transactions on Vehicular Technology 67, 12 (2018), 12095–12108.
- Nurul I Sarkar and Osman Mussa. 2013. The effect of people movement on Wi-Fi link throughput in indoor propagation environments. In IEEE 2013 Tencon-Spring. IEEE, 562–566.
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV]
- Debjyoti Sinha and Mohamed El-Sharkawy. 2019. Thin mobilenet: An enhanced mobilenet architecture. In 2019 IEEE 10th annual ubiquitous computing, electronics & mobile communication conference (UEMCON). IEEE, 0280–0285.
- AMPNet: Average-and max-pool networks for salient object detection. IEEE Transactions on Circuits and Systems for Video Technology 31, 11 (2021), 4321–4333.
- Resnet in resnet: Generalizing residual architectures. arXiv preprint arXiv:1603.08029 (2016).
- MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Computer Science-Research and Development 26, 3 (2011), 257–266.
- AGRNav: Efficient and Energy-Saving Autonomous Navigation for Air-Ground Robots in Occlusion-Prone Environments. In IEEE International Conference on Robotics and Automation (ICRA).
- Lin Wang and Kuk-Jin Yoon. 2021. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. IEEE transactions on pattern analysis and machine intelligence 44, 6 (2021), 3048–3068.
- Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16133–16142.
- An efficient application partitioning algorithm in mobile environments. IEEE Transactions on Parallel and Distributed Systems 30, 7 (2019), 1464–1480.
- A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica 10, 5 (2023), 1122–1136.
- SCPNet: Semantic Scene Completion on Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 17642–17651.
- Yecheng Xiang and Hyoseung Kim. 2019. Pipelined data-parallel CPU/GPU scheduling for multi-DNN real-time inference. In 2019 IEEE Real-Time Systems Symposium (RTSS). IEEE, 392–405.
- RegNet: self-regulated network for image classification. IEEE Transactions on Neural Networks and Learning Systems (2022).
- DDPQN: An efficient DNN offloading strategy in local-edge-cloud collaborative environments. IEEE Transactions on Services Computing 15, 2 (2021), 640–655.
- Mobile access bandwidth in practice: Measurement, analysis, and implications. In Proceedings of the ACM SIGCOMM 2022 Conference. 114–128.
- Multi-robot path planning based on a deep reinforcement learning DQN algorithm. CAAI Transactions on Intelligence Technology 5, 3 (2020), 177–183.
- A flexible sigmoid function of determinate growth. Annals of botany 91, 3 (2003), 361–371.
- LinkForecast: Cellular link bandwidth prediction in LTE networks. IEEE Transactions on Mobile Computing 17, 7 (2017), 1582–1594.
- Anthony Zee. 1996. Law of addition in random matrix theory. Nuclear Physics B 474, 3 (1996), 726–744.
- On optimizing the communication of model parallelism. Proceedings of Machine Learning and Systems 5 (2023).