Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Attention-aware Semantic Communications for Collaborative Inference (2404.07217v2)

Published 23 Feb 2024 in eess.SP, cs.AI, cs.CV, and cs.LG

Abstract: We propose a communication-efficient collaborative inference framework in the domain of edge inference, focusing on the efficient use of vision transformer (ViT) models. The partitioning strategy of conventional collaborative inference fails to reduce communication cost because of the inherent architecture of ViTs maintaining consistent layer dimensions across the entire transformer encoder. Therefore, instead of employing the partitioning strategy, our framework utilizes a lightweight ViT model on the edge device, with the server deploying a complicated ViT model. To enhance communication efficiency and achieve the classification accuracy of the server model, we propose two strategies: 1) attention-aware patch selection and 2) entropy-aware image transmission. Attention-aware patch selection leverages the attention scores generated by the edge device's transformer encoder to identify and select the image patches critical for classification. This strategy enables the edge device to transmit only the essential patches to the server, significantly improving communication efficiency. Entropy-aware image transmission uses min-entropy as a metric to accurately determine whether to depend on the lightweight model on the edge device or to request the inference from the server model. In our framework, the lightweight ViT model on the edge device acts as a semantic encoder, efficiently identifying and selecting the crucial image information required for the classification task. Our experiments demonstrate that the proposed collaborative inference framework can reduce communication overhead by 68% with only a minimal loss in accuracy compared to the server model on the ImageNet dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Toward an intelligent edge: Wireless communication meets machine learning,” IEEE Commun. Mag., vol. 58, no. 1, pp. 19–25, Jan. 2020.
  2. E. Li, L. Zeng, Z. Zhou, and X. Chen, “Edge AI: On-demand accelerating deep neural network inference via edge computing,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 447–457, Jan. 2020.
  3. J. Shao and J. Zhang, “Communication-computation trade-off in resource-constrained edge inference,” IEEE Commun. Mag., vol. 58, no. 12, pp. 20–26, Dec. 2020.
  4. N. Shlezinger and I. V. Bajic, “Collaborative inference for AI-empowered IoT devices,” IEEE Internet Things Mag., vol. 5, no. 4, pp. 92–98, Dec. 2022.
  5. Q. Lan, Q. Zeng, P. Popovski, D. Gündüz, and K. Huang, “Progressive feature transmission for split classification at the wireless edge,” IEEE Trans. Wireless Commun., vol. 22, no. 6, pp. 3837–3852, Jun. 2023.
  6. M. Jankowski, D. Gündüz, and K. Mikolajczyk, “Wireless image retrieval at the edge,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 89–100, Jan. 2021.
  7. J. Shao, Y. Mao, and J. Zhang, “Learning task-oriented communication for edge inference: An information bottleneck approach,” IEEE J. Sel. Areas Commun., vol. 40, no. 1, pp. 197–211, Jan. 2022.
  8. X. Huang and S. Zhou, “Dynamic compression ratio selection for edge inference systems with hard deadlines,” IEEE Internet Things J., vol. 7, no. 9, pp. 8800–8810, Sep. 2020.
  9. C. Zhang, H. Zou, S. Lasaulce, W. Saad, M. Kountouris, and M. Bennis, “Goal-oriented communications for the IoT and application to data compression,” IEEE Internet Things Mag., vol. 5, no. 4, pp. 58–63, Dec. 2022.
  10. G. Shi, Y. Xiao, Y. Li, and X. Xie, “From semantic communication to Semantic-Aware networking: Model, architecture, and open problems,” IEEE Commun. Mag., vol. 59, no. 8, pp. 44–50, Aug. 2021.
  11. Q. Lan, D. Wen, Z. Zhang, Q. Zeng, X. Chen, P. Popovski, and K. Huang, “What is semantic communication? A view on conveying meaning in the era of machine intelligence,” J. Commun. Inf. Netw., vol. 6, no. 4, pp. 336–371, Dec. 2021.
  12. D. Gündüz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,” IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 5–41, Jan. 2023.
  13. H. Xie and Z. Qin, “A lite distributed semantic communication system for Internet of things,” IEEE J. Sel. Areas Commun., vol. 39, no. 1, pp. 142–153, 2020.
  14. Y. Kim, J. Shin, Y. Cassuto, and L. R. Varshney, “Distributed boosting classification over noisy communication channels,” IEEE J. Sel. Areas Commun., vol. 41, no. 1, pp. 141–154, Jan. 2023.
  15. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. Annu. Conf. Neural Inf. Process. Syst. (NeurIPS), Dec. 2017, pp. 5998–6008.
  16. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learn. Representations (ICLR), Jun. 2021.
  17. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training data-efficient image transformers & distillation through attention,” in Proc. Int. Conf. Mach. Learn. (ICML), Jul. 2021, pp. 10 347–10 357.
  18. S. Mehta and M. Rastegari, “MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer,” in Proc. Int. Conf. Learn. Representations (ICLR), Apr. 2022.
  19. J. Pan, A. Bulat, F. Tan, X. Zhu, L. Dudziak, H. Li, G. Tzimiropoulos, and B. Martinez, “Edgevits: Competing light-weight CNNs on mobile devices with vision transformers,” in Proc. European Conf. Comput. Vis. (ECCV), Oct. 2022, pp. 294–311.
  20. Y. Li, G. Yuan, Y. Wen, J. Hu, G. Evangelidis, S. Tulyakov, Y. Wang, and J. Ren, “Efficientformer: Vision transformers at Mobilenet speed,” Proc. Annu. Conf. Neural Inf. Process. Syst. (NeurIPS), vol. 35, pp. 12 934–12 949, Dec. 2022.
  21. G. Xu, Z. Hao, Y. Luo, H. Hu, J. An, and S. Mao, “DeViT: Decomposing vision transformers for collaborative inference in edge devices,” IEEE Trans. Mobile Comput., pp. 1–16, 2023.
  22. T. Liu, P. Li, Y. Gu, and P. Liu, “Efficient transformer inference for extremely weak edge devices using masked autoencoders,” in Proc. IEEE Int. Conf. Commun. (ICC), May 2023, pp. 1718–1723.
  23. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognition (CVPR), Jun. 2022, pp. 16 000–16 009.
  24. H. Xie, Z. Qin, G. Y. Li, and B. H. Juang, “Deep learning enabled semantic communication systems,” IEEE Trans. Signal Process., May 2021.
  25. H. Yoo, L. Dai, S. Kim, and C.-B. Chae, “On the role of ViT and CNN in semantic communications: Analysis and prototype validation,” IEEE Access, vol. 11, pp. 71 528–71 541, Jul. 2023.
  26. S. Chang, P. Wang, M. Lin, F. Wang, D. J. Zhang, R. Jin, and M. Z. Shou, “Making vision transformers efficient from a token sparsification view,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognition (CVPR), Jun. 2023, pp. 6195–6205.
  27. S. Son, N. Lee, and J. Lee, “MaskedKD: Efficient distillation of vision transformers with masked images,” arXiv preprint arXiv:2302.10494, 2023.
  28. T. Darcet, M. Oquab, J. Mairal, and P. Bojanowski, “Vision transformers need registers,” arXiv preprint arXiv:2309.16588, 2023.
  29. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2021, pp. 9650–9660.
  30. H. Chefer, I. Schwartz, and L. Wolf, “Optimizing relevance maps of vision transformers improves robustness,” in Proc. Annu. Conf. Neural Inf. Process. Syst. (NeurIPS), Dec. 2022, pp. 33 618–33 632.
  31. S. Abnar and W. Zuidema, “Quantifying attention flow in transformers,” in Proc. Annu. Meeting Assoc. Comput. Linguistics (ACL), Jul. 2020, pp. 4190–4197.
  32. C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, no. 3, pp. 379–423, Jul. 1948.
  33. R. Konig, R. Renner, and C. Schaffner, “The operational meaning of min- and max-entropy,” IEEE Trans. Inf. Theory, vol. 55, no. 9, pp. 4337–4347, Sep. 2009.
  34. D. Liu, G. Zhu, Q. Zeng, J. Zhang, and K. Huang, “Wireless data acquisition for edge learning: Data-importance aware retransmission,” IEEE Trans. Wireless Commun., vol. 20, no. 1, pp. 406–420, Jan. 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiwoong Im (2 papers)
  2. Nayoung Kwon (1 paper)
  3. Taewoo Park (4 papers)
  4. Jiheon Woo (4 papers)
  5. Jaeho Lee (51 papers)
  6. Yongjune Kim (35 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com