Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning (2401.11652v1)

Published 22 Jan 2024 in cs.CV and cs.LG

Abstract: Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices while preserving privacy. The success of FL hinges on the efficiency of participating models and their ability to handle the unique challenges of distributed learning. While several variants of Vision Transformer (ViT) have shown great potential as alternatives to modern convolutional neural networks (CNNs) for centralized training, the unprecedented size and higher computational demands hinder their deployment on resource-constrained edge devices, challenging their widespread application in FL. Since client devices in FL typically have limited computing resources and communication bandwidth, models intended for such devices must strike a balance between model size, computational efficiency, and the ability to adapt to the diverse and non-IID data distributions encountered in FL. To address these challenges, we propose OnDev-LCT: Lightweight Convolutional Transformers for On-Device vision tasks with limited training data and resources. Our models incorporate image-specific inductive biases through the LCT tokenizer by leveraging efficient depthwise separable convolutions in residual linear bottleneck blocks to extract local features, while the multi-head self-attention (MHSA) mechanism in the LCT encoder implicitly facilitates capturing global representations of images. Extensive experiments on benchmark image datasets indicate that our models outperform existing lightweight vision models while having fewer parameters and lower computational demands, making them suitable for FL scenarios with data heterogeneity and communication bottlenecks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (101)
  1. Tensorflow: Large-scale machine learning on heterogeneous systems.
  2. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375 .
  3. Attention augmented convolutional networks, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 3286–3295.
  4. What is the state of neural network pruning? Proceedings of machine learning and systems 2, 129–146.
  5. Leaf: A benchmark for federated settings. arXiv preprint arXiv:1812.01097 .
  6. pfl-bench: A comprehensive benchmark for personalized federated learning. Advances in Neural Information Processing Systems 35, 9344–9360.
  7. Chasing sparsity in vision transformers: An end-to-end exploration. Advances in Neural Information Processing Systems 34, 19974–19988.
  8. Mobile-former: Bridging mobilenet and transformer, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5270–5279.
  9. Rethinking attention with performers, in: International Conference on Learning Representations.
  10. Low-bit quantization of neural networks for efficient inference, in: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), IEEE. pp. 3009–3018.
  11. Emnist: Extending mnist to handwritten letters, in: 2017 international joint conference on neural networks (IJCNN), IEEE. pp. 2921–2926.
  12. Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition, Ieee. pp. 248–255.
  13. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE signal processing magazine 29, 141–142.
  14. Towards accurate post-training quantization for vision transformer, in: Proceedings of the 30th ACM International Conference on Multimedia, pp. 5380–5388.
  15. Learning to prune deep neural networks via layer-wise optimal brain surgeon. Advances in neural information processing systems 30.
  16. An image is worth 16x16 words: Transformers for image recognition at scale, in: International Conference on Learning Representations. URL: https://openreview.net/forum?id=YicbFdNTTy.
  17. Depgraph: Towards any structural pruning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16091–16101.
  18. A bayesian analysis of some nonparametric problems. The annals of statistics , 209–230.
  19. Pruning neural networks at initialization: Why are we missing the mark?, in: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net. URL: https://openreview.net/forum?id=Ig-VyQc-MLK.
  20. A review of challenges and opportunities in machine learning for health. AMIA Summits on Translational Science Proceedings 2020, 191.
  21. Levit: a vision transformer in convnet’s clothing for faster inference, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 12259–12269.
  22. Ghostnet: More features from cheap operations, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1580–1589.
  23. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. International Conference on Learning Representations (ICLR) .
  24. Escaping the big data paradigm with compact transformers. arXiv preprint arXiv:2104.05704 .
  25. Optimal brain surgeon and general network pruning, in: IEEE international conference on neural networks, IEEE. pp. 293–299.
  26. Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
  27. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 .
  28. Searching for mobilenetv3, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324.
  29. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 .
  30. Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335 .
  31. Squeeze-and-excitation networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141.
  32. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size. arXiv preprint arXiv:1602.07360 .
  33. Quantization and training of neural networks for efficient integer-arithmetic-only inference, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2704–2713.
  34. Resource-efficient hybrid x-formers for vision, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2982–2990.
  35. Scaffold: Stochastic controlled averaging for federated learning, in: International Conference on Machine Learning, PMLR. pp. 5132–5143.
  36. A survey of the recent architectures of deep convolutional neural networks. Artificial intelligence review 53, 5455–5516.
  37. Transformers in vision: A survey. ACM Computing Surveys (CSUR) .
  38. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .
  39. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009 URL: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
  40. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25.
  41. Deep learning. nature 521, 436–444.
  42. Optimal brain damage. Advances in neural information processing systems 2.
  43. Understanding image representations by measuring their equivariance and equivalence, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 991–999.
  44. Model-contrastive federated learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10713–10722.
  45. Fully quantized network for object detection, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2810–2819.
  46. Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine 37, 50–60.
  47. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2, 429–450.
  48. Micronet: Improving image recognition with extremely low flops, in: Proceedings of the IEEE/CVF International conference on computer vision, pp. 468–477.
  49. Q-vit: Accurate and fully quantized low-bit vision transformer, in: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (Eds.), Advances in Neural Information Processing Systems. URL: https://openreview.net/forum?id=fU-m9kQe0ke.
  50. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems .
  51. Microsoft coco: Common objects in context, in: European conference on computer vision, Springer. pp. 740–755.
  52. Fq-vit: Post-training quantization for fully quantized vision transformer, in: Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp. 1173–1179.
  53. Oscillation-free quantization for low-bit vision transformers. arXiv preprint arXiv:2302.02210 .
  54. Learning efficient convolutional networks through network slimming, in: Proceedings of the IEEE international conference on computer vision, pp. 2736–2744.
  55. Post-training quantization for vision transformer. Advances in Neural Information Processing Systems 34, 28092–28103.
  56. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 .
  57. SGDR: Stochastic gradient descent with warm restarts, in: International Conference on Learning Representations. URL: https://openreview.net/forum?id=Skq89Scxx.
  58. Visualizing data using t-sne. Journal of Machine Learning Research 9, 2579–2605. URL: http://jmlr.org/papers/v9/vandermaaten08a.html.
  59. Edgenext: efficiently amalgamated cnn-transformer architecture for mobile vision applications, in: European Conference on Computer Vision, Springer. pp. 3–20.
  60. Communication-efficient learning of deep networks from decentralized data, in: Artificial intelligence and statistics, PMLR. pp. 1273–1282.
  61. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer, in: International Conference on Learning Representations. URL: https://openreview.net/forum?id=vh-0sUt8HlG.
  62. Separable self-attention for mobile vision transformers. Transactions on Machine Learning Research URL: https://openreview.net/forum?id=tBl4yBEjKi.
  63. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Computing Surveys 55, 1–37.
  64. The spatial inductive bias of deep learning. Ph.D. thesis. Johns Hopkins University.
  65. Importance estimation for neural network pruning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11264–11272.
  66. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440 .
  67. Challenges and enablers of augmented reality technology for in situ walkthrough applications. J. Inf. Technol. Constr. 25, 55–71.
  68. Tiny robot learning: challenges and directions for machine learning in resource-constrained robots, in: 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), IEEE. pp. 296–299.
  69. A review on autonomous vehicles: Progress, methods and challenges. Electronics 11, 2162.
  70. A simple and light-weight attention module for convolutional neural networks. International journal of computer vision 128, 783–798.
  71. Image transformer, in: International Conference on Machine Learning, PMLR. pp. 4055–4064.
  72. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32.
  73. Rethinking architecture design for tackling data heterogeneity in federated learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10061–10071.
  74. From machine learning to robotics: challenges and opportunities for embodied intelligence. arXiv preprint arXiv:2110.15245 .
  75. Dynamic routing between capsules. Advances in neural information processing systems 30.
  76. Mobilenetv2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520.
  77. Movement pruning: Adaptive sparsity by fine-tuning. Advances in Neural Information Processing Systems 33, 20378–20389.
  78. Personalized federated learning with hidden information on personalized prior. arXiv preprint arXiv:2211.10684 .
  79. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556. URL: https://api.semanticscholar.org/CorpusID:14124313.
  80. Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9.
  81. Efficientnet: Rethinking model scaling for convolutional neural networks, in: International conference on machine learning, PMLR. pp. 6105–6114.
  82. Training data-efficient image transformers & distillation through attention, in: International Conference on Machine Learning, PMLR. pp. 10347–10357.
  83. Three things everyone should know about vision transformers, in: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIV, Springer. pp. 497–515.
  84. Machine learning for medical imaging: methodological failures and recommendations for the future. NPJ digital medicine 5, 48.
  85. Attention is all you need. Advances in neural information processing systems 30.
  86. Residual attention network for image classification, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164.
  87. Federated learning with matched averaging, in: International Conference on Learning Representations. URL: https://openreview.net/forum?id=BkluqlSFDS.
  88. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in neural information processing systems 33, 7611–7623.
  89. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 .
  90. Pytorch image models. https://github.com/rwightman/pytorch-image-models. doi:10.5281/zenodo.4414861.
  91. Cvt: Introducing convolutions to vision transformers, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 22–31.
  92. Fedcg: Leverage conditional gan for protecting privacy and maintaining competitive performance in federated learning, in: International Joint Conference on Artificial Intelligence. URL: https://api.semanticscholar.org/CorpusID:244130332.
  93. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 .
  94. Capsule graph neural network, in: International conference on learning representations.
  95. Nyströmformer: A nyström-based algorithm for approximating self-attention, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 14138–14148.
  96. Global vision transformer pruning with hessian-aware saliency, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18547–18557.
  97. Width & depth pruning for vision transformers, in: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3143–3151.
  98. X-pruner: explainable pruning for vision transformers, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24355–24363.
  99. Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization, in: European Conference on Computer Vision, Springer. pp. 191–207.
  100. Vision transformer pruning. arXiv preprint arXiv:2104.08500 .
  101. Convolutional recurrent neural networks: Learning spatial dependencies for image representation, in: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 18–26.
Citations (4)

Summary

We haven't generated a summary for this paper yet.