Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DyCE: Dynamically Configurable Exiting for Deep Learning Compression and Real-time Scaling (2403.01695v3)

Published 4 Mar 2024 in cs.LG and cs.AI

Abstract: Conventional deep learning (DL) model compression and scaling methods focus on altering the model's components, impacting the results across all samples uniformly. However, since samples vary in difficulty, a dynamic model that adapts computation based on sample complexity offers a novel perspective for compression and scaling. Despite this potential, existing dynamic models are typically monolithic and model-specific, limiting their generalizability as broad compression and scaling methods. Additionally, most deployed DL systems are fixed, unable to adjust their scale once deployed and, therefore, cannot adapt to the varying real-time demands. This paper introduces DyCE, a dynamically configurable system that can adjust the performance-complexity trade-off of a DL model at runtime without requiring re-initialization or redeployment on inference hardware. DyCE achieves this by adding small exit networks to intermediate layers of the original model, allowing computation to terminate early if acceptable results are obtained. DyCE also decouples the design of an efficient dynamic model, facilitating easy adaptation to new base models and potential general use in compression and scaling. We also propose methods for generating optimized configurations and determining the types and positions of exit networks to achieve desired performance and complexity trade-offs. By enabling simple configuration switching, DyCE provides fine-grained performance tuning in real-time. We demonstrate the effectiveness of DyCE through image classification tasks using deep convolutional neural networks (CNNs). DyCE significantly reduces computational complexity by 23.5% for ResNet152 and 25.9% for ConvNextv2-tiny on ImageNet, with accuracy reductions of less than 0.5%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. M. Dayarathna, Y. Wen, and R. Fan, “Data Center Energy Consumption Modeling: A Survey,” IEEE Communications Surveys & Tutorials, vol. 18, no. 1, pp. 732–794, 2016, conference Name: IEEE Communications Surveys & Tutorials.
  2. B. Wu, Y. Wang, P. Zhang, Y. Tian, P. Vajda, and K. Keutzer, “Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search,” arXiv:1812.00090 [cs], 2018. [Online]. Available: http://arxiv.org/abs/1812.00090
  3. D. Blalock, J. J. Gonzalez Ortiz, J. Frankle, and J. Guttag, “What is the State of Neural Network Pruning?” Proceedings of Machine Learning and Systems, vol. 2, pp. 129–146, Mar. 2020. [Online]. Available: https://proceedings.mlsys.org/paper/2020/hash/d2ddea18f00665ce8623e36bd4e3c7c5-Abstract.html
  4. M. Lin, L. Cao, S. Li, Q. Ye, Y. Tian, J. Liu, Q. Tian, and R. Ji, “Filter Sketch for Network Pruning,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–10, 2021.
  5. S. Teerapittayanon, B. McDanel, and H. Kung, “BranchyNet: Fast inference via early exiting from deep neural networks,” in 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 2464–2469.
  6. Y. Kaya, S. Hong, and T. Dumitras, “Shallow-Deep Networks: Understanding and Mitigating Network Overthinking,” in Proceedings of the 36th International Conference on Machine Learning.   PMLR, 2019, pp. 3301–3310. [Online]. Available: https://proceedings.mlr.press/v97/kaya19a.html
  7. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, and others, “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015, publisher: Springer.
  8. S. Hong, Y. Kaya, I.-V. Modoranu, and T. Dumitraş, “A Panda? No, It’s a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference,” Feb. 2021, arXiv:2010.02432 [cs]. [Online]. Available: http://arxiv.org/abs/2010.02432
  9. R. Hang, X. Qian, and Q. Liu, “MSNet: Multi-Resolution Synergistic Networks for Adaptive Inference,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 5, pp. 2009–2018, May 2023, conference Name: IEEE Transactions on Circuits and Systems for Video Technology.
  10. X. Chen, H. Dai, Y. Li, X. Gao, and L. Song, “Learning to Stop While Learning to Predict,” Jun. 2020, arXiv:2006.05082 [cs, stat]. [Online]. Available: http://arxiv.org/abs/2006.05082
  11. X. Dai, X. Kong, and T. Guo, “EPNet: Learning to Exit with Flexible Multi-Branch Network,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, ser. CIKM ’20.   New York, NY, USA: Association for Computing Machinery, Oct. 2020, pp. 235–244. [Online]. Available: https://dl.acm.org/doi/10.1145/3340531.3411973
  12. X. Wang, F. Yu, Z.-Y. Dou, T. Darrell, and J. E. Gonzalez, “SkipNet: Learning Dynamic Routing in Convolutional Networks,” Jul. 2018, arXiv:1711.09485 [cs]. [Online]. Available: http://arxiv.org/abs/1711.09485
  13. J. Lin, Y. Rao, J. Lu, and J. Zhou, “Runtime Neural Pruning,” in Advances in Neural Information Processing Systems, vol. 30.   Curran Associates, Inc., 2017. [Online]. Available: https://papers.nips.cc/paper_files/paper/2017/hash/a51fb975227d6640e4fe47854476d133-Abstract.html
  14. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  15. M. Tan and Q. V. Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,” in Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97.   PMLR, 2019, pp. 6105–6114. [Online]. Available: http://proceedings.mlr.press/v97/tan19a.html
  16. D. D. Lin, S. S. Talathi, and V. S. Annapureddy, “Fixed Point Quantization of Deep Convolutional Networks,” in Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, ser. JMLR Workshop and Conference Proceedings, M.-F. Balcan and K. Q. Weinberger, Eds., vol. 48.   JMLR.org, 2016, pp. 2849–2858. [Online]. Available: http://proceedings.mlr.press/v48/linb16.html
  17. Y. Han, G. Huang, S. Song, L. Yang, H. Wang, and Y. Wang, “Dynamic Neural Networks: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 7436–7456, Nov. 2022, conference Name: IEEE Transactions on Pattern Analysis and Machine Intelligence.
  18. W. Liu, P. Zhou, Z. Zhao, Z. Wang, H. Deng, and Q. Ju, “FastBERT: a Self-distilling BERT with Adaptive Inference Time,” Apr. 2020, arXiv:2004.02178 [cs]. [Online]. Available: http://arxiv.org/abs/2004.02178
  19. I. Leontiadis, S. Laskaridis, S. I. Venieris, and N. D. Lane, “It’s always personal: Using Early Exits for Efficient On-Device CNN Personalisation,” in Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications, Feb. 2021, pp. 15–21, arXiv:2102.01393 [cs]. [Online]. Available: http://arxiv.org/abs/2102.01393
  20. M. W. Gardner and S. R. Dorling, “Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences,” Atmospheric Environment, vol. 32, no. 14, pp. 2627–2636, 1998. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1352231097004470
  21. G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” arXiv:1503.02531 [cs, stat], Mar. 2015, arXiv: 1503.02531. [Online]. Available: http://arxiv.org/abs/1503.02531
  22. S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie, “ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders,” Jan. 2023, arXiv:2301.00808 [cs]. [Online]. Available: http://arxiv.org/abs/2301.00808
  23. P. Team, “ResNet - PyTorch.” [Online]. Available: https://pytorch.org/hub/pytorch_vision_resnet
  24. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is All you Need,” in Advances in Neural Information Processing Systems, vol. 30.   Curran Associates, Inc., 2017. [Online]. Available: https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Citations (1)

Summary

We haven't generated a summary for this paper yet.