Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SoD$^2$: Statically Optimizing Dynamic Deep Neural Network (2403.00176v1)

Published 29 Feb 2024 in cs.LG, cs.AI, and cs.PL

Abstract: Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution, are becoming common. This paper presents SoD$2$, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a classification of common operators that form DNNs, and the use of this classification towards a Rank and Dimension Propagation (RDP) method. This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these. Next, using RDP we enable a series of optimizations, like fused code generation, execution (order) planning, and even runtime memory allocation plan generation. By evaluating the framework on 10 emerging Dynamic DNNs and comparing it against several existing systems, we demonstrate both reductions in execution latency and memory requirements, with RDP-enabled key optimizations responsible for much of the gains. Our evaluation results show that SoD$2$ runs up to $3.9\times$ faster than these systems while saving up to $88\%$ peak memory consumption.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. TensorFlow: A system for large-scale machine learning. In OSDI 2016. USENIX Association, USA, 265–283.
  2. Ordering Chaos: Memory-Aware Scheduling of Irregularly Wired Neural Networks for Edge Devices. In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020, Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org. https://proceedings.mlsys.org/book/290.pdf
  3. In-Place Activated BatchNorm for Memory-Optimized Training of DNNs. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. IEEE Computer Society, 5639–5647. https://doi.org/10.1109/CVPR.2018.00591
  4. Interprocedural constant propagation. ACM SIGPLAN Notices 21, 7 (1986), 152–161.
  5. TVM: An automated end-to-end optimizing compiler for deep learning. In OSDI 2018. 578–594.
  6. Alexander Collins and Vinod Grover. 2022. Axon: A Language for Dynamic Shapes in Deep Learning Graphs. ArXiv preprint abs/2210.02374 (2022). https://arxiv.org/abs/2210.02374
  7. Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM. https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm
  8. Karl Crary and Stephanie Weirich. 1999. Flexible type analysis. In Proceedings of the fourth ACM SIGPLAN international conference on Functional programming. 233–248.
  9. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA. IEEE Computer Society, 248–255. https://doi.org/10.1109/CVPR.2009.5206848
  10. Microsoft Developer. 2023. CodeBERT. https://github.com/microsoft/CodeBERT.
  11. Numpy developers. 2023a. Tensor Broadcasting. https://numpy.org/doc/stable/user/basics.broadcasting.html. Version: 1.24.
  12. ONNX Runtime developers. 2023b. ONNX Runtime. https://onnxruntime.ai/. Version: 1.14.1.
  13. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
  14. TurboTransformers: an efficient GPU serving system for transformer models. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 389–402.
  15. Cortex: A compiler for recursive deep learning models. Proceedings of Machine Learning and Systems 3 (2021), 38–54.
  16. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
  17. Low latency RNN inference with cellular batching. In Proceedings of the Thirteenth EuroSys Conference. 1–15.
  18. Ben Goertzel. 2014. Artificial general intelligence: concept, state of the art, and future prospects. Journal of Artificial General Intelligence 5, 1 (2014), 1.
  19. Google. 2023. Tensorflow XLA. https://www.tensorflow.org/xla.
  20. Conformer: Convolution-augmented Transformer for Speech Recognition. In Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, Helen Meng, Bo Xu, and Thomas Fang Zheng (Eds.). ISCA, 5036–5040. https://doi.org/10.21437/Interspeech.2020-3015
  21. Mcdnn: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys). ACM, 123–136.
  22. Robert Harper and Greg Morrisett. 1995. Compiling polymorphism using intensional type analysis. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 130–141.
  23. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services (MobiSys). ACM, 82–95. https://doi.org/10.1145/3081333.3081360
  24. Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization. In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020, Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org. https://proceedings.mlsys.org/book/320.pdf
  25. Improving the expressiveness of deep learning frameworks with recursion. In Proceedings of the Thirteenth EuroSys Conference. 1–13.
  26. MNN: A Universal and Efficient Inference Engine. In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2-4, 2020, Inderjit S. Dhillon, Dimitris S. Papailiopoulos, and Vivienne Sze (Eds.). mlsys.org. https://proceedings.mlsys.org/book/287.pdf
  27. F8net: Fixed-point 8-bit only multiplication for network quantization. arXiv preprint arXiv:2202.05239 (2022).
  28. Gary A Kildall. 1973. A unified approach to global program optimization. In Proceedings of the 1st annual ACM SIGACT-SIGPLAN symposium on Principles of programming languages. 194–206.
  29. Segment Anything. ArXiv preprint abs/2304.02643 (2023). https://arxiv.org/abs/2304.02643
  30. Dynamic Tensor Rematerialization. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=Vfs_2RnOD0H
  31. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1–29.
  32. DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). 1–12.
  33. Mlir: Scaling compiler infrastructure for domain specific computation. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 2–14.
  34. Mobisr: Efficient on-device super-resolution through heterogeneous mobile processors. In The 25th Annual International Conference on Mobile Computing and Networking. 1–16.
  35. Maksim Levental. 2022. Memory Planning for Deep Neural Networks. ArXiv preprint abs/2203.00448 (2022). https://arxiv.org/abs/2203.00448
  36. YOLOv6: A single-stage object detection framework for industrial applications. ArXiv preprint abs/2209.02976 (2022). https://arxiv.org/abs/2209.02976
  37. Dynamic Slimmable Network. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021. Computer Vision Foundation / IEEE, 8607–8617. https://doi.org/10.1109/CVPR46437.2021.00850
  38. Edgar Liberis and Nicholas D Lane. 2019. Neural networks on microcontrollers: saving memory at inference via operator reordering. ArXiv preprint abs/1910.05110 (2019). https://arxiv.org/abs/1910.05110
  39. MCUNet: Tiny Deep Learning on IoT Devices. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/86c51678350f656dcc7f490a43946ee5-Abstract.html
  40. Microsoft COCO: Common Objects in Context. CoRR abs/1405.0312 (2014). arXiv:1405.0312 http://arxiv.org/abs/1405.0312
  41. Deep Learning with Dynamic Computation Graphs. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=ryrGawqex
  42. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 881–897.
  43. TelaMalloc: Efficient On-Chip Memory Allocation for Production Machine Learning Accelerators. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 (ASPLOS 2023). Association for Computing Machinery, New York, NY, USA, 123–137. https://doi.org/10.1145/3567955.3567961
  44. Robin Milner. 1978. A theory of type polymorphism in programming. Journal of computer and system sciences 17, 3 (1978), 348–375.
  45. Dynet: The dynamic neural network toolkit. ArXiv preprint abs/1701.03980 (2017). https://arxiv.org/abs/1701.03980
  46. DNNFusion: accelerating deep neural networks execution with advanced operator fusion. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. 883–898.
  47. Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 907–922.
  48. ONNX. 2017. Open Neural Network Exchange. https://www.onnx.ai.
  49. Librispeech: An ASR corpus based on public domain audio books. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015. IEEE, 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964
  50. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 8024–8035. https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html
  51. Yury Pisarchyk and Juhyun Lee. 2020. Efficient memory management for deep neural net inference. ArXiv preprint abs/2001.03288 (2020). https://arxiv.org/abs/2001.03288
  52. Qualcomm. 2016. Snapdragon 835. https://www.qualcomm.com/products/snapdragon-835-mobile-platform.
  53. Qualcomm. 2020. Snapdragon 888. https://www.qualcomm.com/products/snapdragon-888-5g-mobile-platform.
  54. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In PLDI 2013. Association for Computing Machinery, New York, NY, USA, 519–530.
  55. DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Eds.). 13937–13949. https://proceedings.neurips.cc/paper/2021/hash/747d3443e319a22747fbb873e8b2f9f2-Abstract.html
  56. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695.
  57. Nimble: Efficiently compiling dynamic neural networks for model inference. Proceedings of Machine Learning and Systems 3 (2021), 208–222.
  58. Jeremy Siek and Walid Taha. 2007. Gradual typing for objects. In ECOOP 2007–Object-Oriented Programming: 21st European Conference, Berlin, Germany, July 30-August 3, 2007. Proceedings 21. Springer, 2–27.
  59. TensorFlow. 2018. TensorFlow Grappler. https://www.tensorflow.org/guide/graph_optimization.
  60. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  61. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. ArXiv preprint abs/1802.04730 (2018). https://arxiv.org/abs/1802.04730
  62. Andreas Veit and Serge Belongie. 2018. Convolutional Networks with Adaptive Inference Graphs. European Conference on Computer Vision (ECCV) (2018).
  63. Skipnet: Learning dynamic routing in convolutional networks. In Proceedings of the European Conference on Computer Vision (ECCV). 409–424.
  64. SparCL: Sparse Continual Learning on the Edge. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 20366–20380. https://proceedings.neurips.cc/paper_files/paper/2022/file/80133d0f6eccaace15508f91e3c5a93c-Paper-Conference.pdf
  65. BlockDrop: Dynamic Inference Paths in Residual Networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. IEEE Computer Society, 8817–8826. https://doi.org/10.1109/CVPR.2018.00919
  66. DeepCache: Principled Cache for Mobile Deep Vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking (MobiCom ’18). Association for Computing Machinery, New York, NY, USA, 129–144.
  67. Cavs: An efficient runtime system for dynamic neural networks. In 2018 {normal-{\{{USENIX}normal-}\}} Annual Technical Conference ({normal-{\{{USENIX}normal-}\}}{normal-{\{{ATC}normal-}\}} 18). 937–950.
  68. Resolution Adaptive Networks for Efficient Inference. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. IEEE, 2366–2375. https://doi.org/10.1109/CVPR42600.2020.00244
  69. DeepSense: A Unified Deep Learning Framework for Time-Series Mobile Sensing Data Processing. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017, Rick Barrett, Rick Cummings, Eugene Agichtein, and Evgeniy Gabrilovich (Eds.). ACM, 351–360. https://doi.org/10.1145/3038912.3052577
  70. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022).
  71. Boosting Distributed Training Performance of the Unpadded BERT Model. arXiv preprint arXiv:2208.08124 (2022).
  72. ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs. ArXiv preprint abs/2210.03052 (2022). https://arxiv.org/abs/2210.03052
  73. DietCode: Automatic optimization for dynamic tensor programs. Proceedings of Machine Learning and Systems 4 (2022), 848–863.
  74. DISC: A dynamic shape compiler for machine learning workloads. In Proceedings of the 1st Workshop on Machine Learning and Systems. 89–95.
Citations (3)

Summary

We haven't generated a summary for this paper yet.