Papers
Topics
Authors
Recent
Search
2000 character limit reached

Lowering PyTorch's Memory Consumption for Selective Differentiation

Published 15 Apr 2024 in cs.LG | (2404.12406v2)

Abstract: Memory is a limiting resource for many deep learning tasks. Beside the neural network weights, one main memory consumer is the computation graph built up by automatic differentiation (AD) for backpropagation. We observe that PyTorch's current AD implementation neglects information about parameter differentiability when storing the computation graph. This information is useful though to reduce memory whenever gradients are requested for a parameter subset, as is the case in many modern fine-tuning tasks. Specifically, inputs to layers that act linearly in their parameters (dense, convolution, or normalization layers) can be discarded whenever the parameters are marked as non-differentiable. We provide a drop-in, differentiability-agnostic implementation of such layers and demonstrate its ability to reduce memory without affecting run time.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Layer normalization. 2016.
  2. High performance convolutional neural networks for document processing. In International Workshop on Frontiers in Handwriting Recognition, 2006.
  3. Actnn: Reducing training memory footprint via 2-bit activation compressed training. In International Conference on Machine Learning (ICLR), 2021.
  4. Rethinking atrous convolution for semantic image segmentation, 2017.
  5. Training deep nets with sublinear memory cost, 2016.
  6. Felix Julius Dangel. Backpropagation beyond the gradient. 2023.
  7. Optimal brain compression: A framework for accurate post-training quantization and pruning. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
  8. Image style transfer using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  9. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.
  10. Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, 2008.
  11. LAVA: Language audio vision alignment for data-efficient video pre-training. In First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022, 2022.
  12. Second order derivatives for network pruning: Optimal brain surgeon. In Advances in Neural Information Processing Systems (NIPS), 1992.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  14. Searching for mobilenetv3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 1314–1324, Los Alamitos, CA, USA, 2019. IEEE Computer Society.
  15. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2022.
  16. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research (JMLR), 2018.
  17. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), 2015.
  18. Surgical fine-tuning improves adaptation to distribution shifts. In The Eleventh International Conference on Learning Representations, 2023.
  19. Training quantized nets: A deeper understanding. Advances in Neural Information Processing Systems (NeurIPS), 2017.
  20. Fully convolutional networks for semantic segmentation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431–3440, Los Alamitos, CA, USA, 2015. IEEE Computer Society.
  21. A white paper on neural network quantization, 2021.
  22. Randomized automatic differentiation. In International Conference on Learning Representations (ICLR), 2021.
  23. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  24. Fabian Pedregosa. memory_profiler: Monitor memory usage of python code.
  25. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, pages 8748–8763. PMLR, 2021.
  26. Zero-offload: Democratizing billion-scale model training, 2021.
  27. Faster R-CNN: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada, pages 91–99, 2015.
  28. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  29. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
  30. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 6105–6114. PMLR, 2019.
  31. Efficientnetv2: Smaller models and faster training. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, pages 10096–10106. PMLR, 2021.
  32. The computational limits of deep learning, 2020.
  33. Instance normalization: The missing ingredient for fast stylization. 2016.
  34. Group normalization. International Journal of Computer Vision, 2019.
  35. Aggregated residual transformations for deep neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 5987–5995. IEEE Computer Society, 2017.
  36. Tuning layernorm in attention: Towards efficient multi-modal LLM finetuning. In International Conference on Learning Representations (ICLR), 2024.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.