Memory Optimization for Deep Networks (2010.14501v3)

Published 27 Oct 2020 in cs.LG and cs.CV

Abstract: Deep learning is slowly, but steadily, hitting a memory bottleneck. While the tensor computation in top-of-the-line GPUs increased by 32x over the last five years, the total available memory only grew by 2.5x. This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. In this paper, we present MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks. MONeT jointly optimizes the checkpointing schedule and the implementation of various operators. MONeT is able to outperform all prior hand-tuned operations as well as automated checkpointing. MONeT reduces the overall memory requirement by 3x for various PyTorch models, with a 9-16% overhead in computation. For the same computation cost, MONeT requires 1.2-1.8x less memory than current state-of-the-art automated checkpointing frameworks. Our code is available at https://github.com/utsaslab/MONeT.

PDF Abstract

Essay on "Memory Optimization for Deep Networks"

The paper, entitled "Memory Optimization for Deep Networks," addresses a crucial challenge faced by the deep learning community—memory bottlenecks in the training of expansive neural network architectures. The authors present an innovative framework, MONeT (Memory Optimized Network Training), which minimizes memory usage while balancing computational overhead. This work is particularly significant as the growth in computational capabilities of GPUs has far outpaced the increase in memory availability.

Overview and Contributions

MONeT is framed as an optimization problem, focusing on reducing memory requirements while keeping computational costs manageable. The twofold strategy combines both global graph-level optimizations, like checkpointing, with local, operator-level adjustments in execution strategies. This composite approach differentiates MONeT from existing frameworks that typically optimize either in isolation.

Checkpointing is a well-established technique for saving memory during network training by recomputing certain values on-the-fly. MONeT enhances this strategy by integrating operator-level adaptations, effectively treating the selection of convolution algorithms, intermediate representations, and backward operator dependencies as part of the optimization process. Such an approach allows for more granular control over memory and compute trade-offs.

Key Results

The empirical evaluations reveal compelling insights: MONeT achieves a $3\times$ reduction in memory usage across various deep network architectures, such as ResNet-50, GoogleNet, and VGG-16, while incurring only a 9-16% increase in computational overhead. These results are achieved compared to a baseline PyTorch implementation. Further, MONeT surpasses state-of-the-art automated checkpointing frameworks, requiring 1.2–1.8 $\times$ less memory for equivalent computation costs.

Performance gains are demonstrated comprehensively through models with different architectural designs and memory requirements, validating the efficacy of MONeT's optimization across a spectrum of real-world scenarios.

Theoretical Significance and Implications

The theoretical framework underpinning MONeT is grounded in integer programming, effectively handling the interaction between various constraints posed by memory limits, computational dependencies, and operator implementations. By addressing both global and local memory consumption via an integrated optimization solution, MONeT provides tighter bounds on memory resources than previous methodologies.

This research imparts significant implications for the field of artificial intelligence and deep learning, where model complexity is only set to grow. MONeT offers a path forward for training large-scale models without commensurate memory increases, thus enabling broader access to cutting-edge models on hardware with limited memory capacity.

Future Outlook

The novel approach delineated in this paper opens numerous avenues for further research and development. As neural networks increasingly find their place in edge computing and mobile applications, memory-efficient training paradigms will become more crucial. Future iterations of MONeT can potentially incorporate broader aspects of model optimization, such as quantization or pruning, and test its efficacy on more diverse hardware systems.

In conclusion, this paper makes an important contribution to the field of deep learning model training by systematically diminishing the memory constraints that currently pose a significant hindrance to the exploration of larger, more complex neural architectures. As such, MONeT stands as a noteworthy advancement with potential for wide-reaching impact in both practical and theoretical realms of AI research.