Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
The paper presents a method for accelerating sparse neural network training using provable and efficient strategies for applying N:M transposable fine-grained block sparsity masks. This research addresses existing limitations in accelerating training phases, requiring both forward and backward passes to utilize the same sparse patterns, thereby optimizing computation on modern hardware with sparse tensor cores.
Key Contributions
- Mask Diversity Metric: The paper introduces a novel metric, "mask diversity," to evaluate different sparsity masks and their impact on neural network accuracy. Mask diversity quantifies the number of possible mask configurations, correlating this with preservation of model accuracy across varying structured pruning methods, illustrating that higher mask diversity aligns with better performance.
- N:M Transposable Fine-grained Sparsity: Implementing sparsity that benefits both inference and training, the paper suggests using transposable sparsity masks. By ensuring that both weight matrices and their transpositions maintain the same sparsity pattern, the method fosters acceleration of backward multiplication. This is formally presented as a minimum-cost flow problem, enabling precise optimization subject to sparsity constraints. Two solution approaches emerge:
- A comprehensive min-cost flow algorithm, offering exact solutions suitable for static pre-trained model restructuring.
- A swift, 2-approximation algorithm for dynamic scenarios, with time complexity nearly linear in size, suited for iterative training procedures.
- Experimental Verification and Comparison:
- Static Model Adjustments: Utilizing transposable masks from pretrained dense models (e.g. ResNet50, BERT), achieving optimized dense-to-sparse conversions with efficiency gains, compared against NVIDIA's ASP method.
- Dynamic Training Scenario: Methodologies ensure effective sparse model training, negating the need for pre-training with a dense model. This capability is demonstrated across tasks in computer vision and language processing, maintaining parity or surpassing existing benchmarks in resource-moderate environments.
- AdaPrune Method: The paper proposes a conversion technique to adapt between sparsity constraints, enhancing model portability across hardware. AdaPrune effectively remodels existing unstructured models into N:M formats with negligible accuracy loss. Empirically, it demonstrated improved flexibility for hardware compatibility, subverting the need for comprehensive retraining.
Implications and Future Work
This research offers significant implications for enhancing computational efficiency in training neural networks by optimizing both memory and runtime without compromising accuracy. The introduction of transposable masks and the resulting dual acceleration in matrix operations presents a forward path for deep learning infrastructure, notably in resource-constrained or hardware-accelerated environments.
Moving forward, expanding the library of supported mask structures or associated optimization algorithms could further improve performance across diverse hardware architectures. Integrating this approach with additional neural architecture search (NAS) techniques may augment efficiency, offering a dual benefit by refining architectures while enforcing efficient computation via sparse training methods.
Overall, this paper establishes a robust, theoretically grounded framework for enriching neural networks' training efficiency, with practical outcomes evidencing significant throughput enhancements, critical to compelling future development across AI-driven applications.