Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization (1712.01034v2)

Published 4 Dec 2017 in cs.CV

Abstract: Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks. At the core of our method is a meta-layer designed with loop-embedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU. Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging fine-grained benchmarks. The source code and network models will be available at http://www.peihuali.org/iSQRT-COV

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Peihua Li (18 papers)
  2. Jiangtao Xie (10 papers)
  3. Qilong Wang (34 papers)
  4. Zilin Gao (4 papers)
Citations (244)

Summary

Iterative Matrix Square Root Normalization for Efficient Training of Global Covariance Pooling Networks

The paper presents an innovative approach to enhancing the efficiency of global covariance pooling in convolutional neural networks (ConvNets), which have been pivotal in advancing image recognition tasks. The authors propose an iterative matrix square root normalization method, introducing a meta-layer specifically designed for fast, end-to-end training on GPUs. This method addresses the computational inefficiencies associated with the eigendecomposition (EIG) and singular value decomposition (SVD) traditionally used in these tasks.

Key Contributions

  1. Iterative Methodology: The paper introduces a novel iterative method using the Newton-Schulz iteration for calculating matrix square roots. This approach only requires matrix multiplication, which is well-supported on GPUs, thereby enhancing computational efficiency over traditional methods reliant on CPU-based EIG and SVD.
  2. Meta-layer Structure: A uniquely designed meta-layer is a centerpiece of the approach. The layer is structured with a loop-embedded directed graph and comprises three nonlinear structured layers performing pre-normalization, coupled matrix iteration, and post-compensation stages.
  3. Implementation and Performance: The approach is implemented using the ResNet architecture, showing faster convergence and competitive performance on large-scale datasets like ImageNet, with state-of-the-art results on fine-grained benchmarks. On ImageNet, the proposed method achieved top-5 error rates of 6.22% with ResNet-50, outperforming several existing approaches.
  4. Scalability and Efficiency: As demonstrated, the iSQRT-COV method scales effectively across different GPU configurations, making it a viable solution for large-scale, real-time applications where computational resources are a consideration.

Numerical Results

  • The proposed method runs significantly faster compared to CPU-based EIG and SVD, reducing matrix decomposition time from tens of milliseconds to sub-millisecond levels.
  • In the context of AlexNet and ResNet architectures, the proposed iterative normalization approach consistently outperforms or matches classical approaches, even achieving slight improvements over methods that yield exact results using EIG.

Implications and Future Prospects

The method posited in this paper has significant implications for the practical deployment of ConvNets in real-time environments, such as mobile or embedded systems wherein GPU resources predominate. By minimizing reliance on computationally expensive operations, the iSQRT-COV approach represents a critical step towards making complex machine learning models more accessible and deployable at scale.

For theoretical developments, this work opens avenues for further research into iterative solutions for other mathematical functions in matrix computation that are fundamental to signal processing and machine learning. Subsequent research could explore adaptions of this iterative approach for higher-order statistics or other types of normalization functions, potentially improving the robustness and generalizability of ConvNet architectures across different domains.

Overall, this paper is a significant addition to the existing body of knowledge on network training methodologies, offering a clear advancement in leveraging hardware capabilities for enhanced computational efficiency, which is crucial for the continued evolution of AI technologies.