Cross-Iteration Batch Normalization (2002.05712v3)

Published 13 Feb 2020 in cs.LG, cs.CV, and stat.ML

Abstract: A well-known issue of Batch Normalization is its significantly reduced effectiveness in the case of small mini-batch sizes. When a mini-batch contains few examples, the statistics upon which the normalization is defined cannot be reliably estimated from it during a training iteration. To address this problem, we present Cross-Iteration Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality. A challenge of computing statistics over multiple iterations is that the network activations from different iterations are not comparable to each other due to changes in network weights. We thus compensate for the network weight changes via a proposed technique based on Taylor polynomials, so that the statistics can be accurately estimated and batch normalization can be effectively applied. On object detection and image classification with small mini-batch sizes, CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique. Code is available at https://github.com/Howal/Cross-iterationBatchNorm .

Citations (81)

View on Semantic Scholar

Summary

The paper introduces a novel method that aggregates batch statistics over multiple iterations to overcome the limitations of small mini-batch sizes.
It employs a Taylor series-based compensation technique to align outdated statistics with current network weights, enhancing normalization accuracy.
Experimental results on ImageNet and COCO demonstrate that CBN outperforms standard BN and rivals SyncBN, ensuring improved performance with minimal overhead.

Cross-Iteration Batch Normalization

The paper "Cross-Iteration Batch Normalization" explores a novel approach to address the limitations of Batch Normalization (BN) in scenarios involving small mini-batch sizes. BN is a widely adopted technique in neural network training but suffers from decreased accuracy when mini-batches comprise too few examples. This paper introduces Cross-Iteration Batch Normalization (CBN), which leverages accumulated statistics from multiple recent training iterations to enhance the reliability of normalization statistics and mitigate the challenges encountered with small batch sizes.

Problem Statement and Motivation

The use of BN assumes that statistics derived from each mini-batch are representative of the entire dataset. However, this assumption is invalid when using small mini-batches due to the increased variability and noise in estimating the mean and variance. This can significantly impact the performance of models in memory-constrained tasks such as object detection, semantic segmentation, and action recognition.

Previous attempts to overcome the batch size constraint, such as Layer Normalization (LN), Instance Normalization (IN), Group Normalization (GN), and SyncBN, have shown varying levels of success. However, they either completely avoid using batch statistics or introduce complexities like additional synchronization overheads across devices.

Proposed Approach

CBN addresses these issues by accumulating and employing batch statistics across several iterations. The core challenge with this approach lies in normalizing activations from different iterations due to the dynamic nature of network weights during training. To overcome this, the authors propose a Taylor series-based compensation technique that allows these multi-iteration statistics to be accurately utilized.

The Taylor polynomial formulation approximates the mean and variance for older iterations, making them compatible with the current weights of the network. This compensation allows CBN to effectively aggregate statistics over time, thereby simulating a larger effective batch size.

Results and Evaluation

The paper provides empirical analyses demonstrating that CBN significantly improves the estimation of statistics, particularly for small batch sizes. Experimental results on ImageNet classification task and COCO object detection show that CBN consistently outperforms standard BN and naive approaches that calculate batch statistics without compensation. CBN is also shown to achieve performance levels comparable to those achieved using SyncBN.

For instance, in image classification tasks, CBN demonstrates superior top-1 accuracy on ImageNet when trained with smaller batch sizes than those allowed by other normalization methods like GN and BRN. Additionally, in object detection scenarios with COCO, CBN maintains competitive average precision metrics in comparison to GN and SyncBN, underlining its effectiveness across various architectural settings.

Theoretical and Practical Implications

The proposed Cross-Iteration Batch Normalization opens a new avenue for utilizing temporal information effectively in batch normalization, presenting a feasible solution for networks operating under memory constraints. The approximation of statistics through Taylor polynomials could inspire further innovations in managing weight dynamics across iterations.

From a practical perspective, CBN imparts minimal computational overhead, making it a viable option for integration into existing deep learning pipelines without significant performance trade-offs. As deep learning models grow in complexity and memory demands increase, techniques like CBN will become crucial in enabling training with limited resources while preserving accuracy.

Future Directions

The promising initial results of CBN suggest multiple directions for future research. Integration with other normalization techniques and further optimization of the compensation mechanism could yield additional improvements. Moreover, exploring CBN within other domains such as reinforcement learning and unsupervised learning could test the robustness and versatility of this approach beyond traditional supervised tasks.

In summary, the introduction of Cross-Iteration Batch Normalization provides an effective method to enhance batch normalization in scenarios with limited batch sizes, combining theoretical elegance with practical usability in neural network training.

PDF Markdown

Related Papers

GitHub

GitHub - Howal/Cross-iterationBatchNorm (130 stars)

Tweets

https://twitter.com/PapersTrending/status/1229360200240500737

https://twitter.com/PapersTrending/status/1228997684868063234

YouTube

Show All Videos