Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning (2101.08482v2)

Published 21 Jan 2021 in cs.LG, cs.AI, and cs.CV

Abstract: We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques. Unlike the standard BN, where the statistics are computed within each batch, EMAN, used in the teacher, updates its statistics by exponential moving average from the BN statistics of the student. This design reduces the intrinsic cross-sample dependency of BN and enhances the generalization of the teacher. EMAN improves strong baselines for self-supervised learning by 4-6/1-2 points and semi-supervised learning by about 7/2 points, when 1%/10% supervised labels are available on ImageNet. These improvements are consistent across methods, network architectures, training duration, and datasets, demonstrating the general effectiveness of this technique. The code is available at https://github.com/amazon-research/exponential-moving-average-normalization.

Citations (110)

View on Semantic Scholar

Summary

The paper introduces a plug-in replacement for Batch Normalization by using EMAN in teacher networks to mitigate cross-sample dependency and parameter mismatch.
It demonstrates performance boosts of 4-6 points in self-supervised learning and 7 points in semi-supervised learning with limited labels on ImageNet.
EMAN adds minimal complexity, providing a simple yet robust alternative to methods requiring intricate cross-GPU communication.

Analyzing Exponential Moving Average Normalization for Self-supervised and Semi-supervised Learning

The paper introduces a normalization technique called Exponential Moving Average Normalization (EMAN) as an enhancement over the traditional Batch Normalization (BN) in the context of self-supervised and semi-supervised learning frameworks. This approach is particularly utilized within student-teacher models which are prevalent in this learning paradigm. The core proposal revolves around substituting the BN in the teacher network with EMAN, addressing specific drawbacks associated with BN in such models.

Key Contributions

Plug-in Replacement for Batch Normalization: The paper proposes using EMAN in place of BN in the teacher network. EMAN mitigates the intrinsic issues such as cross-sample dependency and parameter-statistics mismatch encountered in standard BN usage within student-teacher architectures.
Performance Improvements: Empirically, EMAN boosts performance in self-supervised learning by 4-6 points and in semi-supervised learning by 7 points when only 1% supervised labels are used on ImageNet. These results highlight EMAN's consistent potential across various architectures, datasets, and training durations.
Minimal Complexity Addition: EMAN is straightforward in implementation, requiring minimal code modifications akin to BN updates. This simplicity is instrumental compared to other complex normalization methods like ShuffleBN or SyncBN, which require intricate cross-GPU communication.
Evaluation and Generalization: The evaluation spans state-of-the-art frameworks such as MoCo, BYOL (Bootstrap Your Own Latent), and FixMatch, showcasing EMAN’s robust adaptability and outperformance in various scenarios, attesting to its general applicability.

Technical Insights

The novel aspect of EMAN lies in its ability to maintain consistent normalization statistics aligned with the slowly evolving teacher parameters due to the exponential moving average update. Unlike BN, where statistics are batch-dependent, EMAN enables the network to utilize exponential moving average statistics from the student, eradicating cross-sample dependencies. This strategic adaptation reduces the risk of parameter mismatches that might arise due to non-optimized real-time batch computations.

Implications and Future Directions

The introduction of EMAN could potentially reform how normalization is perceived in robust learning paradigms like self-supervised and semi-supervised learning. By effectively minimizing the known drawbacks of BN, EMAN provides a clearer learning signal during training, especially beneficial in scenarios with scarce labeled data.

From a theoretical standpoint, the paper's insights could prompt further research into stability and efficiency in self-supervised models, extending usage beyond ImageNet to more domains such as medical imaging, where annotation resources are limited. Future work could delve into adapting EMAN to other architectures or exploring its synergy with different parameter averaging techniques, like those in ensemble learning.

In conclusion, this research underscores the importance of novel normalization techniques in enhancing learning paradigms dependent on sparse supervision, paving the way for more effective deployment of AI in resource-constrained environments.

PDF Markdown

Related Papers

GitHub

GitHub - amazon-science/exponential-moving-average-normalization: PyTorch implementation of EMAN for self-supervised and semi-supervised learning: https://arxiv.org/abs/2101.08482 (98 stars)