The Unusual Effectiveness of Averaging in GAN Training
This paper presents a theoretical and experimental analysis of two techniques for parameter averaging in Generative Adversarial Network (GAN) training: Moving Average (MA) and Exponential Moving Average (EMA). The main objective is to investigate the effectiveness of these averaging methods in improving GAN training stability and performance.
GANs are established as two-player zero-sum games that often suffer from training instability and non-convergence. The traditional approach to GAN optimization either seeks more stable function families or utilizes alternative objectives, but non-convergence issues persist due to cyclic behaviors around optimal solutions. The paper addresses such cyclic behaviors by applying simple strategies for parameter averaging, specifically outside of the adversarial training loop, thus not altering the game dynamics.
Theoretical Contributions
The authors provide the first theoretical insights into the EMA technique, showing its effect on bilinear games. In simple bilinear settings, EMA does not lead to convergence to equilibrium; instead, it stabilizes cyclic behaviors by shrinking their amplitude. The paper demonstrates that in non-bilinear settings, EMA preserves the stability of locally stable fixed points. This theoretical perspective adds a layer of understanding into why EMA can yield better training stability even when convergence to a strict equilibrium is not theoretically guaranteed.
Experimental Findings
The paper empirically evaluates the performance of both MA and EMA across a diverse set of GANs with different architectures and objectives on various datasets including CIFAR-10, STL-10, CelebA, and ImageNet. The experiments clearly showcase that both MA and EMA lead to improvements in standard GAN metrics such as inception scores and Fréchet Inception Distance (FID). EMA consistently provides greater and more reliable benefits compared to MA, which may suffer due to equal weight averaging over long periods resulting in poorer performance with time-varying iterates.
The paper also compares these averaging methods against other techniques like Consensus Optimization, Optimistic Adam, and Zero-centered Gradient Penalty, demonstrating that averaging is unusually effective, alleviating the cycling and non-convergence challenges often seen in GAN training. Specifically, EMA improves results considerably across diverse experimental settings, further confirming its unusual efficacy.
Implications and Future Directions
While the theoretical framework is mostly limited to bilinear models and local stability constructs, the implications are valuable in practical contexts where GAN applications demand robust training strategies to achieve better visual quality and stability. The findings suggest that parameter averaging, particularly EMA, could serve as a simple yet powerful enhancement to existing optimization protocols in GAN training.
The paper's results warrant further exploration into different configurations of the EMA discount factor to optimize its performance across various GAN models and tasks. This could involve more comprehensive hyperparameter studies to fine-tune EMA's application in conditional GAN setups or larger-scale generative contexts.
In sum, this research provides a significant contribution to understanding and improving GAN training methodologies, offering a strong case for the inclusion of parameter averaging strategies as an effective option in the GAN training toolkit. It paves the way for future explorations into more complex games beyond bilinear models to deepen our understanding of GAN dynamics and improve the field of generative modeling.