Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search
The paper "Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search" introduces an innovative approach to neural architecture search (NAS) for Generative Adversarial Networks (GANs) through the use of off-policy reinforcement learning (RL). The novel methodology addresses inefficiencies and optimization challenges associated with conventional GAN architecture search processes, proposing a reformulation of the problem as a Markov Decision Process (MDP) that facilitates smoother architecture sampling and more effective RL-based search algorithms.
The authors start by observing the significant resource demands and expertise required to manually design high-performance GAN architectures, drawing attention to state-of-the-art GAN models that necessitate complex network designs. Recognizing the potential to mitigate these challenges through automation, the authors turn to NAS, which has been effective in discriminative models and is beginning to find applications in GANs. Prior efforts in RL-based GAN architecture search, such as AGAN and AutoGAN, have encountered limitations including high variance, noise in gradient updates, and inefficiencies tied to on-policy learning approaches.
In response, the authors propose a comprehensive reformulation leveraging off-policy methods, which have shown promise in enhancing sample efficiency across various RL tasks. The core of their approach is the expression of the GAN architecture search as an MDP, effectively decomposing the architecture search into a sequence of decisions. Each decision constitutes a cell design step contributing incrementally to the overall architecture, thus facilitating the use of past experience in policy updates—a key advantage of off-policy learning.
The implementation utilizes Soft Actor-Critic (SAC), an off-policy RL algorithm known for its sample efficiency. The paper emphasizes several critical design choices, such as the progressive state representation inspired by human-designed Progressive GANs, which helps alleviate variance and supports stability in training. In addition, the reward function combines Inception Score (IS) and Fréchet Inception Distance (FID) to provide a robust performance metric across varying architecture trajectories. Action space is defined to encompass operations relevant to generator cell design, maintaining similarity with previous benchmarks to ensure comparability.
Empirical results underscore the efficiency and effectiveness of the proposed E2GAN framework. The method discovers high-performing architectures on standard datasets, CIFAR-10 and STL-10, in just 7 GPU hours—remarkably efficient compared to other RL-based approaches that could require up to 1200 GPU days. The competitive performance of the discovered architectures, reflected in both IS and FID metrics, demonstrates the technique’s ability to approach or exceed past GAN models crafted through manual effort or other NAS methods.
The implications of these findings are both practical and theoretical. Practically, the paper provides a significant reduction in search time and computational resources for GAN architecture design, broadening accessibility to powerful GAN applications. Theoretically, it contributes a formalized approach to architecture search as an MDP, opening avenues for further exploration into multi-agent scenarios that may simultaneously optimize generator and discriminator networks. This paper presents an important step forward in automated GAN model design, with future work expected to expand on solving its intricacies and enhancing scalability and generalization.