Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search (2007.09180v1)

Published 17 Jul 2020 in cs.CV

Abstract: In this paper, we introduce a new reinforcement learning (RL) based neural architecture search (NAS) methodology for effective and efficient generative adversarial network (GAN) architecture search. The key idea is to formulate the GAN architecture search problem as a Markov decision process (MDP) for smoother architecture sampling, which enables a more effective RL-based search algorithm by targeting the potential global optimal architecture. To improve efficiency, we exploit an off-policy GAN architecture search algorithm that makes efficient use of the samples generated by previous policies. Evaluation on two standard benchmark datasets (i.e., CIFAR-10 and STL-10) demonstrates that the proposed method is able to discover highly competitive architectures for generally better image generation results with a considerably reduced computational burden: 7 GPU hours. Our code is available at https://github.com/Yuantian013/E2GAN.

Authors (8)

Yuan Tian (183 papers)
Qin Wang (143 papers)
Zhiwu Huang (41 papers)
Wen Li (107 papers)
Dengxin Dai (99 papers)
Minghao Yang (12 papers)
Jun Wang (991 papers)
Olga Fink (104 papers)

Citations (59)

View on Semantic Scholar

Summary

Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search

The paper "Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search" introduces an innovative approach to neural architecture search (NAS) for Generative Adversarial Networks (GANs) through the use of off-policy reinforcement learning (RL). The novel methodology addresses inefficiencies and optimization challenges associated with conventional GAN architecture search processes, proposing a reformulation of the problem as a Markov Decision Process (MDP) that facilitates smoother architecture sampling and more effective RL-based search algorithms.

The authors start by observing the significant resource demands and expertise required to manually design high-performance GAN architectures, drawing attention to state-of-the-art GAN models that necessitate complex network designs. Recognizing the potential to mitigate these challenges through automation, the authors turn to NAS, which has been effective in discriminative models and is beginning to find applications in GANs. Prior efforts in RL-based GAN architecture search, such as AGAN and AutoGAN, have encountered limitations including high variance, noise in gradient updates, and inefficiencies tied to on-policy learning approaches.

In response, the authors propose a comprehensive reformulation leveraging off-policy methods, which have shown promise in enhancing sample efficiency across various RL tasks. The core of their approach is the expression of the GAN architecture search as an MDP, effectively decomposing the architecture search into a sequence of decisions. Each decision constitutes a cell design step contributing incrementally to the overall architecture, thus facilitating the use of past experience in policy updates—a key advantage of off-policy learning.

The implementation utilizes Soft Actor-Critic (SAC), an off-policy RL algorithm known for its sample efficiency. The paper emphasizes several critical design choices, such as the progressive state representation inspired by human-designed Progressive GANs, which helps alleviate variance and supports stability in training. In addition, the reward function combines Inception Score (IS) and Fréchet Inception Distance (FID) to provide a robust performance metric across varying architecture trajectories. Action space is defined to encompass operations relevant to generator cell design, maintaining similarity with previous benchmarks to ensure comparability.

Empirical results underscore the efficiency and effectiveness of the proposed E $^2$ GAN framework. The method discovers high-performing architectures on standard datasets, CIFAR-10 and STL-10, in just 7 GPU hours—remarkably efficient compared to other RL-based approaches that could require up to 1200 GPU days. The competitive performance of the discovered architectures, reflected in both IS and FID metrics, demonstrates the technique’s ability to approach or exceed past GAN models crafted through manual effort or other NAS methods.

The implications of these findings are both practical and theoretical. Practically, the paper provides a significant reduction in search time and computational resources for GAN architecture design, broadening accessibility to powerful GAN applications. Theoretically, it contributes a formalized approach to architecture search as an MDP, opening avenues for further exploration into multi-agent scenarios that may simultaneously optimize generator and discriminator networks. This paper presents an important step forward in automated GAN model design, with future work expected to expand on solving its intricacies and enhancing scalability and generalization.

PDF Markdown

Related Papers

GitHub

GitHub - Yuantian013/E2GAN: [ECCV 2020]"Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search" By Yuan Tian, Qin Wang, Zhiwu Huang, Wen Li, Dengxin Dai, Minghao Yang, Jun Wang, Olga Fink (37 stars)

YouTube

Show All Videos