Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search

Published 17 Jul 2020 in cs.CV | (2007.09180v1)

Abstract: In this paper, we introduce a new reinforcement learning (RL) based neural architecture search (NAS) methodology for effective and efficient generative adversarial network (GAN) architecture search. The key idea is to formulate the GAN architecture search problem as a Markov decision process (MDP) for smoother architecture sampling, which enables a more effective RL-based search algorithm by targeting the potential global optimal architecture. To improve efficiency, we exploit an off-policy GAN architecture search algorithm that makes efficient use of the samples generated by previous policies. Evaluation on two standard benchmark datasets (i.e., CIFAR-10 and STL-10) demonstrates that the proposed method is able to discover highly competitive architectures for generally better image generation results with a considerably reduced computational burden: 7 GPU hours. Our code is available at https://github.com/Yuantian013/E2GAN.

Abstract PDF Upgrade to Chat

Citations (59)

View on Semantic Scholar

Summary

The paper presents an off-policy reinforcement learning framework that reformulates GAN architecture search as a Markov Decision Process.
The approach leverages Soft Actor-Critic for efficient exploration, reducing GAN search time from extensive GPU hours to just 7 GPU hours.
Empirical evaluations on CIFAR-10 and STL-10 demonstrate competitive Inception Score and FID, underscoring the method’s practical effectiveness.

Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search

The paper "Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search" introduces an innovative approach to neural architecture search (NAS) for Generative Adversarial Networks (GANs) through the use of off-policy reinforcement learning (RL). The novel methodology addresses inefficiencies and optimization challenges associated with conventional GAN architecture search processes, proposing a reformulation of the problem as a Markov Decision Process (MDP) that facilitates smoother architecture sampling and more effective RL-based search algorithms.

The authors start by observing the significant resource demands and expertise required to manually design high-performance GAN architectures, drawing attention to state-of-the-art GAN models that necessitate complex network designs. Recognizing the potential to mitigate these challenges through automation, the authors turn to NAS, which has been effective in discriminative models and is beginning to find applications in GANs. Prior efforts in RL-based GAN architecture search, such as AGAN and AutoGAN, have encountered limitations including high variance, noise in gradient updates, and inefficiencies tied to on-policy learning approaches.

In response, the authors propose a comprehensive reformulation leveraging off-policy methods, which have shown promise in enhancing sample efficiency across various RL tasks. The core of their approach is the expression of the GAN architecture search as an MDP, effectively decomposing the architecture search into a sequence of decisions. Each decision constitutes a cell design step contributing incrementally to the overall architecture, thus facilitating the use of past experience in policy updates—a key advantage of off-policy learning.

The implementation utilizes Soft Actor-Critic (SAC), an off-policy RL algorithm known for its sample efficiency. The paper emphasizes several critical design choices, such as the progressive state representation inspired by human-designed Progressive GANs, which helps alleviate variance and supports stability in training. In addition, the reward function combines Inception Score (IS) and Fréchet Inception Distance (FID) to provide a robust performance metric across varying architecture trajectories. Action space is defined to encompass operations relevant to generator cell design, maintaining similarity with previous benchmarks to ensure comparability.

Empirical results underscore the efficiency and effectiveness of the proposed E $^2$ GAN framework. The method discovers high-performing architectures on standard datasets, CIFAR-10 and STL-10, in just 7 GPU hours—remarkably efficient compared to other RL-based approaches that could require up to 1200 GPU days. The competitive performance of the discovered architectures, reflected in both IS and FID metrics, demonstrates the technique’s ability to approach or exceed past GAN models crafted through manual effort or other NAS methods.

The implications of these findings are both practical and theoretical. Practically, the study provides a significant reduction in search time and computational resources for GAN architecture design, broadening accessibility to powerful GAN applications. Theoretically, it contributes a formalized approach to architecture search as an MDP, opening avenues for further exploration into multi-agent scenarios that may simultaneously optimize generator and discriminator networks. This paper presents an important step forward in automated GAN model design, with future work expected to expand on solving its intricacies and enhancing scalability and generalization.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (8)

Collections

GitHub

GitHub - Yuantian013/E2GAN: [ECCV 2020]"Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search" By Yuan Tian, Qin Wang, Zhiwu Huang, Wen Li, Dengxin Dai, Minghao Yang, Jun Wang, Olga Fink (37 stars)

YouTube

Show All Videos

Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search

Summary

Off-Policy Reinforcement Learning for Efficient and Effective GAN Architecture Search

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (8)

Collections

GitHub

YouTube