GAN Compression: Efficient Architectures for Interactive Conditional GANs (2003.08936v4)

Published 19 Mar 2020 in cs.CV and eess.IV

Abstract: Conditional Generative Adversarial Networks (cGANs) have enabled controllable image synthesis for many vision and graphics applications. However, recent cGANs are 1-2 orders of magnitude more compute-intensive than modern recognition CNNs. For example, GauGAN consumes 281G MACs per image, compared to 0.44G MACs for MobileNet-v3, making it difficult for interactive deployment. In this work, we propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs. Directly applying existing compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures. We address these challenges in two ways. First, to stabilize GAN training, we transfer knowledge of multiple intermediate representations of the original model to its compressed model and unify unpaired and paired learning. Second, instead of reusing existing CNN designs, our method finds efficient architectures via neural architecture search. To accelerate the search process, we decouple the model training and search via weight sharing. Experiments demonstrate the effectiveness of our method across different supervision settings, network architectures, and learning methods. Without losing image quality, we reduce the computation of CycleGAN by 21x, Pix2pix by 12x, MUNIT by 29x, and GauGAN by 9x, paving the way for interactive image synthesis.

Authors (6)

Muyang Li (23 papers)
Ji Lin (47 papers)
Yaoyao Ding (7 papers)
Zhijian Liu (41 papers)
Jun-Yan Zhu (80 papers)
Song Han (155 papers)

Citations (1)

View on Semantic Scholar

Summary

An Analysis of "GAN Compression: Efficient Architectures for Interactive Conditional GANs"

The proliferation of Conditional Generative Adversarial Networks (cGANs) has unlocked unprecedented capabilities in controllable image synthesis, serving applications in vision and graphics. However, the computational intensity of state-of-the-art cGANs, such as GauGAN and CycleGAN, presents a formidable barrier to their interactive deployment on resource-constrained devices. The paper "GAN Compression: Efficient Architectures for Interactive Conditional GANs" addresses this concern by introducing a robust compression framework that significantly reduces the inference time and model size of cGAN generators.

Research Contribution

This paper identifies two core challenges of compressing cGANs: the instability in GAN training and architectural discrepancies with conventional CNNs that preclude the direct application of existing compression techniques. To counteract these difficulties, the authors propose a two-pronged strategy:

Stabilized GAN Training: The compression framework leverages a novel knowledge distillation process where multiple intermediate representations from the original "teacher" model are aligned with those in the compressed "student" model. This mechanism unifies the learning processes in both paired and unpaired settings, fostering enhanced training stability.
Efficient Architecture Search: Eschewing traditional CNN designs, the paper capitalizes on neural architecture search (NAS) to discover efficient generator architectures. A noteworthy innovation in the process is the decoupling of model training from the architecture search through a weight-sharing technique, denominated a "once-for-all" network, which significantly accelerates the search operation.

Experimental Outcomes

The proposed GAN Compression method demonstrates substantial computational savings without sacrificing image quality. The paper reports an impressive reduction in computation for CycleGAN by 21 $\times$ , Pix2pix by 12 $\times$ , MUNIT by 29 $\times$ , and GauGAN by 9 $\times$ . The efficiency stems from replacing conventional convolutions with depthwise-separable convolutions and selectively pruning channels using NAS.

The compression results are augmented by pragmatic tests on hardware, showing notable real-time performance improvements on devices like NVIDIA Jetson AGX Xavier, Jetson Nano, and Intel Xeon CPU. This indicates practical applications on edge devices are feasible, marking a significant step toward their wider deployment in user-interactive scenarios.

Theoretical and Practical Implications

Theoretically, the work underscores the potential of architecture search and intermediate representation distillation in the GAN domain, suggesting that these techniques could be extended to other generative modeling tasks. The paper reflects an ongoing paradigm shift in machine learning where model efficiency is becoming increasingly critical alongside accuracy, particularly for resource-sensitive applications.

Practically, this work enables the deployment of advanced cGAN models on consumer-grade hardware, opening new avenues for interactive applications in image generation, VR art tools, and real-time video processing akin to style transfer and object manipulation.

Future Directions

This research may inspire further investigation into more granular and automated approaches for architecture optimization in generative models. Given the practical constraints of edge computing, future advancements could explore hybrid models that strategically offload certain computations to cloud resources while maintaining hardware-efficient deployment for on-device interactions.

In conclusion, the paper "GAN Compression: Efficient Architectures for Interactive Conditional GANs" effectively addresses the computational inefficiency of contemporary cGANs through innovative training stability mechanisms and NAS-driven architecture refinement, delivering a comprehensive solution with broad-reaching implications for the future of interactive AI applications.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos