MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis (2104.04767v2)

Published 10 Apr 2021 in cs.CV and eess.IV

Abstract: In recent years, the use of Generative Adversarial Networks (GANs) has become very popular in generative image modeling. While style-based GAN architectures yield state-of-the-art results in high-fidelity image synthesis, computationally, they are highly complex. In our work, we focus on the performance optimization of style-based generative models. We analyze the most computationally hard parts of StyleGAN2, and propose changes in the generator network to make it possible to deploy style-based generative networks in the edge devices. We introduce MobileStyleGAN architecture, which has x3.5 fewer parameters and is x9.5 less computationally complex than StyleGAN2, while providing comparable quality.

PDF Abstract

MobileStyleGAN: Redefining Efficiency in High-Fidelity Image Synthesis

In the field of generative image modeling, Generative Adversarial Networks (GANs) have demonstrated exceptional capabilities, particularly in high-fidelity image synthesis. StyleGAN2, in particular, has set a high benchmark for the quality of generated images. Despite its success, the computational burden of such models limits their deployment on edge devices, which do not possess the computational resources required by these complex architectures. To address this challenge, the proposed paper introduces MobileStyleGAN, a lightweight convolutional neural network designed to provide comparable image synthesis quality with significantly reduced computational demands.

The primary innovation of MobileStyleGAN lies in its architectural redesign aimed at achieving computational efficiency without compromising image quality. When compared to StyleGAN2, MobileStyleGAN exhibits an impressive reduction in parameters by a factor of 3.5 and a decrease in computational complexity by 9.5 times, while maintaining a comparable Frechet Inception Distance (FID) score. These results indicate a successful balance between model size and performance, making high-resolution image synthesis feasible on edge devices.

Architectural Modifications

MobileStyleGAN retains the foundational framework of StyleGAN2 but incorporates several strategic modifications to optimize its structure:

Wavelet-Based Representation: Unlike StyleGAN2, which operates on a pixel-based level, MobileStyleGAN uses a frequency-based approach, employing discrete wavelet transforms (DWT) for image representation. This shift enhances structural comprehension by the model and aids in generating high-resolution images from lower-resolution feature maps efficiently.
Depthwise Separable Modulated Convolution: Inspired by the MobileNet architecture, the authors propose depthwise separable convolution as a replacement for the traditional dense variations in StyleGAN2. This enables significant savings in computational resources by decomposing a standard convolution into a sequence of depthwise and pointwise operations.
Revised Upsampling Strategy: The upscale operation is pivotal in generating high-resolution outputs. MobileStyleGAN replaces transposed convolutions with an inverse discrete wavelet transform (IDWT) based mechanism, aligning with its frequency-based processing methodology, thereby enhancing model efficiency.
Demodulation Mechanism Optimization: By configuring demodulation to be a trainable parameter rather than a function of input styles, the authors facilitate computational optimizations like operation fusion at inference time, further boosting model speed.

Training Approach

MobileStyleGAN utilizes a knowledge distillation-based training strategy to transfer information from the original StyleGAN2. A particular highlight is the multi-scale training framework, which accommodates multiple intermediate resolutions in training, stabilizing the image synthesis process even in a reduced-size setup. The loss function incorporates pixel-level, perceptual, and GAN-based components to reconcile both fidelity and perceptual quality in the output images.

Results and Implications

The experimental results indicate that MobileStyleGAN achieves an FID score of 7.75 on the FFHQ dataset, a competitive quality metric relative to the computational savings offered. The inference time on CPU devices shows a marked improvement over StyleGAN2, demonstrating the practical viability of deploying MobileStyleGAN in environments with limited computational power, such as handheld and IoT devices.

In terms of broader implications, the development of MobileStyleGAN represents a crucial step towards democratizing the deployment of high-fidelity generative models beyond high-end computing infrastructures. This paper provides a template for achieving efficiency without compromising on the high standards set by state-of-the-art GAN architectures.

Future Prospects

Further optimizations in the form of model quantization and pruning could extend the practical benefits of MobileStyleGAN, even further reducing computational requirements and potentially enhancing deployment flexibility across a wider array of devices. As research progresses, such architectures could significantly impact real-time applications in augmented reality, on-device personalization, and even in-performance art generation. Future developments in model compression techniques will likely enhance the compactness and efficiency of models similar to MobileStyleGAN, opening new avenues for innovative AI deployments.

PDF Markdown Bookmark Chat (Pro)

Authors (1)

Sergei Belousov (3 papers)

Citations (20)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos