Alias-Free Generative Adversarial Networks (2106.12423v4)

Published 23 Jun 2021 in cs.CV, cs.AI, cs.LG, cs.NE, and stat.ML

Abstract: We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.

Citations (1,434)

View on Semantic Scholar

Summary

The paper introduces a novel architecture, StyleGAN3, that eliminates aliasing by reformulating the generator pipeline with advanced filtering and Fourier features.
The paper demonstrates enhanced translation and rotation equivariance, achieving EQ-T of 66.65 dB and EQ-R of 40.48 dB through refined up- and downsampling methods.
The paper maintains competitive FID scores while improving geometric consistency, benefiting applications like video and animation with coherent transformations.

Alias-Free Generative Adversarial Networks

The paper "Alias-Free Generative Adversarial Networks" by Tero Karras et al., introduces a revised GAN architecture designed to mitigate the aliasing issues that plague traditional generative models. This work primarily addresses the artifacts that arise from the dependence on absolute pixel coordinates within the synthesis process of standard GANs like StyleGAN2. The authors propose architectural modifications to develop a new model—StyleGAN3—that ensures alias-free generation and maintains high degrees of translation and rotation equivariance.

Technical Contributions

The primary technical contribution of this paper is the reformulation of the GAN generator's pipeline to eliminate aliasing effects. This is achieved through a meticulous signal processing framework that involves continuous signal interpretation and the employment of high-quality filtering techniques such as windowed sinc filters (Kaiser) for up- and downsampling operations. Key components of the proposed architecture are:

Fourier Features and Signal Boundaries: The authors replace the learned constant input with Fourier features to maintain an infinite spatial extent. Additionally, a fixed-size margin around the target canvas is introduced to avoid border padding artifacts.
Filtered Nonlinearities: Nonlinearities like ReLU, which can introduce high frequencies, are now performed in a magnified space followed by downsampling—a technique that requires upsampling the signal, applying the nonlinearity, and filtering it down to retain only the relevant frequency band.
Non-Critical Sampling: A non-critical sampling approach is employed, where the cutoff frequency of filters is set below half the sampling rate ( $s/2 - \epsilon$ ). This adjustment ensures that all aliasing frequencies fall within the stopband of the filters.
Radially Symmetric Filters: For achieving rotation equivariance, especially in StyleGAN3-R, radial symmetry is invoked via Jinc-based low-pass filters approximated using Kaiser window schemes. The model uses 1x1 convolutions to enforce rotation equivariance effectively.

Strong Numerical Results and Metrics

The paper provides a detailed empirical evaluation of the proposed modifications, demonstrating the significantly improved translation and rotation equivariance of StyleGAN3 compared to StyleGAN2. For instance:

Translation Equivariance (EQ-T): Metrics show a considerable improvement where StyleGAN3-T achieved up to 66.65 dB compared to the undefined values for StyleGAN2.
Rotation Equivariance (EQ-R): The proposed architecture StyleGAN3-R achieves 40.48 dB, showcasing substantial enhancements in handling rotations, which is unachievable by the baseline StyleGAN2.
FID Scores: Despite architectural modifications mainly aimed at improving equivariance, the FID scores of StyleGAN3-T (4.62) and StyleGAN3-R (4.50) remain competitive with those of StyleGAN2 (5.14), reflecting no degradation in image quality.

Implications and Future Directions

The implications of this work are multi-faceted, impacting both the theoretical development and practical application of GANs in tasks requiring the generation of highly coherent images across small transformations. The improvements in hierarchical structure modeling mean that applications in video and animation generation, where maintaining consistency across frames is crucial, will greatly benefit. Additionally, the introduction of radial filters and better handling of signal processing within neural networks could spur further research into scale and anisotropic scaling equivariances or even arbitrary deformations in future GAN architectures.

Given the practical success and increased computational demands (noted as minimal, with the StyleGAN3 models being only marginally heavier than StyleGAN2 counterparts), future work might explore further optimization techniques or more efficient implementations. Potential improvements could include reintroducing controlled noise inputs and developing advanced regularization methods that support the natural hierarchical synthesis.

Conclusion

The presented work marks a significant stride towards more robust and geometrically consistent generative models. This shift from coordination-dependent synthesis processes to an alias-free architecture ensures that GANs generate structures that transform cohesively across multiple scales and orientations. Insights from this work are poised to inspire future advancements in the domain of generative modeling, pushing the boundaries of what can be realistically synthesized by neural networks.