Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer (2301.12811v4)

Published 30 Jan 2023 in cs.LG

Abstract: Generative adversarial networks (GANs) learn a target probability distribution by optimizing a generator and a discriminator with minimax objectives. This paper addresses the question of whether such optimization actually provides the generator with gradients that make its distribution close to the target distribution. We derive metrizable conditions, sufficient conditions for the discriminator to serve as the distance between the distributions by connecting the GAN formulation with the concept of sliced optimal transport. Furthermore, by leveraging these theoretical results, we propose a novel GAN training scheme, called slicing adversarial network (SAN). With only simple modifications, a broad class of existing GANs can be converted to SANs. Experiments on synthetic and image datasets support our theoretical results and the SAN's effectiveness as compared to usual GANs. Furthermore, we also apply SAN to StyleGAN-XL, which leads to state-of-the-art FID score amongst GANs for class conditional generation on ImageNet 256$\times$256. Our implementation is available on https://ytakida.github.io/san.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (90)
  1. Wasserstein GAN. arXiv preprint arXiv:1701.07875, 2017.
  2. Generalization and equilibrium in generative adversarial nets (GANs). In Proc. International Conference on Machine Learning (ICML), volume 70, pp.  224–232, 2017.
  3. Invertible residual networks. In Proc. International Conference on Machine Learning (ICML), pp.  573–582, 2019.
  4. A closer look at the optimization landscapes of generative adversarial networks. In Proc. International Conference on Learning Representation (ICLR), 2020.
  5. Spherical sliced-wasserstein. In Proc. International Conference on Learning Representation (ICLR), 2023.
  6. Sliced and radon Wasserstein barycenters of measures. Journal of Mathematical Imaging and Vision, 51(1):22–45, 2015.
  7. Large scale gan training for high fidelity natural image synthesis. In Proc. International Conference on Learning Representation (ICLR), 2019.
  8. Lu-net: Invertible neural networks based on matrix factorization. arXiv preprint arXiv:2302.10524, 2023.
  9. Reducing noise in gan training with variance reduced extragradient. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  10. Taming gans with lookahead-minmax. In Proc. International Conference on Learning Representation (ICLR), 2022.
  11. Augmented sliced wasserstein distances. In Proc. International Conference on Learning Representation (ICLR), 2022.
  12. Smoothness and stability in GANs. In Proc. International Conference on Learning Representation (ICLR), 2020.
  13. Max-sliced Wasserstein distance and its use for GANs. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  10648–10656, 2019.
  14. Adversarial audio synthesis. In Proc. International Conference on Learning Representation (ICLR), 2019.
  15. Variational wasserstein gradient flow. In Proc. International Conference on Machine Learning (ICML), volume 162, pp.  6185–6215, 2022.
  16. Do GANs always have nash equilibria? In Proc. International Conference on Machine Learning (ICML), pp.  3029–3039, 2020.
  17. Minimax optimization with smooth algorithmic adversaries. In Proc. International Conference on Learning Representation (ICLR), 2022.
  18. Deep generative learning via variational gradient flow. In Proc. International Conference on Machine Learning (ICML), pp.  2093–2101, 2019.
  19. Generative adversarial nets. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  2672–2680, 2014.
  20. Improved training of Wasserstein GANs. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  5767–5777, 2017.
  21. GANcraft: Unsupervised 3d neural rendering of minecraft worlds. In Proc. IEEE/CVF International Conference on Computer Vision (ICCV), pp.  14072–14082, 2021.
  22. Sigurdur Helgason. The Radon transform on Rn. In Integral Geometry and Radon Transforms, pp.  1–62. Springer, 2011.
  23. beta-VAE: Learning basic visual concepts with a constrained variational framework. In Proc. International Conference on Learning Representation (ICLR), 2017.
  24. Denoising diffusion probabilistic models. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  6840–6851, 2020.
  25. Finding mixed nash equilibria of generative adversarial networks. In Proc. International Conference on Machine Learning (ICML), pp.  2810–2819, 2019.
  26. What is local optimality in nonconvex-nonconcave minimax optimization? In Proc. International Conference on Machine Learning (ICML), volume 119, pp.  4880–4889, 2020.
  27. A framework of composite functional gradient methods for generative adversarial models. IEEE transactions on pattern analysis and machine intelligence, 43(1):17–32, 2019.
  28. An introduction to variational methods for graphical models. Machine Learning, 37(2):183–233, 1999.
  29. Invertible convolutional flow. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  30. Progressive growing of gans for improved quality, stability, and variation. In Proc. International Conference on Learning Representation (ICLR), 2018.
  31. A style-based generator architecture for generative adversarial networks. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  4401–4410, 2019.
  32. Analyzing and improving the image quality of stylegan. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  8110–8119, 2020.
  33. Alias-free generative adversarial networks. In Proc. Advances in Neural Information Processing Systems (NeurIPS), 2021.
  34. Adam: A method for stochastic optimization. In Proc. International Conference on Learning Representation (ICLR), 2015.
  35. Auto-encoding variational Bayes. In Proc. International Conference on Learning Representation (ICLR), 2014.
  36. Generalized sliced Wasserstein distances. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  37. HiFi-gan: Generative adversarial networks for efficient and high fidelity speech synthesis. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp.  17022–17033, 2020.
  38. Learning multiple layers of features from tiny images. 2009.
  39. MelGAN: Generative adversarial networks for conditional waveform synthesis. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  40. The role of imagenet classes in fréchet inception distance. In Proc. International Conference on Learning Representation (ICLR), 2023.
  41. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, 1998.
  42. Run-sort-rerun: Escaping batch size limitations in sliced wasserstein generative models. In Proc. International Conference on Machine Learning (ICML), volume 139, pp.  6275–6285, 2021.
  43. On the limitations of first-order approximation in gan dynamics. In Proc. International Conference on Machine Learning (ICML), pp.  3005–3013, 2018.
  44. Geometric gan. arXiv preprint arXiv:1705.02894, 2017.
  45. Why spectral normalization stabilizes GANs: Analysis and improvements. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 34, pp.  9625–9638, 2021.
  46. Deep learning face attributes in the wild. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  3730–3738, 2015.
  47. The numerics of gans. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 30, 2017.
  48. Which training methods for GANs do actually converge? In Proc. International Conference on Machine Learning (ICML), pp.  3481–3490, 2018.
  49. Envelope theorems for arbitrary choice sets. Econometrica, 70(2):583–601, 2002.
  50. cgans with projection discriminator. In Proc. International Conference on Learning Representation (ICLR), 2018.
  51. Spectral normalization for generative adversarial networks. In Proc. International Conference on Learning Representation (ICLR), 2018.
  52. Alfred Müller. Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2):429–443, 1997.
  53. Gradient descent GAN optimization is locally stable. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 30, 2017.
  54. Frank Natterer. The mathematics of computerized tomography. SIAM, 2001.
  55. Revisiting sliced wasserstein on images: From vectorization to convolution. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 35, pp.  17788–17801, 2022.
  56. Distributional sliced-Wasserstein and applications to generative modeling. In Proc. International Conference on Learning Representation (ICLR), 2021.
  57. Improved denoising diffusion probabilistic models. In Proc. International Conference on Machine Learning (ICML), pp.  8162–8171. PMLR, 2021.
  58. Gradient layer: Enhancing the convergence of adversarial training for generative models. In Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), pp.  1008–1016, 2018.
  59. f-GAN: Training generative neural samplers using variational divergence minimization. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 29, 2016.
  60. Globally injective relu networks. The Journal of Machine Learning Research, 23(1):4544–4598, 2022.
  61. Training generative adversarial networks by solving ordinary differential equations. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 33, pp.  5599–5609, 2020.
  62. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proc. International Conference on Learning Representation (ICLR), 2016.
  63. Learning transferable visual models from natural language supervision. In Proc. International Conference on Machine Learning (ICML), pp.  8748–8763, 2021.
  64. Characterization and computation of local nash equilibria in continuous games. In Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp.  917–924. IEEE, 2013.
  65. Variational inference with normalizing flows. In Proc. International Conference on Machine Learning (ICML), pp.  1530–1538, 2015.
  66. Imagenet large scale visual recognition challenge. International journal of computer vision, 115:211–252, 2015.
  67. Improved techniques for training GANs. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 29, 2016.
  68. On the convergence and robustness of training gans with regularized optimal transport. In Proc. International Conference on Machine Learning (ICML), volume 31, 2018.
  69. Projected gans converge faster. In Proc. Advances in Neural Information Processing Systems (NeurIPS), pp.  17480–17492, 2021.
  70. Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH 2022 conference proceedings, pp.  1–10, 2022.
  71. Stylegan-t: Unlocking the power of gans for fast large-scale text-to-image synthesis. In Proc. International Conference on Machine Learning (ICML), 2023.
  72. Top-k training of GANs: Improving generators by making critics less critical. In Proc. Advances in Neural Information Processing Systems (NeurIPS), 2020.
  73. Bridging the gap between f-gans and wasserstein gans. In Proc. International Conference on Machine Learning (ICML), pp.  9078–9087, 2020.
  74. Denoising diffusion implicit models. In Proc. International Conference on Learning Representation (ICLR), 2020.
  75. Mintnet: Building invertible neural networks with masked convolutions. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
  76. VEEGAN: Reducing mode collapse in gans using implicit variational learning. In Proc. Advances in Neural Information Processing Systems (NeurIPS), volume 30, 2017.
  77. Rethinking the inception architecture for computer vision. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  2818–2826, 2016.
  78. A family of nonparametric density estimation algorithms. Communications on Pure and Applied Mathematics, 66(2):145–164, 2013.
  79. Density estimation by dual ascent of the log-likelihood. Communications in Mathematical Sciences, 8(1):217–233, 2010.
  80. SQ-VAE: Variational Bayes on discrete representation with self-annealed stochastic quantization. In Proc. International Conference on Machine Learning (ICML), 2022.
  81. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proc. International Conference on Machine Learning (ICML), pp.  6105–6114, 2019.
  82. Dávid Terjék. Adversarial lipschitz regularization. In Proc. International Conference on Learning Representation (ICLR), 2020.
  83. Training data-efficient image transformers & distillation through attention. In Proc. International Conference on Machine Learning (ICML), pp.  10347–10357, 2021.
  84. MoCoGAN: Decomposing motion and content for video generation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.  1526–1535, 2018.
  85. Cédric Villani. Optimal transport: old and new, volume 338. Springer, 2009.
  86. Understanding and stabilizing GANs’ training dynamics with control theory. In Proc. International Conference on Machine Learning (ICML), pp.  10566–10575, 2020.
  87. The unusual effectiveness of averaging in gan training. In Proc. International Conference on Learning Representation (ICLR), 2019.
  88. Self-attention generative adversarial networks. In Proc. International Conference on Machine Learning (ICML), pp.  7354–7363, 2019.
  89. Consistency regularization for generative adversarial networks. In Proc. International Conference on Learning Representation (ICLR), 2020.
  90. InfoVAE: Balancing learning and inference in variational autoencoders. In Proc. AAAI Conference on Artificial Intelligence (AAAI), pp.  5885–5892, 2019.
Citations (7)

Summary

  • The paper proposes a SAN framework integrating Functional Mean Divergence and sliced optimal transport to enforce metrizable conditions in GANs.
  • It establishes key properties like injectivity, separability, and direction optimality to improve discriminator reliability and guide effective generator updates.
  • It demonstrates, through experiments on CIFAR-10, CelebA, and ImageNet, improved FID scores and reduced mode collapse in image synthesis.

Overview of "SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer"

The paper introduces Slicing Adversarial Networks (SAN), a novel GAN training scheme aimed at addressing the theoretical question of whether GAN optimization truly minimizes the dissimilarity between the generator and target distributions. Building on the foundation of Generative Adversarial Networks (GANs), the authors introduce the concept of metrizable conditions, which are necessary for the discriminator to effectively measure the distance between the data and the generator's distribution. These conditions include direction optimality, separability, and injectivity.

Theoretical Contributions

The authors propose a connection between GAN optimization and sliced optimal transport, which helps in deriving metrizable conditions crucial for effective GAN training. By investigating the Functional Mean Divergence (FM{}^*), they relate it to the concept of sliced optimal transport. This exploration leads to the introduction of the Sanitized GAN training method, which converts existing GANs into SANs through slight modifications.

  1. Functional Mean Divergence (FM{}^*): The paper defines a generalized notion of divergence that allows for a broader classification encompassing integral probability metrics (IPM), including Wasserstein distances.
  2. Injectivity and Separability: These properties are crucial for ensuring that the discriminator can act as a valid metric between distributions. Injectivity prevents information loss, while separability ensures that the direction chosen by the discriminator maximizes the contrast between the data and the generator distributions.
  3. Direction Optimality: This addresses the problem of the discriminator learning effective gradients for the generator. Under certain conditions, the learned direction of the discriminator optimizes the dissimilarity between two distributions, which is crucial in adversarial training.

Empirical Evaluation

The authors conduct extensive experiments on synthetic data, CIFAR-10, CelebA, and ImageNet, demonstrating that SANs outperform traditional GAN frameworks in terms of Fréchet Inception Distance (FID) scores—a popular measure of image generation quality. Specifically:

  • SANs showed improved performance in avoiding mode collapse and better coverage of data distribution in a mixture of Gaussians.
  • In terms of image quality, applying the SAN methodology led to significant improvements over standard GAN approaches, even achieving state-of-the-art FID scores on CIFAR-10 and ImageNet datasets.

Practical Implications and Future Directions

From a practical perspective, SAN allows for more stable and efficient GAN training by ensuring that the discriminator reliably measures the generator's fidelity. This has applications in image synthesis, video, and audio generation, where GANs are widely used.

One of the strengths of the SAN framework is its compatibility with existing GAN architectures, allowing for straightforward implementation adjustments. As AI techniques evolve, it is likely that further insights and improvements will be made in the context of discriminator effectiveness, extending beyond generative modeling to other adversarial learning scenarios.

Future work could explore a more empirical examination of separability and injectivity conditions, as well as explore other domains where adversarial strategies could benefit from this analytical perspective. Moreover, investigating the SAN framework's adaptability to other cutting-edge neural architectures could also prove beneficial.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets