LOGAN: Latent Optimization for GAN Stability
- The paper introduces LOGAN, which applies a latent optimization step using both gradient descent and natural gradient techniques to stabilize GAN training.
- Methodologically, LOGAN incorporates an intermediate latent update before the generator's forward pass, adding second-order terms that dampen adversarial cycling.
- Empirical results on ImageNet demonstrate significant improvements in FID and IS metrics, validating the practical benefits of LOGAN in high-capacity GANs.
LOGAN (Latent Optimisation for Generative Adversarial Networks) is a technique designed to improve the stability and performance of generative adversarial network (GAN) training by introducing a principled gradient-based optimization in the latent space prior to each forward pass through the generator. LOGAN leverages both plain and natural-gradient methods for this latent space adjustment, yielding substantial improvements in convergence behavior, adversarial dynamics, and sample quality, particularly on large-scale datasets such as ImageNet at resolution (Wu et al., 2019).
1. Mathematical Foundations of Latent Optimisation
LOGAN modifies the canonical GAN training loop by incorporating an intermediate "latent optimisation" step before generating samples. Let and denote the discriminator and generator parameters, a vector in latent space, a generated sample, and the discriminator score.
- Standard GAN Gradients: For a Wasserstein-type GAN loss, parameters are updated via , .
- Latent Optimisation by Gradient Descent (GD): A single latent step is introduced:
Afterward, GAN losses are computed from , and the resulting gradients retain second-order terms due to the dependence .
- Natural Gradient Descent (NGD) in Latent Space: Addressing curvature mismatches in latent space, LOGAN-NGD uses
A regularization term is added to both losses.
This modification introduces additional cross-derivative terms, fundamentally changing the adversarial dynamics during training.
2. Integration into the GAN Training Loop
LOGAN inserts latent optimisation into the standard GAN mini-batch training cycle as follows:
- For each batch sample, sample from .
- Forward propagate: , .
- Compute gradient in latent space .
- Update via GD or NGD.
- Forward propagate , obtain .
- Define per-sample generator and discriminator losses: , .
- Batch-aggregate losses; update parameters via Adam.
Empirically, one latent optimisation step per iteration suffices for improved dynamics; additional steps can destabilize the training process.
3. Theoretical Rationale for Improved Stability
Several theoretical principles underlie LOGAN’s stabilization effects:
- Symplectic Gradient Adjustment (SGA) Connection: In adversarial games, the joint gradient field has antisymmetric (rotational) components that induce cycling. LOGAN’s backpropagation through yields second-order terms that couple discriminator and generator updates in a manner analogous to SGA, dampening rotation and reducing cycling.
- Analogy to Unrolling: LOGAN’s single-step latent update is comparable to unrolling optimization in generator parameter space but operates on the substantially lower-dimensional , thus being computationally inexpensive.
- Two-Time-Scale Update Rule (TTUR) Perspective: By pre-optimizing in the discriminator’s favor, LOGAN effectively accelerates the discriminator’s impact and/or retards the generator’s updates, increasing the effective time-scale separation between players, which is conducive to stable convergence.
These mechanisms act jointly to suppress divergence, oscillation, and other pathologies typical in GAN training.
4. Empirical Results and Experimental Protocol
LOGAN achieves state-of-the-art results on ImageNet () with established architectures:
- Architecture: BigGAN-deep with class conditioning, spectral-normalized ResNets, self-attention, latent dimension expanded , latent prior switched to , LeakyReLU (slope $0.2$) in generator final layers.
- Optimizers and Hyperparameters:
- Adam: , , learning rate
- Batch size: $2048$
- Latent optimiser: , ,
- Fraction of latent dimensions optimised per iteration: (chosen randomly)
- Training duration: Up to $600$k steps (LOGAN delays collapse compared to vanilla BigGAN-deep)
- No latent optimisation used at evaluation time
A direct comparison on the ImageNet benchmark yielded significant improvements:
| Model | FID (↓) | IS (↑) |
|---|---|---|
| BigGAN-deep (orig.) | 5.7 ± 0.3 | 124.5 ± 2.0 |
| Baseline (ours) | 4.92 ± 0.05 | 126.6 ± 1.3 |
| LOGAN (GD) | 4.86 ± 0.09 | 127.7 ± 3.5 |
| LOGAN (NGD) | 3.36 ± 0.14 | 148.2 ± 3.1 |
This reflects a 32% improvement in FID and a 17% boost in IS compared to their re-implemented BigGAN-deep baseline. LOGAN also demonstrates superior FID/IS trade-off when varying truncation parameters (Wu et al., 2019).
5. Implementation Considerations and Hyperparameter Regimes
LOGAN introduces some computational and practical implications:
- Overhead: Training time per iteration increases by approximately due to the extra forward/backward pass for . There is no additional cost at evaluation, as latent optimisation is not used at inference.
- Recommended Hyperparameter Ranges (from grid search):
- : $0.7$–$1.0$ (best: $0.9$)
- : $0.1$–$10$ (best: 5 for BigGAN)
- : $0.1$–$500$ (best: $300$ for deep models)
- Fraction : – (best: $50$–)
- Failure Modes:
- Excessively large or too small causes latent overshoot and destabilization.
- LOGAN substantially delays (but does not eliminate) mode collapse; divergence may occur with very prolonged training due to higher-order adversarial effects.
- Very small batch sizes can compromise the intended two-time-scale beneficial effects.
A plausible implication is that the choice of hyperparameters such as , , and should scale with model capacity and dataset size to balance stability and expressiveness.
6. Relationship to Broader Adversarial Optimisation Advances
LOGAN situates itself conceptually at the intersection of game-theoretic GAN optimization and efficient second-order stabilisation methods:
- By exploiting low-dimensional latent optimisations and closed-form natural gradients, LOGAN approximates theoretical stabilisation terms found in SGA and unrolled GANs, both of which are computationally intensive at large scale.
- The approach generalises to other GAN architectures requiring sophisticated dynamic management, suggesting broader applicability in stabilising adversarial learning scenarios.
- The methodology aligns with TTUR principles for two-player minimax optimization, offering an actionable strategy to enforce discriminator dominance via the latent channel.
LOGAN thus provides a tractable means to enhance the stability, efficiency, and quality of high-capacity GANs in large-scale settings, as supported by empirical evidence on ImageNet benchmarks (Wu et al., 2019).