MemoryGAN: Memory-Augmented GAN
- MemoryGAN is a generative adversarial architecture augmented with explicit memory units that address catastrophic forgetting and improve mode diversity.
- It integrates memory modules within both the generator and discriminator to partition latent space into discrete, learnable slots that guide conditional sample synthesis.
- Empirical evaluations demonstrate that MemoryGAN achieves higher Inception Scores and better stability compared to traditional GANs, reducing mode collapse effectively.
MemoryGAN refers to a family of generative adversarial networks (GANs) that are augmented with explicit memory mechanisms, designed to overcome limitations such as catastrophic forgetting and structural discontinuity in unsupervised generative modeling. These architectures integrate memory units either as episodic buffers or as life-long slot networks, influencing generator conditioning and discriminator decision boundaries to enhance representation learning, sample diversity, and long-term stability.
1. Architecture and Memory Integration
MemoryGAN is characterized by the introduction of a learnable memory component accessible by both the generator and discriminator:
- Generator (Memory-Conditional Generative Network, MCGN): Accepts a continuous latent vector and a discrete memory index , retrieving a key to synthesize samples as .
- Discriminator (Discriminative Memory Network, DMN): Encodes samples to query , computing a posterior over slot keys using von Mises-Fisher (vMF) similarity. Real/fake discrimination is via , where denotes real/fake status of slot .
- Memory Module: Stores key vectors , slot values , usage histogram , and age vector . Updates occur via least-recently-used (LRU) slot replacement or incremental EM steps that shift slot centroids and update histograms.
This explicit slot-based memory partitions the latent space, allocating each slot to an implicit cluster in data (class or semantic mode), thereby enabling both generators and discriminators to address and preserve diverse modes without traversing structurally invalid regions in latent space (Kim et al., 2018).
2. Mathematical Formulation and Loss Functions
MemoryGAN leverages a joint model:
where is the latent prior and reflects memory slot usage.
- Generator mapping:
- Discriminator evaluation:
The adversarial objective is augmented by a mutual information regularizer:
where enforces high cosine similarity between sampled keys and generated sample embeddings. This regularizer enforces cluster–sample alignment, critically lowering mode collapse and improving the fidelity and diversity of outputs (Kim et al., 2018).
3. Memory Network Update Policies and Operational Algorithms
Updates to the slot-based memory network follow principles that blend reinforcement of active clusters with continual refreshment:
- Addressing: For each query, a vMF posterior selects top- slots with highest values.
- LRU-Rotation: If no slot matches the label, select the oldest slot (), overwrite , , reset and .
- Incremental EM: Otherwise, update selected slots via EM steps on soft assignments , adjusting and toward modal centroids over observed data.
Training alternates memory updates, discriminator learning on real/generated samples, and generator updating with latent z/c sampled from recent slot statistics (Kim et al., 2018).
4. Impact on GAN Training Dynamics: Structural Discontinuity and Forgetting
The slot-based memory mitigates GAN limitations by:
- Structural discontinuity: Discrete latent indices partition latent space into cluster regions, avoiding unnatural interpolation and invalid transitions characteristic of unimodal latent distributions. Each key acts as an anchor for a data mode—generator traversal along within one slot yields intra-class variations, while inter-slot transitions produce inter-class changes (Kim et al., 2018).
- Discriminator forgetting: Persistent slot centroids and nonzero priors ensure continued representation of rare or previously generated samples, stabilizing adversarial learning and reducing catastrophic forgetting of generator modes. This persistence is not achievable with purely continuous or memoryless GAN architectures.
5. Empirical Evaluation and Results
MemoryGAN demonstrates quantitative and qualitative superiority in unsupervised image generation:
- Unsupervised Inception Scores (CIFAR-10):
- MemoryGAN:
- Comparators: e.g., WGAN-GP: , Fisher GAN:
- Qualitative results: On datasets like Fashion-MNIST and affine-MNIST, discrete slots each encapsulate distinct object categories or transformation clusters. Interpolations along within a slot yield smooth stylistic changes; transitions between slots effect semantic shifts.
- Ablation studies: Removing memory network (“–Memory”) leads to a sharp drop in inception score from $8.04$ to $5.35$. Eliminating memory-based sampling or using simple moving average for slot centroids also degrades performance.
- Failure cases: Occur when slots mix visually similar but semantically diverse samples, suggesting future research in adaptive slot granularity (Kim et al., 2018).
6. Extensions, Trade-offs, and Limitations
- Scalability: MemoryGAN’s slot memory scales to thousands of clusters ( for MNIST, for CIFAR-10), but memory management overhead and EM computation may pose challenges for extremely large or fine-grained datasets.
- Mode granularity: Slot collapse or mixing can arise when clusters in data are poorly separated or visually ambiguous. Modulating key dimension , slot count , or updating rules could improve allocation fidelity.
- Integrability: MemoryGAN can be integrated with other GAN models without optimization tricks or weaker divergences and remains unsupervised, highlighting its flexibility for broad application domains.
7. Context and Related Continuous Memory GANs
MemoryGAN is conceptually distinct from GANs using simple episodic buffers (e.g., in continual learning contexts such as CloGAN (Rios et al., 2018)). While CloGAN maintains a small buffer of real samples to regularize continual class learning, MemoryGAN applies a life-long unsupervised memory slot architecture tuned for structural latent representation and adversarial stability. Both approaches address forgetting and diversity, but employ orthogonal mechanisms—episodic replay versus slot-based mixture models—to condition and regularize generation.
In summary, MemoryGAN’s discrete–continuous latent decomposition and persistent slot memory fundamentally alleviate mode collapse, improve interpretability, and stabilize adversarial training in unsupervised settings. Its architecture extends the generative modeling frontier by leveraging memory as a high-dimensional, learnable scaffold for both representation and synthesis (Kim et al., 2018).