Mimicry: Towards the Reproducibility of GAN Research (2005.02494v1)

Published 5 May 2020 in cs.CV and eess.IV

Abstract: Advancing the state of Generative Adversarial Networks (GANs) research requires one to make careful and accurate comparisons with existing works. Yet, this is often difficult to achieve in practice when models are often implemented differently using varying frameworks, and evaluated using different procedures even when the same metric is used. To mitigate these issues, we introduce Mimicry, a lightweight PyTorch library that provides implementations of popular state-of-the-art GANs and evaluation metrics to closely reproduce reported scores in the literature. We provide comprehensive baseline performances of different GANs on seven widely-used datasets by training these GANs under the same conditions, and evaluating them across three popular GAN metrics using the same procedures. The library can be found at https://github.com/kwotsin/mimicry.

Authors (2)

Kwot Sin Lee (6 papers)
Christopher Town (1 paper)

Citations (27)

View on Semantic Scholar

Summary

Mimicry: Towards the Reproducibility of GAN Research

The paper "Mimicry: Towards the Reproducibility of GAN Research," authored by Kwot Sin Lee and Christopher Town, addresses a significant challenge in modern machine learning research—ensuring accurate and comparable evaluations of Generative Adversarial Networks (GANs). With the remarkable advancement in generative modeling through GANs, the necessity for reproducible research has gained paramount importance. However, the heterogeneous nature of implementations and evaluation procedures across studies has posed a considerable barrier to precise inter-model comparisons.

Core Contributions

Mimicry Library: This work introduces Mimicry, a lightweight PyTorch library serving as a comprehensive toolset for the research community. Mimicry standardizes the implementation and evaluation of GAN variants, ensuring that researchers can derive comparable results without the additional overhead of managing disparate codebases or evaluation methods.

Standardized Implementations: The paper details the inclusion of several GAN models, such as DCGAN, WGAN-GP, SNGAN, cGAN-PD, SSGAN, and InfoMax-GAN. Each model is implemented uniformly to facilitate direct comparisons with reported scores, thereby enhancing the reproducibility of results.

Unified Evaluation Metrics: The library consolidates commonly used GAN evaluation metrics—Inception Score (IS), Fréchet Inception Distance (FID), and Kernel Inception Distance (KID)—to provide a consistent evaluation framework. This enables researchers to assess model performance using unbiased and standardized procedures.

Extensive Baseline Experiments: The authors conduct comprehensive experiments across seven datasets and the aforementioned GAN models. They highlight the relative performances of these models under controlled conditions, emphasizing the importance of using consistent datasets, network architectures, and metrics for proper evaluation.

Key Findings

Reproducibility of Scores: The results demonstrate that implementations provided in Mimicry can successfully replicate the reported scores across various studies. The paper outlines specific configurations and methods employed, which are critical to achieving reproducibility.
Comparison with Reported Scores: The paper identifies discrepancies between their replicated scores and those reported in prior literature, particularly highlighting that, in some instances, their implementations outperform the original reported results due to refined training practices or adjustments.
Dataset and Metric Diversity: By leveraging multiple datasets, Mimicry establishes comprehensive baseline results, which are crucial for benchmarking and further research developments. The testing on diverse datasets including CIFAR-10, CelebA, and ImageNet affirms the generalizability of the GAN implementations.

Practical and Theoretical Implications

Practical Implications: By simplifying the process of GAN research, Mimicry aids researchers in focusing more on the innovation of model architectures and training techniques rather than entangling with boilerplate code. This could accelerate the pace of advancements in generative modeling by providing a reliable baseline for experimentation.

Theoretical Implications: The consistent evaluation framework set forth by Mimicry aids in understanding the fundamental aspects of GAN performance, particularly how different architectures and training techniques impact the generation quality and diversity.

Future Speculations

Moving forward, the integration of additional GAN architectures and the adoption of more complex evaluation metrics could enrich Mimicry's library. Incorporating tasks beyond image generation, such as 3D synthesis, could expand its applicability. Moreover, as GAN research progresses, the inclusion of newer datasets and metrics reflecting the latest advancements will be crucial. This paper sets a strong foundation for reproducibility in GAN research and suggests a promising trajectory for addressing similar challenges in other domains of AI.

By leveraging Mimicry, the paper not only advances reproducibility in GAN research but also provides a roadmap for improving transparency and comparability across machine learning research communities.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - kwotsin/mimicry: [CVPR 2020 Workshop] A PyTorch GAN library that reproduces research results for popular GANs. (603 stars)