Generalizing Dataset Distillation via Deep Generative Prior
Overview
The paper "Generalizing Dataset Distillation via Deep Generative Prior" addresses significant limitations within existing methods of dataset distillation. Dataset distillation is a technique aimed at condensing the knowledge of a large dataset into a smaller set of synthetic images, which can approximate the learning results of models trained on the full dataset. Despite the appeal of this approach, previous distillation methods face challenges in generalizing across different architectures and scaling effectively to high-resolution datasets. This work proposes leveraging deep generative models to overcome these challenges, introducing Generative Latent Distillation (GLaD).
Methodology
GLaD involves the use of deep generative priors, specifically employing pretrained generative models, such as GANs, to synthesize distilled datasets. Instead of directly optimizing synthetic pixel values, the proposed method optimizes latent feature vectors within these generative models. By distilling into the latent space of the generator model, GLaD imposes coherence and regularization that is beneficial for cross-architectural generalization.
The method integrates with existing distillation techniques such as Gradient Matching (DC), Distribution Matching (DM), and Trajectory Matching (MTT). In these applications, latent vectors replace raw pixels as the optimization target, with dataset synthesis occurring in the intermediate feature space of the generative model. This approach alleviates the tendency of previous methods to overfit particular architectures and allows effective distillation at high resolutions.
Numerical Results
The paper presents extensive empirical results demonstrating the improved generalization and scalability of GLaD across multiple datasets and algorithms. It highlights significant performance improvements in cross-architecture evaluations, where synthetic datasets distilled using generative priors offer better generalization across diverse architectures such as AlexNet, VGG11, ResNet18, and Vision Transformers.
Experiments on CIFAR-10 and subsets of ImageNet at resolutions up to 512x512 show that GLaD leads to a notable enhancement in distilled image quality, avoiding the high-frequency noise issues often seen in pixel-space distillation. For example, on selected ImageNet subsets, the approach improved accuracy from 26.8% to 28.0% on unseen architectures, demonstrating better generalization.
Implications and Future Directions
The success of GLaD suggests the strong potential of generative priors in dataset distillation, paving the way for future research on incorporating various generative models. The results imply practical applications in scenarios requiring cross-architecture adaptability and high-resolution synthetic data generation, such as neural architecture search, federated learning, and privacy-preserving model training.
Future research might explore the integration of more advanced generative models with customized architectures or latent space configurations further increasing the effectiveness of dataset distillation. Additionally, the implications of aesthetic properties in distilled images open up interesting avenues in artistic and design applications.
Conclusion
The introduction of deep generative priors through GLaD marks a crucial development in dataset distillation methodology, addressing key limitations of generalization and scalability. By utilizing deep generative models, this approach advances the ability to distill datasets into flexible, coherent synthetic images fit for a wide range of architectures, unlocking new efficiencies and possibilities in machine learning workflows.