Generalizing Dataset Distillation via Deep Generative Prior (2305.01649v2)

Published 2 May 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Dataset Distillation aims to distill an entire dataset's knowledge into a few synthetic images. The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data. Despite recent progress in the field, existing dataset distillation methods fail to generalize to new architectures and scale to high-resolution datasets. To overcome the above issues, we propose to use the learned prior from pre-trained deep generative models to synthesize the distilled data. To achieve this, we present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space. Our method augments existing techniques, significantly improving cross-architecture generalization in all settings.

Authors (5)

George Cazenavette (11 papers)
Tongzhou Wang (22 papers)
Antonio Torralba (178 papers)
Jun-Yan Zhu (80 papers)
Alexei A. Efros (100 papers)

Citations (63)

View on Semantic Scholar

Summary

Generalizing Dataset Distillation via Deep Generative Prior

Overview

The paper "Generalizing Dataset Distillation via Deep Generative Prior" addresses significant limitations within existing methods of dataset distillation. Dataset distillation is a technique aimed at condensing the knowledge of a large dataset into a smaller set of synthetic images, which can approximate the learning results of models trained on the full dataset. Despite the appeal of this approach, previous distillation methods face challenges in generalizing across different architectures and scaling effectively to high-resolution datasets. This work proposes leveraging deep generative models to overcome these challenges, introducing Generative Latent Distillation (GLaD).

Methodology

GLaD involves the use of deep generative priors, specifically employing pretrained generative models, such as GANs, to synthesize distilled datasets. Instead of directly optimizing synthetic pixel values, the proposed method optimizes latent feature vectors within these generative models. By distilling into the latent space of the generator model, GLaD imposes coherence and regularization that is beneficial for cross-architectural generalization.

The method integrates with existing distillation techniques such as Gradient Matching (DC), Distribution Matching (DM), and Trajectory Matching (MTT). In these applications, latent vectors replace raw pixels as the optimization target, with dataset synthesis occurring in the intermediate feature space of the generative model. This approach alleviates the tendency of previous methods to overfit particular architectures and allows effective distillation at high resolutions.

Numerical Results

The paper presents extensive empirical results demonstrating the improved generalization and scalability of GLaD across multiple datasets and algorithms. It highlights significant performance improvements in cross-architecture evaluations, where synthetic datasets distilled using generative priors offer better generalization across diverse architectures such as AlexNet, VGG11, ResNet18, and Vision Transformers.

Experiments on CIFAR-10 and subsets of ImageNet at resolutions up to 512x512 show that GLaD leads to a notable enhancement in distilled image quality, avoiding the high-frequency noise issues often seen in pixel-space distillation. For example, on selected ImageNet subsets, the approach improved accuracy from 26.8% to 28.0% on unseen architectures, demonstrating better generalization.

Implications and Future Directions

The success of GLaD suggests the strong potential of generative priors in dataset distillation, paving the way for future research on incorporating various generative models. The results imply practical applications in scenarios requiring cross-architecture adaptability and high-resolution synthetic data generation, such as neural architecture search, federated learning, and privacy-preserving model training.

Future research might explore the integration of more advanced generative models with customized architectures or latent space configurations further increasing the effectiveness of dataset distillation. Additionally, the implications of aesthetic properties in distilled images open up interesting avenues in artistic and design applications.

Conclusion

The introduction of deep generative priors through GLaD marks a crucial development in dataset distillation methodology, addressing key limitations of generalization and scalability. By utilizing deep generative models, this approach advances the ability to distill datasets into flexible, coherent synthetic images fit for a wide range of architectures, unlocking new efficiencies and possibilities in machine learning workflows.

PDF Markdown

Related Papers

YouTube

Show All Videos