On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm (2312.03526v2)

Published 6 Dec 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Contemporary machine learning requires training large neural networks on massive datasets and thus faces the challenges of high computational demands. Dataset distillation, as a recent emerging strategy, aims to compress real-world datasets for efficient training. However, this line of research currently struggle with large-scale and high-resolution datasets, hindering its practicality and feasibility. To this end, we re-examine the existing dataset distillation methods and identify three properties required for large-scale real-world applications, namely, realism, diversity, and efficiency. As a remedy, we propose RDED, a novel computationally-efficient yet effective data distillation paradigm, to enable both diversity and realism of the distilled data. Extensive empirical results over various neural architectures and datasets demonstrate the advancement of RDED: we can distill the full ImageNet-1K to a small dataset comprising 10 images per class within 7 minutes, achieving a notable 42% top-1 accuracy with ResNet-18 on a single RTX-4090 GPU (while the SOTA only achieves 21% but requires 6 hours).

References (54)

Authors (4)

Peng Sun (210 papers)
Bei Shi (10 papers)
Daiwei Yu (4 papers)
Tao Lin (167 papers)

Citations (20)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - LINs-lab/RDED: [Preprint] On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm (71 stars)

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm (2312.03526v2)

Summary

Related Papers

GitHub