On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm (2312.03526v2)
Abstract: Contemporary machine learning requires training large neural networks on massive datasets and thus faces the challenges of high computational demands. Dataset distillation, as a recent emerging strategy, aims to compress real-world datasets for efficient training. However, this line of research currently struggle with large-scale and high-resolution datasets, hindering its practicality and feasibility. To this end, we re-examine the existing dataset distillation methods and identify three properties required for large-scale real-world applications, namely, realism, diversity, and efficiency. As a remedy, we propose RDED, a novel computationally-efficient yet effective data distillation paradigm, to enable both diversity and realism of the distilled data. Extensive empirical results over various neural architectures and datasets demonstrate the advancement of RDED: we can distill the full ImageNet-1K to a small dataset comprising 10 images per class within 7 minutes, achieving a notable 42% top-1 accuracy with ResNet-18 on a single RTX-4090 GPU (while the SOTA only achieves 21% but requires 6 hours).
- Anonymous. Multisize dataset condensation. In Submitted to The Twelfth International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=FVhmnvqnsI. under review.
- Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4750–4759, 2022.
- Generalizing dataset distillation via deep generative prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3739–3748, 2023.
- Selection via proxy: Efficient data selection for deep learning. arXiv preprint arXiv:1906.11829, 2019.
- Dc-bench: Dataset condensation benchmark. Advances in Neural Information Processing Systems, 35:810–822, 2022.
- Scaling up dataset distillation to imagenet-1k with constant memory. In International Conference on Machine Learning, pp. 6565–6590. PMLR, 2023.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
- Minimizing the accumulated trajectory error to improve dataset distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3749–3758, 2023.
- Understanding dataset difficulty with 𝒱𝒱\mathcal{V}caligraphic_V-usable information. In International Conference on Machine Learning, pp. 5988–6008. PMLR, 2022.
- What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems, 33:2881–2891, 2020.
- Forgy, E. W. Cluster analysis of multivariate data: efficiency versus interpretability of classifications. biometrics, 21:768–769, 1965.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Towards lossless dataset distillation via difficulty-aligned trajectory matching. arXiv preprint arXiv:2310.05773, 2023.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp. 448–456. pmlr, 2015.
- Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
- Introduction to coresets: Accurate coresets. arXiv preprint arXiv:1910.08707, 2019.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Dataset condensation via efficient synthetic-data parameterization. In International Conference on Machine Learning, pp. 11102–11118. PMLR, 2022.
- Learning multiple layers of features from tiny images. 2009a.
- Cifar-10 and cifar-100 datasets. URl: https://www. cs. toronto. edu/kriz/cifar. html, 6(1):1, 2009b.
- Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12009–12019, 2022.
- Efficient dataset distillation using random feature approximation. Advances in Neural Information Processing Systems, 35:13877–13891, 2022.
- Can pre-trained models assist in dataset distillation? arXiv preprint arXiv:2310.03295, 2023.
- On the principles of parsimony and self-consistency for the emergence of intelligence. Frontiers of Information Technology & Electronic Engineering, 23(9):1298–1323, 2022.
- Trivial or impossible–dichotomous data difficulty masks model differences (on imagenet and beyond). arXiv preprint arXiv:2110.05922, 2021.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems, 34:20596–20607, 2021.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520, 2018.
- Shannon, C. E. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
- A fast knowledge distillation framework for visual recognition. In European Conference on Computer Vision, pp. 673–690. Springer, 2022.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems, 35:19523–19536, 2022.
- Data pruning via moving-one-sample-out. arXiv preprint arXiv:2310.14664, 2023.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pp. 6105–6114. PMLR, 2019.
- An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159, 2018.
- Cafe: Learning to condense dataset by aligning features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12196–12205, 2022.
- Dataset distillation. arXiv preprint arXiv:1811.10959, 2018.
- Welling, M. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1121–1128, 2009.
- Understanding data augmentation for classification: when to warp? In 2016 international conference on digital image computing: techniques and applications (DICTA), pp. 1–6. IEEE, 2016.
- A theory of usable information under computational constraints. arXiv preprint arXiv:2002.10689, 2020.
- Dreaming to distill: Data-free knowledge transfer via deepinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8715–8724, 2020.
- Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective. arXiv preprint arXiv:2306.13092, 2023.
- Dataset distillation: A comprehensive review. arXiv preprint arXiv:2301.07014, 2023.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 6023–6032, 2019.
- Re-labeling imagenet: from single to multi-labels, from global to localized labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2340–2350, 2021.
- Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928, 2016.
- Accelerating dataset distillation via model augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11950–11959, 2023.
- Dataset condensation with distribution matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6514–6523, 2023.
- Dataset condensation with gradient matching. arXiv preprint arXiv:2006.05929, 2020.
- Improved distribution matching for dataset condensation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7856–7865, 2023.
- Dataset distillation using neural feature regression. Advances in Neural Information Processing Systems, 35:9813–9827, 2022.