Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm (2312.03526v2)

Published 6 Dec 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Contemporary machine learning requires training large neural networks on massive datasets and thus faces the challenges of high computational demands. Dataset distillation, as a recent emerging strategy, aims to compress real-world datasets for efficient training. However, this line of research currently struggle with large-scale and high-resolution datasets, hindering its practicality and feasibility. To this end, we re-examine the existing dataset distillation methods and identify three properties required for large-scale real-world applications, namely, realism, diversity, and efficiency. As a remedy, we propose RDED, a novel computationally-efficient yet effective data distillation paradigm, to enable both diversity and realism of the distilled data. Extensive empirical results over various neural architectures and datasets demonstrate the advancement of RDED: we can distill the full ImageNet-1K to a small dataset comprising 10 images per class within 7 minutes, achieving a notable 42% top-1 accuracy with ResNet-18 on a single RTX-4090 GPU (while the SOTA only achieves 21% but requires 6 hours).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Anonymous. Multisize dataset condensation. In Submitted to The Twelfth International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=FVhmnvqnsI. under review.
  2. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4750–4759, 2022.
  3. Generalizing dataset distillation via deep generative prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3739–3748, 2023.
  4. Selection via proxy: Efficient data selection for deep learning. arXiv preprint arXiv:1906.11829, 2019.
  5. Dc-bench: Dataset condensation benchmark. Advances in Neural Information Processing Systems, 35:810–822, 2022.
  6. Scaling up dataset distillation to imagenet-1k with constant memory. In International Conference on Machine Learning, pp.  6565–6590. PMLR, 2023.
  7. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  9. Minimizing the accumulated trajectory error to improve dataset distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  3749–3758, 2023.
  10. Understanding dataset difficulty with 𝒱𝒱\mathcal{V}caligraphic_V-usable information. In International Conference on Machine Learning, pp.  5988–6008. PMLR, 2022.
  11. What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems, 33:2881–2891, 2020.
  12. Forgy, E. W. Cluster analysis of multivariate data: efficiency versus interpretability of classifications. biometrics, 21:768–769, 1965.
  13. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  14. Towards lossless dataset distillation via difficulty-aligned trajectory matching. arXiv preprint arXiv:2310.05773, 2023.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  16. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pp.  448–456. pmlr, 2015.
  17. Neural tangent kernel: Convergence and generalization in neural networks. Advances in neural information processing systems, 31, 2018.
  18. Introduction to coresets: Accurate coresets. arXiv preprint arXiv:1910.08707, 2019.
  19. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  20. Dataset condensation via efficient synthetic-data parameterization. In International Conference on Machine Learning, pp.  11102–11118. PMLR, 2022.
  21. Learning multiple layers of features from tiny images. 2009a.
  22. Cifar-10 and cifar-100 datasets. URl: https://www. cs. toronto. edu/kriz/cifar. html, 6(1):1, 2009b.
  23. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
  24. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  12009–12019, 2022.
  25. Efficient dataset distillation using random feature approximation. Advances in Neural Information Processing Systems, 35:13877–13891, 2022.
  26. Can pre-trained models assist in dataset distillation? arXiv preprint arXiv:2310.03295, 2023.
  27. On the principles of parsimony and self-consistency for the emergence of intelligence. Frontiers of Information Technology & Electronic Engineering, 23(9):1298–1323, 2022.
  28. Trivial or impossible–dichotomous data difficulty masks model differences (on imagenet and beyond). arXiv preprint arXiv:2110.05922, 2021.
  29. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  30. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems, 34:20596–20607, 2021.
  31. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  4510–4520, 2018.
  32. Shannon, C. E. A mathematical theory of communication. The Bell system technical journal, 27(3):379–423, 1948.
  33. A fast knowledge distillation framework for visual recognition. In European Conference on Computer Vision, pp.  673–690. Springer, 2022.
  34. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  35. Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems, 35:19523–19536, 2022.
  36. Data pruning via moving-one-sample-out. arXiv preprint arXiv:2310.14664, 2023.
  37. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pp.  6105–6114. PMLR, 2019.
  38. An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159, 2018.
  39. Cafe: Learning to condense dataset by aligning features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  12196–12205, 2022.
  40. Dataset distillation. arXiv preprint arXiv:1811.10959, 2018.
  41. Welling, M. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning, pp.  1121–1128, 2009.
  42. Understanding data augmentation for classification: when to warp? In 2016 international conference on digital image computing: techniques and applications (DICTA), pp.  1–6. IEEE, 2016.
  43. A theory of usable information under computational constraints. arXiv preprint arXiv:2002.10689, 2020.
  44. Dreaming to distill: Data-free knowledge transfer via deepinversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  8715–8724, 2020.
  45. Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective. arXiv preprint arXiv:2306.13092, 2023.
  46. Dataset distillation: A comprehensive review. arXiv preprint arXiv:2301.07014, 2023.
  47. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  6023–6032, 2019.
  48. Re-labeling imagenet: from single to multi-labels, from global to localized labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2340–2350, 2021.
  49. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928, 2016.
  50. Accelerating dataset distillation via model augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11950–11959, 2023.
  51. Dataset condensation with distribution matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp.  6514–6523, 2023.
  52. Dataset condensation with gradient matching. arXiv preprint arXiv:2006.05929, 2020.
  53. Improved distribution matching for dataset condensation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  7856–7865, 2023.
  54. Dataset distillation using neural feature regression. Advances in Neural Information Processing Systems, 35:9813–9827, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Peng Sun (210 papers)
  2. Bei Shi (10 papers)
  3. Daiwei Yu (4 papers)
  4. Tao Lin (167 papers)
Citations (20)

Summary

We haven't generated a summary for this paper yet.