Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality (2310.06982v1)
Abstract: Dataset distillation aims to minimize the time and memory needed for training deep networks on large datasets, by creating a small set of synthetic images that has a similar generalization performance to that of the full dataset. However, current dataset distillation techniques fall short, showing a notable performance gap when compared to training on the original data. In this work, we are the first to argue that using just one synthetic subset for distillation will not yield optimal generalization performance. This is because the training dynamics of deep networks drastically change during the training. Hence, multiple synthetic subsets are required to capture the training dynamics at different phases of training. To address this issue, we propose Progressive Dataset Distillation (PDD). PDD synthesizes multiple small sets of synthetic images, each conditioned on the previous sets, and trains the model on the cumulative union of these subsets without requiring additional training time. Our extensive experiments show that PDD can effectively improve the performance of existing dataset distillation methods by up to 4.3%. In addition, our method for the first time enable generating considerably larger synthetic datasets.
- Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4750–4759, 2022.
- Generalizing dataset distillation via deep generative prior. arXiv preprint arXiv:2305.01649, 2023.
- Privacy for free: How does dataset condensation help privacy? In International Conference on Machine Learning, pp. 5378–5396. PMLR, 2022.
- Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media, 2009.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- What shapes feature representations? exploring datasets, architectures, and training. Advances in Neural Information Processing Systems, 33:9995–10006, 2020.
- The surprising simplicity of the early-time learning dynamics of neural networks. Advances in Neural Information Processing Systems, 33:17116–17128, 2020.
- Sgd on neural networks learns functions of increasing complexity. Advances in neural information processing systems, 32, 2019.
- Dataset condensation via efficient synthetic-data parameterization. In International Conference on Machine Learning, pp. 11102–11118. PMLR, 2022.
- Learning multiple layers of features from tiny images. 2009.
- Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge. CS 231N, 7(7):3, 2015.
- Dataset distillation via factorization. arXiv preprint arXiv:2210.16774, 2022.
- Efficient dataset distillation using random feature approximation. arXiv preprint arXiv:2210.12067, 2022.
- Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814, 2010.
- In search of the real inductive bias: On the role of implicit regularization in deep learning. arXiv preprint arXiv:1412.6614, 2014.
- Dataset meta-learning from kernel ridge-regression. In International Conference on Learning Representations, 2021a. URL https://openreview.net/forum?id=l-PrrQrK0QR.
- Dataset distillation with infinitely wide convolutional networks. Advances in Neural Information Processing Systems, 34:5186–5198, 2021b.
- Adaptive second order coresets for data-efficient machine learning. In International Conference on Machine Learning, pp. 17848–17869. PMLR, 2022.
- The pitfalls of simplicity bias in neural networks. Advances in Neural Information Processing Systems, 33:9573–9585, 2020.
- An empirical study of example forgetting during deep neural network learning. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=BJlxm30cKm.
- Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
- Cafe: Learning to condense dataset by aligning features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12196–12205, 2022.
- Dataset distillation. arXiv preprint arXiv:1811.10959, 2018.
- Max Welling. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1121–1128, 2009.
- Towards sustainable learning: Coresets for data-efficient deep learning. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp. 39314–39330. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/yang23g.html.
- Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. arXiv preprint arXiv:2110.04181, 2021a.
- Bo Zhao and Hakan Bilen. Dataset condensation with differentiable siamese augmentation. In International Conference on Machine Learning, pp. 12674–12685. PMLR, 2021b.
- Bo Zhao and Hakan Bilen. Synthesizing informative training samples with gan. arXiv preprint arXiv:2204.07513, 2022.
- Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6514–6523, 2023.
- Dataset condensation with gradient matching. ICLR, 1(2):3, 2021.
- Dataset distillation using neural feature regression. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=2clwrA2tfik.
- Xuxi Chen (20 papers)
- Yu Yang (213 papers)
- Zhangyang Wang (375 papers)
- Baharan Mirzasoleiman (51 papers)