Distributional Dataset Distillation with Subtask Decomposition (2403.00999v1)
Abstract: What does a neural network learn when training from a task-specific dataset? Synthesizing this knowledge is the central idea behind Dataset Distillation, which recent work has shown can be used to compress large datasets into a small set of input-label pairs ($\textit{prototypes}$) that capture essential aspects of the original dataset. In this paper, we make the key observation that existing methods distilling into explicit prototypes are very often suboptimal, incurring in unexpected storage cost from distilled labels. In response, we propose $\textit{Distributional Dataset Distillation}$ (D3), which encodes the data using minimal sufficient per-class statistics and paired with a decoder, we distill dataset into a compact distributional representation that is more memory-efficient compared to prototype-based methods. To scale up the process of learning these representations, we propose $\textit{Federated distillation}$, which decomposes the dataset into subsets, distills them in parallel using sub-task experts and then re-aggregates them. We thoroughly evaluate our algorithm on a three-dimensional metric and show that our method achieves state-of-the-art results on TinyImageNet and ImageNet-1K. Specifically, we outperform the prior art by $6.9\%$ on ImageNet-1K under the storage budget of 2 images per class.
- Demystifying MMD GANs. January 2018. arXiv:1801.01401.
- Dataset distillation by matching training trajectories. March 2022. arXiv:2203.11932.
- Generalizing dataset distillation via deep generative prior. pp. 3739–3748, May 2023. arXiv:2305.01649.
- Scaling up dataset distillation to ImageNet-1K with constant memory. November 2022. arXiv:2211.10586.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
- Remember the past: Distilling datasets into addressable memories for neural networks. pp. 34391–34404, June 2022. arXiv:2206.02916.
- Privacy for free: How does dataset condensation help privacy? June 2022. arXiv:2206.00240.
- Minimizing the accumulated trajectory error to improve dataset distillation. November 2022. arXiv:2211.11004.
- Kernel thinning. May 2021. arXiv:2105.05842v10.
- Auto-Encoding variational bayes. December 2013. arXiv:1312.6114v11.
- An introduction to variational autoencoders. June 2019. arXiv:1906.02691.
- Dataset condensation with latent space knowledge factorization and sharing. August 2022. arXiv:2208.10494.
- Mmd gan: Towards deeper understanding of moment matching network. Adv. Neural Inf. Process. Syst., 30, 2017.
- Compressed gastric image generation based on soft-label dataset distillation for medical data sharing. Comput. Methods Programs Biomed., 227:107189, December 2022.
- Dataset distillation via the wasserstein metric. November 2023. arXiv:2311.18531.
- Dataset distillation via factorization. pp. 1100–1113, October 2022.
- Coresets for data-efficient training of machine learning models. In Hal Daumé Iii and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 6950–6960. PMLR, 2020.
- Dataset Meta-Learning from kernel Ridge-Regression. October 2020. arXiv:2011.0005.
- Kornia: an open source differentiable computer vision library for PyTorch. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 3674–3683. IEEE, March 2020.
- Distilled replay: Overcoming forgetting through synthetic samples. March 2021. arXiv:2103.15851.
- Data distillation: A survey. January 2023. arXiv:2301.04272.
- DataDAM: Efficient dataset distillation with attention matching. ICCV, pp. 17051–17061, September 2023.
- CAFE: Learning to condense dataset by aligning FEatures. pp. 12196–12205, March 2022. arXiv:2203.01531.
- Dataset distillation. November 2018. arXiv:1811.10959.
- Andreas Winter. Compression of sources of probability distributions and density operators. August 2002. arXiv:quant-ph/0208131.
- Dataset distillation in large data era. November 2023. arXiv:2311.18838.
- Squeeze, recover and relabel: Dataset condensation at ImageNet scale from a new perspective. June 2023. arXiv:2306.13092.
- Bo Zhao and Hakan Bilen. Dataset condensation with differentiable siamese augmentation. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 12674–12685. PMLR, 2021.
- Bo Zhao and Hakan Bilen. Synthesizing informative training samples with GAN. April 2022. arXiv:2204.07513.
- Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 6514–6523. IEEE, January 2023.
- Dataset condensation with gradient matching. June 2020. arXiv:2006.05929.
- Probabilistic bilevel coreset selection. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 27287–27302. PMLR, 2022a.
- Dataset distillation using neural feature regression. June 2022b. arXiv:2206.00719.
- Tian Qin (27 papers)
- Zhiwei Deng (33 papers)
- David Alvarez-Melis (48 papers)