M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy (2312.15927v3)
Abstract: Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs. To address these challenges, dataset condensation has been developed to learn a small synthetic set that preserves essential information from the original large-scale dataset. Nowadays, optimization-oriented methods have been the primary method in the field of dataset condensation for achieving SOTA results. However, the bi-level optimization process hinders the practical application of such methods to realistic and larger datasets. To enhance condensation efficiency, previous works proposed Distribution-Matching (DM) as an alternative, which significantly reduces the condensation cost. Nonetheless, current DM-based methods still yield less comparable results to SOTA optimization-oriented methods. In this paper, we argue that existing DM-based methods overlook the higher-order alignment of the distributions, which may lead to sub-optimal matching results. Inspired by this, we present a novel DM-based method named M3D for dataset condensation by Minimizing the Maximum Mean Discrepancy between feature representations of the synthetic and real images. By embedding their distributions in a reproducing kernel Hilbert space, we align all orders of moments of the distributions of real and synthetic images, resulting in a more generalized condensed set. Notably, our method even surpasses the SOTA optimization-oriented method IDC on the high-resolution ImageNet dataset. Extensive analysis is conducted to verify the effectiveness of the proposed method. Source codes are available at https://github.com/Hansong-Zhang/M3D.
- Unifying divergence minimization and statistical inference via convex duality. In ICCLT, 139–153.
- Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 22(14): e49–e57.
- Dataset Distillation by Matching Training Trajectories. In CVPR, 4750–4759.
- Generalizing Dataset Distillation via Deep Generative Prior. In CVPR, 3739–3748.
- A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness. arXiv.
- Selection via proxy: Efficient data selection for deep learning. arXiv.
- DC-BENCH: Dataset condensation benchmark. NeurIPS, 35: 810–822.
- Imagenet: A large-scale hierarchical image database. In CVPR, 248–255.
- Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks. In NeurIPS.
- Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation. In CVPR.
- Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media.
- Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. JMLR, 5(Jan): 73–99.
- Federated Learning via Synthetic Data. arXiv.
- Deep residual learning for image recognition. In CVPR, 770–778.
- Densely connected convolutional networks. In CVPR, 4700–4708.
- Neural tangent kernel: Convergence and generalization in neural networks. NeurIPS, 31.
- Delving into Effective Gradient Matching for Dataset Condensation. arXiv.
- Condensing graphs via one-step gradient matching. In KDD, 720–730.
- Dataset Condensation via Efficient Synthetic-Data Parameterization. In ICML, 11102–11118.
- Optimal continual learning has perfect memory and is np-hard. In ICML, 5327–5337.
- Learning multiple layers of features from tiny images.
- Gradient-based learning applied to document recognition. Proc. IEEE, 86(11): 2278–2324.
- Dataset condensation with contrastive signals. In ICML, 12352–12364.
- A comprehensive survey to dataset distillation. arXiv.
- Trustable Co-label Learning from Multiple Noisy Annotators. IEEE TMM, 25: 1045–1057.
- Transferring Annotator-and Instance-dependent Transition Matrix for Learning from Crowds. arXiv.
- Selective-Supervised Contrastive Learning with Noisy Labels. In CVPR, 316–325.
- Estimating Noise Transition Matrix with Label Correlations for Noisy Multi-Label Learning. In NeurIPS.
- Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond. IEEE TPAMI, 44(12): 10045–10067.
- Dataset distillation via factorization. NeurIPS, 35: 1100–1113.
- Few-Shot Dataset Distillation via Translative Pre-Training. In ICCV, 18654–18664.
- Slimmable dataset condensation. In CVPR, 3759–3768.
- DREAM: Efficient Dataset Distillation by Representative Matching. In ICCV.
- Efficient dataset distillation using random feature approximation. NeurIPS, 35: 13877–13891.
- Dataset Distillation with Convexified Implicit Gradients. In ICML, 22649–22674.
- Reducing Catastrophic Forgetting with Learning on Synthetic Data. In CVPR Workshop.
- Learning to Generate Synthetic Training Data using Gradient Matching and Implicit Differentiation. In AIST, 138–150.
- Kernel mean embedding of distributions: A review and beyond. Found. Trends Mach. Learn., 10(1-2): 1–141.
- Reading digits in natural images with unsupervised feature learning.
- Dataset meta-learning from kernel ridge-regression. arXiv.
- Dataset Distillation with Infinitely Wide Convolutional Networks. In NeurIPS, 5186–5198.
- Adaptive second order coresets for data-efficient machine learning. In ICML, 17848–17869.
- Empirical analysis of the hessian of over-parametrized neural networks. arXiv.
- DataDAM: Efficient Dataset Distillation with Attention Matching. In ICCV, 17097–17107.
- Active learning for convolutional neural networks: A core-set approach. arXiv.
- A Hilbert space embedding for distributions. In ALT, 13–31.
- Beyond neural scaling laws: beating power law scaling via data pruning. NeurIPS, 35: 19523–19536.
- Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data. In ICML, 9206–9216.
- An empirical study of example forgetting during deep neural network learning. arXiv.
- Gradient Matching for Categorical Data Distillation in CTR Prediction. In CRS, RecSys ’23, 161–170. New York, NY, USA: Association for Computing Machinery. ISBN 9798400702419.
- CAFE: Learning to Condense Dataset by Aligning Features. In CVPR, 12196–12205.
- Dataset Distillation. arXiv.
- Welling, M. 2009. Herding dynamical weights to learn. In ICML, 1121–1128.
- Condensed Composite Memory Continual Learning. In IJCNN, 1–8.
- Moderate coreset: A universal method of data selection for real-world data-efficient deep learning. In ICLR.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv.
- Feddm: Iterative distribution matching for communication-efficient federated learning. In CVPR, 16323–16332.
- Dataset pruning: Reducing training data by examining generalization influence. arXiv.
- Nas-bench-101: Towards reproducible neural architecture search. In ICML, 7105–7114.
- Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In CVPR, 2636–2645.
- Dataset distillation: A comprehensive review. arXiv.
- Coupled Confusion Correction: Learning from Crowds with Sparse Annotations. AAAI.
- Accelerating Dataset Distillation via Model Augmentation. In CVPR.
- Dataset condensation with Differentiable Siamese Augmentation. In ICML, 12674–12685.
- Dataset Condensation with Gradient Matching. In ICLR.
- Dataset Condensation with Distribution Matching. In WACV.
- Improved distribution matching for dataset condensation. In CVPR, 7856–7865.
- Coverage-centric Coreset Selection for High Pruning Rates. arXiv.
- Dataset quantization. In ICCV, 17205–17216.
- Dataset Distillation using Neural Feature Regression. In NeurIPS.