Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy (2312.15927v3)

Published 26 Dec 2023 in cs.CV and cs.LG

Abstract: Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs. To address these challenges, dataset condensation has been developed to learn a small synthetic set that preserves essential information from the original large-scale dataset. Nowadays, optimization-oriented methods have been the primary method in the field of dataset condensation for achieving SOTA results. However, the bi-level optimization process hinders the practical application of such methods to realistic and larger datasets. To enhance condensation efficiency, previous works proposed Distribution-Matching (DM) as an alternative, which significantly reduces the condensation cost. Nonetheless, current DM-based methods still yield less comparable results to SOTA optimization-oriented methods. In this paper, we argue that existing DM-based methods overlook the higher-order alignment of the distributions, which may lead to sub-optimal matching results. Inspired by this, we present a novel DM-based method named M3D for dataset condensation by Minimizing the Maximum Mean Discrepancy between feature representations of the synthetic and real images. By embedding their distributions in a reproducing kernel Hilbert space, we align all orders of moments of the distributions of real and synthetic images, resulting in a more generalized condensed set. Notably, our method even surpasses the SOTA optimization-oriented method IDC on the high-resolution ImageNet dataset. Extensive analysis is conducted to verify the effectiveness of the proposed method. Source codes are available at https://github.com/Hansong-Zhang/M3D.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Unifying divergence minimization and statistical inference via convex duality. In ICCLT, 139–153.
  2. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics, 22(14): e49–e57.
  3. Dataset Distillation by Matching Training Trajectories. In CVPR, 4750–4759.
  4. Generalizing Dataset Distillation via Deep Generative Prior. In CVPR, 3739–3748.
  5. A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness. arXiv.
  6. Selection via proxy: Efficient data selection for deep learning. arXiv.
  7. DC-BENCH: Dataset condensation benchmark. NeurIPS, 35: 810–822.
  8. Imagenet: A large-scale hierarchical image database. In CVPR, 248–255.
  9. Remember the Past: Distilling Datasets into Addressable Memories for Neural Networks. In NeurIPS.
  10. Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation. In CVPR.
  11. Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media.
  12. Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces. JMLR, 5(Jan): 73–99.
  13. Federated Learning via Synthetic Data. arXiv.
  14. Deep residual learning for image recognition. In CVPR, 770–778.
  15. Densely connected convolutional networks. In CVPR, 4700–4708.
  16. Neural tangent kernel: Convergence and generalization in neural networks. NeurIPS, 31.
  17. Delving into Effective Gradient Matching for Dataset Condensation. arXiv.
  18. Condensing graphs via one-step gradient matching. In KDD, 720–730.
  19. Dataset Condensation via Efficient Synthetic-Data Parameterization. In ICML, 11102–11118.
  20. Optimal continual learning has perfect memory and is np-hard. In ICML, 5327–5337.
  21. Learning multiple layers of features from tiny images.
  22. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11): 2278–2324.
  23. Dataset condensation with contrastive signals. In ICML, 12352–12364.
  24. A comprehensive survey to dataset distillation. arXiv.
  25. Trustable Co-label Learning from Multiple Noisy Annotators. IEEE TMM, 25: 1045–1057.
  26. Transferring Annotator-and Instance-dependent Transition Matrix for Learning from Crowds. arXiv.
  27. Selective-Supervised Contrastive Learning with Noisy Labels. In CVPR, 316–325.
  28. Estimating Noise Transition Matrix with Label Correlations for Noisy Multi-Label Learning. In NeurIPS.
  29. Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond. IEEE TPAMI, 44(12): 10045–10067.
  30. Dataset distillation via factorization. NeurIPS, 35: 1100–1113.
  31. Few-Shot Dataset Distillation via Translative Pre-Training. In ICCV, 18654–18664.
  32. Slimmable dataset condensation. In CVPR, 3759–3768.
  33. DREAM: Efficient Dataset Distillation by Representative Matching. In ICCV.
  34. Efficient dataset distillation using random feature approximation. NeurIPS, 35: 13877–13891.
  35. Dataset Distillation with Convexified Implicit Gradients. In ICML, 22649–22674.
  36. Reducing Catastrophic Forgetting with Learning on Synthetic Data. In CVPR Workshop.
  37. Learning to Generate Synthetic Training Data using Gradient Matching and Implicit Differentiation. In AIST, 138–150.
  38. Kernel mean embedding of distributions: A review and beyond. Found. Trends Mach. Learn., 10(1-2): 1–141.
  39. Reading digits in natural images with unsupervised feature learning.
  40. Dataset meta-learning from kernel ridge-regression. arXiv.
  41. Dataset Distillation with Infinitely Wide Convolutional Networks. In NeurIPS, 5186–5198.
  42. Adaptive second order coresets for data-efficient machine learning. In ICML, 17848–17869.
  43. Empirical analysis of the hessian of over-parametrized neural networks. arXiv.
  44. DataDAM: Efficient Dataset Distillation with Attention Matching. In ICCV, 17097–17107.
  45. Active learning for convolutional neural networks: A core-set approach. arXiv.
  46. A Hilbert space embedding for distributions. In ALT, 13–31.
  47. Beyond neural scaling laws: beating power law scaling via data pruning. NeurIPS, 35: 19523–19536.
  48. Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data. In ICML, 9206–9216.
  49. An empirical study of example forgetting during deep neural network learning. arXiv.
  50. Gradient Matching for Categorical Data Distillation in CTR Prediction. In CRS, RecSys ’23, 161–170. New York, NY, USA: Association for Computing Machinery. ISBN 9798400702419.
  51. CAFE: Learning to Condense Dataset by Aligning Features. In CVPR, 12196–12205.
  52. Dataset Distillation. arXiv.
  53. Welling, M. 2009. Herding dynamical weights to learn. In ICML, 1121–1128.
  54. Condensed Composite Memory Continual Learning. In IJCNN, 1–8.
  55. Moderate coreset: A universal method of data selection for real-world data-efficient deep learning. In ICLR.
  56. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv.
  57. Feddm: Iterative distribution matching for communication-efficient federated learning. In CVPR, 16323–16332.
  58. Dataset pruning: Reducing training data by examining generalization influence. arXiv.
  59. Nas-bench-101: Towards reproducible neural architecture search. In ICML, 7105–7114.
  60. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In CVPR, 2636–2645.
  61. Dataset distillation: A comprehensive review. arXiv.
  62. Coupled Confusion Correction: Learning from Crowds with Sparse Annotations. AAAI.
  63. Accelerating Dataset Distillation via Model Augmentation. In CVPR.
  64. Dataset condensation with Differentiable Siamese Augmentation. In ICML, 12674–12685.
  65. Dataset Condensation with Gradient Matching. In ICLR.
  66. Dataset Condensation with Distribution Matching. In WACV.
  67. Improved distribution matching for dataset condensation. In CVPR, 7856–7865.
  68. Coverage-centric Coreset Selection for High Pruning Rates. arXiv.
  69. Dataset quantization. In ICCV, 17205–17216.
  70. Dataset Distillation using Neural Feature Regression. In NeurIPS.
Citations (15)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com