Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning (2311.13613v3)

Published 22 Nov 2023 in cs.CV and cs.LG

Abstract: Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original, full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples, often resulting in poor generalization across various pruning and cross-architecture scenarios. Recent studies have addressed this issue by expanding the scope of training dynamics considered, including factors such as forgetting event and probability change, typically using an averaging approach. However, these works struggle to integrate a broader range of training dynamics without overlooking well-generalized samples, which may not be sufficiently highlighted in an averaging manner. In this study, we propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS), to tackle this problem. TDDS utilizes a dual-depth strategy to achieve a balance between incorporating extensive training dynamics and identifying representative samples for dataset pruning. In the first depth, we estimate the series of each sample's individual contributions spanning the training progress, ensuring comprehensive integration of training dynamics. In the second depth, we focus on the variability of the sample-wise contributions identified in the first depth to highlight well-generalized samples. Extensive experiments conducted on CIFAR and ImageNet datasets verify the superiority of TDDS over previous SOTA methods. Specifically on CIFAR-100, our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Estimating example difficulty using variance of gradients. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10368–10378, 2022.
  2. Shun-ichi Amari. Backpropagation and stochastic gradient descent method. Neurocomputing, 5(4-5):185–196, 1993.
  3. Gradient-matching coresets for rehearsal-based continual learning. arXiv preprint arXiv:2203.14544, 2022.
  4. Vicreg: Variance-invariance-covariance regularization for self-supervised learning. In Int. Conf. Learn. Represent., 2021.
  5. Dataset distillation by matching training trajectories. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4750–4759, 2022.
  6. Super-samples from kernel herding. arXiv preprint arXiv:1203.3472, 2012.
  7. Masked-attention mask transformer for universal image segmentation. In IEEE Conf. Comput. Vis. Pattern Recog., pages 1290–1299, 2022.
  8. Selection via proxy: Efficient data selection for deep learning. In Int. Conf. Learn. Represent., 2019.
  9. Imagenet: A large-scale hierarchical image database. In IEEE Conf. Comput. Vis. Pattern Recog., pages 248–255, 2009.
  10. L-robustness and beyond: Unleashing efficient adversarial training. In Eur. Conf. Comput. Vis., 2022.
  11. Deepcore: A comprehensive library for coreset selection in deep learning. In Int. Conf. Database. and Expert Syst. Appl., pages 181–195, 2022.
  12. Deep residual learning for image recognition. In IEEE Conf. Comput. Vis. Pattern Recog., pages 770–778, 2016.
  13. Large-scale dataset pruning with dynamic uncertainty. arXiv preprint arXiv:2306.05175, 2023.
  14. Efficient quantization-aware training with adaptive coreset selection. arXiv preprint arXiv:2306.07215, 2023.
  15. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
  16. Grad-match: Gradient matching based data subset selection for efficient deep model training. In Proc. Int. Conf. Mach. Learn., pages 5464–5474, 2021.
  17. Learning multiple layers of features from tiny images. 2009.
  18. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. Int. J. Comput. Vis., 128(7):1956–1981, 2020.
  19. Swin transformer: Hierarchical vision transformer using shifted windows. In Int. Conf. Comput. Vis., pages 10012–10022, 2021.
  20. Online batch selection for faster training of neural networks. arXiv preprint arXiv:1511.06343, 2015.
  21. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Eur. Conf. Comput. Vis., pages 116–131, 2018.
  22. Active learning by acquiring contrastive examples. arXiv preprint arXiv:2109.03764, 2021.
  23. When less is more: Investigating data pruning for pretraining llms at scale. arXiv preprint arXiv:2309.04564, 2023.
  24. Coresets for data-efficient training of machine learning models. In Proc. Int. Conf. Mach. Learn., pages 6950–6960, 2020.
  25. Dataset meta-learning from kernel ridge-regression. arXiv preprint arXiv:2011.00050, 2020.
  26. Repeated random sampling for minimizing the time-to-accuracy of learning. arXiv preprint arXiv:2305.18424, 2023.
  27. Robust data pruning under label noise via maximizing re-labeling accuracy. arXiv preprint arXiv:2311.01002, 2023.
  28. Meta variance transfer: Learning to augment from the others. In Int. Conf. Learn. Represent., pages 7510–7520, 2020.
  29. Deep learning on a data diet: Finding important examples early in training. In Adv. Neural Inform. Process. Syst., pages 20596–20607, 2021.
  30. Lottery tickets on a data diet: Finding initializations with sparse trainable networks. In Adv. Neural Inform. Process. Syst., pages 18916–18928, 2022.
  31. Identifying mislabeled data using the area under the margin ranking. In Adv. Neural Inform. Process. Syst., pages 17044–17056, 2020.
  32. Fishr: Invariant gradient variances for out-of-distribution generalization. In Int. Conf. Learn. Represent., pages 18347–18377, 2022.
  33. High-resolution image synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog., pages 10684–10695, 2022.
  34. Mobilenetv2: Inverted residuals and linear bottlenecks. In IEEE Conf. Comput. Vis. Pattern Recog., pages 4510–4520, 2018.
  35. Loss-curvature matching for dataset selection and condensation. In AISTATS, pages 8606–8628, 2023.
  36. Very deep convolutional networks for large-scale image recognition. In Int. Conf. Learn. Represent., 2015.
  37. Beyond neural scaling laws: beating power law scaling via data pruning. In Adv. Neural Inform. Process. Syst., pages 19523–19536, 2022.
  38. Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243, 2019.
  39. Data pruning via moving-one-sample-out. In Adv. Neural Inform. Process. Syst., 2023.
  40. Efficientnet: Rethinking model scaling for convolutional neural networks. In Int. Conf. Mach. Learn., pages 6105–6114, 2019.
  41. Exploring data redundancy in real-world image classification through data selection. arXiv preprint arXiv:2306.14113, 2023.
  42. An empirical study of example forgetting during deep neural network learning. In Int. Conf. Learn. Represent., 2019.
  43. Cafe: Learning to condense dataset by aligning features. In IEEE Conf. Comput. Vis. Pattern Recog., pages 12196–12205, 2022.
  44. Max Welling. Herding dynamical weights to learn. In Proc. Int. Conf. Mach. Learn., pages 1121–1128, 2009.
  45. Moderate coreset: A universal method of data selection for real-world data-efficient deep learning. In Int. Conf. Learn. Represent., 2022.
  46. Efficient adversarial contrastive learning via robustness-aware coreset selection. In Adv. Neural Inform. Process. Syst., 2023.
  47. Dataset distillation: A comprehensive review. arXiv preprint arXiv:2301.07014, 2023.
  48. Baharan Mirzasoleiman Yu Yang, Kang Hao. Towards sustainable learning: Coresets for data-efficient deep learning. In Proc. Int. Conf. Mach. Learn., 2023.
  49. Adding conditional control to text-to-image diffusion models. In Int. Conf. Comput. Vis., pages 3836–3847, 2023.
  50. Coverage-centric coreset selection for high pruning rates. In Int. Conf. Learn. Represent., 2022.
  51. Scene parsing through ade20k dataset. In IEEE Conf. Comput. Vis. Pattern Recog., pages 633–641, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xin Zhang (904 papers)
  2. Jiawei Du (31 papers)
  3. Yunsong Li (41 papers)
  4. Weiying Xie (31 papers)
  5. Joey Tianyi Zhou (116 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com