Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Not All Data Matters: An End-to-End Adaptive Dataset Pruning Framework for Enhancing Model Performance and Efficiency (2312.05599v1)

Published 9 Dec 2023 in cs.AI and cs.LG

Abstract: While deep neural networks have demonstrated remarkable performance across various tasks, they typically require massive training data. Due to the presence of redundancies and biases in real-world datasets, not all data in the training dataset contributes to the model performance. To address this issue, dataset pruning techniques have been introduced to enhance model performance and efficiency by eliminating redundant training samples and reducing computational and memory overhead. However, previous works most rely on manually crafted scalar scores, limiting their practical performance and scalability across diverse deep networks and datasets. In this paper, we propose AdaPruner, an end-to-end Adaptive DAtaset PRUNing framEwoRk. AdaPruner can perform effective dataset pruning without the need for explicitly defined metrics. Our framework jointly prunes training data and fine-tunes models with task-specific optimization objectives. AdaPruner leverages (1) An adaptive dataset pruning (ADP) module, which iteratively prunes redundant samples to an expected pruning ratio; and (2) A pruning performance controller (PPC) module, which optimizes the model performance for accurate pruning. Therefore, AdaPruner exhibits high scalability and compatibility across various datasets and deep networks, yielding improved dataset distribution and enhanced model performance. AdaPruner can still significantly enhance model performance even after pruning up to 10-30\% of the training data. Notably, these improvements are accompanied by substantial savings in memory and computation costs. Qualitative and quantitative experiments suggest that AdaPruner outperforms other state-of-the-art dataset pruning methods by a large margin.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), 687–694.
  2. Ss-il: Separated softmax for incremental learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 844–853.
  3. Scail: Classifier weights scaling for class incremental learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1266–1275.
  4. The minimum feature subset selection problem. Journal of Computer Science and Technology, 12: 145–153.
  5. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819.
  6. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552.
  7. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88(2): 303–338.
  8. Turning big data into tiny data: Constant-size coresets for k-means, PCA, and projective clustering. SIAM Journal on Computing, 49(3): 601–657.
  9. McTwo: a two-step feature selection algorithm based on maximal information coefficient. BMC bioinformatics, 17(1): 1–14.
  10. DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning. arXiv preprint arXiv:2204.08499.
  11. Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 41–50.
  12. Grad-match: Gradient matching based data subset selection for efficient deep model training. In International Conference on Machine Learning, 5464–5474. PMLR.
  13. Glister: Generalization based data subset selection for efficient and robust learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 8110–8118.
  14. Understanding Black-Box Predictions via Influence Functions. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, 1885–1894. JMLR.org.
  15. Convolutional deep belief networks on cifar-10. Unpublished manuscript, 40(7): 1–9.
  16. Learning multiple layers of features from tiny images.
  17. Imagenet classification with deep convolutional neural networks. 25.
  18. The MNIST database of handwritten digits, 1998. URL http://yann. lecun. com/exdb/mnist, 10(34): 14.
  19. Deep double descent: Are deep neural networks prone to spurious solutions? In International Conference on Learning Representations.
  20. Autopruner: An end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recognition, 107: 107461.
  21. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning, 6950–6960. PMLR.
  22. Deep Double Descent: Where Bigger Models and More Data Hurt. In International Conference on Learning Representations.
  23. Parallel and scalable Dunn Index for the validation of big data clusters. Parallel Computing, 102: 102751.
  24. Data Valuation Without Training of a Model. In The Eleventh International Conference on Learning Representations.
  25. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng., 22(10): 1345–1359.
  26. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems, 34: 20596–20607.
  27. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Cortes, C.; Lawrence, N.; Lee, D.; Sugiyama, M.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc.
  28. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489.
  29. Smith, L. N. 2018. A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay. arXiv preprint arXiv:1803.09820.
  30. Beyond neural scaling laws: beating power law scaling via data pruning. In Oh, A. H.; Agarwal, A.; Belgrave, D.; and Cho, K., eds., Advances in Neural Information Processing Systems.
  31. An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159.
  32. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  33. Welling, M. 2009. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning, 1121–1128.
  34. Detectron2. https://github.com/facebookresearch/detectron2.
  35. Dataset Pruning: Reducing Training Data by Examining Generalization Influence. In International Conference on Learning Representations.
  36. Data valuation using reinforcement learning. In International Conference on Machine Learning, 10842–10851. PMLR.
  37. Wide Residual Networks. In Richard C. Wilson, E. R. H.; and Smith, W. A. P., eds., Proceedings of the British Machine Vision Conference (BMVC), 87.1–87.12. BMVA Press. ISBN 1-901725-59-6.
  38. A Comprehensive Survey on Transfer Learning. Proc. IEEE, 109(1): 43–76.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Suorong Yang (13 papers)
  2. Hongchao Yang (1 paper)
  3. Suhan Guo (7 papers)
  4. Furao Shen (44 papers)
  5. Jian Zhao (218 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.