Papers
Topics
Authors
Recent
2000 character limit reached

DRoP: Distributionally Robust Data Pruning (2404.05579v4)

Published 8 Apr 2024 in cs.LG and cs.CV

Abstract: In the era of exceptionally data-hungry models, careful selection of the training data is essential to mitigate the extensive costs of deep learning. Data pruning offers a solution by removing redundant or uninformative samples from the dataset, which yields faster convergence and improved neural scaling laws. However, little is known about its impact on classification bias of the trained models. We conduct the first systematic study of this effect and reveal that existing data pruning algorithms can produce highly biased classifiers. We present theoretical analysis of the classification risk in a mixture of Gaussians to argue that choosing appropriate class pruning ratios, coupled with random pruning within classes has potential to improve worst-class performance. We thus propose DRoP, a distributionally robust approach to pruning and empirically demonstrate its performance on standard computer vision benchmarks. In sharp contrast to existing algorithms, our proposed method continues improving distributional robustness at a tolerable drop of average performance as we prune more from the datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Contextual diversity for active learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16, pages 137–153. Springer, 2020.
  2. Data pruning and neural scaling laws: fundamental limitations of score-based algorithms. Transactions on Machine Learning Research, 2023. ISSN 2835-8856.
  3. Mwmote–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on knowledge and data engineering, 26(2):405–425, 2012.
  4. The power of ensembles for active learning in image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9368–9377, 2018.
  5. Robustness may be at odds with fairness: An empirical study on class-wise accuracy. In NeurIPS 2020 Workshop on Pre-registration in Machine Learning, pages 325–342. PMLR, 2021.
  6. Fairness in machine learning: A survey. ACM Comput. Surv., aug 2023.
  7. Luigi Cavalli. Alcuni problemi della analisi biometrica di popolazioni naturali, 1945.
  8. Dataset distillation by matching training trajectories. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4750–4759, 2022.
  9. Why does throwing away data improve worst-group error? In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 4144–4188. PMLR, 23–29 Jul 2023.
  10. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
  11. Robust optimization for non-convex objectives. Advances in Neural Information Processing Systems, 30, 2017.
  12. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9268–9277, 2019.
  13. Coupling fairness and pruning in a single run: a bi-level optimization perspective. arXiv preprint arXiv:2312.10181, 2023.
  14. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226, 2012.
  15. Charles Elkan. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, volume 17, pages 973–978. Lawrence Erlbaum Associates Ltd, 2001.
  16. Dan Feldman. Core-sets: An updated survey. WIREs Data Mining and Knowledge Discovery, 10(1):e1335, 2020.
  17. Embarrassingly simple dataset distillation. In The Twelfth International Conference on Learning Representations (ICLR), 2024.
  18. Pruning neural networks at initialization: Why are we missing the mark? In International Conference on Learning Representations, 2021.
  19. Recall distortion in neural network pruning and the undecayed pruning algorithm. Advances in Neural Information Processing Systems, 35:32762–32776, 2022.
  20. Data and parameter scaling laws for neural machine translation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5915–5922, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
  21. Deepcore: A comprehensive library for coreset selection in deep learning. In International Conference on Database and Expert Systems Applications, pages 181–195. Springer, 2022.
  22. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29, 2016.
  23. Balancing act: Constraining disparate impact in sparse models, 2023.
  24. Fairness without demographics in repeated loss minimization. In International Conference on Machine Learning, pages 1929–1938. PMLR, 2018.
  25. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision, pages 1026–1034, 2015.
  26. Large-scale dataset pruning with dynamic uncertainty. arXiv preprint arXiv:2306.05175, 2023.
  27. Deep learning scaling is predictable, empirically. arXiv preprint arXiv:1712.00409, 2017.
  28. Simple data balancing achieves competitive worst-group-accuracy. In Bernhard Schölkopf, Caroline Uhler, and Kun Zhang, editors, Proceedings of the First Conference on Causal Learning and Reasoning, volume 177 of Proceedings of Machine Learning Research, pages 336–351. PMLR, 11–13 Apr 2022.
  29. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML’15, page 448–456. JMLR.org, 2015.
  30. Going beyond classification accuracy metrics in model compression. arXiv preprint arXiv:2012.01604, 2020.
  31. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
  32. Ordered sgd: A new stochastic optimization framework for empirical risk minimization. In International Conference on Artificial Intelligence and Statistics, pages 669–679. PMLR, 2020.
  33. Last layer re-training is sufficient for robustness to spurious correlations. In The Eleventh International Conference on Learning Representations, 2023.
  34. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. Advances in neural information processing systems, 32, 2019.
  35. Dataset difficulty and the role of inductive bias. arXiv preprint arXiv:2401.01867, 2024.
  36. Wat: improve the worst-class robustness in adversarial training. AAAI’23/IAAI’23/EAAI’23. AAAI Press, 2023. ISBN 978-1-57735-880-0.
  37. Fairgrape: Fairness-aware gradient pruning method for face attribute classification. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIII, page 414–432, Berlin, Heidelberg, 2022. Springer-Verlag.
  38. Just train twice: Improving group robustness without training group information. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 6781–6792. PMLR, 18–24 Jul 2021.
  39. Teacher’s pet: understanding and mitigating biases in distillation. Transactions on Machine Learning Research, 2022. ISSN 2835-8856.
  40. On the tradeoff between robustness and fairness. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  41. Prioritized training on points that are learnable, worth learning, and not yet learnt. In International Conference on Machine Learning, pages 15630–15649. PMLR, 2022.
  42. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning, pages 6950–6960. PMLR, 2020.
  43. Fairness through robustness: Investigating robustness disparity in deep learning. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 466–477, New York, NY, USA, 2021. Association for Computing Machinery.
  44. Dataset distillation with infinitely wide convolutional networks. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  45. Michela Paganini. Prune responsibly, 2020.
  46. Automatic differentiation in pytorch. 2017.
  47. Deep learning on a data diet: Finding important examples early in training. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  48. Classification bias on a data diet. In Conference on Parsimony and Learning (Recent Spotlight Track), 2023.
  49. A survey of deep active learning. ACM computing surveys (CSUR), 54(9):1–40, 2021.
  50. A constructive prediction of the generalization error across scales. In International Conference on Learning Representations (ICLR) 2020, 2020.
  51. Mind the gap: Improving robustness to subpopulation shifts with group-aware priors. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2024.
  52. Distributionally robust neural networks. In International Conference on Learning Representations, 2020.
  53. Active learning for convolutional neural networks: A core-set approach. In International Conference on Learning Representations, 2018.
  54. Class-difficulty based methods for long-tailed visual recognition. International Journal of Computer Vision, 130(10):2517–2531, 2022.
  55. Beyond neural scaling laws: beating power law scaling via data pruning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  56. Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data. In International Conference on Machine Learning, pages 9206–9216. PMLR, 2020.
  57. Soft-label dataset distillation and text dataset distillation. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
  58. Equalization loss for long-tailed object recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11662–11671, 2020.
  59. A survey on active learning: State-of-the-art, practical challenges and research directions. Mathematics, 11(4), 2023.
  60. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pages 3428–3448. Association for Computational Linguistics (ACL), 2020.
  61. An empirical study of example forgetting during deep neural network learning. In International Conference on Learning Representations, 2019.
  62. Pruning has a disparate impact on model accuracy. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  63. Picking winning tickets before training by preserving gradient flow. In International Conference on Learning Representations, 2020.
  64. Robust distillation for worst-class performance: on the interplay between teacher and student objectives. In Robin J. Evans and Ilya Shpitser, editors, Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, volume 216 of Proceedings of Machine Learning Research, pages 2237–2247. PMLR, 31 Jul–04 Aug 2023.
  65. Max Welling. Herding dynamical weights to learn. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, page 1121–1128, New York, NY, USA, 2009. Association for Computing Machinery.
  66. To be robust or to be fair: Towards fairness in adversarial training. In International conference on machine learning, pages 11492–11501. PMLR, 2021.
  67. Dataset pruning: Reducing training data by examining generalization influence. In The Eleventh International Conference on Learning Representations, 2023.
  68. Deep learning on a healthy data diet: finding important examples for fairness. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence, AAAI’23/IAAI’23/EAAI’23. AAAI Press, 2023.
  69. Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6514–6523, 2023.
  70. Coverage-centric coreset selection for high pruning rates. In The Eleventh International Conference on Learning Representations, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 3 tweets with 194 likes about this paper.