Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Architecture Search via Bi-level Data Pruning (2312.14200v1)

Published 21 Dec 2023 in cs.CV

Abstract: Improving the efficiency of Neural Architecture Search (NAS) is a challenging but significant task that has received much attention. Previous works mainly adopted the Differentiable Architecture Search (DARTS) and improved its search strategies or modules to enhance search efficiency. Recently, some methods have started considering data reduction for speedup, but they are not tightly coupled with the architecture search process, resulting in sub-optimal performance. To this end, this work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization, and then proposes a novel Bi-level Data Pruning (BDP) paradigm that targets the weights and architecture levels of DARTS to enhance efficiency from a data perspective. Specifically, we introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric, to gradually prune unsuitable samples for DARTS during the search. An effective automatic class balance constraint is also integrated into BDP, to suppress potential class imbalances resulting from data-efficient algorithms. Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50% while achieving superior performance when applied to baseline DARTS. Besides, we demonstrate that BDP can harmoniously integrate with advanced DARTS variants, like PC-DARTS and \b{eta}-DARTS, offering an approximately 2 times speedup with minimal performance compromises.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Nlu on data diets: Dynamic data subset selection for nlp classification tasks. arXiv preprint arXiv:2306.03208, 2023.
  2. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332, 2018.
  3. Once-for-all: Train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791, 2019.
  4. Searching for efficient multi-scale architectures for dense image prediction. Advances in neural information processing systems, 31, 2018.
  5. Stabilizing differentiable architecture search via perturbation-based regularization. In International conference on machine learning, pages 1554–1565. PMLR, 2020.
  6. Progressive differentiable architecture search: Bridging the depth gap between search and evaluation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1294–1303, 2019.
  7. Noisy differentiable architecture search. arXiv preprint arXiv:2005.03566, 2020.
  8. Darts-: robustly stepping out of performance collapse without indicators. arXiv preprint arXiv:2009.01027, 2020a.
  9. Moga: Searching beyond mobilenetv3. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4042–4046. IEEE, 2020b.
  10. Fair darts: Eliminating unfair advantages in differentiable architecture search. In European conference on computer vision, pages 465–480. Springer, 2020c.
  11. Searching for a robust neural architecture in four gpu hours. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1761–1770, 2019.
  12. Nas-bench-201: Extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326, 2020.
  13. Spinenet: Learning scale-permuted backbone for recognition and localization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11592–11601, 2020.
  14. Bert on a data diet: Finding important examples by gradient-based pruning. arXiv preprint arXiv:2211.05610, 2022.
  15. What neural networks memorize and why: Discovering the long tail via influence estimation. Advances in Neural Information Processing Systems, 33:2881–2891, 2020.
  16. Dots: Decoupling operation and topology in differentiable architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12311–12320, 2021.
  17. Generalized global ranking-aware neural architecture ranker for efficient image classifier search. In Proceedings of the 30th ACM International Conference on Multimedia, pages 3730–3741, 2022.
  18. On coresets for k-means and k-median clustering. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pages 291–300, 2004.
  19. Large-scale dataset pruning with dynamic uncertainty. arXiv preprint arXiv:2306.05175, 2023.
  20. Dsnas: Direct neural architecture search without parameter retraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12084–12092, 2020.
  21. Grad-match: Gradient matching based data subset selection for efficient deep model training. In International Conference on Machine Learning, pages 5464–5474. PMLR, 2021a.
  22. Glister: Generalization based data subset selection for efficient and robust learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8110–8118, 2021b.
  23. Mobilenetv3. Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization, pages 125–144, 2021.
  24. Block-wisely supervised neural architecture search with knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1989–1998, 2020a.
  25. Adapting neural architectures between domains. Advances in neural information processing systems, 33:789–798, 2020b.
  26. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
  27. Automatic loss function search for adversarial unsupervised domain adaptation. IEEE Transactions on Circuits and Systems for Video Technology, 2023.
  28. Coresets for data-efficient training of machine learning models. In International Conference on Machine Learning, pages 6950–6960. PMLR, 2020.
  29. Accelerating neural architecture search via proxy data. arXiv preprint arXiv:2106.04784, 2021.
  30. Deep learning on a data diet: Finding important examples early in training. Advances in Neural Information Processing Systems, 34:20596–20607, 2021.
  31. Speeding up nas with adaptive subset selection. In First Conference on Automated Machine Learning (Late-Breaking Workshop), 2022.
  32. Aging evolution for image classifier architecture search. In AAAI conference on artificial intelligence, page 2, 2019a.
  33. Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, pages 4780–4789, 2019b.
  34. Efficient architecture search for diverse tasks. Advances in Neural Information Processing Systems, 35:16151–16164, 2022.
  35. Core-set sampling for efficient neural architecture search. arXiv preprint arXiv:2107.06869, 2021.
  36. Beyond neural scaling laws: beating power law scaling via data pruning. Advances in Neural Information Processing Systems, 35:19523–19536, 2022.
  37. Dataset cartography: Mapping and diagnosing datasets with training dynamics. arXiv preprint arXiv:2009.10795, 2020.
  38. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pages 6105–6114. PMLR, 2019a.
  39. Mixconv: Mixed depthwise convolutional kernels. arXiv preprint arXiv:1907.09595, 2019b.
  40. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2820–2828, 2019.
  41. An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159, 2018.
  42. Rethinking architecture selection in differentiable nas. arXiv preprint arXiv:2108.04392, 2021.
  43. Snas: stochastic neural architecture search. arXiv preprint arXiv:1812.09926, 2018.
  44. Pc-darts: Partial channel connections for memory-efficient architecture search. arXiv preprint arXiv:1907.05737, 2019.
  45. Partially-connected neural architecture search for reduced computational redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9):2953–2970, 2021.
  46. Revisiting training-free nas metrics: An efficient training-based method. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4751–4760, 2023.
  47. Efficient joint-dimensional search with solution space regularization for real-time semantic segmentation. International Journal of Computer Vision, 130(11):2674–2694, 2022a.
  48. β𝛽\betaitalic_β-darts: Beta-decay regularization for differentiable architecture search. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10864–10873. IEEE, 2022b.
  49. β𝛽\betaitalic_β-darts++: Bi-level regularization for proxy-robust differentiable architecture search. arXiv preprint arXiv:2301.06393, 2023.
  50. Cyclic differentiable architecture search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):211–228, 2022.
  51. Deep learning on a healthy data diet: Finding important examples for fairness. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 14593–14601, 2023.
  52. Understanding and robustifying differentiable architecture search. arXiv preprint arXiv:1909.09656, 2019.
  53. A2s-nas: Asymmetric spectral-spatial neural architecture search for hyperspectral image classification. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
  54. One-shot neural architecture search: Maximising diversity to overcome catastrophic forgetting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(9):2921–2935, 2020.
  55. idarts: Differentiable architecture search with stochastic implicit gradients. In International Conference on Machine Learning, pages 12557–12566. PMLR, 2021.
  56. Coverage-centric coreset selection for high pruning rates. arXiv preprint arXiv:2210.15809, 2022.
  57. Blockqnn: Efficient block-wise neural network architecture generation. IEEE transactions on pattern analysis and machine intelligence, 43(7):2314–2328, 2020.
  58. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com