Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DREAM: Debugging and Repairing AutoML Pipelines (2401.00379v1)

Published 31 Dec 2023 in cs.SE and cs.AI

Abstract: Deep Learning models have become an integrated component of modern software systems. In response to the challenge of model design, researchers proposed Automated Machine Learning (AutoML) systems, which automatically search for model architecture and hyperparameters for a given task. Like other software systems, existing AutoML systems suffer from bugs. We identify two common and severe bugs in AutoML, performance bug (i.e., searching for the desired model takes an unreasonably long time) and ineffective search bug (i.e., AutoML systems are not able to find an accurate enough model). After analyzing the workflow of AutoML, we observe that existing AutoML systems overlook potential opportunities in search space, search method, and search feedback, which results in performance and ineffective search bugs. Based on our analysis, we design and implement DREAM, an automatic debugging and repairing system for AutoML systems. It monitors the process of AutoML to collect detailed feedback and automatically repairs bugs by expanding search space and leveraging a feedback-driven search strategy. Our evaluation results show that DREAM can effectively and efficiently repair AutoML bugs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (102)
  1. 2020. AutoKeras. https://github.com/keras-team/autokeras.
  2. 2021. CIFAR-100 Dataset. https://www.cs.toronto.edu/~kriz/cifar.html.
  3. 2021. Edge AI Software Market - Global Forecast to 2026. https://www.marketsandmarkets.com/Market-Reports/edge-ai-software-market-70030817.html.
  4. 2021. Google Cloud, Harvard Global Health Institute release improved COVID-19 Public Forecasts, share lessons learned. https://cloud.google.com/blog/products/ai-machine-learning/google-and-harvard-improve-covid-19-forecasts.
  5. 2021. Open-source Repository of DREAM. https://github.com/shiningrain/DREAM.
  6. 2021. Providence creates chatbot to improve COVID-19 care and manage hospital resources. https://customers.microsoft.com/en-in/story/1402681562485712847-providence-health-provider-azure-en-united-states.
  7. 2022. Google Cloud AutoML. https://cloud.google.com/automl.
  8. 2022. Image classification via fine-tuning with EfficientNet. https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning.
  9. 2022a. Imagia: Collaborating on AI-powered healthcare discovery. https://cloud.google.com/customers/imagia.
  10. 2022b. Meredith Digital: Entering a new era of data-driven publishing. https://cloud.google.com/customers/meredith.
  11. 2022. Microsoft NNI. https://github.com/microsoft/nni.
  12. 2022. ML CO2 IMPACT. https://mlco2.github.io/impact/.
  13. 2022. Transfer learning and fine-tuning. https://www.tensorflow.org/guide/keras/transfer_learning.
  14. TensorFlow: A system for large-scale machine learning. In 12th {{\{{USENIX}}\}} symposium on operating systems design and implementation ({{\{{OSDI}}\}} 16). 265–283.
  15. The effect of Target Normalization and Momentum on Dying ReLU. arXiv preprint arXiv:2005.06195 (2020).
  16. Can differential testing improve automatic speech recognition systems?. In 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 674–678.
  17. Food-101–mining discriminative components with random forests. In European conference on computer vision. Springer, 446–461.
  18. Proxylessnas: Direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332 (2018).
  19. Gradient descent: The ultimate optimizer. arXiv preprint arXiv:1909.13371 (2019).
  20. François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1251–1258.
  21. Francois Chollet. 2021. Deep learning with Python. Simon and Schuster.
  22. Georgi Dikov and Justin Bayer. 2019. Bayesian learning of neural network architectures. In The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 730–738.
  23. Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-fourth international joint conference on artificial intelligence.
  24. Xuanyi Dong and Yi Yang. 2019. Searching for a robust neural architecture in four gpu hours. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1761–1770.
  25. Deepstellar: Model-based quantitative analysis of stateful deep learning systems. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 477–487.
  26. HUDD: A tool to debug DNNs for safety analysis. In 2022 IEEE/ACM 44st International Conference on Software Engineering.
  27. Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE.
  28. The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares. Advances in Neural Information Processing Systems 32 (2019).
  29. Importance-Driven Deep Learning System Testing. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE.
  30. Deep learning. MIT press.
  31. Symbolic execution for attribution and attack synthesis in neural networks. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 282–283.
  32. Dlfuzz: Differential fuzzing testing of deep learning systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 739–743.
  33. Isabelle Guyon and André Elisseeff. 2006. An introduction to feature extraction. In Feature extraction. Springer, 1–25.
  34. Milenas: Efficient neural architecture search via mixed-level reformulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11993–12002.
  35. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026–1034.
  36. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  37. Structure-Invariant Testing for Machine Translation. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE.
  38. AutoML: A Survey of the State-of-the-Art. Knowledge-Based Systems 212 (2021), 106622.
  39. Population based augmentation: Efficient learning of augmentation policy schedules. In International Conference on Machine Learning. PMLR, 2731–2741.
  40. Sepp Hochreiter. 1991. Untersuchungen zu dynamischen neuronalen Netzen. Diploma, Technische Universität München 91, 1 (1991).
  41. SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier. The Scientific World Journal 2014 (2014).
  42. Coverage-guided testing for recurrent neural networks. IEEE Transactions on Reliability (2021).
  43. A survey on cleaning dirty data using machine learning paradigm for big data analytics. Indonesian Journal of Electrical Engineering and Computer Science 10, 3 (2018), 1234–1243.
  44. Yuan Jiang and Zhi-Hua Zhou. 2004. Editing training data for kNN classifiers with neural network ensemble. In International symposium on neural networks.
  45. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 1946–1956.
  46. Neural architecture search with bayesian optimisation and optimal transport. arXiv preprint arXiv:1802.07191 (2018).
  47. A survey of feature selection and feature extraction techniques in machine learning. In 2014 science and information conference. IEEE, 372–378.
  48. Self-normalizing neural networks. In Advances in neural information processing systems. 971–980.
  49. Fast bayesian optimization of machine learning hyperparameters on large datasets. In Artificial intelligence and statistics. PMLR, 528–536.
  50. Learning curve prediction with Bayesian neural networks. (2016).
  51. 3d object representations for fine-grained categorization. In Proceedings of the IEEE international conference on computer vision workshops. 554–561.
  52. Quantifying the Carbon Emissions of Machine Learning. arXiv preprint arXiv:1910.09700 (2019).
  53. Ya Le and Xuan Yang. 2015. Tiny imagenet visual recognition challenge. CS 231N 7, 7 (2015), 3.
  54. Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research 18, 1 (2017), 6765–6816.
  55. Testing DNN-based Autonomous Driving Systems under Critical Environmental Conditions. In International Conference on Machine Learning. PMLR, 6471–6482.
  56. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).
  57. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
  58. Dying relu and initialization: Theory and numerical examples. arXiv preprint arXiv:1903.06733 (2019).
  59. Neural architecture optimization. In Advances in neural information processing systems. 7816–7827.
  60. Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 120–131.
  61. LAMP: data provenance for graph based machine learning algorithms through derivative computation. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, Eric Bodden, Wilhelm Schäfer, Arie van Deursen, and Andrea Zisman (Eds.). ACM, 786–797. https://doi.org/10.1145/3106237.3106291
  62. MODE: automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 175–186.
  63. John Miller and Moritz Hardt. 2018. Stable recurrent models. arXiv preprint arXiv:1805.10369 (2018).
  64. Towards modular and programmable architecture search. Advances in Neural Information Processing Systems 32 (2019).
  65. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning. 4901–4911.
  66. A novel learning rate schedule in optimization for neural networks and it’s convergence. Symmetry 12, 4 (2020), 660.
  67. Fabian Pedregosa. 2016. Hyperparameter optimization with approximate gradient. In International conference on machine learning. PMLR, 737–746.
  68. Deepxplore: Automated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles. 1–18.
  69. Ning Qian. 1999. On the momentum term in gradient descent learning algorithms. Neural networks 12, 1 (1999), 145–151.
  70. Semantically Equivalent Adversarial Rules for Debugging NLP models. In Association for Computational Linguistics (ACL).
  71. A survey on data collection for machine learning: a big data-ai integration perspective. IEEE Transactions on Knowledge and Data Engineering 33, 4 (2019), 1328–1347.
  72. Increasing the Confidence of Deep Neural Networks by Coverage Analysis. arXiv preprint arXiv:2101.12100 (2021).
  73. Fairness in the eyes of the data: Certifying machine-learning models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 926–935.
  74. Hippo: Taming hyper-parameter optimization of deep learning with stage trees. arXiv preprint arXiv:2006.11972 (2020).
  75. Concolic testing for deep neural networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 109–119.
  76. Automatic Testing and Improvement of Machine Translation. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE.
  77. David Sussillo and LF Abbott. 2014. Random walk initialization for training very deep feedforward networks. arXiv preprint arXiv:1412.6558 (2014).
  78. Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning. PMLR, 6105–6114.
  79. TRADER: Trace Divergence Analysis and Embedding Regulation for Debugging Recurrent Neural Networks. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE.
  80. Testing Deep Learning Models for Image Analysis Using Object-Relevant Metamorphic Relations. CoRR abs/1909.03824 (2019). arXiv:1909.03824 http://arxiv.org/abs/1909.03824
  81. Testing DNN Image Classifier for Confusion & Bias Errors. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE.
  82. Automated directed fairness testing. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 98–108.
  83. NEUROSPF: A tool for the Symbolic Analysis of Neural Networks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 25–28.
  84. Residual networks behave like ensembles of relatively shallow networks. Advances in neural information processing systems 29 (2016), 550–558.
  85. DeepLocalize: fault localization for deep neural networks. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 251–262.
  86. Bananas: Bayesian optimization with neural architectures for neural architecture search. arXiv preprint arXiv:1910.11858 1, 2 (2019).
  87. Self-checking deep neural networks in deployment. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 372–384.
  88. Lingxi Xie and Alan Yuille. 2017. Genetic cnn. In Proceedings of the IEEE international conference on computer vision. 1379–1388.
  89. Exploring randomly wired neural networks for image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1284–1293.
  90. Deephunter: A coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. 146–157.
  91. Data noising as smoothing in neural network language models. arXiv preprint arXiv:1703.02573 (2017).
  92. Correlations between deep neural network model coverage criteria and model quality. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020, Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM, 775–787. https://doi.org/10.1145/3368089.3409671
  93. Recognition from web data: A progressive filtering approach. IEEE Transactions on Image Processing 27, 11 (2018), 5303–5315.
  94. Revisiting Neuron Coverage Metrics and Quality of Deep Neural Networks. arXiv preprint arXiv:2201.00191 (2022).
  95. Towards automated deep learning: Efficient joint neural architecture and hyperparameter search. arXiv preprint arXiv:1807.06906 (2018).
  96. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 132–142.
  97. AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 359–371.
  98. Training set debugging using trusted items. In Thirty-Second AAAI Conference on Artificial Intelligence.
  99. NeuronFair: Interpretable White-Box Fairness Testing through Biased Neuron Identification. arXiv preprint arXiv:2112.13214 (2021).
  100. DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems. In 2020 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE.
  101. Zhi Quan Zhou and Liqun Sun. 2019. Metamorphic testing of driverless cars. Commun. ACM 62, 3 (2019), 61–67.
  102. Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 (2016).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xiaoyu Zhang (144 papers)
  2. Juan Zhai (26 papers)
  3. Shiqing Ma (56 papers)
  4. Chao Shen (168 papers)