Integration Of Evolutionary Automated Machine Learning With Structural Sensitivity Analysis For Composite Pipelines (2312.14770v1)
Abstract: Automated machine learning (AutoML) systems propose an end-to-end solution to a given machine learning problem, creating either fixed or flexible pipelines. Fixed pipelines are task independent constructs: their general composition remains the same, regardless of the data. In contrast, the structure of flexible pipelines varies depending on the input, making them finely tailored to individual tasks. However, flexible pipelines can be structurally overcomplicated and have poor explainability. We propose the EVOSA approach that compensates for the negative points of flexible pipelines by incorporating a sensitivity analysis which increases the robustness and interpretability of the flexible solutions. EVOSA quantitatively estimates positive and negative impact of an edge or a node on a pipeline graph, and feeds this information to the evolutionary AutoML optimizer. The correctness and efficiency of EVOSA was validated in tabular, multimodal and computer vision tasks, suggesting generalizability of the proposed approach across domains.
- Irina V Barabanova, Pavel Vychuzhanin and Nikolay O Nikitin “Sensitivity Analysis of the Composite Data-Driven Pipelines in the Automated Machine Learning” In Procedia Computer Science 193 Elsevier, 2021, pp. 484–493
- José P Cambronero, Jürgen Cito and Martin C Rinard “Ams: Generating automl search spaces from weak specifications” In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2020, pp. 763–774
- “Ensemble Selection from Libraries of Models”, 2004
- “Modeling and forecasting armed conflict: AutoML with human-guided machine learning” In 2019 IEEE International Conference on Big Data (Big Data), 2019, pp. 4714–4723 IEEE
- “Amazon SageMaker Autopilot: a white box AutoML solution at scale” In Proceedings of the fourth international workshop on data management for end-to-end machine learning, 2020, pp. 1–7
- “An analysis of dimensionality reduction techniques for visualizing evolution” In Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2019, pp. 1864–1872
- “Trust in AutoML: exploring information needs for establishing trust in automated machine learning systems” In Proceedings of the 25th International Conference on Intelligent User Interfaces, 2020, pp. 297–307
- Mengnan Du, Ninghao Liu and Xia Hu “Techniques for interpretable machine learning” In Communications of the ACM 63.1 ACM New York, NY, USA, 2019, pp. 68–77
- “AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data” In 7th ICML Workshop on Automated Machine Learning, 2020
- “Autogluon-tabular: Robust and accurate automl for structured data” In arXiv preprint arXiv:2003.06505, 2020
- “Auto-Sklearn 2.0: The Next Generation” In arXiv, 2020, pp. 1–18 arXiv: http://arxiv.org/abs/2007.04074
- “Efficient and robust automated machine learning” In Advances in neural information processing systems 28, 2015
- Nicolo Fusi, Rishit Sheth and Melih Elibol “Probabilistic matrix factorization for automated machine learning” In Advances in neural information processing systems 31, 2018, pp. 3348–3357
- Pieter Gijsbers, Joaquin Vanschoren and Randal S. Olson “Layered TPOT: Speeding up Tree-based Pipeline Optimization”, 2018 URL: http://arxiv.org/abs/1801.06007
- “An open source AutoML benchmark” In arXiv preprint arXiv:1907.00909, 2019
- “Deep residual learning for image recognition” In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778
- Xin He, Kaiyong Zhao and Xiaowen Chu “AutoML: A survey of the state-of-the-art” In Knowledge-Based Systems 212 Elsevier, 2021, pp. 106622
- “Auto-WEKA: Automatic model selection and hyperparameter optimization in WEKA” In Automated machine learning Springer, Cham, 2019, pp. 81–95
- Siddhartha Laghuvarapu, Yashaswi Pathak and U Deva Priyakumar “Band nn: A deep learning framework for energy prediction and geometry optimization of organic small molecules” In Journal of computational chemistry 41.8 Wiley Online Library, 2020, pp. 790–799
- Teddy Lazebnik, Amit Somech and Abraham Itzhak Weinberg “SubStrat: A Subset-Based Strategy for Faster AutoML” In arXiv preprint arXiv:2206.03070, 2022
- “H2O AutoML: Scalable Automatic Machine Learning” In 7th ICML Workshop on Automated Machine Learning, 2020 URL: https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf
- “A sensitivity analysis method for driving the Artificial Bee Colony algorithm’s search process” In Applied Soft Computing 41 Elsevier, 2016, pp. 515–531
- “System sensitivity and uncertainty analysis” In Water Resource Systems Planning and Management: An Introduction to Methods, Models, and Applications Springer, 2017, pp. 331–374
- “Automated evolutionary approach for the design of composite machine learning pipelines” In Future Generation Computer Systems 127 Elsevier, 2022, pp. 109–125
- Randal S Olson and Jason H Moore “TPOT: A tree-based pipeline optimization tool for automating machine learning” In Workshop on automatic machine learning, 2016, pp. 66–74 PMLR
- “PipelineProfiler : A Visual Analytics Tool for the Exploration of AutoML Pipelines”
- “Pipelineprofiler: A visual analytics tool for the exploration of automl pipelines” In IEEE Transactions on Visualization and Computer Graphics 27.2 IEEE, 2020, pp. 390–400
- Youngjun Park, Anne-Christin Hauschild and Dominik Heider “Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing” In NAR Genomics and Bioinformatics 3.4, 2021 DOI: 10.1093/nargab/lqab104
- “Scikit-learn: Machine learning in Python” In Journal of machine learning research 12.Oct, 2011, pp. 2825–2830
- Philipp Probst, Anne-Laure Boulesteix and Bernd Bischl “Tunability: Importance of hyperparameters of machine learning algorithms.” In J. Mach. Learn. Res. 20.53, 2019, pp. 1–32
- “Quantum chemistry structures and properties of 134 kilo molecules” In Scientific data 1.1 Nature Publishing Group, 2014, pp. 1–7
- “Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities” In Knowledge-Based Systems 263 Elsevier, 2023, pp. 110273
- “Sensitivity analysis: A discipline coming of age” In Environmental Modelling & Software 146 Elsevier, 2021, pp. 105226
- “Schnet: A continuous-filter convolutional neural network for modeling quantum interactions” In Advances in neural information processing systems 30, 2017
- “Quantum-chemical insights from deep tensor neural networks” In Nature communications 8.1 Nature Publishing Group, 2017, pp. 1–8
- “Heterogeneous molecular graph neural networks for predicting molecule properties” In 2020 IEEE International Conference on Data Mining (ICDM), 2020, pp. 492–500 IEEE
- “Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms” In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Part F1288, 2013, pp. 847–855 DOI: 10.1145/2487575.2487629
- Oliver T Unke and Markus Meuwly “PhysNet: A neural network for predicting energies, forces, dipole moments, and partial charges” In Journal of chemical theory and computation 15.6 ACS Publications, 2019, pp. 3678–3693
- “LightAutoML: AutoML Solution for a Large Financial Services Ecosystem” In arXiv preprint arXiv:2109.01528, 2021
- “Putting the Human Back in the AutoML Loop.” In EDBT/ICDT Workshops, 2020
- Marc-André Zöller, Tien-Dung Nguyen and Marco F Huber “Incremental Search Space Construction for Machine Learning Pipeline Synthesis” In arXiv preprint arXiv:2101.10951, 2021