Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools (1908.05557v2)

Published 15 Aug 2019 in cs.LG and stat.ML

Abstract: There has been considerable growth and interest in industrial applications of ML in recent years. ML engineers, as a consequence, are in high demand across the industry, yet improving the efficiency of ML engineers remains a fundamental challenge. Automated machine learning (AutoML) has emerged as a way to save time and effort on repetitive tasks in ML pipelines, such as data pre-processing, feature engineering, model selection, hyperparameter optimization, and prediction result analysis. In this paper, we investigate the current state of AutoML tools aiming to automate these tasks. We conduct various evaluations of the tools on many datasets, in different data segments, to examine their performance, and compare their advantages and disadvantages on different test cases.

An Evaluation and Comparison of Automated Machine Learning Tools

The paper "Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools" presents a rigorous evaluation of various automated machine learning (AutoML) tools, assessing their capabilities in automating multiple stages of the machine learning pipeline. This paper provides a foundational understanding of the current landscape of AutoML tools and highlights practical insights into their performance across different datasets and machine learning tasks.

The primary focus of this research is on assessing AutoML tools based on several criteria that encompass the capabilities of these tools in data preprocessing, model selection and hyperparameter optimization, model interpretation, and prediction analysis. The authors systematically profile major AutoML platforms, comparing their functionalities and performance. The analysis is enriched by experimental evaluations performed on a comprehensive array of datasets, covering diverse data segments exemplified by varying sizes, features, and imbalances.

Key Findings and Comparative Analysis:

  1. Model Selection and Optimization: The research confirms that AutoML tools consistently outperform traditional hand-crafted models in several supervised tasks due to their robust automated model selection and hyperparameter optimization capabilities. Notably, tools like H2O-Automl, Auto-keras, and Auto-sklearn demonstrated superior adaptability and efficiency in model training and selection.
  2. Data Preprocessing and Feature Engineering: The paper reveals a considerable variation in data preprocessing abilities among different AutoML tools. While platforms like TransmogrifAI excel in schema detection and feature engineering with detailed data type identification, others necessitate substantial human input, which mitigates their automation benefits.
  3. Performance Across Data Segments: The performance of AutoML tools, according to the experiments, varies significantly across data segments, particularly with data containing high dimensionality and imbalances. Nonetheless, H2O-Automl and Auto-keras were particularly noted for maintaining stable performance across these variations.
  4. Time Limit and Convergence: The investigation into the temporal efficacy of these platforms revealed that given more computational time, AutoML tools generally improved their model accuracy. Tools such as H2O-Automl and Auto-keras were reported to achieve optimal performance relatively quickly compared to others, indicating efficient use of computational resources.
  5. Robustness: Tool robustness was another focal point, with findings indicating that widespread variability in output across multiple runs was observed for certain tools, emphasizing the need for further robustness enhancements in AutoML executions.

Implications and Future Directions:

The paper's comparative analysis of AutoML tools suggests profound implications for both industrial application and academic research. As the demand for rapid and efficient machine learning solutions grows, AutoML platforms are expected to play an increasingly pivotal role. The findings accentuate the potential for hybrid systems that combine robust model selection with advanced preprocessing capabilities.

For academia, the paper lays foundational knowledge for subsequent research in optimizing AutoML tools further and developing novel architectures that enhance automation without compromising user interpretability. The diverse feature offerings among tools highlight potential pathways for cross-tool integration to leverage individual strengths. Practically, as tools like H2O-Automl and Auto-keras emerge as top contenders in efficiency and stability, enterprises might leverage these platforms in scalable machine learning tasks, provided ongoing enhancements address current robustness and adaptability challenges.

Overall, while the paper illustrates considerable advancements in AutoML technology, it also underscores the need for ongoing refinement and innovation to fully realize automated solutions that integrate seamlessly within complex machine learning ecosystems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Anh Truong (10 papers)
  2. Austin Walters (2 papers)
  3. Jeremy Goodsitt (3 papers)
  4. Keegan Hines (9 papers)
  5. C. Bayan Bruss (22 papers)
  6. Reza Farivar (2 papers)
Citations (177)