A survey of active learning in materials science: Data-driven paradigm for accelerating the research pipeline

Published 11 Jan 2026 in cond-mat.mtrl-sci | (2601.06971v1)

Abstract: The exploration of materials composition, structure, and processing spaces is constrained by high dimensionality and the cost of data acquisition. While machine learning has supported property prediction and design, its effectiveness depends on labeled data, which remains expensive to generate via experiments or high-fidelity simulations. Improving data efficiency is thus a central concern in materials informatics. Active learning (AL) addresses this by coupling model training with adaptive data acquisition. Instead of static datasets, AL iteratively prioritizes candidates based on uncertainty, diversity, or task-specific objectives. By guiding data collection under limited budgets, AL offers a structured approach to decision-making, complementing physical insight with quantitative measures of informativeness. Recently, AL has been applied to computational simulation, structure optimization, and autonomous experimentation. However, the diversity of AL formulations has led to fragmented methodologies and inconsistent assessments. This Review provides a concise overview of AL methods in materials science, focusing on their role in improving data efficiency under realistic constraints. We summarize key methodological principles, representative applications, and persistent challenges, aiming to clarify the scope and limitations of AL as a practical tool within contemporary materials informatics.