A survey of active learning algorithms for supervised remote sensing image classification (2104.07784v1)

Published 15 Apr 2021 in cs.CV

Abstract: Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active learning algorithms: committee, large margin and posterior probability-based. For each of them, the most recent advances in the remote sensing community are discussed and some heuristics are detailed and tested. Several challenging remote sensing scenarios are considered, including very high spatial resolution and hyperspectral image classification. Finally, guidelines for choosing the good architecture are provided for new and/or unexperienced user.

Citations (506)

View on Semantic Scholar

Summary

The paper presents a comprehensive survey of active learning algorithms, categorizing methods into committee-based, large margin-based, and posterior probability-based approaches.
It details methodologies that select the most informative samples to reduce labeling efforts and improve classification performance in remote sensing imagery.
The results indicate that large margin-based techniques, especially when combined with diversity criteria, consistently enhance accuracy across various remote sensing datasets.

Active Learning Algorithms for Supervised Remote Sensing Image Classification

This paper presents a comprehensive survey of active learning algorithms employed in the context of supervised remote sensing image classification. The authors focus on three major families of these algorithms: committee-based, large margin-based, and posterior probability-based approaches. Each category offers unique strategies for selecting the most informative training samples, thereby improving classification performance without the need for extensive datasets.

Key Concepts and Methodologies

Remote sensing classification relies heavily on robust training datasets. However, the generation of such datasets is often constrained by temporal, financial, and manual labeling resources. Active learning addresses this by iteratively refining the training set, focusing on the most informative samples. This methodology underlines several significant benefits, including computational efficiency and enhanced model generalization.

Committee-Based Approaches: This family of algorithms utilizes a committee of diverse models to evaluate the uncertainty of unlabeled data points. The normalized entropy query-by-bagging (nEQB) method exploits a committee of classifiers trained on varied subsets of training data, while the adaptive maximum disagreement (AMD) method partitions the feature space into views to maximize disagreement among classifiers.
Large Margin-Based Approaches: These methods leverage support vector machine (SVM) properties to select samples near decision boundaries. Key techniques include margin sampling (MS), multiclass level uncertainty (MCLU), and augmented strategies incorporating diversity criteria, such as manifold-aware clustering and adaptive subspace decomposition.
Posterior Probability-Based Approaches: These rely on the posterior probabilities of class assignments to gauge sample uncertainty. Methods like Breaking Ties (BT) and KL-max optimize candidate selection based on the model's confidence levels, proving particularly effective with models providing probabilistic outputs.

Results and Implications

The empirical evaluation across various remote sensing datasets, including hyperspectral and multispectral images, reveals that active learning significantly boosts classification accuracy compared to random sampling methods. Specifically, the results indicate that large margin-based approaches, especially when combined with diversity criteria, consistently outperform other strategies when SVM classifiers are utilized. The findings emphasize the importance of choosing appropriate active learning strategies based on dataset characteristics and operational constraints, such as the nature of the data (high-dimensional vs. low-dimensional) and the mode of label acquisition (manual vs. automated).

Practical and Theoretical Implications

Practically, active learning methodologies offer a viable solution for efficiently managing the growing volume of remote sensing data, enabling refined classification models with reduced labeled datasets. Theoretically, the survey provides insights into the strengths and limitations of various active learning heuristics and suggests potential areas for future research, such as noise robustness and the integration of contextual information into learning heuristics.

Future Developments

Future research directions include advancing the robustness of active learning frameworks to handle noisy data, such as SAR imagery, and integrating spatial and contextual information in the sample selection processes. Additionally, aligning active learning techniques with semi-supervised frameworks and domain adaptation emerges as a significant frontier, necessitating in-depth exploration and innovation.

In summary, this survey establishes active learning as a pivotal mechanism in remote sensing classification, offering a roadmap for leveraging informative sampling to enhance classification models effectively. The findings and recommendations will undoubtedly guide researchers and practitioners in the ongoing endeavor to optimize remote sensing image analysis.

PDF Markdown