Active learning with MaskAL reduces annotation effort for training Mask R-CNN (2112.06586v3)

Published 13 Dec 2021 in cs.CV

Abstract: The generalisation performance of a convolutional neural network (CNN) is influenced by the quantity, quality, and variety of the training images. Training images must be annotated, and this is time consuming and expensive. The goal of our work was to reduce the number of annotated images needed to train a CNN while maintaining its performance. We hypothesised that the performance of a CNN can be improved faster by ensuring that the set of training images contains a large fraction of hard-to-classify images. The objective of our study was to test this hypothesis with an active learning method that can automatically select the hard-to-classify images. We developed an active learning method for Mask Region-based CNN (Mask R-CNN) and named this method MaskAL. MaskAL involved the iterative training of Mask R-CNN, after which the trained model was used to select a set of unlabelled images about which the model was most uncertain. The selected images were then annotated and used to retrain Mask R-CNN, and this was repeated for a number of sampling iterations. In our study, MaskAL was compared to a random sampling method on a broccoli dataset with five visually similar classes. MaskAL performed significantly better than the random sampling. In addition, MaskAL had the same performance after sampling 900 images as the random sampling had after 2300 images. Compared to a Mask R-CNN model that was trained on the entire training set (14,000 images), MaskAL achieved 93.9% of that model's performance with 17.9% of its training data. The random sampling achieved 81.9% of that model's performance with 16.4% of its training data. We conclude that by using MaskAL, the annotation effort can be reduced for training Mask R-CNN on a broccoli dataset with visually similar classes. Our software is available on https://github.com/pieterblok/maskal.

Citations (28)

View on Semantic Scholar

Summary

The paper presents MaskAL, an active learning framework designed to significantly reduce the data annotation effort required to train Mask R-CNN models for instance segmentation.
MaskAL improves uncertainty sampling by combining semantic, spatial, and occurrence uncertainties to select the most informative images for annotation.
Empirical results show MaskAL achieves comparable performance to random sampling using substantially fewer annotated images, saving considerable manual labeling time and cost.

Overview of Active Learning with MaskAL for Mask R-CNN

The paper "Active learning with MaskAL reduces annotation effort for training Mask R-CNN" presents an approach to optimize training of Mask R-CNN algorithms through the introduction of an active learning framework named MaskAL. This framework specifically addresses the challenges of reducing annotation effort in image data for instance segmentation tasks, especially in the agricultural sector, as illustrated by a case paper involving broccoli head identification.

Active learning is predicated on the idea that a model's performance can be enhanced more quickly by curating training datasets comprising predominantly complex, hard-to-classify images. To exploit this concept in the domain of Mask R-CNN for instance segmentation, the MaskAL framework employs uncertainty sampling to effectively select the most informative subset of unlabelled images for annotation.

Research Highlights and Methodology

MaskAL consists of a multiple-step process integrating sampling, annotation, training, and evaluation in iterative cycles. The method strategically utilizes uncertainty measures to identify images where the model shows the greatest confusion and uncertainty about predictions. The advancements in the framework stem predominantly from integrating three metrics—semantic, spatial, and occurrence uncertainties—to produce robust measures of overall image uncertainty.

The empirical validation of MaskAL was conducted using a large dataset of 16,000 images acquired from broccoli fields. The dataset encapsulates five classes, namely, healthy broccoli heads, those exhibiting damages, maturation defects, cat-eye, and head rot, with class representations significantly imbalanced toward healthy instances. The primary objective was to contrast MaskAL's efficiency against a baseline random sampling method.

Key Findings and Outcomes

The central finding of this research is that MaskAL outperforms standard random sampling in terms of annotation efficiency and model performance. Notably, with only 900 annotated images, MaskAL achieved comparative performance to random sampling's 2300 annotated images. This efficiency translates into a significant reduction in manual annotation effort. Moreover, MaskAL achieved 93.9% of the performance level of a fully annotated (14,000 images) model with merely 17.9% of the data, underscoring its potential for cost-effective training in data-intensive applications.

Implications for Future Research

This work has several practical and theoretical implications. For practical applications, particularly those involving large-scale and high-variance datasets, MaskAL provides a structured methodology to manage annotation budgets effectively. Theoretically, the introduction of composite measures of uncertainty into active learning paradigms for instance segmentation offers avenues for future research to refine uncertainty estimation, possibly through neural network architectures augmented by dynamic dropout strategies.

Future research could also explore how this framework interacts with diversity sampling mechanisms, potentially delivering a hybrid method that leverages both uncertainty and diversity metrics. Moreover, evaluation using more diverse datasets, such as those in dynamic and less controlled environments or those used in open-set learning scenarios, could further enhance understanding of MaskAL’s effectiveness in broader contexts.

Conclusion

In conclusion, the introduced MaskAL framework represents a substantial step forward in reducing annotation efforts while maintaining robust model performance in instance segmentation tasks using Mask R-CNN. The integration of advanced uncertainty measures into the active learning pipeline demonstrates potential for broad applicability and provides a compelling case for re-evaluating current methodologies employed in labor-intensive data labelling processes across various domains. As the intersection of active learning strategies and deep learning techniques evolves, frameworks like MaskAL illuminate pathways toward more efficient and scalable solutions.

PDF Markdown

Related Papers

GitHub

GitHub - pieterblok/maskal: Active learning for Mask R-CNN in Detectron2 (63 stars)

YouTube

Show All Videos