Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ (1803.09693v1)

Published 26 Mar 2018 in cs.CV

Abstract: Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of Polygon-RNN to produce polygonal annotations of objects interactively using humans-in-the-loop. We introduce several important improvements to the model: 1) we design a new CNN encoder architecture, 2) show how to effectively train the model with Reinforcement Learning, and 3) significantly increase the output resolution using a Graph Neural Network, allowing the model to accurately annotate high-resolution objects in images. Extensive evaluation on the Cityscapes dataset shows that our model, which we refer to as Polygon-RNN++, significantly outperforms the original model in both automatic (10% absolute and 16% relative improvement in mean IoU) and interactive modes (requiring 50% fewer clicks by annotators). We further analyze the cross-domain scenario in which our model is trained on one dataset, and used out of the box on datasets from varying domains. The results show that Polygon-RNN++ exhibits powerful generalization capabilities, achieving significant improvements over existing pixel-wise methods. Using simple online fine-tuning we further achieve a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice.

Authors (4)

David Acuna (26 papers)
Huan Ling (23 papers)
Amlan Kar (19 papers)
Sanja Fidler (184 papers)

Citations (387)

View on Semantic Scholar

Summary

An Analysis of Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++

The paper presents a detailed examination of Polygon-RNN++, an enhanced model geared towards efficient interactive annotation of segmentation datasets through the implementation of polygonal contours for image segmentation. Focusing largely on its application within the Cityscapes dataset, the paper provides comprehensive quantitative and qualitative evaluations, shedding light on both automatic and interactive annotation modes.

Model Overview and Training

Polygon-RNN++ builds upon prior models by introducing a reinforcement learning (RL) approach to refine its annotation accuracy. During training, the model displayed enhanced intersection-over-union (IoU) scores through direct IoU optimization. An intriguing facet is its approach to managing self-intersections in predicted polygons, noting a reduction in such occurrences. This is juxtaposed against a maximum likelihood estimation (MLE) model wherein Polygon-RNN++ demonstrated superior IoU scores.

Evaluator Network and Decoding Strategies

A notable aspect of the model is the integration of an evaluator network, which plays a pivotal role in the decoding process. The paper compares various decoding strategies, highlighting that beam search, when combined with the evaluator network, yields a noticeable enhancement in IoU performance over greedy decoding approaches. The evaluator network exhibited a marked improvement, especially when employing multiple sequence evaluations.

Output Resolution and Graph-based Neural Networks

The model uses a gated graph neural network (GGNN) to upscale the output resolution, maintaining efficiency with stable performance across varying parameters. The choice of resolution, 112x112, allowed balanced accuracy and computational efficiency. The minimal performance gain (0.02%) from increasing resolution underscores the importance of efficient processing in training.

Automatic and Interactive Annotation Performance

Polygon-RNN++ was evaluated extensively on the Cityscapes dataset under automatic and interactive modes. In automatic mode, the model achieved substantive Average Precision (AP) scores, indicating its competence in full-image instance-level segmentation. However, a limitation recognized was the model's inherent design to output single polygons per bounding box, affecting its performance when handling occluded objects.

Interactive mode assessments showed that the majority of predicted polygons required minimal corrections, often resolved with just five user clicks. This highlights the model’s potential for easing annotation workloads, a vital consideration in large-scale dataset creation.

Cross-domain and Adaptive Evaluations

Demonstrating versatility, Polygon-RNN++ was tested across various domains beyond Cityscapes without additional fine-tuning, including general scenic, aerial, and medical datasets. The model maintained reasonable performance, underscoring its robustness across diverse visual conditions. Complementary to this, an online fine-tuning capability was introduced, offering adaptive improvements when transferring the model to disparate datasets.

Reflecting on the Model's Implications

The paper presents Polygon-RNN++ as a significant stride in segmentation annotation efficiency. Its RL-based training process, use of evaluator networks, and adjustable output resolution strategy provide valuable insights into designing models capable of handling complex image segmentation tasks. The implementation of such models could mitigate manual annotation burdens, fostering more extensive application in domains requiring rapid dataset generation.

Future explorations could delve into enhancing multi-component object mask predictions, potentially overcoming the limitation of single polygon outputs per bounding box. Moreover, leveraging this model’s adaptability through further domain-specific optimizations could catalyze progress within varied fields reliant on precise image annotations, including autonomous vehicles, medical imaging, and geographic information systems.

PDF Markdown

Related Papers

YouTube

Show All Videos