COCONut: Modernizing COCO Segmentation (2404.08639v1)

Published 12 Apr 2024 in cs.CV

Abstract: In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segmentation systems. However, the COCO segmentation benchmark has seen comparatively slow improvement over the last decade. Originally equipped with coarse polygon annotations for thing instances, it gradually incorporated coarse superpixel annotations for stuff regions, which were subsequently heuristically amalgamated to yield panoptic segmentation annotations. These annotations, executed by different groups of raters, have resulted not only in coarse segmentation masks but also in inconsistencies between segmentation types. In this study, we undertake a comprehensive reevaluation of the COCO segmentation annotations. By enhancing the annotation quality and expanding the dataset to encompass 383K images with more than 5.18M panoptic masks, we introduce COCONut, the COCO Next Universal segmenTation dataset. COCONut harmonizes segmentation annotations across semantic, instance, and panoptic segmentation with meticulously crafted high-quality masks, and establishes a robust benchmark for all segmentation tasks. To our knowledge, COCONut stands as the inaugural large-scale universal segmentation dataset, verified by human raters. We anticipate that the release of COCONut will significantly contribute to the community's ability to assess the progress of novel neural networks.

PDF HTML Abstract

COCONut: Enhancing Image Segmentation with High-Quality, Large-Scale Datasets

Introduction to COCONut Dataset

The Common Objects in Context (COCO) dataset has been a mainstay in computer vision research, particularly in the realms of object detection and image segmentation. Despite its widespread application, the growth of machine learning capabilities has outpaced the advancements in dataset quality, particularly concerning segmentation tasks. In response to this challenge, the authors introduce COCONut (COCO Next Universal segmenTation dataset), aiming to modernize COCO's segmentation capabilities by augmenting annotation quality and dataset size. This new dataset comprises approximately 383K images with over 5.18M panoptic masks, making it a comprehensive resource for semantic, instance, and panoptic segmentation tasks. COCONut is characterized by its high-quality, human-verified segmentation masks, offering a robust benchmark that promises to facilitate significant progress in image understanding tasks.

Reevaluation and Enhancement of Existing Annotations

COCONut's creation involved a thorough reevaluation of COCO's existing annotations, identifying several issues such as over-annotations, missing labels, and inaccurate segmentations — particularly in 'stuff' classes. The authors addressed these deficiencies by redesigning the annotation pipeline, integrating modern neural networks to generate initial annotations, which were then refined through a meticulous manual editing process. This approach significantly improved the consistency and quality of the resulting segmentation masks.

Data Splits and Sources

The COCONut dataset includes images from the original COCO dataset and Objects365, offering a more extensive collection with varied and robust training and validation sets. The dataset is divided into several splits for scalability, with COCONut-S (small) including 118K images, COCONut-B (base) with 242K images, and COCONut-L (large) encompassing 358K images. This organization ensures that researchers can select a data split that aligns with their computational resources and experimental needs.

The Innovative Annotation Pipeline

Central to the success of COCONut is the novel annotation process, combining machine learning-generated proposals with human verification and refinement. This assisted-manual annotation pipeline significantly increases the efficiency of producing high-quality masks. It involves an initial automatic proposal generation, followed by a detailed human inspection and editing phase, leading to the final verification by experts. This pipeline ensures that COCONut's masks exhibit superior quality compared to their COCO counterparts, addressing intricate details and maintaining consistency across different segmentation tasks.

Dataset's Implications and Future Directions

The COCONut dataset's introduction has profound implications for both theoretical and practical aspects of computer vision research:

Benchmarking and Model Evaluation: COCONut provides a much-needed platform for evaluating advanced neural network models, especially those requiring high-quality, diverse annotations for accurate segmentation tasks.
Progress in Model Development: With its comprehensive coverage and high-quality annotations, COCONut is poised to drive advancements in segmentation model accuracy and efficiency.
Research on Annotation Efficiency: The dataset's creation process offers valuable insights into balancing automated and manual annotation methods, contributing to ongoing discussions about dataset scalability and quality.

With its release, COCONut challenges the research community to leverage this rich resource for developing more sophisticated image understanding algorithms. Future work may involve exploring more efficient annotation techniques, expanding the dataset with new classes and images, and creating models that can exploit this dataset's depth and quality to set new benchmarks in image segmentation tasks.