MSeg: A Composite Dataset for Multi-domain Semantic Segmentation (2112.13762v1)

Published 27 Dec 2021 in cs.CV

Abstract: We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains. A naive merge of the constituent datasets yields poor performance due to inconsistent taxonomies and annotation practices. We reconcile the taxonomies and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images, requiring more than 1.34 years of collective annotator effort. The resulting composite dataset enables training a single semantic segmentation model that functions effectively across domains and generalizes to datasets that were not seen during training. We adopt zero-shot cross-dataset transfer as a benchmark to systematically evaluate a model's robustness and show that MSeg training yields substantially more robust models in comparison to training on individual datasets or naive mixing of datasets without the presented contributions. A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training. We evaluate our models in the 2020 Robust Vision Challenge (RVC) as an extreme generalization experiment. MSeg training sets include only three of the seven datasets in the RVC; more importantly, the evaluation taxonomy of RVC is different and more detailed. Surprisingly, our model shows competitive performance and ranks second. To evaluate how close we are to the grand aim of robust, efficient, and complete scene understanding, we go beyond semantic segmentation by training instance segmentation and panoptic segmentation models using our dataset. Moreover, we also evaluate various engineering design decisions and metrics, including resolution and computational efficiency. Although our models are far from this grand aim, our comprehensive evaluation is crucial for progress. We share all the models and code with the community.

Citations (184)

View on Semantic Scholar

Summary

The paper introduces a unified dataset that combines seven semantic segmentation datasets into one standardized taxonomy of 194 categories.
It details an extensive relabeling process of over 220,000 object masks to enable robust zero-shot cross-dataset transfer and competitive leaderboard rankings.
The study demonstrates that models trained on MSeg achieve state-of-the-art results under diverse evaluation conditions, advancing domain-general vision solutions.

Overview of MSeg: A Composite Dataset for Multi-domain Semantic Segmentation

The paper "MSeg: A Composite Dataset for Multi-domain Semantic Segmentation" presents an innovative approach to achieving semantic segmentation across multiple domains by creating a unified dataset known as MSeg. The researchers identify a significant challenge within semantic segmentation tasks: the inconsistency in taxonomies and annotation practices across existing datasets. This inconsistency obstructs the creation of a single model capable of robustly performing across diverse environments.

Dataset Composition and Taxonomic Reconciliation

To address these challenges, the authors construct MSeg, a composite dataset built from seven existing semantic segmentation datasets: COCO, ADE20K, Mapillary, IDD, BDD, Cityscapes, and SUN RGB-D. The naive merge of these datasets would result in an unwieldy classification system consisting of 316 classes, compounded by substantial internal inconsistencies. Consequently, the authors propose a reconciliation process involving merging and splitting classes to derive a unified, flat taxonomy comprising 194 categories. This process necessitated extensive manual relabeling of over 220,000 object masks in more than 80,000 images, accomplished through significant annotator efforts.

Experimental Evaluation and Zero-shot Transfer

The strength of MSeg lies in its ability to enable models trained on this composite dataset to generalize effectively to previously unseen data. The paper benchmarks models using zero-shot cross-dataset transfer tests, comparing the robustness of models trained on MSeg to those trained on individual datasets. The paper demonstrates that models trained on MSeg not only outperform those formed from singular datasets but also achieve competitive performance against oracle models that utilize test datasets during training. Notably, a model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation.

Robust Vision Challenge Participation

An extreme test of the MSeg models was presented through participation in the 2020 Robust Vision Challenge (RVC), where MSeg-trained models were evaluated under strict generalization conditions, using datasets not included in MSeg. Despite the differing evaluation taxonomy and exclusion of four datasets from training, MSeg-trained models performed competitively, ranking second in the challenge.

Implications and Future Directions

The paper's contributions significant advance the pursuit of robust, domain-independent semantic segmentation systems. The authors highlight crucial numerical performance improvements and underscore the inherent difficulty associated with taxonomic reconciliation across diverse datasets. Practically, MSeg provides a valuable resource for practitioners who require domain-free models in deployment scenarios. The theoretical implications are equally profound, suggesting broader applications in scene understanding tasks beyond mere semantic segmentation, including instance and panoptic segmentation, which were also evaluated using MSeg.

Conclusion

In summary, "MSeg: A Composite Dataset for Multi-domain Semantic Segmentation" outlines a comprehensive framework for overcoming dataset-specific limitations in semantic segmentation. By creating a unified taxonomy and resolving annotation discrepancies, the authors present a pathway for future research to realize robust, domain-general vision models, applicable across a multitude of environments. The MSeg dataset, accessible to the community, provides a foundational asset upon which advanced semantic segmentation models can be developed and studied further.

PDF Markdown