Improving Semantic Segmentation via Video Propagation and Label Relaxation (1812.01593v3)

Published 4 Dec 2018 in cs.CV, cs.AI, cs.MM, and cs.RO

Abstract: Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate models. In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks. We exploit video prediction models' ability to predict future frames in order to also predict future labels. A joint propagation strategy is also proposed to alleviate mis-alignments in synthesized samples. We demonstrate that training segmentation models on datasets augmented by the synthesized samples leads to significant improvements in accuracy. Furthermore, we introduce a novel boundary label relaxation technique that makes training robust to annotation noise and propagation artifacts along object boundaries. Our proposed methods achieve state-of-the-art mIoUs of 83.5% on Cityscapes and 82.9% on CamVid. Our single model, without model ensembles, achieves 72.8% mIoU on the KITTI semantic segmentation test set, which surpasses the winning entry of the ROB challenge 2018. Our code and videos can be found at https://nv-adlr.github.io/publication/2018-Segmentation.

Citations (371)

View on Semantic Scholar

Summary

The paper presents a novel method that combines video propagation with label relaxation to enhance semantic segmentation performance.
It introduces an efficient data augmentation approach that mitigates sparse annotations by propagating labels across video frames.
The technique demonstrates improved accuracy and robustness, offering practical benefits for real-world segmentation applications.

Paper Formatting Guidelines for CVPR Proceedings

This paper serves as a guide for authors preparing their manuscripts for submission to the Conference on Computer Vision and Pattern Recognition (CVPR), specifically focusing on the formatting requirements established by the IEEE Computer Society Press. Such guidelines are critical for maintaining uniformity in submissions, which facilitates the review process and enhances the readability of the published proceedings.

Content Overview

The manuscript addresses multiple aspects of paper preparation, including language, submission policies, paper length, and formatting prescriptions. It reinforces the necessity for manuscripts to be composed in English and discusses the policy implications of dual submissions. The document outlines strict page limitations, excluding references from the page count, and emphasizes that non-compliant submissions will not be reviewed, underlining the importance of adherence to these limitations.

A notable requirement outlined is the inclusion of a printed ruler in submissions made through the \LaTeX\ system. This ruler assists reviewers by providing a standardized reference for feedback. The document instructs authors using alternative document preparation systems to implement an equivalent feature.

Structural and Formatting Specifications

The paper provides comprehensive instructions on structuring the content within specified margins and font styles. Authors must create manuscripts in a two-column format, with precise dimensions for the overall text area and column separations. It further clarifies the hierarchical organization of headings and advises on the typographical styles required for each section, ensuring clarity and consistency across manuscripts.

The proper use of mathematical expressions and figures is elaborated, with recommendations on numbering equations and formatting figures to avoid misalignment with text. The necessity of single-spacing the text, including equations and figures, is highlighted to maintain a compact presentation.

Review Process and Anonymization

An essential consideration within the paper is the discussion on anonymization for blind review. The document provides guidelines on how to cite previous work without inadvertently revealing the authorship of the current submission, thereby preserving the integrity of the blind review process.

Furthermore, the guidelines clarify the procedure for listing references, advising authors to number citations in the order they appear and to follow specific formats for multi-author papers.

Practical Implications

For practitioners and researchers preparing their submissions for CVPR, adherence to these formatting instructions is crucial for ensuring their work is eligible for review. The guidelines aim to streamline the review process, thereby reducing administrative overhead and allowing reviewers to focus on the substantive aspects of the research.

From a theoretical perspective, the formatting guidelines reflect broader academic standards in computer science conferences, serving as a template for authors contributing to various domains within the field. Importantly, these guidelines underscore the necessity of precision and consistency in scholarly communication.

Conclusion

In conclusion, the paper provides a detailed framework that authors must follow to conform to the CVPR 2019 submission requirements. While the technical nature of the document might appear restrictive, it offers a standardized approach that benefits both authors and reviewers. These guidelines act as a cornerstone for the preparation of high-quality technical contributions, facilitating both the dissemination and advancement of knowledge within the computer vision community. As the field of artificial intelligence and computer vision continues to evolve, such standards will likely adapt to accommodate emerging research methodologies and presentation paradigms.

PDF Markdown

Related Papers

GitHub

Improving Semantic Segmentation via Video Propagation and Label Relaxation - NVIDIA ADLR

YouTube

Show All Videos