SESS: Self-Ensembling Semi-Supervised 3D Object Detection (1912.11803v3)

Published 26 Dec 2019 in cs.CV

Abstract: The performance of existing point cloud-based 3D object detection methods heavily relies on large-scale high-quality 3D annotations. However, such annotations are often tedious and expensive to collect. Semi-supervised learning is a good alternative to mitigate the data annotation issue, but has remained largely unexplored in 3D object detection. Inspired by the recent success of self-ensembling technique in semi-supervised image classification task, we propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data. Furthermore, we propose three consistency losses to enforce the consistency between two sets of predicted 3D object proposals, to facilitate the learning of structure and semantic invariances of objects. Extensive experiments conducted on SUN RGB-D and ScanNet datasets demonstrate the effectiveness of SESS in both inductive and transductive semi-supervised 3D object detection. Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data. Our code is available at https://github.com/Na-Z/sess.

Citations (116)

View on Semantic Scholar

Summary

The paper presents a self-ensembling semi-supervised framework that reduces the need for extensive labeled point cloud data while maintaining competitive detection performance.
It introduces novel point cloud perturbations and tailored consistency losses (center-, class-, and size-aware) to align student and teacher predictions.
Experimental results on SUN RGB-D and ScanNetV2 show up to 28.09% mAP improvement with partial labeling, highlighting significant practical benefits.

Overview of SESS: Self-Ensembling Semi-Supervised 3D Object Detection

The paper explores the challenge of 3D object detection in point clouds, addressing the significant dependency on extensive labeled data typically required by existing methods. This paper presents SESS, a self-ensembling semi-supervised framework aimed at reducing such dependency by leveraging both labeled and unlabeled data.

Key Contributions

Semi-Supervised Approach: SESS is a semi-supervised 3D object detection framework developed on the basis of the self-ensembling Mean Teacher paradigm. It stands out by not relying heavily on vast annotated datasets, making it a more efficient alternative for 3D object detection tasks.
Perturbation Scheme: Unlike image-based perturbations, the framework introduces a novel set of perturbations tailored for point clouds, including random sub-sampling and stochastic transformations such as random flipping, rotation, and scaling.
Consistency Losses: The framework employs three carefully defined consistency losses—center-aware, class-aware, and size-aware—to align outputs from student and teacher networks, enhancing the network's generalization capabilities.

Experimental Results

The framework demonstrated empirically competitive performance compared to fully-supervised methods by employing only 50% of the labeled data. Conducted on SUN RGB-D and ScanNetV2 datasets encompassing indoor scenes, the SESS showed significant improvements in mAP metrics over traditional methods operating under similar conditions. Notably, an average mAP improvement of up to 28.09% was showcased when only 10% of labeled data was available, underpinning SESS's efficacy in semi-supervised learning scenarios.

Moreover, even with full utilization of labeled data, SESS outperformed some state-of-the-art methods, implying the complementary nature of the designed perturbations and consistency constraints with the established detection models.

Theoretical and Practical Implications

The SESS framework proposes an innovative direction for research in semi-supervised learning by adapting a widely successful image classification strategy, self-ensembling, to the domain of 3D object detection. The framework’s model-agnostic nature makes it extensible to other architectures, potentially facilitating adoption across various applications like autonomous driving or augmented reality.

From a practical perspective, SESS reduces the cost of data labeling for developing reliable 3D detectors, which is an often resource-intensive process. The improved generalization capability and robustness to perturbations of the detection networks imbue greater versatility and reliability across different real-world scenarios, where procuring large labeled datasets is not feasible.

Future Directions

Future research could investigate extending the applicability of SESS to other domains, such as outdoor scenes or dynamic environments, requiring more adaptive and responsive detection systems. Additionally, integrating additional modalities beyond raw point clouds, such as RGB information, might further enhance detection accuracy and robustness. The exploration of more sophisticated perturbation techniques and consistency measures can drive further enhancements in semi-supervised learning within 3D domains.

In summary, this work introduces a forward-leaning semi-supervised framework that holds potential for reducing annotation costs while maintaining strong detection performance, setting a precedent for further innovations in 3D object detection methodologies.

PDF Markdown

Related Papers

GitHub

GitHub - Na-Z/sess: [CVPR2020 Oral] SESS: Self-Ensembling Semi-Supervised 3D Object Detection (137 stars)

YouTube

Show All Videos