ClearPose: Large-scale Transparent Object Dataset and Benchmark (2203.03890v2)

Published 8 Mar 2022 in cs.CV and cs.RO

Abstract: Transparent objects are ubiquitous in household settings and pose distinct challenges for visual sensing and perception systems. The optical properties of transparent objects leave conventional 3D sensors alone unreliable for object depth and pose estimation. These challenges are highlighted by the shortage of large-scale RGB-Depth datasets focusing on transparent objects in real-world settings. In this work, we contribute a large-scale real-world RGB-Depth transparent object dataset named ClearPose to serve as a benchmark dataset for segmentation, scene-level depth completion and object-centric pose estimation tasks. The ClearPose dataset contains over 350K labeled real-world RGB-Depth frames and 5M instance annotations covering 63 household objects. The dataset includes object categories commonly used in daily life under various lighting and occluding conditions as well as challenging test scenarios such as cases of occlusion by opaque or translucent objects, non-planar orientations, presence of liquids, etc. We benchmark several state-of-the-art depth completion and object pose estimation deep neural networks on ClearPose. The dataset and benchmarking source code is available at https://github.com/opipari/ClearPose.

Citations (33)

View on Semantic Scholar

Summary

The paper introduces the ClearPose dataset with over 350K labeled RGB-D frames and nearly 5M annotations to overcome transparent object challenges.
It utilizes a structured annotation pipeline with visual SLAM, eliminating the need for fiducial markers and manual alignment during pose estimation.
Benchmarking reveals that while methods like TransCG excel in depth completion, state-of-the-art pose estimators struggle under complex lighting and occlusion conditions.

ClearPose: Large-scale Transparent Object Dataset and Benchmark

The paper introduces the ClearPose dataset, an extensive RGB-D dataset tailored for the perception challenges posed by transparent objects. This dataset addresses the notable limitations of existing datasets in handling transparent objects, which include scale, diversity of object categories, and variations in scene complexity and lighting conditions. ClearPose provides over 350,000 labeled real-world RGB-D frames and approximately 5 million instance annotations, encompassing 63 household objects. The motivation for this dataset arises from the inadequacies of traditional 3D sensors when estimating the depth and pose of transparent objects due to their unique optical properties.

Dataset Description and Methodology

ClearPose aims to fill the gap in large-scale, real-world datasets for transparent object perception by introducing numerous challenging scenarios. These include configurations with occlusions, non-planar orientations, and varying lighting, which are captured across multiple diverse household objects. The dataset is collected using a structured methodology involving an Intel RealSense L515 camera, capturing scenes under different lighting conditions. A vital feature of ClearPose is its annotation pipeline, ProgressLabeller, which leverages visual SLAM for accurate camera pose estimation and efficient annotation of object poses in RGB-D videos. This method obviates the need for fiducial markers and manual object alignment, common methodologies in earlier datasets.

Benchmarking and Analysis

In conjunction with the dataset, the authors benchmark several state-of-the-art depth completion and object pose estimation algorithms to assess their performance on the dataset's challenging scenarios. Two notable depth completion methods are analyzed: ImplicitDepth and TransCG. TransCG moderately outperforms ImplicitDepth across various test scenarios, highlighting the competitive edge of methods built upon DFNet architecture over voxel-based approaches.

For object pose estimation, Xu et al. and FFB6D serve as the baseline algorithms. Notably, the FFB6D method reveals significant performance drops when trained and tested on raw or completed depth compared to ground-truth depth. This underscores the difficulty current models face when dealing with the sporadic and distorted depth information typically presented by transparent objects. Qualitative evaluation suggests that FFB6D exhibits comparable efficacy to Xu et al. in general scenarios but falters in complex environments involving opaque distractors and liquid-filled transparent objects.

Implications and Future Work

The inclusion of diverse object categories with challenges such as transparency and translucency, as well as varying backgrounds and lighting conditions, represents a significant contribution to the field of computer vision, specifically in the domain of transparent object perception. The dataset's implications extend to advancing robotic manipulation tasks, helping refine algorithms for depth completion, and improving object pose estimation frameworks. Moreover, the introduction of multi-layer appearance, where transparent and translucent objects coexist and overlap, invites further exploration into new segmentation and detection paradigms.

Future research, leveraging ClearPose, could explore RGB-based estimators to bypass depth-related inaccuracies or develop novel methods focusing on category-level pose estimation to handle symmetrical and translucent variations. Additionally, neural rendering techniques may advance in providing contextual depth predictions under varying environmental conditions. The dataset's accessibility will undoubtedly catalyze further research into these areas, promoting advancements in AI-driven perception tasks.

PDF Markdown

Related Papers

GitHub

GitHub - opipari/ClearPose: https://progress.eecs.umich.edu/projects/clearpose/ (39 stars)

YouTube

Show All Videos