T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects (1701.05498v1)

Published 19 Jan 2017 in cs.CV, cs.AI, and cs.RO

Abstract: We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.

Authors (6)

Tomas Hodan (22 papers)
Pavel Haluza (1 paper)
Stepan Obdrzalek (1 paper)
Jiri Matas (133 papers)
Manolis Lourakis (1 paper)
Xenophon Zabulis (2 papers)

Citations (458)

View on Semantic Scholar

Summary

The paper presents the T-LESS dataset, offering a challenging benchmark with over 39,000 training images and accurate ground truth for 6D pose estimation.
It utilizes three synchronized sensors and dual 3D models per object to replicate complex industrial environments with minimal visual cues.
Initial evaluations reveal that current techniques struggle with occlusion and texture-less objects, spurring the need for advanced pose estimation algorithms.

An Academic Overview of the T-LESS Dataset for 6D Pose Estimation in Texture-less Objects

The paper presents T-LESS, an innovative RGB-D dataset specifically crafted for the challenging task of 6D pose estimation involving texture-less objects. The authors have meticulously designed this dataset to address the unique difficulties posed by texture-less rigid objects often encountered in industrial applications. Comprising thirty objects with minimal textural cues, the dataset serves as a critical tool for evaluating and advancing 6D pose estimation methodologies.

Key Features of the T-LESS Dataset

The T-LESS dataset distinguishes itself through several unique characteristics:

Object Diversity: It includes thirty industrially relevant objects, notable for their lack of distinct texture, color, or reflective attributes. This feature is critical in replicating real-world conditions where objects often bear minimal visual discriminators.
Camera and Sensor Setup: Images were captured using three synchronized sensors, including a structured-light and a time-of-flight RGB-D sensor, alongside a high-resolution RGB camera. This setup offers a comprehensive view for model training and evaluation.
Data Complexity and Volume: The dataset offers a substantial number of images, with approximately 39,000 training images and 10,000 test images per sensor type. This volume ensures robust training opportunities and an extensive evaluation scale for assessing pose estimation algorithms.
3D Model Provision: Each object in the T-LESS dataset is paired with both CAD models and semi-automatically reconstructed models. This dual provision allows for a versatile application of algorithms that may rely on different types of model data.
Ground Truth Annotations: Meticulously calibrated, the dataset provides precise ground truth annotations which are essential for the quantitative evaluation of 6D pose estimation methods.

Initial Evaluation and Implications

Initial results from evaluating recent 6D localization methods on T-LESS suggest that the challenges presented by the dataset are not sufficiently addressed by current methodologies. Specifically, object recall rates are notably impacted by mutual object resemblance and occlusions—a common occurrence in intricate industrial environments.

The dataset's intentional complexity in object arrangement and scene clutter provides a fertile ground for developing more sophisticated algorithms that can navigate these complexities. Furthermore, such comprehensive datasets nurture advancements in the field by highlighting key areas where existing techniques may be inadequate.

Implications for Future Research and Applications

The introduction of T-LESS is poised to influence both theoretical research and practical applications significantly:

Advancement in Algorithm Development: The dataset underscores the limitations of existing techniques in dealing with occlusion and lack of texture, thereby motivating researchers to innovate improved models and methods.
Robotics and Machine Vision: In robotics, particularly for industrial automation, precise object localization is paramount. T-LESS has the potential to contribute significantly to this domain by improving the accuracy and reliability of object detection and manipulation tasks.
Augmented Reality (AR): For AR systems, the accurate alignment of digital content over physical objects can be enhanced by robust 6D pose estimation, as facilitated by datasets like T-LESS.

Future Directions

Potential developments stemming from this research include leveraging machine learning techniques, particularly deep learning, to extract more meaningful features from minimal visual cues. Additionally, investigating the use of hybrid sensor data could yield breakthroughs in accurately estimating poses in complex, real-world scenarios.

In conclusion, T-LESS represents a pivotal contribution to the object pose estimation domain, offering a well-structured, complex dataset that challenges existing paradigms and pushes the boundaries of what current systems can achieve. As the field progresses, the data and insights derived from engagements with T-LESS will undoubtedly foster significant advancements.

PDF Markdown