Synthetica: Large Scale Synthetic Data for Robot Perception (2410.21153v1)

Published 28 Oct 2024 in cs.CV and cs.RO

Abstract: Vision-based object detectors are a crucial basis for robotics applications as they provide valuable information about object localisation in the environment. These need to ensure high reliability in different lighting conditions, occlusions, and visual artifacts, all while running in real-time. Collecting and annotating real-world data for these networks is prohibitively time consuming and costly, especially for custom assets, such as industrial objects, making it untenable for generalization to in-the-wild scenarios. To this end, we present Synthetica, a method for large-scale synthetic data generation for training robust state estimators. This paper focuses on the task of object detection, an important problem which can serve as the front-end for most state estimation problems, such as pose estimation. Leveraging data from a photorealistic ray-tracing renderer, we scale up data generation, generating 2.7 million images, to train highly accurate real-time detection transformers. We present a collection of rendering randomization and training-time data augmentation techniques conducive to robust sim-to-real performance for vision tasks. We demonstrate state-of-the-art performance on the task of object detection while having detectors that run at 50-100Hz which is 9 times faster than the prior SOTA. We further demonstrate the usefulness of our training methodology for robotics applications by showcasing a pipeline for use in the real world with custom objects for which there do not exist prior datasets. Our work highlights the importance of scaling synthetic data generation for robust sim-to-real transfer while achieving the fastest real-time inference speeds. Videos and supplementary information can be found at this URL: https://sites.google.com/view/synthetica-vision.

Summary

The paper presents Synthetica, a method using NVIDIA Omniverse to generate 2.7 million highly randomized synthetic images for training robust robot object detectors.
Synthetica achieves state-of-the-art object detection performance on the YCB-Video benchmark and near SOTA on T-LESS, running nine times faster than prior methods.
This approach demonstrates critical scaling for sim-to-real transfer and allows efficient creation of accurate custom object detectors without extensive real-world data.

The paper "Synthetica: Large Scale Synthetic Data for Robot Perception" (2410.21153) presents a method for generating large-scale synthetic data to train object detectors for robotic applications. The core of the method is the generation of 2.7 million images using NVIDIA Omniverse Isaac Sim, a photorealistic ray-tracing renderer.

The authors utilize extensive rendering randomizations, including procedurally generated rooms, random HDRI backgrounds, diverse object configurations, varied material properties and lighting, and distractor objects. They also employ a variety of training-time data augmentations: color jittering, random background replacement, random blending, JPEG compression, shot noise, snow simulation, reflectance augmentation, histogram equalization, random perspective transformations, large-scale jittering, and PASTA. The detection transformers are based on the RT-DETR architecture with ResNet-50 and ConvNext-S backbones, optimized with AdamW and TensorRT for real-time performance.

The method achieves state-of-the-art performance on the YCB-Video object detection benchmark, outperforming prior methods in mAP and mAR. The detectors run at 50-100 Hz, nine times faster than the prior state-of-the-art. The method shows near state-of-the-art performance on the T-LESS dataset, demonstrating its generalizability. The detectors demonstrate robustness to varying confidence thresholds. A pipeline for scanning real-world objects and training custom object detectors using this synthetic data generation approach is demonstrated, emphasizing its practical applicability.

In summary, the research demonstrates that scaling synthetic data generation is critical for robust sim-to-real transfer in object detection. The presented method produces highly accurate and real-time object detectors suitable for robotics applications, and facilitates custom object detector creation without extensive real-world data.

PDF Markdown

Related Papers

Tweets

https://twitter.com/gm8xx8/status/1851105738225000660

YouTube

Show All Videos