Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 60 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 14 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 93 tok/s Pro

Kimi K2 156 tok/s Pro

GPT OSS 120B 441 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset Generation (2401.02281v2)

Published 4 Jan 2024 in cs.CV

Abstract: We introduce Physically Enhanced Gaussian Splatting Simulation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting. Environment and object representations can be easily obtained using commodity cameras to reconstruct with Gaussian Splatting. <i>PEGASUS</i> allows the composition of new scenes by merging the respective underlying Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the simulation of natural object placement within a scene through interaction between meshes extracted for the objects and the environment. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data generated by PEGASUS enables pose estimation networks to successfully transfer from synthetic data to real-world data. Moreover, we introduce the Ramen dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible with PEGASUS.

References (2)

Citations (6)

View on Semantic Scholar

Summary

The paper presents PEGASUS, a simulation system that merges 3D Gaussian Splatting with a physics engine to create realistic 6DoF pose datasets for AI training.
It demonstrates practical use with a UR5 robot, achieving precise pick-and-place operations using generated RGB images, depth maps, and semantic masks.
The approach minimizes the domain gap by capturing real objects with commodity cameras, enabling neural networks like DOPE to generalize effectively from synthetic to real-world data.

Introduction to PEGASUS

Researchers and industry professionals are continuously exploring innovative techniques to enhance machine perception capabilities. A core component of such advancements is the availability of accurate and robust datasets. With this objective, the new simulation system called PEGASUS (Physically Enhanced Gaussian Splatting Simulation System) is introduced. It aims to generate high-quality datasets for 6 Degrees of Freedom (6DoF) object pose estimation. This system uses a 3D Gaussian Splatting method for reconstruction and leverages a physics engine to simulate natural object interactions within a scene, which is crucial for creating realistic and diverse datasets.

Generating Datasets with PEGASUS

The need for a diverse and realistic dataset is paramount in training AI models for real-world deployment. PEGASUS allows the creation of scenes by merging Gaussian Splatting point clouds derived from both environments and objects. This merging is complemented by physical interactions simulated through a physics engine, resulting in dynamic and static datasets that reflect real-world conditions. Furthermore, the system is capable of generating a range of data points including RGB images, depth maps, and semantic masks. An important proof of concept is the Ramen dataset, introduced within the paper, which includes 30 different Japanese cup noodles and provides foundational material for PEGASUS.

Novel Approach to Minimize the Domain Gap

The transition from synthetic data to real-world applications has often been hindered by a domain gap due to the lack of realism in synthetic datasets. PEGASUS tackles this issue with photorealistic rendering, allowing neural networks to better generalize from synthetic to real-world data. The system doesn't require time-consuming model creation for each object and environment; instead, it relies on scanning real objects and environments with commodity cameras, significantly simplifying the process of asset acquisition. When trained on PEGASUS-generated datasets, networks like Deep Object Pose (DOPE) have shown successful transfer from synthetic to real-world tasks, confirming the system's utility.

Practical Applications and Further Development

The efficacy of PEGASUS was demonstrated through experiments involving a UR5 robot, proving the practical applicability of the generated dataset. The robot performed consistent and precise pick-and-place operations, indicating the high quality of the training data. Acknowledging the system's limitations, such as the absence of realistic shadow rendering, further enhancements are suggested. These include incorporating shadow maps and considering re-lighting effects. Additionally, expanding the variety of captured environments and objects would make PEGASUS an even more powerful tool for dataset generation. Future enhancements might also explore the use of LIDAR for capturing complex environments, adding another layer of depth and realism to the synthetic datasets.

In conclusion, PEGASUS stands out as a promising approach to create customized and lifelike datasets for AI training in object pose estimation, paving the way for more seamless and effective synthetic to real-world data transitions.