DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation

Published 30 Nov 2023 in cs.CV and cs.RO | (2312.00583v2)

Abstract: Teaching robots to fold, drape, or reposition deformable objects such as cloth will unlock a variety of automation applications. While remarkable progress has been made for rigid object manipulation, manipulating deformable objects poses unique challenges, including frequent occlusions, infinite-dimensional state spaces and complex dynamics. Just as object pose estimation and tracking have aided robots for rigid manipulation, dense 3D tracking (scene flow) of highly deformable objects will enable new applications in robotics while aiding existing approaches, such as imitation learning or creating digital twins with real2sim transfer. We propose DeformGS, an approach to recover scene flow in highly deformable scenes, using simultaneous video captures of a dynamic scene from multiple cameras. DeformGS builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel-view synthesis. DeformGS learns a deformation function to project a set of Gaussians with canonical properties into world space. The deformation function uses a neural-voxel encoding and a multilayer perceptron (MLP) to infer Gaussian position, rotation, and a shadow scalar. We enforce physics-inspired regularization terms based on conservation of momentum and isometry, which leads to trajectories with smaller trajectory errors. We also leverage existing foundation models SAM and XMEM to produce noisy masks, and learn a per-Gaussian mask for better physics-inspired regularization. DeformGS achieves high-quality 3D tracking on highly deformable scenes with shadows and occlusions. In experiments, DeformGS improves 3D tracking by an average of 55.8% compared to the state-of-the-art. With sufficient texture, DeformGS achieves a median tracking error of 3.3 mm on a cloth of 1.5 x 1.5 m in area. Website: https://deformgs.github.io

Abstract PDF HTML Upgrade to Chat

Authors (11)

Citations (14)

View on Semantic Scholar

Summary

The paper introduces MD-Splatting, leveraging canonical neural voxel encoding and a deformation MLP with physics-inspired regularization to enhance 3D tracking by 23.9%.
The method learns a canonical state of Gaussian elements and a deformation function to reconstruct dynamic views and accurately track complex geometries.
Experimental results on six synthetic cloth-dynamic scenes demonstrate significant accuracy gains, with implications for improved robotics and augmented reality applications.

Introduction

Three-dimensional tracking and novel view synthesis of highly deformable objects, such as clothing or soft materials, pose significant challenges in computer vision. These challenges stem from the complex nature of deformations paired with variable lighting and the presence of shadows and occlusions. A new method, MD-Splatting, pushes the boundaries of current technologies by delivering improved 3D tracking while also supporting the synthesis of dynamic new views.

Methodology

MD-Splatting operates by learning a canonical state of Gaussian elements and a deformation function that projects them into metric space. This approach enables reconstruction of not only the original views but also tracking the 3D geometry of objects despite complexities in the scene. The methodology is grounded in the following components:

Canonical Neural Voxel Encoding: MD-Splatting uses a neural-voxel encoding to ensure the deformation function can capture complexities in scene deformations effectively.
Deformation MLP (Multilayer Perceptron): This neural network infers Gaussian properties such as position, rotation, and a shadow scalar based on local encoding.
Physics-Inspired Regularization: Implementing regularization based on principles like local rigidity, conservation of momentum, and isometry, strengthens the plausibility of inferred trajectories beyond relying on photometric consistency.

Experiments

Extensive evaluations were conducted using six photo-realistic synthetic scenes created with diverse cloth dynamics to test the capabilities of MD-Splatting. The results demonstrate MD-Splatting's ability to generate accurate 3D tracking data, with an impressive average improvement of 23.9% over existing state-of-the-art methods. This performance boost was notably significant in scenarios with more complex shadows and textures.

Implications and Applications

The advancements represented by MD-Splatting promise to enable new applications in augmented reality, robotics, and AI, opening the door for more interactive and immersive experiences. For instance, robots could benefit from more accurate tracking for manipulation tasks, and augmented reality systems could offer improved interaction with dynamically changing environments. Furthermore, the research contributes a novel dataset comprised of six synthetic scenes that will serve as a valuable resource for continued advancements in this area of study.

The methodological contributions and empirical outcomes suggest that MD-Splatting is an efficacious approach for handling complex dynamic scenes and has set a new benchmark for the concurrent tasks of 3D tracking and novel view synthesis of highly deformable scenes.

Markdown Report Issue