- The paper introduces a layered neural representation that separately models body and garment details for improved 3D reconstruction.
- The method leverages a virtual bone deformation module that enables accurate tracking of dynamic, free-form garment movements.
- Multi-layer differentiable volume rendering yields high-fidelity reconstructions, outperforming state-of-the-art baselines on challenging datasets.
ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild
The paper "ReLoo: Reconstructing Humans Dressed in Loose Garments from Monocular Video in the Wild" presents a novel methodology aimed at solving the challenge of 3D human reconstruction from monocular video inputs, specifically focusing on subjects wearing loose garments. This focus distinguishes it from prior methodologies that predominantly target tight-fitting clothes which are easier to model due to their closer adherence to body contours.
Core Contributions
The key contributions of this work are three-fold:
- Layered Neural Human Representation: The authors introduce a multi-layer representation that separately models the inner body and outer clothing layers using neural implicit functions. This decomposition improves model expressiveness and allows capturing intricate details of the garments, which are often lost in single-layer models.
- Virtual Bone Deformation Module: Unlike conventional methods that rely on skeletal deformations derived from body poses, the proposed virtual bone deformation module permits free-form movement, thereby accurately tracking dynamically deforming garments. This module applies non-hierarchical deformations that are not limited by the anatomical constraints of human skeletons, enabling the recovery of complex and dynamic garment motions.
- Multi-Layer Differentiable Volume Rendering: By extending standard volume rendering techniques to handle multiple neural layers, the authors achieve high-fidelity reconstruction of both the human body and outer clothing. This rendering approach ensures temporally consistent and detailed visual outputs.
Methodological Insights
Layered Representation
The inner body and outer garments are modeled through separate networks that predict Signed Distance Fields (SDFs) and radiance values. By decomposing the clothed human figure into these two layers, the model can handle the intricate details and larger deformations of loose garments which single-layer models struggle with.
Hybrid Deformation Modeling
The hybrid deformation strategy comprises:
- Skeletal Deformation: Uses Linear Blend Skinning (LBS) for inner body deformations, driven by SMPL model skeletal poses.
- Virtual Bone Deformation: Introduces a set of virtual bones for the garment layer, which follow non-hierarchical, free-form motions to accurately capture the movement of loose garments.
The virtual bones' positions are refined through a learning process that ensures their transformations accurately reflect the dynamics of the garment fabric under various motions.
Differentiable Volume Rendering
The authors apply a multi-round sampling process where points in the body and garment layers are evaluated and combined through a sorting and weighting mechanism, allowing the rendering of complex occlusion scenarios and ensuring coherence between layers. This facilitates a realistic reconstruction from monocular images.
Experimental Validation
The efficacy of ReLoo was demonstrated on the newly introduced MonoLoose dataset as well as the existing DynaCap dataset. By capturing humans in highly dynamic scenarios wearing loose garments and generating robust 3D reconstructions and novel view synthesis tasks, the method was shown to surpass the performance of existing baseline methods (e.g., SelfRecon, Vid2Avatar, SCARF) across various metrics.
Quantitative Results
- In 3D surface reconstruction on the MonoLoose dataset, ReLoo achieved a Chamfer distance of 1.93 cm, outperforming baselines which recorded up to 3.13 cm.
- For the novel view synthesis task, the method attained a PSNR of 29.2 and an SSIM of 0.970 on the MonoLoose dataset, distinctly higher than competing approaches.
Theoretical Implications and Future Work
Theoretically, ReLoo paves the way for more sophisticated and nuanced human modeling techniques by focusing on the independent dynamics of outer garments. This challenges the traditional reliance on skeletal deformation exclusively and opens up new avenues to model other complex deformations and interactions between the body and outer layers.
Future work could build on this foundation by exploring broader applications such as virtual try-ons in e-commerce, more complex multi-layer garment reconstruction, and integrating additional sensory inputs (e.g., depth sensors) to further enhance reconstruction fidelity. Integrating unsupervised or semi-supervised learning techniques might further reduce the reliance on annotated data, extending the model's generalizability.
Practical Implications
From a practical standpoint, this method can significantly enhance applications requiring realistic human avatars, such as in virtual reality, gaming, film production, and telepresence systems. By accommodating a wider variety of clothing and motions, ReLoo addresses a critical need in the democratization of high-quality virtual human representations from easily obtainable video data.
Conclusion
In conclusion, the ReLoo methodology offers substantial improvements in handling the unique challenges posed by loose garments in 3D human reconstruction. By employing a layered neural representation, non-hierarchical virtual bone deformations, and a sophisticated volume rendering approach, it sets a new benchmark in achieving realistic, high-fidelity reconstructions from monocular video in the wild.