Nerfies: Deformable Neural Radiance Fields (2011.12948v5)

Published 25 Nov 2020 in cs.CV and cs.GR

Abstract: We present the first method capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones. Our approach augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. We observe that these NeRF-like deformation fields are prone to local minima, and propose a coarse-to-fine optimization method for coordinate-based models that allows for more robust optimization. By adapting principles from geometry processing and physical simulation to NeRF-like models, we propose an elastic regularization of the deformation field that further improves robustness. We show that our method can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints, which we dub "nerfies." We evaluate our method by collecting time-synchronized data using a rig with two mobile phones, yielding train/validation images of the same pose at different viewpoints. We show that our method faithfully reconstructs non-rigidly deforming scenes and reproduces unseen views with high fidelity.

Citations (994)

View on Semantic Scholar

Summary

The paper presents a deformable neural radiance field that warps observed points into a canonical 5D space using per-image latent codes.
The paper introduces elastic regularization to penalize deviations from rigid transformations, effectively mitigating local minima during optimization.
The paper employs a coarse-to-fine optimization strategy that progressively refines positional encodings for enhanced photorealistic 3D reconstruction of dynamic scenes.

Deformable Neural Radiance Fields: A Comprehensive Analysis

The paper "Nerfies: Deformable Neural Radiance Fields" presents a novel method for photorealistically reconstructing deformable scenes using casually captured photos and videos. The approach generalizes Neural Radiance Fields (NeRF) to handle dynamic and deformable objects by introducing a continuous volumetric deformation field. This deformation field warps each observed point into a canonical 5D NeRF. To address challenges associated with local minima common in NeRF-like models, the authors propose a coarse-to-fine optimization strategy as well as an elastic regularization of the deformation field.

Key Contributions

Deformation Field per Observation: The authors extend NeRF by optimizing a deformation field for every frame or observation. The deformation field is represented as a multi-layer perceptron (MLP), similar to the radiance field in NeRF, and is conditioned on a per-image learned latent code. This allows for adaptive deformation between observations.
Elastic Regularization: To mitigate the issue of local minima and over-fitting, the paper introduces an elastic energy term that penalizes deviations from rigid transformations. This is akin to regularization terms used in mesh fitting and geometry processing.
Coarse-to-Fine Optimization: The authors propose a novel coarse-to-fine optimization scheme that progressively introduces higher-frequency positional encodings in the deformation field network. This strategy helps in initially learning smooth deformations before refining them.
Robustness to Dynamic Scenes: The method allows for photorealistic renderings of dynamic scenes, which the authors demonstrate through the reconstruction of “nerfies” — dynamic 3D portraits captured with a mobile phone. This is validated against a new dataset and compared with several state-of-the-art methods.

Experimental Evaluation

The method is evaluated on a newly created dataset that captures both quasi-static and dynamic scenes. The evaluation is conducted using a validation rig with two synchronized mobile phones to provide ground truth for comparison. The results indicate that the proposed method outperforms several baselines in terms of LPIPS and MS-SSIM metrics, even though the PSNR results are modest. This suggests that LPIPS and MS-SSIM might be better suited for evaluating the visual quality of dynamic scene reconstructions.

Implications and Future Work

The implications of this work are both practical and theoretical. Practically, the ability to create dynamic, photorealistic 3D models using casual captures dramatically lowers the barrier to high-quality 3D modeling, making it accessible for consumer devices such as mobile phones. This could revolutionize applications in VR/AR, animation, and even online communication.

Theoretically, the introduction of deformation fields and the proposed optimization strategies extend the capabilities of NeRF-based models and offer new directions for future research. Key areas for future work include addressing the limitations related to topological changes, rapid motion, and orientation flips. Furthermore, integrating explicit modeling of static regions could enhance the robustness and accuracy of scene reconstructions, particularly for quasi-static settings.

Conclusion

This paper contributes significantly to the field by introducing a method that handles non-rigid, deformable scenes with high fidelity and realism. Through the integration of deformation fields, elastic regularization, and coarse-to-fine optimization, the authors provide a versatile and effective solution. This work lays the groundwork for further advances in dynamic scene reconstruction and opens up numerous practical applications in everyday consumer technology.

The possibilities for future improvements and extensions are vast, with potential impacts on how we interact with and create digital content. This work stands as a robust advancement in neural radiance fields, substantially enhancing their applicability to real-world, dynamic scenarios.

PDF Markdown

Related Papers

Tweets

https://twitter.com/nrehiew_/status/1875766386469040437