HumanRF: High-Fidelity Neural Radiance Fields for Humans in Motion (2305.06356v2)

Published 10 May 2023 in cs.CV, cs.GR, and cs.LG

Abstract: Representing human performance at high-fidelity is an essential building block in diverse applications, such as film production, computer games or videoconferencing. To close the gap to production-level quality, we introduce HumanRF, a 4D dynamic neural scene representation that captures full-body appearance in motion from multi-view video input, and enables playback from novel, unseen viewpoints. Our novel representation acts as a dynamic video encoding that captures fine details at high compression rates by factorizing space-time into a temporal matrix-vector decomposition. This allows us to obtain temporally coherent reconstructions of human actors for long sequences, while representing high-resolution details even in the context of challenging motion. While most research focuses on synthesizing at resolutions of 4MP or lower, we address the challenge of operating at 12MP. To this end, we introduce ActorsHQ, a novel multi-view dataset that provides 12MP footage from 160 cameras for 16 sequences with high-fidelity, per-frame mesh reconstructions. We demonstrate challenges that emerge from using such high-resolution data and show that our newly introduced HumanRF effectively leverages this data, making a significant step towards production-level quality novel view synthesis.

Citations (90)

View on Semantic Scholar

Summary

The paper presents a 4D dynamic neural scene representation that achieves high-resolution 12MP novel view synthesis of human motion.
It employs temporal matrix-vector decomposition and low-rank spatio-temporal partitioning for efficient, scalable rendering of complex dynamic scenes.
The research leverages the ActorsHQ dataset with 12MP recordings from 160 cameras to benchmark and advance realistic human avatar synthesis.

High-Fidelity Neural Radiance Fields for Humans in Motion

The paper presents HumanRF, a novel approach for creating high-fidelity neural radiance fields (NeRFs) designed to capture dynamic human motion. This method represents a significant development in the synthesis of photo-realistic virtual environments, addressing a key challenge in computer graphics and computer vision.

HumanRF is specifically designed for synthesizing images from unseen viewpoints using a 4D dynamic neural scene representation. By incorporating a temporal matrix-vector decomposition, the paper introduces a new technique in the encoding of high-resolution details in human motion, which maintains its efficacy across extended sequences. This approach marks a departure from traditional methods that are typically limited to static scenes or operate at considerably lower resolutions.

A pivotal component of the research is the introduction of ActorsHQ, a dataset containing 12MP recordings from 160 cameras that captures 16 high-fidelity sequences with per-frame mesh reconstructions. This dataset provides a unique opportunity for evaluating novel view synthesis techniques at a level of detail previously unattainable.

One noteworthy aspect of HumanRF is its spatio-temporal decomposition, which efficiently reconstructs dynamic radiance fields from multi-view inputs through low-rank decomposition. The adaptive temporal partitioning scheme further enhances the method's scalability, allowing it to cope with variably long sequences without exceeding the memory constraints of modern GPUs.

The paper provides strong quantitative results demonstrating the superior performance of HumanRF over current state-of-the-art methods. The method's ability to produce temporally coherent reconstructions and accurate novel view synthesis at a 12MP resolution sets it apart from its contemporaries, which typically struggle with such large volumes of high-resolution data.

In terms of implications, HumanRF enriches the field of computer graphics by providing a powerful tool for creating realistic human avatars in motion. This has immediate applications in film production, gaming, and virtual reality. The theoretical implications extend to how dynamic radiance fields can be further optimized for even more intricate motion capture and synthesis tasks.

Looking towards future developments, this research could pave the way for integrating neural models into real-time applications, leveraging the high fidelity of neural representations for interactive digital environments. Moreover, the dataset and methodologies could be expanded for broader applications beyond human motion, potentially leading to a comprehensive framework for dynamic scene rendering.

Overall, this paper advances the state of dynamic neural rendering, demonstrating a significant step toward achieving production-level quality in novel view synthesis while maintaining a comprehensive handling of challenging motions and extensive sequence lengths.

PDF Markdown

Related Papers

YouTube

Show All Videos