Papers
Topics
Authors
Recent
Search
2000 character limit reached

ActorsHQ: High-Fidelity Multi-View Dataset

Updated 22 September 2025
  • ActorsHQ is a high-fidelity multi-view dataset featuring ~39,765 frames, 160 synchronized 12MP cameras, and dense per-frame mesh reconstructions.
  • It enables production-level dynamic human performance capture for applications in film, gaming, and telepresence with precise photometric and geometric data.
  • The dataset establishes a benchmark for novel view synthesis and dynamic reconstruction, fostering innovations in neural rendering and algorithm optimization.

ActorsHQ is a high-fidelity multi-view dataset designed for dynamic human performance capture at production-level quality for novel view synthesis, as introduced in the HumanRF framework. Comprising synchronized high-resolution video captured from a large array of cameras and dense ground-truth mesh reconstructions, ActorsHQ establishes a new standard for benchmarking neural rendering and dynamic reconstruction methods in computer vision.

1. Dataset Composition and Structure

ActorsHQ consists of approximately 39,765 frames of dynamic motion captured using 160 Ximea cameras, each recording 12-megapixel images at 25 frames per second. Eight actors (four male, four female) each perform two distinct 100-second sequences in a controlled environment. The first sequence features a choreographed set of 32 actions aimed at activating major joints and limbs (interspersed with A-poses for calibration), while the second is comprised of 20 everyday movements emphasizing exaggerated dynamics. Every frame is paired with a high-quality mesh reconstruction containing approximately 500,000 vertices, generated via state-of-the-art multi-view stereo (RealityCapture).

Property Value/Description Notes
#Frames ~39,765 160 cameras × 12MP × 2×100s × 8 actors
#Cameras 160 Ximea, synchronized
Resolution 12 Megapixels Per camera/frame
Actors 8 (4 male, 4 female) 2 sequences/actor
Ground-truth Mesh 500k vertices/frame Per-frame mesh

This dataset's composition—a combination of dense multi-view imagery and precise geometric meshes—provides opportunities for both photometric and geometric validation.

2. Capture Methodology and Illumination Control

The acquisition process relies on a custom-built multi-camera rig, synchronized at 25 Hz, with each camera employing a shutter speed of 650 µs to minimize motion blur. A globally-calibrated volume (1.6 m diameter × 2.2 m height) serves as the capture space, enabling unrestricted human motion within a photometrically consistent scene. Illumination is managed using 420 programmable LEDs, timed precisely with camera shutters to ensure appearance fidelity in captured details such as skin, hair, and fabric texture.

Mesh reconstructions for each frame are generated with RealityCapture using the complete multi-view image set, which enables explicit geometric supervision during downstream learning and comparison.

3. Applications and Use Cases

The ActorsHQ dataset is tailored for research areas where extreme fidelity and geometric accuracy are required:

  • Film Production: Enables synthesis of detailed digital human avatars suitable for visual effects, including subtle appearance cues (hair strands, cloth wrinkles).
  • Computer Games: Facilitates realistic asset generation for immersive environments by capturing both geometry and photometric appearance of humans in motion.
  • Videoconferencing & Telepresence: Provides avatars with high realism for real-time or delayed visual communication.

By offering ground-truth meshes and ultra-high-res imagery, ActorsHQ serves as a benchmark for dynamic scene reconstruction and novel view synthesis, challenging models to address both temporal and spatial complexity at scale.

4. Technical Challenges and Algorithmic Solutions

Memory and Computation

Operating at 12MP and thousands of frames across 160 cameras pushes the limits of GPU memory and computational requirements. Efficient data representation and scalable optimization are mandatory. HumanRF addresses these issues via an adaptive temporal partitioning algorithm that splits the sequence into segments based on spatial occupancy, using a greedy algorithm guided by foreground mask union expansion thresholds.

Detail Preservation in Dynamic Content

Fast motions and topological changes (hair, cloth, limb articulation) necessitate representations capable of capturing fine details without succumbing to oversmoothing or noise. HumanRF employs a compact 4D feature grid decomposed into four 3D hash grids and four 1D dense grids:

Txyzt(pxyzt)=Txyz(pxyz)Tt(pt)+Txyt(pxyt)Tz(pz)+Txzt(pxzt)Ty(py)+Tyzt(pyzt)Tx(px)T_{xyzt}(p_{xyzt}) = T_{xyz}(p_{xyz}) \odot T_t(p_t) + T_{xyt}(p_{xyt}) \odot T_z(p_z) + T_{xzt}(p_{xzt}) \odot T_y(p_y) + T_{yzt}(p_{yzt}) \odot T_x(p_x)

where “⊙” denotes the Hadamard product. This low-rank tensor decomposition, combined with multi-resolution hash grids, enables both shared space-time features and per-frame specificity. A shallow MLP processes density/geometry and view-dependent radiance components. Only about 5.2% of the parameters are needed compared to per-frame models such as Instant-NGP.

Volume rendering uses standard ray integration:

C(r,t)=nearfarT(s)L(s,d,t)dsC(r, t) = \int_{near}^{far} T(s)L(s, d, t)\,ds

where T(s)T(s) represents transmittance and L(s,d,t)L(s, d, t) radiance.

5. Benchmark Impact and Downstream Evaluation

By providing both photometric and geometric ground-truth, ActorsHQ enables rigorous quantitative and qualitative evaluation:

  • Resolution Gap: ActorsHQ pushes from the 4MP norm to 12MP synthesis, challenging models to resolve features at a scale suited to production-level tasks.
  • Temporal Consistency: Adaptive segmentation of the time axis maintains consistent rendering over long sequences, preventing artifacts and temporal flicker.
  • Computational Efficiency: The compact grid/MLP architecture leverages dataset richness with manageable resource demands, making high fidelity synthesis tractable on modern hardware.

Recent studies, including MoAngelo (Ebbed et al., 19 Sep 2025), report superior reconstruction accuracy and smoother mesh fidelity versus prior methods (e.g., Tensor4D, GauSTAR, HumanRF)—with evaluation on ActorsHQ establishing a new benchmark for dynamic neural surface reconstruction.

6. Challenges in Dynamic Reconstruction and Evolving Methodologies

Novel view synthesis and dynamic surface reconstruction on ActorsHQ impose additional demands due to occlusions, topological changes, and motion complexity. MoAngelo addresses these by jointly optimizing a static template (SDF-based, initialized via NeuralAngelo) and per-frame deformation fields modeled in so(3)\mathfrak{so}(3). Explicit template updates from all frames enable accurate reconstruction in regions revealed over time or subject to topological change.

Quantitatively, models demonstrate lower L1-Chamfer error compared to prior art (e.g., MoAngelo: 0.0052 vs. competitors an order of magnitude higher), and qualitatively recover intricate details robustly across frames, as demonstrated in side-by-side visual comparisons.

This suggests that the ActorsHQ dataset not only advances neural rendering benchmarks, but also catalyzes innovations in dynamic scene reconstruction algorithms through its rich and unique structure.

7. Significance and Future Directions

ActorsHQ identifies several future research avenues:

  • How to further optimize neural representations for memory and computation, enabling even higher-fidelity synthesis at scale.
  • How to address remaining challenges posed by the dataset’s dynamic sequences, such as accurate tracking of severe topological changes and fine-grained appearance modeling under fast motion and complex lighting.
  • Integration of end-to-end deep learning approaches and advanced multi-task architectures to jointly process geometry, appearance, and physical dynamics.
  • Expansion to broader actor categories, complex interactions, and real-world scene complexity.

As a high-resolution, ground-truth-rich benchmark, ActorsHQ is poised to remain central to research at the intersection of computer vision, graphics, and immersive human-centered applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ActorsHQ Dataset.