Identifying conditioning strategies that best preserve hand fidelity, realism, and temporal coherence
Determine which conditioning strategies for video diffusion models best preserve hand fidelity, realism, and temporal coherence when conditioning on tracked joint-level hand poses.
References
Furthermore, it is unclear which conditioning strategies best preserve hand fidelity, realism, and temporal coherence in video generation.
— Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control
(2602.18422 - Xie et al., 20 Feb 2026) in Section 1: Introduction