Cause of GEN3C’s initial-frame viewpoint transformation failure

Determine whether the observed failure of GEN3C to transform the viewpoint of the initial frame in camera-controlled video generation arises from a bias in the GEN3C image-to-video backbone toward preserving the initial source frame.

Background

In the qualitative evaluation, GEN3C, a reprojection-based approach for camera-controlled video generation, was observed to fail at transforming the viewpoint of the initial frame. The authors explicitly conjecture a causal explanation for this behavior, suggesting a systemic bias in the image-to-video backbone that prefers preserving the initial source frame.

This conjecture, if validated, would identify a core limitation in GEN3C’s backbone influencing its camera control behavior, and aligns with similar observations reported in prior work (GCD). Establishing or refuting this conjecture would clarify whether backbone bias is the primary cause and guide targeted remedies or architectural improvements.

References

GEN3C also relies on reprojection, but fails to transform the viewpoint of the initial frame. We conjecture that this result stems from a bias in its image-to-video backbone toward preserving the initial source frame.

Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation (2512.17040 - Kim et al., 18 Dec 2025) in Section: Qualitative Results (sec:exp_qual)