Design quantitative metrics for subject-driven 3D/4D identity preservation

Develop robust, human-aligned quantitative evaluation metrics for subject-driven 3D and 4D generation that accurately assess subject fidelity across novel viewpoints. The metric should integrate both appearance similarity (e.g., shape, color, texture, and facial features relative to the reference views) and geometric correctness of the rendered foreground across viewpoints, and overcome the known deficiencies of existing vision-model-based metrics such as DINO similarity when evaluating identity preservation in 3D/4D assets.

Background

The paper finds that standard metrics used in DreamBooth, such as DINO feature similarity, can produce misleading results when evaluating subject-driven 3D/4D generation. In qualitative comparisons, methods that clearly fail in geometry sometimes score higher, suggesting these metrics are inadequate for capturing identity preservation across viewpoints.

The authors propose that better metrics should combine geometric assessment of the rendered foreground with appearance-based similarity to the source subject, reflecting the multi-view and structural nature of 3D/4D assets.

References

We leave the design of proper evaluation metrics on subject-driven 3D/4D generation as future work.

Track, Inpaint, Resplat: Subject-driven 3D and 4D Generation with Progressive Texture Infilling (2510.23605 - Zheng et al., 27 Oct 2025) in Appendix Section C (Discussions and Limitations), Quantitative metrics for 3D/4D subject-driven generation