Evaluation protocols for models with per-image GLO embeddings

Develop a principled and standardized evaluation methodology for view synthesis models that incorporate per-image Generative Latent Optimization (GLO) embeddings to handle exposure and lighting variations, enabling fair quantitative comparison of such models on real-world datasets.

Background

In real-world captures, models often employ per-image latent vectors (e.g., GLO embeddings) to explain away exposure and lighting variation, which is crucial for achieving high-quality reconstructions. However, this introduces complications for fair and consistent evaluation, since the latent vectors can absorb appearance changes that differ between training and test conditions.

Because of this difficulty, recent works commonly train separate models—one with such embeddings for visualizations and another without them for quantitative evaluation—highlighting the lack of a standardized, principled protocol for assessing models that use per-image embeddings.

References

While using techniques such as GLO vectors is essential for high quality on real-world captures (see~Sec.{subsec:nerf-prior}), the evaluation of such models is an open problem such that recent methods train two separate models, one for visualizations, and one (without GLO vectors) purely for the quantitative comparison.

RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS (2403.13806 - Niemeyer et al., 20 Mar 2024) in Experiments, Metrics and Evaluation paragraph (Section 4)