Empirically verify Reliev3R’s behavior under large-scale data scaling

Determine the empirical behavior of Reliev3R when its training is scaled to substantially larger datasets, in order to assess whether the weakly supervised paradigm maintains or improves reconstruction and camera pose estimation performance without multi-view geometric annotations.

Background

Reliev3R is proposed as a weakly supervised training paradigm for feed-forward 3D reconstruction models that avoids reliance on multi-view geometric annotations by using pseudo monocular relative depth and sparse image correspondences. A central motivation is that removing costly SfM/MVS labels could enable scaling to larger, more diverse training data.

However, the paper’s experiments are conducted on DL3DV-10K-scale data, and the authors explicitly state that they have not empirically validated Reliev3R’s behavior at substantially larger scales. This leaves open whether the intended data-scaling advantages translate into measurable performance gains or robustness improvements when training is scaled up.

References

The major limitation lies in the absence of a large-scale data scaling analysis. Although Reliev3R is designed to reduce the reliance on SfM/MVS annotations and thereby increase the feasible training scale, we have not empirically verified its behavior under substantially larger datasets.

Reliev3R: Relieving Feed-forward Reconstruction from Multi-View Geometric Annotations  (2604.00548 - Chen et al., 1 Apr 2026) in Section 5 (Limitation)