COLMAP-Based Metrics in 3D Reconstruction

Updated 4 July 2026

COLMAP-based metrics are evaluation methods derived from COLMAP's SfM and MVS pipelines that assess sparse geometric consistency, pose accuracy, and dense support.
They combine native reprojection errors, explicit pose metrics (ATE, RPE), and dense image-space scores (PSNR, SSIM, LPIPS) into unified evaluation regimes.
They serve as both intrinsic optimization objectives and reliable failure indicators, enabling ground-truth-free evaluation in neural rendering and multi-modal settings.

COLMAP-based metrics denote a family of evaluation criteria and optimization objectives that are inherited from COLMAP’s structure-from-motion and multi-view stereo pipeline, defined on top of COLMAP outputs, or interpreted through COLMAP-centered benchmarking conventions. In recent work, the term encompasses sparse geometric residuals such as reprojection-based bundle adjustment, pose metrics such as Absolute Trajectory Error and Relative Pose Error when COLMAP trajectories or COLMAP-derived poses are used as references, dense consistency scores derived from COLMAP registration and depth outputs, and extensions that augment COLMAP with LiDAR constraints or refractive camera models (Paul et al., 18 May 2026, Xiong et al., 5 Mar 2026, Bai et al., 2023, She et al., 2024). In neural rendering, PSNR, SSIM, and LPIPS are mathematically independent of COLMAP, yet they become COLMAP-conditioned whenever camera poses originate from COLMAP or are evaluated against COLMAP-based baselines (Xiong et al., 5 Mar 2026, Wei et al., 18 Jul 2025).

1. Scope of the term

Recent papers do not use “COLMAP-based metrics” to denote a single scalar. Rather, the phrase is attached to several evaluation regimes. In GloSplat, “COLMAP-based” baselines are methods that use COLMAP’s standard incremental SfM initialization—SIFT features, exhaustive matching, view-graph construction, triangulation, and COLMAP’s bundle adjustment—and then keep poses frozen during downstream 3D Gaussian Splatting training (Xiong et al., 5 Mar 2026). In PCR-GS, COLMAP also appears as a reference trajectory provider: on Free-Dataset, the reported “ground-truth poses” are obtained from COLMAP, and pose metrics are computed after Procrustes alignment (Wei et al., 18 Jul 2025). In a different usage, “COLMAP-based metrics” in multiview 3D consistency refer to scene-level scores built directly from COLMAP registration success, dense support, and reconstruction failure signals (Paul et al., 18 May 2026).

Usage	Signal source	Representative quantities
Native SfM/MVS objective	feature tracks, reprojection, dense depth	reprojection error, BA loss, registration rate, GPC, ICM, W-GPC
COLMAP as reference or baseline	COLMAP poses or trajectories	ATE, RPE, rotation error
COLMAP-conditioned downstream evaluation	COLMAP-initialized neural reconstruction	PSNR, SSIM, LPIPS

This suggests that the expression is best understood as a family resemblance rather than a strict taxonomy. The common denominator is that COLMAP supplies either the geometry, the initialization, the failure mode, or the reference frame against which quality is judged.

2. Native sparse-geometry objectives

At the core of COLMAP-style evaluation lies the reprojection objective over feature tracks. GloSplat writes the sparse SfM bundle-adjustment loss explicitly as a robustified sum of reprojection errors over track observations,

$\mathcal{L}_\text{BA}^{\text{SfM}} = \sum_k \sum_{(i,p)\in\mathcal{T}_k} \rho_{\text{Huber}} \left( \left\|\pi_i(\mathbf{X}_k)-\mathbf{x}_{i,p}\right\|^2 \right),$

with projection $\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ . The same paper also makes explicit a global rotation averaging loss with Geman–McClure robustification and a point-to-ray translation/point-positioning objective. These are described as classic SfM metrics in the COLMAP sense, even when implemented in a different engine (Xiong et al., 5 Mar 2026).

Colmap-PCD preserves this COLMAP-style reprojection term and augments it with a LiDAR point-to-plane distance term,

$\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$

where $(l_q,n_q)$ is an associated LiDAR point and plane normal. The combined optimization minimizes visual reprojection consistency and geometric agreement with a fixed LiDAR map, thereby using COLMAP’s residual structure as the visual component of a metrically constrained registration problem (Bai et al., 2023).

Refractive COLMAP retains the same bundle-adjustment form—sum of robustified squared reprojection errors—but replaces the standard pinhole projection by a refractive projection implemented through per-feature virtual cameras. The optimized state includes the usual camera intrinsics, camera poses, and 3D points, and can additionally refine refractive parameters such as interface normal, camera–interface distance, or dome decentering. The metric remains COLMAP-like in solver structure and residual definition, but its projection model is physically refractive rather than central-perspective (She et al., 2024).

A recurrent implication is that reprojection error is not merely an evaluation scalar; it is the organizing principle of the optimization itself. Later papers question whether this principle is sufficient as a terminal criterion, but they continue to use it as a geometric anchor.

3. Pose and image-space evaluation protocols

A second major layer of COLMAP-based evaluation consists of pose metrics and image-based rendering metrics. In GloSplat, the dominant reporting protocol uses PSNR, SSIM, and LPIPS on held-out views. The paper states that these metrics are independent of COLMAP directly, but COLMAP enters indirectly as the source of camera poses for some baselines: better COLMAP poses lead to better 3DGS training and therefore better PSNR, SSIM, and LPIPS (Xiong et al., 5 Mar 2026). PCR-GS adopts the same image-space metrics for novel-view synthesis and combines them with explicit pose metrics, namely Relative Pose Error in translation $RPE_t$ , Relative Pose Error in rotation $RPE_r$ , and Absolute Trajectory Error, after Procrustes analysis aligns estimated and reference trajectories into a common coordinate space (Wei et al., 18 Jul 2025).

The image-space component follows standard definitions. For PSNR,

$\text{PSNR}=10\log_{10}\frac{MAX^2}{\mathrm{MSE}(\hat{\mathbf{I}},\mathbf{I})},$

where $\hat{\mathbf{I}}$ is the rendered image and $\mathbf{I}$ is the ground-truth image. GloSplat further uses a photometric training loss

$\mathcal{L}_\text{photo} = (1-\lambda_\text{SSIM})\|\hat{\mathbf{I}}-\mathbf{I}\|_1 + \lambda_\text{SSIM}\bigl(1-\mathrm{SSIM}(\hat{\mathbf{I}},\mathbf{I})\bigr), \quad \lambda_\text{SSIM}=0.2,$

and combines it with a persistent reprojection loss during joint pose–appearance optimization (Xiong et al., 5 Mar 2026).

The pose component is dataset-dependent. On ScanNet, GloSplat reports rotation error in degrees and ATE in meters against dataset ground truth rather than against COLMAP; here COLMAP is a baseline to be beaten, not a source of truth (Xiong et al., 5 Mar 2026). On Free-Dataset in PCR-GS, COLMAP trajectories are explicitly used as ground truth for ATE and RPE computation (Wei et al., 18 Jul 2025). This dual use is a source of ambiguity in the literature: COLMAP reconstructions may function either as strong baselines or as pseudo-ground-truth references, depending on the benchmark.

A common misconception is that these image metrics are somehow “non-geometric.” The cited work instead treats them as downstream consequences of pose quality. GloSplat’s ablations state that holding densification fixed while changing only pose quality can produce large PSNR gaps, and PCR-GS similarly frames improvements in PSNR, SSIM, and LPIPS as a consequence of better pose regularization (Xiong et al., 5 Mar 2026, Wei et al., 18 Jul 2025).

4. Failure-aware metrics from COLMAP outputs

A distinct development appears in work that treats COLMAP not as an initializer but as a verifier of whether a view set supports any coherent static 3D scene at all. “Can These Views Be One Scene?” introduces COLMAP-based metrics that convert registration success, dense support, and reconstruction failure into multiview 3D consistency scores (Paul et al., 18 May 2026).

The paper runs standard COLMAP SfM and MVS on an image set and defines three nested subsets: $\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ 0, the attempted images; $\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ 1, the images successfully registered in SfM; and $\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ 2, the registered images with valid dense depth maps from MVS. It then reports the registration rate

$\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ 3

For each densified view $\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ 4, valid pixels are collected in $\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ 5, and a per-pixel quality score $\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ 6 measures normalized agreement between COLMAP’s geometric-consistency depth map and photometric depth hypothesis with tolerance $\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ 7. On top of this, the paper defines

$\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ 8

$\pi_i(\mathbf{X})=\mathrm{proj}(\mathbf{K}_i(\mathbf{R}_i\mathbf{X}+\mathbf{t}_i))$ 9

Two integrated variants summarize reliable dense support. The first is

$\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$ 0

and the second, $\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$ 1, keeps the same numerator but uses all attempted images $\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$ 2 in the denominator, thereby explicitly penalizing views that fail SfM or MVS. To address narrow-viewpoint reconstructions, the paper further defines angular coverage as $\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$ 3 minus the largest azimuthal hole among registered cameras and uses it to construct coverage-weighted GPC,

$\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$ 4

W-GPC is the main COLMAP-based metric reported in most tables (Paul et al., 18 May 2026).

The significance of these metrics is methodological. Unlike learned 3D-consistency scores, which may hallucinate geometry for unrelated scenes, repeated images, or Gaussian noise, COLMAP-based metrics are designed to collapse when geometry cannot be verified. The paper therefore treats failure as informative evidence rather than as missing data. On real NVS outputs and a structured human study, these metrics achieve up to $\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$ 5 higher correlation with human judgments than MEt3R, and on a targeted SysCON3D slice W-GPC attains category-level Spearman correlation $\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$ 6 with the intended severity ordering (Paul et al., 18 May 2026).

The same paper also states the limitations plainly: very low texture, specular or reflective surfaces, repetitive patterns, large viewpoint changes, poor overlap, heavy blur, or compression artifacts can cause COLMAP to fail for physically coherent scenes as well. In such cases, the resulting near-zero scores are transparent but not necessarily discriminative.

5. Extensions with LiDAR constraints and refractive optics

COLMAP-based metrics have also been extended beyond pure RGB pinhole reconstruction. Colmap-PCD builds image-to-point-cloud registration around COLMAP’s SfM pipeline and supplements reprojection error with point-to-plane residuals against a pre-built LiDAR map. The paper treats point-to-plane distance as the primary alignment metric, tracks the number of associated LiDAR planes per bundle-adjustment stage, and reports convergence of the initial camera pose in the LiDAR frame. Mean reprojection error remains the diagnostic for visual consistency, while point-to-plane distance measures metric alignment and drift suppression (Bai et al., 2023).

This produces a characteristic joint interpretation. Low reprojection error with high point-to-plane distance indicates that visual geometry is internally consistent but poorly aligned or mis-scaled relative to LiDAR. Low values for both indicate that visual and external geometric consistency agree. The paper’s figures use this dual metric regime to compare original COLMAP against Colmap-PCD and to show that LiDAR constraints correct scale and alignment without degrading visual residuals (Bai et al., 2023).

Refractive COLMAP generalizes the same logic to underwater and non-central imaging. It integrates refractive camera models for flat and dome ports into COLMAP, uses generalized P3P for absolute pose, an approximate perspective model plus 5-point initialization and refractive refinement for relative pose, and performs bundle adjustment on virtual image planes. The reported metrics remain recognizably COLMAP-style: reprojection error in pixels, rotation error $\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$ 7, translation error $\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$ 8, 3D model error $\mathbf{E}_d=\sum w \times \|\mathbf{X}_k\cdot n_q-n_q\cdot l_q\|,$ 9, inlier ratio in RANSAC, number of registered images, and calibration accuracy for refractive parameters (She et al., 2024).

The most important conclusion in that work is that reprojection error alone becomes misleading under refraction. The paper gives cases in which the baseline UWPinhole model attains low reprojection error yet yields large pose and 3D errors, including curved seafloor reconstructions in large-scale AUV scenarios. By contrast, refractive COLMAP yields low reprojection error together with low pose and model error. In a small, weakly refractive water-tank setup, however, UWPinhole and refractive COLMAP perform similarly, indicating that standard COLMAP metrics remain meaningful when refraction is negligible or nearly symmetric (She et al., 2024).

Taken together, these extensions show that COLMAP-based metrics are not restricted to image-space pinhole SfM. They can be lifted into mixed-modality and physics-aware settings so long as the residual structure—projection consistency, robust correspondence geometry, and bundle-adjustment reasoning—remains intact.

6. Ground-truth-free formulations and broader implications

A different line of work addresses the case in which COLMAP-style reconstructions must be evaluated without any external ground truth. The Dense Map Posterior metric is designed precisely for this regime. It considers a reconstruction

$(l_q,n_q)$ 0

with dense model $(l_q,n_q)$ 1 and estimated trajectory $(l_q,n_q)$ 2, together with geometric observations $(l_q,n_q)$ 3. Under conditional independence assumptions and a Gaussian sensor model, the negative log-likelihood of the observations given the map reduces, up to constants, to

$(l_q,n_q)$ 4

where $(l_q,n_q)$ 5 is the predicted measurement under the sensor model. The paper explicitly calls $(l_q,n_q)$ 6 the Dense Map Posterior, with lower values indicating a more probable or better reconstruction (Zhang et al., 2021).

Although DMP is demonstrated on RGB-D and LiDAR SLAM rather than image-only COLMAP, the paper frames it as a way to evaluate “systems like COLMAP” when no ground-truth poses or geometry exist. Empirically, DMP is benchmarked against ground-truth-based metrics and is reported to give almost the same ranking of reconstructions as ATE, trajectory RMSE, and Surface Mean Distance on the datasets considered (Zhang et al., 2021). The same paper notes that DMP is explicitly defined for dense maps and requires geometry-related sensor readings and a sensor model; it is therefore not plug-and-play for pure RGB collections without adaptation.

These developments have changed how COLMAP-based metrics are interpreted. GloSplat argues that sparse reprojection error alone is not a sufficient final metric for modern neural rendering, because poses with good bundle-adjustment residuals can still be suboptimal for PSNR, SSIM, and LPIPS; its solution is to keep a COLMAP-style reprojection loss active during 3DGS training as a persistent geometric regularizer (Xiong et al., 5 Mar 2026). The multiview-consistency literature argues almost the opposite failure mode: if learned backbones are allowed to hallucinate geometry, then classical COLMAP failure signals—registration collapse, missing dense support, narrow angular coverage—become valuable as negative evidence rather than as nuisances (Paul et al., 18 May 2026).

A plausible implication is that contemporary usage no longer treats any single COLMAP residual as exhaustive. Instead, COLMAP-based metrics are increasingly assembled into composite regimes that combine sparse geometric consistency, dense support, pose accuracy, downstream rendering quality, external geometric constraints, and explicit failure handling. Within that broader regime, COLMAP remains less a single software dependency than a reference geometry for how reconstruction quality is formalized and tested.