Depth Normal Consistency in 3D Reconstruction
- Depth Normal Consistency (DNC) is a technique that measures the angular difference between sensor-derived and predicted normals to filter out unreliable geometric cues.
- It is applied in adaptive Gaussian splatting pipelines to improve mesh accuracy and photorealism by selectively trusting reliable depth and normal guidance.
- Empirical results demonstrate that using DNC significantly boosts mesh F-score and rendering quality, making it vital for robust indoor 3D reconstruction.
Depth Normal Consistency (DNC) is a regularization and filtering strategy employed during Gaussian Splatting-based 3D reconstruction to robustly integrate geometric priors, especially when combining noisy or low-resolution sensor depth with data-driven or monocular normal estimates. DNC measures the agreement between depth- and normal-derived surface orientation at each pixel, and adaptively filters unreliable geometric supervision in regions where these modalities disagree. The result is a more reliable geometric alignment, improved mesh accuracy, and better photorealistic rendering in challenging scenarios such as smartphone-based indoor reconstruction.
1. Concept and Definition
Depth Normal Consistency (DNC) quantifies the alignment between the local surface normal computed from depth data, , and an external normal estimate, (typically from a monocular normal predictor). The consistency at pixel is measured by
where is the angular deviation in radians or degrees between the two normals. High values of indicate disagreement and thus potential unreliability in the raw depth or normal estimate at that location. This metric is used during supervision to selectively trust or disregard depth guides, thereby controlling the influence of external priors in the learning or optimization loop (Ren et al., 2024).
2. Role in Adaptive Gaussian Splatting Pipelines
In the AGS-Mesh framework, DNC is integral to the training phase of Gaussian Splatting models intended for mesh extraction and novel view synthesis:
- Sensor depth from smartphone LiDAR or similar sources is noisy and not always geometrically consistent with other cues.
- Monocular normals predicted by pretrained networks (e.g., Omnidata, ZoeDepth) provide high-resolution but possibly biased normal estimates.
DNC operates by first computing using local plane fitting or K-NN covariance on the raw sensor depth image at each pixel. The orientation agreement with is then measured using the above angular metric. If exceeds a threshold (e.g., ), the corresponding depth is considered unreliable and suppressed for that pixel: where is the original sensor depth and is the filtered version (Ren et al., 2024).
During optimization, the loss used to supervise the model geometry is then: where is the model's rendered depth, and is a transition iteration.
3. Integration with Normal Regularization and Filtering
AGs-Mesh further extends the DNC idea to normal supervision. The normal consistency between the model's rendered normal, , and is computed: Filtering is performed analogously: With this adaptive filtering, the model's normal supervision switches over iterations from using the full prior to only those pixels where consistency is high:
This dual adaptation ensures that both depth and normal priors are enforced only in plausible, reliably reconstructed regions, and that ambiguous or outlier regions do not bias the optimization (Ren et al., 2024).
4. Optimization Objective with DNC
The full objective function for AGS-Mesh employing DNC is: where is a photometric loss (e.g., D-SSIM), and , are empirically chosen weights.
This objective enables robust learning of Gaussian splatting models that are well-aligned with real surfaces, suppressing noisy or inconsistent guidance sources, and is key for producing high-fidelity geometry from uncontrolled sensor input in indoor environments (Ren et al., 2024).
5. Empirical Impact and Mesh Refinement
Empirical results on the MuSHRoom and ScanNet++ datasets confirm that DNC-based adaptive filtering leads to:
- Significant improvements in mesh accuracy (F-score, Chamfer-L1, normal consistency) over both vanilla and prior-augmented pipelines.
- Superior photorealistic rendering (PSNR, SSIM) of novel synthetic views.
- Ability to leverage both low-resolution sensor-based depth and high-resolution monocular normals simultaneously, extracting their complementary strengths and mitigating individual weaknesses.
An ablation on the MuSHRoom dataset demonstrates that introducing both priors and DNC increases mesh F-score from 0.6039 (no priors) up to 0.9061 (with both and DNC), and further to 0.9157 with full adaptive filtering and multiscale meshing (Ren et al., 2024).
6. Generalization and Limitations
Depth Normal Consistency as a gating mechanism is not limited to AGS-Mesh but can generalize to other 3DGS and 2DGS pipelines requiring robust geometric supervision under multi-source uncertainty. Its main limitation is that, in regions where both priors (sensor depth and predicted normal) fail or disagree due to extreme noise or occlusion, it provides no strong supervision—thus, the final reconstruction quality still relies on the coverage and base quality of the priors.
A plausible implication is that DNC enables stable learning in real-world scenarios where geometric priors are essential but inherently noisy, such as in consumer device-based room scanning and real-time scene updating (Ren et al., 2024).