Confidence-Aware Densification & Fusion

Updated 3 March 2026

The paper introduces a novel framework that adaptively weights sensor inputs via per-pixel confidence, enhancing densification accuracy.
It employs variational optimization, deep convolutions, and graph-based propagation to fuse multi-modal data with improved fidelity.
CADF is practically applied in SLAM, cross-sensor view synthesis, and 3D reconstruction, achieving superior quantitative results over traditional methods.

Confidence-Aware Densification and Fusion (CADF) is a paradigm for integrating spatially or temporally disparate, often sparse or noisy, sensor measurements into dense, reliable, and geometrically consistent outputs by explicitly modeling and leveraging per-pixel or per-point confidence values. CADF addresses incompleteness and misalignment in multi-modal or multi-view data by combining data fidelity and local spatial coherence, adaptively weighting contributions according to learned or computed confidence. CADF methodologies span variational optimization, deep convolutional architectures, graph-based propagation, and multi-stage fusion pipelines, showing particular efficacy for tasks such as depth completion, cross-sensor view synthesis, visual SLAM, and dense 3D scene reconstruction.

1. Theoretical Foundations and Canonical Formulations

CADF emerged in variational data fusion as a solution to ill-posed densification, where reliability varies spatially or across sensor modalities. A canonical variational framework is the confidence-driven TGV fusion model, where for a set of input maps $\{d_k\}$ and confidence vector $c$ the energy is

$E(u, c) = \mathrm{TGV}_\alpha^\ell(u) + \sum_{i=1}^{n} \sum_{k=1}^K c_i |u_i - d_{k,i}| + \frac{1}{2} \sum_{i=1}^n w_i^{-1} c_i - b\sum_{i=1}^n \log c_i,$

with $u$ as the fused output, $c$ the spatial confidence, $\mathrm{TGV}_\alpha^\ell$ a higher-order regularization (edge- and surface-preserving), and $b, w$ hyperparameters controlling the confidence scale (Ntouskos et al., 2016). The joint estimation proceeds by biconvex alternating minimization, encouraging $u$ to interpolate among consistent observations, while $c$ down-weights outliers and ambiguous regions. The confidence prior prevents degenerate solutions and controls the spatial spread of confidence.

2. Confidence Estimation Mechanisms

Core to CADF is the design of confidence measures that reflect geometric or statistical reliability. In contemporary implementations:

Multi-view geometric confidence: Quantified as the normalized count of corroborating reconstructions within a spatial threshold, e.g.,

$w_\mathrm{mv}(p) = \frac{n_i(p)}{N_{\mathrm{key}}},$

where $n_i(p)$ is the number of neighbors whose 3D reconstructions agree with the reference at pixel $p$ (Dufera et al., 21 Sep 2025).

Monocular prior or cross-modal confidence: Often taken as the complement of multi-view, or as the output of sparse matching or regression network “uncertainty” maps, e.g., $w_\mathrm{mono}(p) = 1-w_\mathrm{mv}(p)$ .
Data-driven confidence: In sparsity-aware CNNs, confidence $C^{l}$ propagates through layers by normalized convolution,

$C_{i,j}^{l} = \frac{\sum C_{i+m, j+n}^{l-1} \Gamma(W_{m,n}^{l}) + \varepsilon} {\sum \Gamma(W_{m,n}^{l})}$

where $\Gamma(\cdot)$ ensures non-negativity (Eldesokey et al., 2018).

Cross-sensor confidence: In multimodal fusion, confidence is derived from local consistency between matching points or region-wise agreement, and refined by spatial or object-aware heuristics (e.g., 3D box overlap or proximity in radar-camera fusion) (Sun et al., 2024).

3. Densification and Fusion Algorithms

CADF converts sparse, noisy, and/or multi-modal input into a dense representation by blending the data according to the computed confidence. The principal schemes are:

Per-pixel confidence-weighted fusion: As in ConfidentSplat,

$D_\mathrm{proxy}(p) = w_\mathrm{geo}(p)d_\mathrm{geo}(p) + w_\mathrm{mono}(p)d_\mathrm{mono}(p)$

utilizing linear or softmax-normalized weights (Dufera et al., 21 Sep 2025).

Normalized-convolution propagation: For sparse input $F$ , confidence $C$ , and applicability $a$ ,

$r[k] = \frac{\sum_i a[i]F[k-i]C[k-i]}{\sum_i a[i]C[k-i]}$

with recurrence through CNN layers for depth completion (Eldesokey et al., 2018).

Recurrent graph propagation: Densification via DySPN with confidence blending,

$L^{t+1} = (1 - C_sC_m)\sum w_{r,a,b}L^t_{a,b} + (C_s C_m) X_m$

where $C_s$ is certainty from network features and $C_m$ is matcher-produced confidence (Wu et al., 27 Feb 2026).

Variational optimization: Alternating updates of $u$ (the fused depth/image) and $c$ (the confidence), each via convex sub-problems, typically solved with primal-dual splitting (Ntouskos et al., 2016).
Confidence-aware gated fusion: In CaFNet, feature gating is modulated by confidence at each scale,

$(F_R^i)'_{u,v} = \alpha_{u,v}^i \beta_{u,v}^i \widehat c_{u,v}^i$

before additive fusion with image features (Sun et al., 2024).

4. CADF in Large-Scale Visual SLAM and 3D Reconstruction

CADF plays a pivotal role in enabling robust visual SLAM and dense 3DGS-based scene modeling, particularly when depth cues are unreliable or only available via indirect inference:

ConfidentSplat employs CADF to fuse geometric depth (from multi-view photometry/SfM) and monocular priors (Omnidata DPT/ViT). The dense proxy depth constrains 3D Gaussian parameters in the splatting map, with per-pixel confidence modulating loss terms in joint pose, depth, and appearance optimization. This achieves SOTA reconstruction on TUM-RGBD and ScanNet despite RGB-only input (Dufera et al., 21 Sep 2025).
Unposed sparse-view 3DGS: CADF mechanisms restore missing views via bidirectional, diffusion-based pseudo-view synthesis, with confidence masks inferred from feature correspondence consistency. These are then used in a two-stage joint optimization to regularize 3DGS model parameters and improve geometric completeness, especially critical in large-scale outdoor scenes (Zhao et al., 25 Feb 2026).
Cross-sensor view synthesis: CADF enables automatic, calibration-free alignment and densification of RGB-X (thermal, NIR, SAR) data. The match-densify-consolidate pipeline uses matcher-derived confidence for DySPN-based propagation, patch-level self-matching for outlier rejection, and multi-level fusion for robust, dense cross-modality image and 3DGS representation (Wu et al., 27 Feb 2026).

5. Applications and Empirical Performance

CADF frameworks are validated in diverse domains, consistently surpassing fixed weighting and non-confidence-aware baselines. Key quantitative results include:

Study	Domain	Task	CADF Metric (ours)	SOTA Baseline / Δ
(Dufera et al., 21 Sep 2025) ConfidentSplat	Monocular SLAM	3DGS Map, RGB-only recon	L1 rel depth ↓, PSNR ↑, SSIM ↑	Outperforms all RGB-only baselines
(Ntouskos et al., 2016) TGV Fusion	Stereo depth fusion	Dense depth, disparity	Out-3 (SGM+VISO2) 7.3%, MC-CNN 2.46%	Single-view: 8.7%, 2.61%
(Eldesokey et al., 2018) NConv-HMS	Sparse-to-dense CNN	KITTI depth completion	MAE 0.37m, RMSE 1.29m	SparseConv, DCCS-3-layers
(Sun et al., 2024) CaFNet	Radar+camera dense depth	nuScenes test, 80m	MAE 2.11m (↓3.2%), RMSE 4.77m (↓2.7%)	RadarNet (MAE 2.18m, RMSE 4.90m)
(Wu et al., 27 Feb 2026) Cross-sensor	RGB/thermal/NIR/SAR fusion	Novel view, 3DGS synth	RGB-NIR: 21.15dB PSNR	20.39dB; –3DGS: 21.04dB

Empirical ablation studies consistently show degradations upon removing confidence modeling, multi-level thresholds, or self-matching filtering, corroborating the importance of CADF for output fidelity, artifact suppression, and robustness to outliers or domain gap.

6. Implementation Considerations and Design Patterns

Across variants, CADF is characterized by a strict adherence to per-pixel or per-point confidence propagation, typically without binarization. Confidence enters both data fidelity (weighting observed values) and spatial regularization or fusion (guiding propagation and attenuation):

Deep learning variants (e.g., NConv-HMS, CaFNet) employ normalized convolutions or gated fusions at multiple scales, with explicit propagation or inference of confidence maps as part of the end-to-end pipeline (Eldesokey et al., 2018, Sun et al., 2024).
Variational or recurrent inference leverages confidence both in closed-form weighting (e.g., ACS updating c, (Ntouskos et al., 2016)) and in iterative graph-based propagation (e.g., DySPN, (Wu et al., 27 Feb 2026)).
Computational efficiency is achieved by keyframe subsampling, streaming of only active 3D primitives, and restricted window optimization (e.g., 30-keyframe limit in SLAM, (Dufera et al., 21 Sep 2025)).
Hyperparameter tuning (e.g., threshold sets, update rates, regularization weights) directly impacts confidence sharpness and, ultimately, fusion selectivity and completeness.

7. Limitations and Practical Considerations

Despite their empirical success, CADF methods are subject to certain limitations:

Nonconvexity in alternating minimization can lead to local minima, particularly if confidence initialization is poor or data quality is low (Ntouskos et al., 2016).
Overconfidence and bias: Excessively peaked or biased confidence can suppress useful information in ambiguous regions, necessitating careful regularization and prior calibration.
Propagation of sensor bias: CADF does not inherently correct systematic sensor bias; confidence weighting must be combined with domain-specific correction or calibration to achieve optimal results.
Scalability: While efficient in design, large-scale or real-time deployment (e.g., in SLAM) can be bottlenecked by memory (feature volumes, parameter buffers) or runtime (multi-stage fusion passes) (Dufera et al., 21 Sep 2025).

CADF continues to provide a principled and extensible framework for reliable, dense estimation in multimodal and multi-view sensing, substantiated by consistent improvements over non-confidence-aware baselines in depth fusion, scene completion, SLAM, and cross-modal view synthesis.