Papers
Topics
Authors
Recent
Search
2000 character limit reached

Uncertainty-Aware Noisy-Or Fusion (UNO)

Updated 1 July 2026
  • UNO is a multimodal, uncertainty-aware fusion framework that integrates deep segmentation predictions using spatially adaptive calibration and a probabilistic noisy-or rule.
  • It dynamically recalibrates per-modality softmax outputs with spatial temperature scaling based on both epistemic and aleatoric uncertainty to improve robustness under various degradations.
  • Empirical evaluations show that UNO significantly boosts mean IoU, outperforming prior methods by up to 17.5 percentage points especially under challenging, out-of-distribution conditions.

Uncertainty-Aware Noisy-Or Fusion (UNO) is a multimodal late-fusion framework for semantic segmentation that combines independent deep networks (“experts”) for each input modality (e.g., RGB, depth), integrating both probabilistic uncertainty estimates and spatially adaptive calibration. UNO aims to achieve robustness to both familiar and previously unseen input degradations by dynamically tempering each expert’s confidence according to estimated epistemic and aleatoric uncertainties, then combining the recalibrated predictions via a probabilistically principled noisy-or rule. This approach yields significant gains over prior state-of-the-art models, especially under challenging or out-of-distribution corruptions (Tian et al., 2019).

1. Model Architecture and Inference Pipeline

UNO employs a late-fusion architecture, in which an independent segmentation network is assigned to each sensor modality. During inference, for each modality mm, the input xmx_m is processed by a pre-trained deep network to produce a set of per-class logits: zm(xm)=[zm1(xm),zm2(xm),,zmC(xm)]RC.z_m(x_m) = [z_m^1(x_m), z_m^2(x_m), \dots, z_m^C(x_m)]\in\mathbb{R}^C. In standard approaches, per-class probabilities are obtained via the softmax function: pm(cxm)=exp(zmc(xm))c=1Cexp(zmc(xm)).p_m(c\mid x_m) = \frac{\exp(z_m^c(x_m))}{\sum_{c'=1}^{C} \exp(z_m^{c'}(x_m))}. UNO interposes an uncertainty-aware recalibration of the softmax temperature, modulating the logits according to both global and spatial indicators of input quality. The resulting uncertainty-scaled probabilities p~m\tilde p_m from all modalities are finally fused per class by a noisy-or assignment, preserving high confidence even if only a single expert is confident.

2. Uncertainty Quantification

UNO computes three independent uncertainty metrics per pixel, averaging each spatially for an overall score. These are:

  • Predictive entropy: Quantifies total uncertainty under Monte Carlo dropout (MCDO) forward passes,

H[yx]=c=1Cpˉ(c)logpˉ(c),pˉ(c)=1Tt=1Tp(y=cx,θ^t).H[y\mid x] = -\sum_{c=1}^C \bar p(c)\,\log\bar p(c),\quad \bar p(c) = \frac{1}{T}\sum_{t=1}^{T} p(y=c \mid x, \hat\theta_t).

  • Mutual information: Captures epistemic (model) uncertainty as the difference between predictive entropy and expected entropy over dropout samples,

I[y,wx]=H[yx]1Tt=1T[c=1Cp(y=cx,θ^t)logp(y=cx,θ^t)].I[y, w\mid x] = H[y\mid x] - \frac{1}{T}\sum_{t=1}^{T} \left[ -\sum_{c=1}^C p(y=c \mid x, \hat\theta_t)\log p(y=c \mid x, \hat\theta_t) \right].

  • Deterministic (single-pass) entropy:

Hdet[yx]=c=1Cp(y=cx,θ)logp(y=cx,θ).H_{\rm det}[y \mid x] = -\sum_{c=1}^C p(y = c \mid x, \theta)\log p(y = c \mid x, \theta).

Each metric delivers a distinct detection sensitivity for out-of-distribution and degraded inputs.

3. Data-Dependent Spatial Temperature Scaling

UNO supplements global uncertainty with spatial temperature scaling via the TempNet auxiliary network. For each modality, TempNet predicts a 2D temperature map T(i,j)RH×WT(i,j) \in \mathbb{R}^{H \times W} conditioned on xmx_m: xmx_m0 The spatial temperature is applied to each logit before the softmax: xmx_m1 TempNet is trained by minimizing negative log-likelihood over the per-pixel scaled softmax. This mechanism enables localized calibration, e.g., down-weighting fog-obscured distant regions, and captures region-specific uncertainty that global metrics may miss.

4. Uncertainty-Scaled Softmax and Deviation Ratios

The softmax input for each modality is dynamically sharpened or flattened by a deviation ratio xmx_m2, which aggregates the normalized difference between current and training-phase statistics of each uncertainty metric. For uncertainty measure xmx_m3,

xmx_m4

where xmx_m5 and xmx_m6 are the mean and standard deviation on clean data, and xmx_m7 is the test-time value. The overall scaling factor is set as xmx_m8, a conservative strategy that discounts the expert’s prediction if any metric signals unpredictability. The per-class probabilities are thus: xmx_m9

5. Noisy-Or Fusion Rule

The final multimodal fusion is accomplished by the noisy-or rule, a probabilistic model in which each modality acts as an independent “cause” for class zm(xm)=[zm1(xm),zm2(xm),,zmC(xm)]RC.z_m(x_m) = [z_m^1(x_m), z_m^2(x_m), \dots, z_m^C(x_m)]\in\mathbb{R}^C.0. The combined belief for class zm(xm)=[zm1(xm),zm2(xm),,zmC(xm)]RC.z_m(x_m) = [z_m^1(x_m), z_m^2(x_m), \dots, z_m^C(x_m)]\in\mathbb{R}^C.1 is computed as: zm(xm)=[zm1(xm),zm2(xm),,zmC(xm)]RC.z_m(x_m) = [z_m^1(x_m), z_m^2(x_m), \dots, z_m^C(x_m)]\in\mathbb{R}^C.2 In this framework, high confidence from any single expert is preserved (i.e., if one zm(xm)=[zm1(xm),zm2(xm),,zmC(xm)]RC.z_m(x_m) = [z_m^1(x_m), z_m^2(x_m), \dots, z_m^C(x_m)]\in\mathbb{R}^C.3 is close to 1, the fused probability is high for that class), and weak or noisy experts do not dominate the decision. No fusion parameters are trained; the method is zero-parameter at the fusion stage.

6. Training Procedures

The base segmentation networks are trained conventionally to minimize pixel-wise cross-entropy on clean data per modality. TempNet is subsequently trained with a negative log-likelihood loss using ground-truth labels and the pixel-wise temperature-scaled logits: zm(xm)=[zm1(xm),zm2(xm),,zmC(xm)]RC.z_m(x_m) = [z_m^1(x_m), z_m^2(x_m), \dots, z_m^C(x_m)]\in\mathbb{R}^C.4 Neither the uncertainty scaling nor the noisy-or fusion rule introduce any additional trainable parameters. All uncertainty statistics are estimated during training and applied during inference only.

7. Empirical Evaluation and Ablation Analysis

UNO was evaluated on AirSim-generated urban RGB-D datasets with two training conditions (fog level 0, fog level 50) and diverse test degradations including fog (level 100), snow, frost, motion blur, brightness shifts, masking, impulse noise, Gaussian noise, and shot noise. Performance was measured by mean Intersection-over-Union (mIoU).

Key results summarize comparative performance:

Method mIoU (avg. all test condit.) mIoU (in-dist.) mIoU (degraded)
RGB-D SSMA ≈ 62.5% >80% Collapses on severe
UNO ≈ 78.4% >80% ≫ Baseline
UNO++ (with SSMA) ≈ 80.0% >80% ≫ Baseline

UNO improved mIoU by +15.9 pts over SSMA across all test conditions, and UNO++ by +17.5 pts (Tian et al., 2019). In-distribution performance remained comparable to other methods, but UNO degraded gracefully in conditions unseen at training, avoiding the catastrophic failures noted in learned fusion baselines under, for example, input blackout or severe noise.

Ablation studies indicated that single-pass entropy (zm(xm)=[zm1(xm),zm2(xm),,zmC(xm)]RC.z_m(x_m) = [z_m^1(x_m), z_m^2(x_m), \dots, z_m^C(x_m)]\in\mathbb{R}^C.5) combined with spatial temperature scaling sufficed for robust uncertainty estimation; inclusion of MCDO-based metrics incurred computational overhead without commensurate performance gains. Choosing the minimum deviation ratio across uncertainty measures proved a conservatively effective detector for any input corruption.

8. Context, Significance, and Limitations

UNO’s principal advance is its ability to handle unscripted, out-of-distribution input corruptions without fixed, explicit modeling of each degradation type. Dynamic uncertainty-driven recalibration, rather than hard learned gates or fixed fusion strategies, enables greater generalization to novel sensor failures or adversarial degradations. The spatial temperature maps provide region-adaptive modulation, critical for handling spatially structured corruption (e.g., distance-dependent fog). Noisy-or fusion preserves the influence of highly confident experts while suppressing unreliable ones, outperforming multiplicative fusion schemes that dilute correct signals through averaging.

A plausible implication is that zero-parameter, uncertainty-aware fusion schemes may be broadly applicable for multimodal systems where training-time anticipation of every input perturbation is impractical. The method’s reliance on only pre-trained segmentation networks and a compact auxiliary TempNet ensures tractability and deployment efficiency.

Observed limitations include increased inference cost for Monte Carlo-based uncertainty estimation, though ablation finds deterministic entropy plus learned temperature sufficient for typical scenarios. Empirical results are demonstrated in a photo-realistic simulation environment; real-world generality depends on the similarity between simulated and physical sensor failure modes.

UNO establishes a rigorous, extensible probabilistic foundation for multimodal sensor fusion under uncertainty, emphasizing adaptivity and out-of-distribution robustness without fusion-stage re-training (Tian et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Uncertainty-Aware Noisy-Or Fusion (UNO).