Aleatoric Entrenchment Overview

Updated 4 July 2026

Aleatoric entrenchment is the persistence of inherent, irreducible noise in outcomes even as data volume and model quality improve.
It is formalized via diverse approaches, including Bayesian risk ratios, entropy measures, and credal-set methods, highlighting its task-specific nature.
This concept underpins advances in uncertainty disentanglement, robust out-of-distribution detection, and fairness analysis by separating reducible from persistent uncertainty.

Aleatoric entrenchment denotes the persistence of irreducible uncertainty after reducible uncertainty has been reduced, but the term is not used uniformly across the literature. In work on uncertainty disentanglement, it appears as an assumption or phenomenon that the aleatoric component should remain stable as data size increases, under distribution shift, and with improved modeling (Valdenegro-Toro et al., 2022). In credal uncertainty modeling, it is operationalized as high intrinsic ambiguity that remains high even as epistemic uncertainty decreases (Mukherjee et al., 11 Feb 2026). In a decision-theoretic treatment, it is the fraction of decision-relevant uncertainty that is irreducible for a specified task and loss (Smith et al., 2024). Related formulations also arise in Bayesian out-of-distribution detection, where aleatoric uncertainty becomes a posterior-stable signal for inputs without a well-defined task label (Wang et al., 2021), in fairness analysis as the irreducible price of fairness (Wang et al., 2023), and in robust uncertainty quantification as the persistence of randomness induced by variables with known laws (Chowdhary et al., 2011). Taken together, these formulations distinguish aleatoric entrenchment from generic predictive uncertainty by emphasizing what remains invariant, or remains dominant, after improvements in estimation, model coverage, or data volume.

1. Conceptual definitions and scope

The underlying distinction is the standard one. Aleatoric uncertainty is stochastic variability in outcomes conditioned on inputs, arising from intrinsic noise in the data-generating process; for fixed parameters $w$ , it is captured by $H[y\mid x,w]$ or $\operatorname{Var}[y\mid x,w]$ (Valdenegro-Toro et al., 2022). Epistemic uncertainty is ignorance about parameters or models due to finite data, model misspecification, or posterior multimodality, and is reducible with more data under well-specified models (Valdenegro-Toro et al., 2022).

Across the cited work, aleatoric entrenchment refers to the irreducible side of that distinction, but with different formal targets. In the disentanglement literature, it is the assumption that the aleatoric component “remains irreducible and stable as data size increases, under distribution shift, and with improved modeling” (Valdenegro-Toro et al., 2022). In the credal-set literature, entrenched aleatoric uncertainty is high entropy of the ground-truth label distribution, $H[p^*(\cdot\mid x)]$ , that remains high as epistemic divergence decreases (Mukherjee et al., 11 Feb 2026). In the decision-theoretic literature, it is quantified as the ratio of Bayes risk to total predictive risk for a chosen loss (Smith et al., 2024). In fairness analysis, the same structural idea is the performance gap between unconstrained and fairness-constrained Bayes risk, which persists even with perfect knowledge of the data-generating distribution (Wang et al., 2023).

A concise comparison is useful because the term is polysemous rather than singular.

Source	Formal object	Meaning of entrenchment
(Valdenegro-Toro et al., 2022)	Stability of estimated aleatoric component	Aleatoric should not be reallocated as epistemic shrinks
(Mukherjee et al., 11 Feb 2026)	$H[p^*(\cdot\mid x)]$ or aligned $\sigma_{\text{ale}}$	Persistent ambiguity under improved estimation
(Smith et al., 2024)	$\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$	Fraction of risk that is irreducible
(Wang et al., 2021)	High expected conditional entropy or $P(\mathrm{Undef}\mid x)$	Stable signal for undefined-label OOD inputs
(Wang et al., 2023)	$D_{\text{aleatoric}} = R^_{\text{fair}} - R^_{\text{uncon}}$	Irreducible price of fairness
(Chowdhary et al., 2011)	Residual risk under known aleatoric law	Randomness that persists under epistemic robustness

This suggests that aleatoric entrenchment is best treated as a family of related formalisms rather than a single canonical definition. The common thread is irreducibility relative to some intervention class: more data, better posterior approximation, stronger model class, fairness post-processing, or tighter ambiguity sets.

2. Formalizations of irreducibility

In Bayesian predictive modeling, disentanglement usually starts from

$p(y\mid x,D)=\int p(y\mid x,w)p(w\mid D)\,dw.$

For classification, predictive entropy is

$H[y\mid x,w]$ 0

with aleatoric uncertainty defined as $H[y\mid x,w]$ 1 and epistemic uncertainty as mutual information,

$H[y\mid x,w]$ 2

For regression and logit-space classification, the same paper uses the law of total variance, with aleatoric uncertainty $H[y\mid x,w]$ 3 and epistemic uncertainty $H[y\mid x,w]$ 4 (Valdenegro-Toro et al., 2022). In that setting, a strong entrenchment assumption would mean that the aleatoric term is invariant to changes in uncertainty-quantification method and to reductions in epistemic uncertainty.

The credal-set formulation replaces single-distribution decomposition by a set of predictive distributions,

$H[y\mid x,w]$ 5

and assigns epistemic uncertainty to the size of the set and aleatoric uncertainty to the noise within its elements (Mukherjee et al., 11 Feb 2026). In the Variational Credal Concept Bottleneck Model, the logit-space credal set is

$H[y\mid x,w]$ 6

with

$H[y\mid x,w]$ 7

There, aleatoric entrenchment is explicitly the case in which $H[y\mid x,w]$ 8 remains high while epistemic uncertainty decreases (Mukherjee et al., 11 Feb 2026).

The decision-theoretic account makes the irreducible/reducible split depend on the task and loss. For action space $H[y\mid x,w]$ 9, loss $\operatorname{Var}[y\mid x,w]$ 0, and true conditional $\operatorname{Var}[y\mid x,w]$ 1, the pointwise Bayes risk is

$\operatorname{Var}[y\mid x,w]$ 2

and the global Bayes risk is

$\operatorname{Var}[y\mid x,w]$ 3

For a learned predictor $\operatorname{Var}[y\mid x,w]$ 4, the excess risk is

$\operatorname{Var}[y\mid x,w]$ 5

Aleatoric entrenchment is then quantified by

$\operatorname{Var}[y\mid x,w]$ 6

with a local version

$\operatorname{Var}[y\mid x,w]$ 7

Under log loss, $\operatorname{Var}[y\mid x,w]$ 8; under $\operatorname{Var}[y\mid x,w]$ 9– $H[p^*(\cdot\mid x)]$ 0 loss, $H[p^*(\cdot\mid x)]$ 1; under squared loss, $H[p^*(\cdot\mid x)]$ 2 (Smith et al., 2024). This makes entrenchment loss-dependent rather than universal.

A related but distinct formalization appears in robust uncertainty quantification. There, variables with exactly known distributions are treated as aleatoric, epistemic ambiguity is represented by KL-balls around nominal laws, and hybrid risk-sensitive functionals retain expectation over the aleatoric variables even in the support-only limit for the epistemic ones (Chowdhary et al., 2011). The limit

$H[p^*(\cdot\mid x)]$ 3

makes explicit that epistemic ignorance about $H[p^*(\cdot\mid x)]$ 4 does not eliminate variability induced by the aleatoric $H[p^*(\cdot\mid x)]$ 5 (Chowdhary et al., 2011).

3. Mechanisms by which aleatoric entrenchment is preserved or broken

One line of work challenges a strong invariance view of entrenchment. In “A Deeper Look into Aleatoric and Epistemic Uncertainty Disentanglement,” the reported interaction between learning aleatoric and epistemic components violates the assumption that aleatoric depends only on data while epistemic depends only on the model posterior (Valdenegro-Toro et al., 2022). The paper states that the uncertainty-quantification method and training loss affect both components, that aleatoric uncertainty is unreliable in out-of-distribution settings, and that some methods, notably Flipout, collapse epistemic uncertainty and reallocate uncertainty mass to the aleatoric head (Valdenegro-Toro et al., 2022). Under that reading, apparent entrenchment can be an artifact of the estimator.

The 2026 credal-set work gives a different diagnosis of why standard decompositions fail. It argues that most practical decompositions compute both components from the same predictive distribution, so changes in predictive spread simultaneously affect both estimates, producing strong correlation and blurring semantics (Mukherjee et al., 11 Feb 2026). Its empirical summary reports baseline $H[p^*(\cdot\mid x)]$ 6– $H[p^*(\cdot\mid x)]$ 7, which it calls the “algebraic trap,” and proposes disjoint parameters, disjoint objectives, and non-overlapping gradient paths to prevent leakage between the two signals (Mukherjee et al., 11 Feb 2026). In that setting, entrenchment is preserved by construction rather than inferred from a single predictive mixture.

Distribution shift is a central failure mode across formulations. In the disentanglement paper’s toy heteroscedastic regression, ensembles and Flipout often output nearly constant aleatoric variance in OOD regions even when the true noise increases with $H[p^*(\cdot\mid x)]$ 8 (Valdenegro-Toro et al., 2022). In Bayesian OOD detection, aleatoric entrenchment is explicitly tied only to a specific kind of OOD point: an input without a well-defined task label. The curation model gives

$H[p^*(\cdot\mid x)]$ 9

which is largest when per-annotator class probabilities are uniform (Wang et al., 2021). This means aleatoric uncertainty is entrenched only for undefined-label OOD inputs; OOD inputs with well-defined labels need not display high aleatoric uncertainty (Wang et al., 2021).

A further limitation comes from model misspecification and evaluation mismatch. The decision-theoretic analysis argues that popular information-theoretic quantities such as $H[p^*(\cdot\mid x)]$ 0 and $H[p^*(\cdot\mid x)]$ 1 are estimators of predictive entropy, not guaranteed measures of decision-relevant irreducible and reducible components under arbitrary losses (Smith et al., 2024). This implies that entrenchment defined through entropy can disagree with entrenchment defined through Bayes risk when the task loss is not log loss, when the model is misspecified, or when the posterior is ill-behaved.

4. Architectures and estimation strategies

The methodologies used to study or enforce aleatoric entrenchment vary sharply by domain. In standard Bayesian deep learning, the disentanglement paper generalizes Kendall and Gal’s formulation beyond MC Dropout to deep ensembles, MC DropConnect, and Flipout, and evaluates both entropy-based and variance-based decompositions (Valdenegro-Toro et al., 2022). For classification, logit uncertainty is propagated into probability space through a sampling softmax,

$H[p^*(\cdot\mid x)]$ 2

with $H[p^*(\cdot\mid x)]$ 3 or with parameter samples $H[p^*(\cdot\mid x)]$ 4 (Valdenegro-Toro et al., 2022). The same study recommends $H[p^*(\cdot\mid x)]$ 5 because the probability of misclassification due to sampling error approaches zero for $H[p^*(\cdot\mid x)]$ 6 (Valdenegro-Toro et al., 2022).

The V-CCBM architecture is a direct attempt to separate entrenched aleatoric uncertainty from reducible epistemic uncertainty structurally. Its pipeline is input $H[p^*(\cdot\mid x)]$ 7 encoder $H[p^*(\cdot\mid x)]$ 8 concept bottleneck $H[p^*(\cdot\mid x)]$ 9 task, with a frozen encoder and orthogonal projections into disjoint subspaces for the mean, epistemic, and aleatoric heads (Mukherjee et al., 11 Feb 2026). The aleatoric head is trained only against annotator disagreement entropy,

$\sigma_{\text{ale}}$ 0

whereas the epistemic head uses detached error supervision plus credal-set regularization,

$\sigma_{\text{ale}}$ 1

With frozen encoder, orthogonal projections, and stop-gradient in $\sigma_{\text{ale}}$ 2, the paper states that the aleatoric and epistemic parameters are updated only by their respective losses (Mukherjee et al., 11 Feb 2026).

In regression, CLEAR treats aleatoric and epistemic components as separately estimated interval widths and then calibrates their combination. The interval is

$\sigma_{\text{ale}}$ 3

where aleatoric uncertainty is estimated by residual quantile regression and epistemic uncertainty by PCS ensembles (Azizi et al., 10 Jul 2025). The paper then proposes an aleatoric entrenchment index

$\sigma_{\text{ale}}$ 4

with $\sigma_{\text{ale}}$ 5 and $\sigma_{\text{ale}}$ 6 the total aleatoric and epistemic width contributions (Azizi et al., 10 Jul 2025). In this framework, entrenchment is dominance of calibrated aleatoric width rather than entropy or variance.

For robust performance analysis with mixed aleatoric and epistemic inputs, the 2011 risk-sensitive framework uses relative entropy and polynomial chaos expansions. Aleatoric variables with known laws are integrated under their known distribution, while epistemic variables are varied within KL-balls around a nominal law (Chowdhary et al., 2011). The hybrid functional

$\sigma_{\text{ale}}$ 7

yields robustification with respect to epistemic uncertainty while preserving aleatoric averaging (Chowdhary et al., 2011). That construction is useful precisely because it prevents epistemic robustness from absorbing irreducible stochastic variation.

5. Empirical evidence across tasks

The empirical record is mixed. The strongest challenge to naïve entrenchment comes from the disentanglement study on toy heteroscedastic regression and FER+ classification. It reports that Flipout produces near-zero epistemic uncertainty across both settings, that aleatoric uncertainty is unreliable OOD, and that deep ensembles provide the best disentangling quality (Valdenegro-Toro et al., 2022). On FER+, the reported test accuracy and loss are Baseline $\sigma_{\text{ale}}$ 8 $\sigma_{\text{ale}}$ 9, Dropout $\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$ 0 $\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$ 1, DropConnect $\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$ 2 $\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$ 3, Flipout $\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$ 4 $\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$ 5, and Ensembles $\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$ 6 $\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$ 7 (Valdenegro-Toro et al., 2022). The same paper also reports that $\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$ 8-NLL improves aleatoric estimation for Dropout and DropConnect but also shifts epistemic behavior, again contradicting the idea that aleatoric should remain stable as epistemic changes (Valdenegro-Toro et al., 2022).

By contrast, the credal-set experiments report structural decorrelation and improved semantic alignment. Across CEBaB, HateXplain, GoEmotions, MAQA, and AmbigQA, V-CCBM reduces $\mathrm{AE}_\ell = R^*(\ell)/R(\hat a_D,\ell)$ 9 from approximately $P(\mathrm{Undef}\mid x)$ 0– $P(\mathrm{Undef}\mid x)$ 1 for baselines to approximately $P(\mathrm{Undef}\mid x)$ 2– $P(\mathrm{Undef}\mid x)$ 3, improves $P(\mathrm{Undef}\mid x)$ 4 to approximately $P(\mathrm{Undef}\mid x)$ 5– $P(\mathrm{Undef}\mid x)$ 6, and attains $P(\mathrm{Undef}\mid x)$ 7– $P(\mathrm{Undef}\mid x)$ 8, including approximately $P(\mathrm{Undef}\mid x)$ 9 on CEBaB and approximately $D_{\text{aleatoric}} = R^*_{\text{fair}} - R^*_{\text{uncon}}$ 0 on MAQA* (Mukherjee et al., 11 Feb 2026). The same study reports AUROC degradation under high ambiguity of approximately $D_{\text{aleatoric}} = R^*_{\text{fair}} - R^*_{\text{uncon}}$ 1 for V-CCBM versus $D_{\text{aleatoric}} = R^*_{\text{fair}} - R^*_{\text{uncon}}$ 2 to $D_{\text{aleatoric}} = R^*_{\text{fair}} - R^*_{\text{uncon}}$ 3 for baselines on MAQA* and AmbigQA* (Mukherjee et al., 11 Feb 2026). These results support the claim that entrenched aleatoric uncertainty can be estimated as a distinct object if it is supervised directly and geometrically separated from epistemic uncertainty.

The OOD detection evidence supports a domain-specific version of entrenchment. Using CIFAR-10 and CIFAR-100 as in-distribution data, Downsampled ImageNet for outlier exposure, and Gaussian noise, Rademacher noise, Blob shapes, Texture, and SVHN at test time, the Bayesian OE method is reported to outperform both a Bayesian neural network trained on in-distribution data only and aleatoric-only outlier exposure (Wang et al., 2021). On several OOD sets it attains AUROC near $D_{\text{aleatoric}} = R^*_{\text{fair}} - R^*_{\text{uncon}}$ 4 and FPR@95 near $D_{\text{aleatoric}} = R^*_{\text{fair}} - R^*_{\text{uncon}}$ 5, with substantial gains on more challenging sets such as Texture and SVHN (Wang et al., 2021). The important qualification is that the mechanism is tailored to OOD inputs with undefined labels rather than all forms of shift (Wang et al., 2021).

Evidence against off-the-shelf proxies also appears in conformal prediction. On CIFAR-10H, MLRSNet, FER+, and ImageNet-ReaL, using three conformal predictors and eight deep models, the 2025 study finds that the correlation between prediction-set size and human ambiguity is predominantly very weak to weak across $D_{\text{aleatoric}} = R^*_{\text{fair}} - R^*_{\text{uncon}}$ 6 dataset–model–CP combinations, with only one moderate case (Hagos et al., 6 Sep 2025). For example, MLRSNet correlations between set size and distinct human labels are near zero across methods, while correlations with softmax entropy are strong to very strong (Hagos et al., 6 Sep 2025). This suggests that coverage-preserving predictive sets can track model confidence without faithfully tracking entrenched human ambiguity.

6. Extensions to fairness, decision-making, and open problems

In fairness analysis, aleatoric entrenchment is not ambiguity within labels but an irreducible performance cost induced by the data distribution and the fairness constraint. Let $D_{\text{aleatoric}} = R^*_{\text{fair}} - R^*_{\text{uncon}}$ 7 be unconstrained Bayes risk and $D_{\text{aleatoric}} = R^*_{\text{fair}} - R^*_{\text{uncon}}$ 8 the fairness-constrained Bayes risk. The paper defines aleatoric discrimination as

$D_{\text{aleatoric}} = R^*_{\text{fair}} - R^*_{\text{uncon}}$ 9

with epistemic discrimination the gap between a learned fair model and the fairness frontier (Wang et al., 2023). On Adult, COMPAS, German Credit, and HSLS, state-of-the-art fairness interventions are reported to operate near the estimated fairness frontier on standard tabular datasets, implying small epistemic discrimination there, while disparate missingness patterns substantially degrade the frontier and increase aleatoric entrenchment (Wang et al., 2023). In that formulation, entrenched aleatoric structure is a limit to fair performance rather than a property of predictive entropy.

The decision-theoretic perspective generalizes this logic beyond fairness. Because entrenchment is defined as the ratio of irreducible Bayes risk to total risk, it can increase either because the predictor improves or because the data-generating process becomes noisier under the relevant loss (Smith et al., 2024). This clarifies why different applications can report apparently contradictory behavior: under log loss, high conditional entropy yields high entrenchment; under $p(y\mid x,D)=\int p(y\mid x,w)p(w\mid D)\,dw.$ 0– $p(y\mid x,D)=\int p(y\mid x,w)p(w\mid D)\,dw.$ 1 loss, the same conditional can have modest Bayes error if the modal class is still dominant (Smith et al., 2024). A plausible implication is that the phrase “aleatoric entrenchment” should always be accompanied by its task, loss, and evaluation distribution.

Several practical implications recur across the literature. Deep ensembles are repeatedly favored for disentanglement quality in standard Bayesian neural settings (Valdenegro-Toro et al., 2022). Structural separation with disjoint heads and isolated gradients can dramatically reduce correlation between epistemic and aleatoric signals (Mukherjee et al., 11 Feb 2026). In regression, the learned trade-off parameter $p(y\mid x,D)=\int p(y\mid x,w)p(w\mid D)\,dw.$ 2 in CLEAR can diagnose whether uncertainty is dominated by aleatoric or epistemic contributions; in the Ames Housing case study, $p(y\mid x,D)=\int p(y\mid x,w)p(w\mid D)\,dw.$ 3 shifts from $p(y\mid x,D)=\int p(y\mid x,w)p(w\mid D)\,dw.$ 4 with two predictors to $p(y\mid x,D)=\int p(y\mid x,w)p(w\mid D)\,dw.$ 5 with all features, while the epistemic-to-aleatoric width ratio shifts from $p(y\mid x,D)=\int p(y\mid x,w)p(w\mid D)\,dw.$ 6 to $p(y\mid x,D)=\int p(y\mid x,w)p(w\mid D)\,dw.$ 7 (Azizi et al., 10 Jul 2025). In OOD detection, aleatoric entrenchment is actionable only when the relevant OOD mechanism is undefined-label ambiguity (Wang et al., 2021).

The open problems are correspondingly precise. The disentanglement paper highlights that there is no universal guarantee that aleatoric uncertainty is independent of the uncertainty-quantification method and notes that entropy-based decompositions in classification are not additive in probability space (Valdenegro-Toro et al., 2022). The credal-set paper notes dependence on multi-annotator signals, parameterization sensitivity, and the approximate status of its Hausdorff KL geometry (Mukherjee et al., 11 Feb 2026). The decision-theoretic critique argues that entropy-based proxies can be poor estimators of decision-relevant irreducible and reducible uncertainty outside well-specified, calibrated, log-loss settings (Smith et al., 2024). The conformal prediction study further indicates that widely used post hoc uncertainty sets may not capture entrenched ambiguity even when marginal coverage is correct (Hagos et al., 6 Sep 2025).

Aleatoric entrenchment therefore functions less as a settled quantity than as a unifying research question: which part of predictive uncertainty is fundamentally irreducible, under which formalism, and under which interventions does that irreducible component remain stable? The cited work agrees on the existence of irreducible uncertainty, but disagrees on whether standard estimators isolate it, on which geometric or decision-theoretic object best represents it, and on how robustly it survives distribution shift, model misspecification, and evaluation mismatch.